RavenDB – Live projections, or how to do joins in a non relational database
This feature has me really excited, because it solves a pretty big problem and it does so in a really elegant fashion.
Let us start from the beginning. Documents are independent, which means that processing a single document should not require loading additional documents. This, in turn, leads to denormalization, so we can keep the data that we need about our associations in the same document.
The problem, of course, is that there are many cases where this denormalization is annoying. In particular, it means that you have to take responsibility for handling the updates to the denormalized data. There are good reasons to want to do that, particularly if you are working on a sharded data store. And yet… many people don’t run in a sharded store, so why make them pay for that?
I found it hard to answer, especially since we already introduced the includes feature. For a while, I thought that just doing the include was enough, but then I got into a discussion with Rob about this. And we had the chance to talk about projections vs. documents. We agreed that we wanted a solution, and we started throwing a lot of crazy ideas (multi sourced, background updated, materialized views – to name one) around. Until we finally realized that we were being really stupid. The data is already there!
It was just that I was too blind to see how we can push it out.
Let us talk in code for a minute, since it would be easier to demonstrate how things work:
using (var s = ds.OpenSession()) { var entity = new User { Name = "Ayende" }; s.Store(entity); s.Store(new User { Name = "Oren", AliasId = entity.Id }); s.SaveChanges(); }
This creates two documents and links between them.
Now, let us say that we want to display the following grid:
User Alias Oren Ayende
Well, we need to query for all the users that has an alias, then include the associated document. Something like this:
var usersWithAliases = from user in session.Query<User>().Include(x=>x.AliasId) where user.AliasId != null select user; var results = new List<UserAndAlias>(); foreach(var user in usersWithAliases) { results.Add( new UserAndAlias { User = user.Name Alias = session.Load<User>(user.AliasId).Name } ); }
Here is the deal, this is very efficient in terms of calling the database only once, but it does means that we are passing the full document back, which may be something that we may not want to do.
Not to mention that there is a whole lot of code here.
Okay, so far we have introduced the problem. Let us see how we can solve it. We can do that by applying a live projection at the server side. A live projection transforms the results of a query on the server side, and it has access to other documents as well. Let us see what I mean by that:
public class Users_ByAlias : AbstractIndexCreationTask<User> { public Users_ByAlias() { Map = users => from user in users select new {user.AliasId}; TransformResults = (database, users) => from user in users let alias= database.Load<User>(user.AliasId) select new {Name = user.Name, Alias = alias.Name}; } }
It is important to understand exactly what is going on here. The TransformResults will be executed on the results on the query, which gives it the change to modify, extend or filter them. In this case, it gives you the ability to look at data from another document.
For the DB guys among you, this performs a nested loop join.
Now, we can just write:
var usersWithAliases = (from user in session.Query<User, Users_ByAlias>() where user.AliasId != null select user).As<UserAndAlias>();
This will query the index, transform the results on the server side, and give us the UserAndAlias colelction that we can just use.
Did I mention that I am really excited about this feature?
Comments
Very useful. This can probably be materialized as well.
This is one of my favourite new things ever :D :D (Okay, I'm still a little excited too!!)
wow, that is a really useful feature! Ayende: respect;)
Nice.
How do you end up with the UserAndAlias type at the end when your transform's just creating an anonymous type?
Is the index persisted and maintained or just created for this one query and, if it is persisted, what happens when you update the code for the index creation task?
Derek,
The TransformResults function is called when you query the index, it is not persisted.
And the fact that you can do that is simply because the anon type & the UserAndAlias has the shape shape
Ayende: just a question about classes naming. Users_ByAlias: do you recommend naming classes with an underscore?
For indexes, yes.
Because the naming conventions turns them into: "Users/ByAlias"
I might be misreading this, but isn't this generating an 1+N query? While the good part is that it happens server side, so you avoid network roundtrips, it will still generate N additional reads.
Alex,
Yes & No.
What this actually generate is a load by ID, which is about as fast as you can get it.
As I mentioned, this is basically a nested loop join
Is it enforced to be get by key? (basically I'm wondering if someone would actually be able to shhot himself in the foot with this feature, by emiting non key subqueries)
Alex,
Yes, you can only query by PK
Comment preview