RavenDB – Live projections, or how to do joins in a non relational database

Oct 14 2010

RavenDB – Live projections, or how to do joins in a non relational database

time to read 7 min | 1252 words

This feature has me really excited, because it solves a pretty big problem and it does so in a really elegant fashion.

Let us start from the beginning. Documents are independent, which means that processing a single document should not require loading additional documents. This, in turn, leads to denormalization, so we can keep the data that we need about our associations in the same document.

The problem, of course, is that there are many cases where this denormalization is annoying. In particular, it means that you have to take responsibility for handling the updates to the denormalized data. There are good reasons to want to do that, particularly if you are working on a sharded data store. And yet… many people don’t run in a sharded store, so why make them pay for that?

I found it hard to answer, especially since we already introduced the includes feature. For a while, I thought that just doing the include was enough, but then I got into a discussion with Rob about this. And we had the chance to talk about projections vs. documents. We agreed that we wanted a solution, and we started throwing a lot of crazy ideas (multi sourced, background updated, materialized views – to name one) around. Until we finally realized that we were being really stupid. The data is already there!

It was just that I was too blind to see how we can push it out.

Let us talk in code for a minute, since it would be easier to demonstrate how things work:

using (var s = ds.OpenSession())
{
    var entity = new User { Name = "Ayende" };
    s.Store(entity);
    s.Store(new User { Name = "Oren", AliasId = entity.Id });
    s.SaveChanges();
}

This creates two documents and links between them.

Now, let us say that we want to display the following grid:

User Alias

Oren Ayende

Well, we need to query for all the users that has an alias, then include the associated document. Something like this:

var usersWithAliases = from user in session.Query<User>().Include(x=>x.AliasId)
                       where user.AliasId != null
                       select user;


var results = new List<UserAndAlias>();

foreach(var user in usersWithAliases)
{
    results.Add(
        new UserAndAlias
        {
            User = user.Name
            Alias = session.Load<User>(user.AliasId).Name
        }
    );
}

Here is the deal, this is very efficient in terms of calling the database only once, but it does means that we are passing the full document back, which may be something that we may not want to do.

Not to mention that there is a whole lot of code here.

Okay, so far we have introduced the problem. Let us see how we can solve it. We can do that by applying a live projection at the server side. A live projection transforms the results of a query on the server side, and it has access to other documents as well. Let us see what I mean by that:

public class Users_ByAlias : AbstractIndexCreationTask<User>
{
    public Users_ByAlias()
    {
        Map =
            users => from user in users
                     select new {user.AliasId};

        TransformResults =
            (database, users) => from user in users
                                 let alias= database.Load<User>(user.AliasId)
                                 select new {Name = user.Name, Alias = alias.Name};
    }
}

It is important to understand exactly what is going on here. The TransformResults will be executed on the results on the query, which gives it the change to modify, extend or filter them. In this case, it gives you the ability to look at data from another document.

For the DB guys among you, this performs a nested loop join.

Now, we can just write:

var usersWithAliases = 
     (from user in session.Query<User, Users_ByAlias>()
     where user.AliasId != null
     select user).As<UserAndAlias>();

This will query the index, transform the results on the server side, and give us the UserAndAlias colelction that we can just use.

Did I mention that I am really excited about this feature?

Tweet Share Share 11 comments

Tags:

Raven

Comments

14 Oct 2010
09:06 AM

tobi

Very useful. This can probably be materialized as well.

14 Oct 2010
09:06 AM

Rob Ashton

This is one of my favourite new things ever :D :D (Okay, I'm still a little excited too!!)

14 Oct 2010
09:18 AM

json

wow, that is a really useful feature! Ayende: respect;)

14 Oct 2010
09:23 AM

Derek Fowler

Nice.

How do you end up with the UserAndAlias type at the end when your transform's just creating an anonymous type?

Is the index persisted and maintained or just created for this one query and, if it is persisted, what happens when you update the code for the index creation task?

14 Oct 2010
10:13 AM

Ayende Rahien

Derek,

The TransformResults function is called when you query the index, it is not persisted.

And the fact that you can do that is simply because the anon type & the UserAndAlias has the shape shape

14 Oct 2010
11:27 AM

json

Ayende: just a question about classes naming. Users_ByAlias: do you recommend naming classes with an underscore?

14 Oct 2010
11:30 AM

Ayende Rahien

For indexes, yes.

Because the naming conventions turns them into: "Users/ByAlias"

14 Oct 2010
22:20 PM

Alex Popescu

I might be misreading this, but isn't this generating an 1+N query? While the good part is that it happens server side, so you avoid network roundtrips, it will still generate N additional reads.

15 Oct 2010
01:04 AM

Ayende Rahien

Alex,

Yes & No.

What this actually generate is a load by ID, which is about as fast as you can get it.

As I mentioned, this is basically a nested loop join

15 Oct 2010
01:13 AM

Alex Popescu

Is it enforced to be get by key? (basically I'm wondering if someone would actually be able to shhot himself in the foot with this feature, by emiting non key subqueries)

15 Oct 2010
01:20 AM

Ayende Rahien

Alex,

Yes, you can only query by PK

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB