Raven and client linq indexes, or: We hate strings, too

May 15 2010

Raven and client linq indexes, or: We hate strings, too

time to read 12 min | 2267 words

When I designed Raven, one of the things that were very clear to me was that I wanted to be able to take advantage on existing features in the framework to the greatest degree possible. This means that I don’t have to reinvent the wheel, and it means that Raven’s users will be able to understand what is going on and how to utilize Raven much more easily.

One of those decisions was to use Linq as the format for defining indexes. All other document databases use Javascript as their index definition format (yes, I know you can use Ruby for CouchDB, that isn’t the default / common approach). But .NET already has this nice syntax, and it already gives me so much information OOTB, and users already know how to make linq do all sort of crazy stuff. That reduce support questions, not to mention that it is a sexy little feature.

We are running those linq queries on the server (doing dynamic compilations and a bunch of other stuff). The problem is how to get them to the server. We have a really nice UI, to do so, of course:

But when it comes right down to it, it is a couple of text boxes, and users may want to be able to define indexes via code. That is perfectly understandable, but it means that users have to do something like this:

documentStore.DatabaseCommands.PutIndex(
    "UsersByRegion",
    new IndexDefinition()
    {
        Map = @"from user in docs.Users select new {user.Region}",
    });

If you cringe when you see that, welcome to the club. This is still more or less okay, but indexes can get complicated, like this guy:

store.DatabaseCommands.PutIndex("GameEventCountZoneBySpecificCharacter",
                    new IndexDefinition
                    {
                        Map = @"from doc in docs where doc.DataUploadId != null 
                && doc.RealmName != null 
                && doc.Region != null 
                && doc.CharacterName != null 
                && doc.Zone != null 
                && doc.SubZone != null
    select new
    {
        DataUploadId = doc.DataUploadId,
        RealmName = doc.RealmName,
        Region = doc.Region,
        CharacterName = doc.CharacterName,
        Zone = doc.Zone,
        Count = 1
    };",
                        Reduce = @"from result in results
        group result by new
        {
            DataUploadId = result.DataUploadId,
            RealmName = result.RealmName,
            Region = result.Region,
            CharacterName = result.CharacterName,
            Zone = result.Zone
        } into g
        select new
        {
            DataUploadId = g.Key.DataUploadId,
            RealmName = g.Key.RealmName,
            Region = g.Key.Region,
            CharacterName = g.Key.CharacterName,
            Zone = g.Key.Zone,
            Count = g.Sum(x => x.Count) 
        };"});

I could tolerate the first example, I couldn’t tolerate this one. I spent a lot of time exploring how I can get things setup so you’ll be able to use linq (including all the usual strong typing goodies) on the client to define server indexes. The short answer is that a complete solution would take about a month of development. A workable solution can be used by serializing expression tree on the wire. But that would produce the following index definition on the server:

That is not an acceptable option for me.

But if we limit the scope to “just get it working and accept that it won’t handle 100% of the cases”, the problem become much easier, and we can now do this:

documentStore.DatabaseCommands.PutIndex("UsersByLocation",
    new IndexDefinition<LinqIndexesFromClient.User>
    {
        Map = users => from user in users
                       select new { user.Region }
    });

And that would show up on the server as:

It isn’t the original expression, but it is clear enough, I think.

What about that monster query? We can now write it like this:

documentStore.DatabaseCommands.PutIndex("GameEventCountZoneBySpecificCharacter",
    new IndexDefinition<Game.GameEvent, Game.GameEventCount>
    {
        Map = docs =>
            from doc in docs
            where doc.DataUploadId != null
                && doc.RealmName != null
                && doc.Region != null
                && doc.CharacterName != null
                && doc.Zone != null
                && doc.SubZone != null
            select new
            {
                doc.DataUploadId,
                doc.RealmName,
                doc.Region,
                doc.CharacterName,
                doc.Zone,
                Count = 1
            },
        Reduce = results => from result in results
                            group result by new
                            {
                                result.DataUploadId,
                                result.RealmName,
                                result.Region,
                                result.CharacterName,
                                result.Zone
                            }
                            into g
                            select new
                            {
                                g.Key.DataUploadId,
                                g.Key.RealmName,
                                g.Key.Region,
                                g.Key.CharacterName,
                                g.Key.Zone,
                                Count = g.Sum(x => x.Count)
                            }
    });

Well, this will turn into this:

It isn’t as nice as the original query, I’ll admit, but it is still highly readable.

And yes, it takes scary code to get there :-)

Tweet Share Share 12 comments

Tags:

Raven

Comments

15 May 2010
09:17 AM

Ngoc Van Tran

I don't know why but I actually use the extension methods more than direct linq syntax. It just feels more natural to me. So you approach is absolutely beautiful.

15 May 2010
10:15 AM

Demis Bellot

I'm with @Ngoc, I actually prefer extension method syntax to pure LINQ. The fact that we can use either is just a testament to how well-designed, powerful and expressive LINQ is. Big kudos to Erik Meijer for this killer feature.

15 May 2010
10:29 AM

J Healy

Really? A whole 'EiniMonth'? Hmmm, I can't shake the feeling there is a relativistic time dilation associated with any such measure.

15 May 2010
10:33 AM

Ayende Rahien

J Healy,

Yes, a month. Linq is freaking complicated.

15 May 2010
14:55 PM

Frank Quednau

The second version of that monster query...are Game.GameEvent, Game.GameEventCount the types that are used to make the LINQ-Queries strong-typed? So, they could then become out of sync with the actual documents (even though the same goes for the string-style indexes, I suppose)? Are they merely placeholders, to be defined by a Raven user, or...

15 May 2010
15:05 PM

Rob Ashton

In that last query, GameEvent is the document itself, and yes - GameEventCount is a separate class to help with the Linq, but it only as one property as it inherits from GameEvent to get all the other ones.

Thus no out of sync problems to worry about.

15 May 2010
21:03 PM

Guillaume

Is there any plan to have IQueryable support for client-side queries ? or to have Count method in IDocumentQuery <t ?

15 May 2010
21:36 PM

Ayende Rahien

Guillaume,

IDocumentQuery have TotalResults property that you can access.

As for IQueryable, probably, but it is a low priority item at the moment.

16 May 2010
08:53 AM

Max

Why not just pretty-print the expression tree when you show it to the user? That seems like a more direct and robust solution..

16 May 2010
08:56 AM

Ayende Rahien

Max,

Show me the code to do so, and the resulting output

17 May 2010
10:54 AM

Matt Freeman

Looking at the ravendb code where should I start if I wanted to extend on the server side, e.g. drop in mymapreduce.dll I dont think I'll ever be able to serialize my map reduce functions.

17 May 2010
11:48 AM

Ayende Rahien

Matt,

Look at the CompiledIndex test

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB