Raven and client linq indexes, or: We hate strings, too

time to read 12 min | 2267 words

When I designed Raven, one of the things that were very clear to me was that I wanted to be able to take advantage on existing features in the framework to the greatest degree possible. This means that I don’t have to reinvent the wheel, and it means that Raven’s users will be able to understand what is going on and how to utilize Raven much more easily.

One of those decisions was to use Linq as the format for defining indexes. All other document databases use Javascript as their index definition format (yes, I know you can use Ruby for CouchDB, that isn’t the default / common approach). But .NET already has this nice syntax, and it already gives me so much information OOTB, and users already know how to make linq do all sort of crazy stuff. That reduce support questions, not to mention that it is a sexy little feature.

We are running those linq queries on the server (doing dynamic compilations and a bunch of other stuff). The problem is how to get them to the server. We have a really nice UI, to do so, of course:image

But when it comes right down to it, it is a couple of text boxes, and users may want to be able to define indexes via code. That is perfectly understandable, but it means that users have to do something like this:

documentStore.DatabaseCommands.PutIndex(
    "UsersByRegion",
    new IndexDefinition()
    {
        Map = @"from user in docs.Users select new {user.Region}",
    });

If you cringe when you see that, welcome to the club. This is still more or less okay, but indexes can get complicated, like this guy:

store.DatabaseCommands.PutIndex("GameEventCountZoneBySpecificCharacter",
                    new IndexDefinition
                    {
                        Map = @"from doc in docs where doc.DataUploadId != null 
                && doc.RealmName != null 
                && doc.Region != null 
                && doc.CharacterName != null 
                && doc.Zone != null 
                && doc.SubZone != null
    select new
    {
        DataUploadId = doc.DataUploadId,
        RealmName = doc.RealmName,
        Region = doc.Region,
        CharacterName = doc.CharacterName,
        Zone = doc.Zone,
        Count = 1
    };",
                        Reduce = @"from result in results
        group result by new
        {
            DataUploadId = result.DataUploadId,
            RealmName = result.RealmName,
            Region = result.Region,
            CharacterName = result.CharacterName,
            Zone = result.Zone
        } into g
        select new
        {
            DataUploadId = g.Key.DataUploadId,
            RealmName = g.Key.RealmName,
            Region = g.Key.Region,
            CharacterName = g.Key.CharacterName,
            Zone = g.Key.Zone,
            Count = g.Sum(x => x.Count) 
        };"});

I could tolerate the first example, I couldn’t tolerate this one. I spent a lot of time exploring how I can get things setup so you’ll be able to use linq (including all the usual strong typing goodies) on the client to define server indexes. The short answer is that a complete solution would take about a month of development. A workable solution can be used by serializing expression tree on the wire. But that would produce the following index definition on the server:

image

That is not an acceptable option for me.

But if we limit the scope to “just get it working and accept that it won’t handle 100% of the cases”, the problem become much easier, and we can now do this:

documentStore.DatabaseCommands.PutIndex("UsersByLocation",
    new IndexDefinition<LinqIndexesFromClient.User>
    {
        Map = users => from user in users
                       select new { user.Region }
    });

And that would show up on the server as:

image

It isn’t the original expression, but it is clear enough, I think.

What about that monster query? We can now write it like this:

documentStore.DatabaseCommands.PutIndex("GameEventCountZoneBySpecificCharacter",
    new IndexDefinition<Game.GameEvent, Game.GameEventCount>
    {
        Map = docs =>
            from doc in docs
            where doc.DataUploadId != null
                && doc.RealmName != null
                && doc.Region != null
                && doc.CharacterName != null
                && doc.Zone != null
                && doc.SubZone != null
            select new
            {
                doc.DataUploadId,
                doc.RealmName,
                doc.Region,
                doc.CharacterName,
                doc.Zone,
                Count = 1
            },
        Reduce = results => from result in results
                            group result by new
                            {
                                result.DataUploadId,
                                result.RealmName,
                                result.Region,
                                result.CharacterName,
                                result.Zone
                            }
                            into g
                            select new
                            {
                                g.Key.DataUploadId,
                                g.Key.RealmName,
                                g.Key.Region,
                                g.Key.CharacterName,
                                g.Key.Zone,
                                Count = g.Sum(x => x.Count)
                            }
    });

Well, this will turn into this:

image 

It isn’t as nice as the original query, I’ll admit, but it is still highly readable.

And yes, it takes scary code to get there :-)