Implementing RavenDB Indexes

time to read 4 min | 727 words

I got a couple of interesting questions about RavenDB implementation, and I thought it would make a good blog post.

Is it somewhat correct to say: When doing a map and reduce you use a dynamic or static query-index (not sure what you call them) that has compiled a c# class which will be used when deserializing the JSON to this class. This is done at server side, in memory, right? You run the query against the Lucene indexes and then deserialize the JSON and apply the projection/reduce to create the result?

No, this isn’t the case. We take the linq expression that makes up the statement, and then we compile that as a class. That class doesn’t represent the documents we index, it represent the indexing operation. It would probably be easier to explain with an example. Here is a simple index definition:

from user in docs.Users select new { user.Name }

RavenDB is going to take this code and translate into something like this:

using Raven.Abstractions;
using Raven.Database.Linq;
using System.Linq;
using System.Collections.Generic;
using System.Collections;
using System;
using Raven.Database.Linq.PrivateExtensions;
using Lucene.Net.Documents;
public class Index_MyIndex : AbstractViewGenerator
{
    public Index_MyIndex()
    {
        this.ViewText = @"from user in docs.Users select new { user.Name }";
        this.ForEntityNames.Add("Users");
        this.AddMapDefinition(docs => from user in docs
            where user["@metadata"]["Raven-Entity-Name"] == "Users"
            select new { user.Name, __document_id = user.__document_id });
        this.AddField("__document_id");
        this.AddField("Name");
        this.AddQueryParameterForMap("__document_id");
        this.AddQueryParameterForMap("Name");
        this.AddQueryParameterForReduce("__document_id");
        this.AddQueryParameterForReduce("Name");
    }
}

There is a lot going on in here, but most of it are just stuff used for internal bookkeeping for RavenDB. The important thing is the AddMapDefinition. You can see that we have taken the index definition, processed it a bit, and then we treat it like a lambda. That is how RavenDB is able to go from having an index in text to processing that index in memory.

Note that this has nothing whatsoever to do with deserialization. That is handled by another part of RavenDB, where we use the dynamic feature (along with a host of other stuff) to make it possible to run linq queries over schema less information.

Another thing to remember is that indexing is run over the documents stored in the database (input) and the results goes to Lucene (output). We never read information from Lucene as input for an index.

Could you describe how a drop of a property and a rename of a property will affect the Query-indexes?

If the index isn’t modified, it would try to index a missing property. That is basically a no op. In fact, we can even do nested indexing into a missing property and still have no issues, because the index code inside RavenDB is using Null Objects for most things, so you can do things like user.HelloWorld.NiceToMeeYou.Too and that would basically be translated into “don’t index me” value, instead of throwing.

For more general information about RavenDB migrations, you can see: