Linq queries with RavenDB

time to read 3 min | 506 words

People seems to be puzzled by my post about ad-hoc queries with RavenDB, that is mostly because I posted code from the very early experiments.

Basically, the main problem with supporting arbitrary Linq queries with RavenDB is outline in this post, since I have to compile a Linq query to an assembly, and since assemblies can’t be unloaded (we aren’t talking about dynamic assemblies, which can) except by unloading the whole app domain. That meant that supporting arbitrary Linq queries is essentially opening up a memory leak, which is why we didn’t implement it for a long time.

But the feature was so… shiny that I really couldn’t let it go. I tried testing it with serializing the data from storage over app domain boundary, but that turned out to be prohibitive from a performance perspective. Then it occurred to me that if Mohammad can't go to the mountain, let the mountain come to Mohammad.

RavenDB uses Esent under the covers to handle storage. That means that the storage itself doesn’t really care for app domains, since it isn't managed code. Once I had that idea, it was very simple to write the rest of the code that setup querying in a separate app domain and would tear it down once memory usage became too high.

Note that the purpose of this feature is mainly for testing / exporting / migrating data.

The code in the previous post was the very first attempt to prove that this can be done. This is how it looks like in practice:

[Fact]
public void CanPerformQueryToSelectSingleItem()
{
    db.Put("ayende", null, JObject.FromObject(new {Name = "Ayende"}), new JObject(), null);
    
    var result = db.ExecuteQueryUsingLinearSearch(new LinearQuery
    {
        Query = "from doc in docs select new { doc.Name }"
    });

    Assert.Empty(result.Errors);
    Assert.Equal(@"{""Name"":""Ayende"",""__document_id"":""ayende""}", result.Results[0].ToString(Formatting.None));
}

We also integrated that into the HTTP API and WebUI, including all the usual taxes such as error handling, paging, etc.

What about the name? ExecuteQueryUsingLinearSearch is a pretty hefty method name. That is quite intentional. While most queries in RavenDB are done using indexes, this performs a linear search across all documents. The performance in O(N), so this isn’t really something that you want to run in production (in the same sense that you don’t want table scans in production RDBMS).