Linq queries with RavenDB
People seems to be puzzled by my post about ad-hoc queries with RavenDB, that is mostly because I posted code from the very early experiments.
Basically, the main problem with supporting arbitrary Linq queries with RavenDB is outline in this post, since I have to compile a Linq query to an assembly, and since assemblies can’t be unloaded (we aren’t talking about dynamic assemblies, which can) except by unloading the whole app domain. That meant that supporting arbitrary Linq queries is essentially opening up a memory leak, which is why we didn’t implement it for a long time.
But the feature was so… shiny that I really couldn’t let it go. I tried testing it with serializing the data from storage over app domain boundary, but that turned out to be prohibitive from a performance perspective. Then it occurred to me that if Mohammad can't go to the mountain, let the mountain come to Mohammad.
RavenDB uses Esent under the covers to handle storage. That means that the storage itself doesn’t really care for app domains, since it isn't managed code. Once I had that idea, it was very simple to write the rest of the code that setup querying in a separate app domain and would tear it down once memory usage became too high.
Note that the purpose of this feature is mainly for testing / exporting / migrating data.
The code in the previous post was the very first attempt to prove that this can be done. This is how it looks like in practice:
[Fact] public void CanPerformQueryToSelectSingleItem() { db.Put("ayende", null, JObject.FromObject(new {Name = "Ayende"}), new JObject(), null); var result = db.ExecuteQueryUsingLinearSearch(new LinearQuery { Query = "from doc in docs select new { doc.Name }" }); Assert.Empty(result.Errors); Assert.Equal(@"{""Name"":""Ayende"",""__document_id"":""ayende""}", result.Results[0].ToString(Formatting.None)); }
We also integrated that into the HTTP API and WebUI, including all the usual taxes such as error handling, paging, etc.
What about the name? ExecuteQueryUsingLinearSearch is a pretty hefty method name. That is quite intentional. While most queries in RavenDB are done using indexes, this performs a linear search across all documents. The performance in O(N), so this isn’t really something that you want to run in production (in the same sense that you don’t want table scans in production RDBMS).
Comments
s/since it is managed code/since it isn't managed code
Thanks, fixed
I'm curious about how LinqPad implements that
@Hendry, I verified with Process Explorer that LINQPad compiles queries to a separate AppDomain. But looking back at Ayende's first post on the subject, you'll note he was still able to transform upwards of 12000 documents per second using (what I assume is) vanilla remoting. That's plenty fast enough for an app like LINQPad.
Dathan,
Sometimes it isn't the perf that matter, but the purity of the solution.
For that matter, LinqPad could probably do WITHOUT the app domain, since each asm is tiny, and it would take literally thousands of queries in a single run of the app to create significant memory pressure
Comment preview