RavenDB Awesome Feature of the Day, Formatted Indexes
There is a chance that you’ll look at me strangely for calling this the “Feature of the Day”. But that is actually quite a important little feature.
Here is the deal, let us say that you have the following index:
public class Orders_Search : AbstractIndexCreationTask<Order, Orders_Search.ReduceResult> { public class ReduceResult { public string Query { get; set; } public DateTime LastPaymentDate { get; set; } } public Orders_Search() { Map = orders => from order in orders let lastPayment = order.Payments.LastOrDefault() select new { Query = new object[] { order.FirstName, order.LastName, order.OrderNumber, order.Email, order.Email.Split('@'), order.CompanyName, order.Payments.Select(payment => payment.PaymentIdentifier), order.LicenseIds }, LastPaymentDate = lastPayment == null ? order.OrderedAt : lastPayment.At }; } }
And you are quite happy with it. But that is the client side perspective. We don’t have any types on the server, so you can’t just execute this there. Instead, we send a string representing the index to the server. That string is actually the output of the linq expression, which looks like this:
This is… somewhat hard to read, I think you’ll agree. So we had some minimal work done to improve this, and right now what you’ll get is (you’ll likely see it roll off the screen, that is expected):
docs.Orders .Select(order => new {order = order, lastPayment = order.Payments.LastOrDefault()}) .Select(__h__TransparentIdentifier0 => new {Query = new System.Object []{__h__TransparentIdentifier0.order.FirstName, __h__TransparentIdentifier0.order.LastName, __h__TransparentIdentifier0.order.OrderNumber, __h__TransparentIdentifier0.order.Email, __h__TransparentIdentifier0.order.Email.Split(new System.Char []{'@'}), __h__TransparentIdentifier0.order.CompanyName, __h__TransparentIdentifier0.order.Payments .Select(payment => payment.PaymentIdentifier), __h__TransparentIdentifier0.order.LicenseIds}, LastPaymentDate = __h__TransparentIdentifier0.lastPayment == null ? __h__TransparentIdentifier0.order.OrderedAt : __h__TransparentIdentifier0.lastPayment.At})
This is still quite confusing, actually. But still better than the alternative.
As I said, it seems like a little thing, but those things are important. An index in its compiled form that is hard to understand for a user is a support issue for us. We needed to resolve this issue.
The problem is that source code beautifying is non trivial. I started playing with parsers a bit, but it was all way too complex. Then I had an epiphany. I didn’t actually care about the code, I just wanted it sorted. There aren’t many C# code beautifiers around, but there are a lot for JavaScript.
I started with the code from http://jsbeautifier.org/, which Rekna Anker had already ported to C#. From there, it was an issue of making sure that for my purposes, the code generated the right output. I had to teach it C# idioms such as @foo, null coalescent and lambda expressions, but that sounds harder than it actually was. With that done, we go this output:
docs.Orders.Select(order => new { order = order, lastPayment = order.Payments.LastOrDefault() }).Select(__h__TransparentIdentifier0 => new { Query = new System.Object[] { __h__TransparentIdentifier0.order.FirstName, __h__TransparentIdentifier0.order.LastName, __h__TransparentIdentifier0.order.OrderNumber, __h__TransparentIdentifier0.order.Email, __h__TransparentIdentifier0.order.Email.Split(new System.Char[] { '@' }), __h__TransparentIdentifier0.order.CompanyName, __h__TransparentIdentifier0.order.Payments.Select(payment => payment.PaymentIdentifier), __h__TransparentIdentifier0.order.LicenseIds }, LastPaymentDate = __h__TransparentIdentifier0.lastPayment == null ? __h__TransparentIdentifier0.order.OrderedAt : __h__TransparentIdentifier0.lastPayment.At })
And this is actually much better. Still not good enough, mind. we can do better than that. It is a simple change:
docs.Orders.Select(order => new { order = order, lastPayment = order.Payments.LastOrDefault() }).Select(this0 => new { Query = new System.Object[] { this0.order.FirstName, this0.order.LastName, this0.order.OrderNumber, this0.order.Email, this0.order.Email.Split(new System.Char[] { '@' }), this0.order.CompanyName, this0.order.Payments.Select(payment => payment.PaymentIdentifier), this0.order.LicenseIds }, LastPaymentDate = this0.lastPayment == null ? this0.order.OrderedAt : this0.lastPayment.At })
And now we got to something far more readable .
Comments
Nice! Perhaps you could even leave out the "System." prefixes and "System.Object" altogether?
There are two changes I would make:
All in all, this is an awesome feature.
You could also use the code from ILSpy that transforms C# LINQ calls back into query expressions. (IntroduceQueryExpressions and CombineQueryExpressions transforms) Those two are purely syntactic transformations, they don't consume any additional information from previous decompiler stages.
Although pulling in a full-blown C# parser as a dependency might be overkill for this problem :)
Hi,
broken url... I can't see the url
http://ayende.com/blog/157665/data-virtualization-lazy-loading-stealth-pagingndash-whatever-you-want-to-call-it-herersquo-s-how-to-do-it-in-silverlight?key=f69eddad-8e64-4363-94ac-2da433d52515&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AyendeRahien+%28Ayende+%40+Rahien%29
"Then I had an affiany." I believe the word you wanted was "epiphany"
One thing I don't quite get: In the reduce result, you specify Query as a string, but in the mapping it's clearly an array of objects. I thought that these had to match?
Also (to repeat one of your favorite lines) - What are you actually trying to do here? If this is just an index of all of those properties, why do you need the array of objects at all?
If I was to guess, it looks like the index is such that you can search across all of these fields at the same time? If so, is this the recommended approach, and is it written up somewhere that I can't seem to find?
@Matt
That index is from this blog post http://ayende.com/blog/152833/orders-search-in-ravendb.
And yes, the idea is that you can search across several fields at the same time.
Also it's not a Map/Reduce query, it's just using ReduceResult as the type for the shape of the Map output.
Roy, Good idea, I'll see if that can be made to work.
configurator,
1) Will be done. 2) Cannot really work. What happen if you already use x or y in your lambdas already? this0 it much less likely
Daniel, We already have a dependency on NRefactory, although on the server, and not on the client, which is where this code is runnig. Any reference for how to use those two?
Matt, Thanks, typo fixed.
Regarding the results, RavenDB has indexing model & query model, they don't have to quite match from types perspective, because we do a lot of funcy stuff.
This index is explained here: http://ayende.com/blog/152833/orders-search-in-ravendb
I've extracted the query expression decompiler logic into a standalone program: https://gist.github.com/3414523
It might be a bit too aggressive though, sometimes it would be more readable to keep the lambdas around.
Thanks for the clarification on the search index. I actually have several places in my app where this technique will be useful. Thanks.
Daniel, Thanks, looks awesome. I wonder if there is a good way to get this without bringing the full parser in.
Comment preview