RavenDB and output collection pattern
A map/reduce index in RavenDB can be configured to output its value to a collection. This seems like a strange thing to want to do at first. We already got the results of the index, in the index. Why do we want to duplicate that by writing them to collections?
As it turns out, this is a pretty cool feature, because it enable us to do quite a lot. It means that we can apply anything that work on documents on the results of a map/reduce index. This list include:
- Map/Reduce – so you can create recursive / chained map/reduce operations.
- ETL – so you can push aggregated data to another location, allowing distributed aggregation at scale easily.
- Subscription / Changes – so you can get notified when an aggregated value has been changed.
The key about the list above is that all of them don’t require you to know upfront the id of the generated documents. Indeed, RavenDB uses documents ids like the following for such documents:
Technically speaking, you can compute the id. RavenDB uses a predictable algorithm to generate such an id, but practically speaking, it can be hard to figure out exactly what the inputs are for the id generation. That means that certain document related features are not available. In particular, you can’t easily:
- Include such a document
- Load it directly (you have to query)
So we need a better option to deal with it. The way RavenDB solves this issue is by allowing you to specify a pattern for the output collection, like so:
As you can see, we have a map/reduce index that group by the company and year (marked in blue). We output the collection to YearlySummary, as shown in the previous image.
The pattern (marked in red) specify how we should name the output documents. Here is the result of this index:
And here is what this document looks like:
Huh?
This is strange, you probably think. This is the document we need to show the summary for companies/9-A in 1998, but there is no such data here. Instead, you’ll notice that the document collection is references (marked in red) and that it points to (marked in blue) the actual document with the data. Why do we do things this way?
A map/reduce document is free to output multiple results for the same reduce key, so we need to handle multiple documents here. We also have to deal with multiple reduce outputs that end up with the same pattern. For example, if we use map/reduce by day, but our pattern only specify the month, we’ll have multiple reduce keys that end up with the same pattern.
In practice, because RavenDB has great support for following documents by id, it doesn’t matter. Here is how I can use this index in a query:
This single query allow us to ask a question about companies (those that reside in London, in this case), as well as sales total data for a particular year. Note that this doesn’t do any joins or anything expensive. We have the information at hand, and can just use it.
You’ll notice that the pattern we specified is using both items that we reduce by. But that isn’t mandatory. We can also use this:
Here we only specify the company in the pattern. What would be the result?
Now we get the sales total for the company, on a per year basis.
We can now run the following query:
And this will give us the following output:
As you can imagine, this opens up quite a few possibilities for advanced features. In particular, it means that you can make it even easier for you to show and process aggregate information and work through complex object models.
Comments
Always happy to read how some real use cases can be accomplished with such elegance. Thanks for the great work.
Nice, this (loading docs from an OutputCollection by effectively a "natural key") will allow for elaborate cascading map/reduce indexes, without having to use client code (and latency) to shovel around data.
Comment preview