Awesome indexing with RavenDB

time to read 18 min | 3561 words

I am currently teaching a course in RavenDB, and as usual during a course, we keep doing a lot of work that pushes what we do with RavenDB. Usually because we try to come up with new scenarios on the fly and adapting to the questions from the students.

In this case, we were going over the map/reduce stack and we kept coming more and more complex example and how to handle them, and then we got to this scenario.

Given the following class structure:

   1: public class Animal   2: {   3:     public string Name { get; set; }   4:     public string Species { get; set; }   5:     public string Breed { get; set; }   6: }

Give me the count of all the species and all the breeds. That is pretty easy to do, right? In SQL, you would write it like this:

   1: SELECT Species, Breed, Count(*) FROM Animals   2: GROUP BY Species, Breed

And that is nice, but it still means that you have to do some work on the client side to merge things up to get the final result, since we want something like this:

Dogs: 6
- German Shepherd: 3
- Labrador: 1
- Mixed: 2
Cats: 3
- Street: 2
- Long Haired: 1

In RavenDB, we can express the whole thing in a simple succinct index:

   1: public class Animals_Stats : AbstractIndexCreationTask<Animal, Animals_Stats.ReduceResult>   2: {   3:     public class ReduceResult   4:     {   5:         public string Species { get; set; }   6:         public int Count { get; set; }   7:         public BreedStats[] Breeds { get; set; }   8:     9:         public class BreedStats  10:         {  11:             public string Breed { get; set; }  12:             public int Count { get; set; }  13:         }  14:     }  15:    16:     public Animals_Stats()  17:     {  18:         Map = animals =>  19:                 from animal in animals  20:                 select new  21:                     {  22:                         animal.Species,  23:                         Count = 1,  24:                         Breeds = new [] {new {animal.Breed, Count = 1}}  25:                     };  26:         Reduce = animals =>  27:                     from r in animals  28:                     group r by r.Species  29:                     into g  30:                     select new  31:                         {  32:                             Species = g.Key,  33:                             Count = g.Sum(x => x.Count),  34:                             Breeds = from breed in g.SelectMany(x => x.Breeds)  35:                                     group breed by breed.Breed  36:                                     into gb  37:                                     select new {Breed = gb.Key, Count = gb.Sum(x => x.Count)}  38:                         };  39:    40:     }  41: }

And the result of this beauty?

And that is quite pretty, even if I say so myself.

Tweet Share Share 21 comments

Tags:

raven

Comments

05 Oct 2012
13:17 PM

TomCollins

In SQL you could do similar things. This example works on Oracle 11:

CREATE TABLE Animal ( Name VARCHAR2(30), Species VARCHAR2(30), Breed VARCHAR2(30));

INSERT INTO Animal (Name, Species, Breed) VALUES ('Albert','Dog','German Shepherd'); INSERT INTO Animal (Name, Species, Breed) VALUES ('Berta','Dog','German Shepherd'); INSERT INTO Animal (Name, Species, Breed) VALUES ('Carla','Dog','German Shepherd'); INSERT INTO Animal (Name, Species, Breed) VALUES ('Doris','Dog','Labrador'); INSERT INTO Animal (Name, Species, Breed) VALUES ('Emil','Dog','Mixed'); INSERT INTO Animal (Name, Species, Breed) VALUES ('Fritz','Dog','Mixed'); INSERT INTO Animal (Name, Species, Breed) VALUES ('Gerd','Cat','Street'); INSERT INTO Animal (Name, Species, Breed) VALUES ('Hans','Cat','Street'); INSERT INTO Animal (Name, Species, Breed) VALUES ('Ida','Cat','Long Haired');

SELECT Species, Breed, Count FROM (SELECT Species, 'Sum' AS Breed, 0 AS Detaillevel, COUNT() AS Count FROM Animal GROUP BY Species UNION SELECT Species, Breed, 1 AS Detaillevel, COUNT() AS Count FROM Animal GROUP BY Species, Breed ORDER BY 1, 3, 2);

Result: SPECIES BREED COUNT Cat Sum 3 Cat Long Haired 1 Cat Street 2 Dog Sum 6 Dog German Shepherd 3 Dog Labrador 1 Dog Mixed 2

05 Oct 2012
13:19 PM

TomCollins

Hope this format will look nicer:

CREATE TABLE Animal ( Name VARCHAR2(30),

Species VARCHAR2(30),

Breed   VARCHAR2(30));

INSERT INTO Animal (Name, Species, Breed) VALUES ('Albert','Dog','German Shepherd');

INSERT INTO Animal (Name, Species, Breed) VALUES ('Berta','Dog','German Shepherd');

INSERT INTO Animal (Name, Species, Breed) VALUES ('Carla','Dog','German Shepherd');

INSERT INTO Animal (Name, Species, Breed) VALUES ('Doris','Dog','Labrador');

INSERT INTO Animal (Name, Species, Breed) VALUES ('Emil','Dog','Mixed');

INSERT INTO Animal (Name, Species, Breed) VALUES ('Fritz','Dog','Mixed');

INSERT INTO Animal (Name, Species, Breed) VALUES ('Gerd','Cat','Street');

INSERT INTO Animal (Name, Species, Breed) VALUES ('Hans','Cat','Street');

INSERT INTO Animal (Name, Species, Breed) VALUES ('Ida','Cat','Long Haired');

SELECT Species, Breed, Count FROM

(SELECT Species, 'Sum' AS Breed, 0 AS Detaillevel, COUNT(*) AS Count FROM Animal GROUP BY Species

UNION

SELECT Species, Breed, 1 AS Detaillevel, COUNT(*) AS Count FROM Animal GROUP BY Species, Breed

ORDER BY 1, 3, 2);

SPECIES BREED COUNT

Cat Sum 3

Cat Long Haired 1

Cat Street 2

Dog Sum 6

Dog German Shepherd 3

Dog Labrador 1

Dog Mixed 2

05 Oct 2012
14:04 PM

wqweto

SELECT Species, Breed, Count(*) FROM Animals

GROUP BY Species, Breed

WITH ROLLUP

05 Oct 2012
14:13 PM

justin

What about ROLLUP which was designed for exactly this purpose and is almost every SQL implementation?

SQL server has had it since at SQL 2000:

http://msdn.microsoft.com/en-us/library/ms189305(v=SQL.90).aspx

05 Oct 2012
14:23 PM

Rasmus Schultz

Doing this with rollup in SQL yields the same result - but there's an important difference: the result of the projection is materialized, so it's extremely fast to query. Of course, if your SQL engine supports materialized views, again, you can accomplish something similar - but I'll be you anything, RavenDB indexes will still beat the performance of materialized views in any SQL engine ;-)

05 Oct 2012
14:26 PM

Ajai

select species, breed, count(1) from animals group by rollup(species, breed)

And that is pretty if I say so myself :)

Ajai

05 Oct 2012
15:41 PM

João P. Bragança

If only there were an in memory version of sql server to allow tests on top of this to execute quickly.

05 Oct 2012
17:13 PM

Justin

"but I'll be you anything, RavenDB indexes will still beat the performance of materialized views in any SQL engine ;-)"

I'll bet you a database engine that been around many more years than Raven has a good chance of winning that bet. Dumping row from a materialized view will be one of the fastest thing a SQL server can do.

"If only there were an in memory version of sql server to allow tests on top of this to execute quickly."

SQL server will host the pages of a materialized view in RAM for reads if it has enough. If your worried about write performance and not durability then put the SQL server db files on a RAMDISK:

http://www.sqltreeo.com/wp/sql-server-on-steroids-with-ram-disk/

05 Oct 2012
20:32 PM

João P. Bragança

"http://www.sqltreeo.com/wp/sql-server-on-steroids-with-ram-disk/"

That sounds suspiciously like effort. I'd rather have something xcopy deployable.

07 Oct 2012
08:20 AM

Rafal

I'm all enthusiastic about NoSQL but we still don't have a true data processing language. The C# + map-reduce construct is not very elegant compared to SQL select with rollup.

07 Oct 2012
10:21 AM

Ayende Rahien

Justin, There is a very important difference between the way RavenDB and SQL Server. RavenDB is going to compute the result once and then use that as the source for the query. SQL Server is going to have to go through the entire data set, on each query. Materialized views (indexed views in SQL Server) and quite limited - for example, they can't do Rollups.

07 Oct 2012
15:24 PM

Justin

"RavenDB is going to compute the result once and then use that as the source for the query."

Probably why everyone is talking about materialized views no?

In the original post you said: "And that is nice, but it still means that you have to do some work on the client side to merge things up to get the final result"

This has nothing to do with how often Raven or SQL computes the result, you said SQL simply could not compute the result on it own, which means one of two things, either you did not know ROLLUP existed or you are purposely are saying SQL the language can do less than it actually does.

Also the limitation of ROLLUP not being in am materialized view is a MS SQL Server limitation not a SQL/Relational limitation. Oracle and DB2 have no problem with ROLLUP in materialized view. Again this just shows that you simply don't know what is possible in the relational world or you are purposely ignoring the facts. For MS SQL ever heard of triggers or the job scheduler?

Oracle can even defer the refresh of of the materialized view for later if you want Raven DB like staleness.

07 Oct 2012
15:28 PM

Ayende Rahien

Justin, Are you aware of my history with regard to RDBMS?

07 Oct 2012
15:30 PM

Justin

Yes, so which is it, did you not know about ROLLUP or purposely say SQL can't do it?

07 Oct 2012
15:43 PM

Ayende Rahien

Justin, The only place where I talked about SQL was when I wanted to show the simple aggregation. I had nothing to say about the feasibility of this in SQL.

07 Oct 2012
16:06 PM

Justin

Still not answering the question.

You most definitely talked about the feasibility in SQL by saying work must be done on the client side while this was not necessary in Raven, simply untrue.

The whole article's premise was a contrast to SQL yet didn't mention the ROLLUP operator at all even though it is the textbook example of ROLLUP. You might as well have left off GROUP BY and COUNT and done ALL the work client side and declared SQL can't do aggregation but Raven can.

Adding WITH ROLLUP to the SQL statement is even simpler and more succinct than Raven if I do say so myself. 3 lines of SQL vs 40+ lines of C# map/reduce for the same result.

07 Oct 2012
20:21 PM

Duke

3 lines of SQL has my vote too on this one

07 Oct 2012
22:10 PM

Mr_Simple

08 Oct 2012
09:14 AM

Rytis

I like the idea about simple things expressed in 40 lines of code instead of 2-3lines :)

09 Oct 2012
15:15 PM

João P. Bragança

Yes, forget test-ability! This won't run on SQLITE so there goes quick test execution. Then again when was the last time you saw a DBA write a test for anything?

09 Oct 2012
19:47 PM

Alex Henderson

An application I work on, implements this really easily as part of an in-built query language - here's the one-liner in that case:

GROUP BY Species { GROUP BY Breed { COUNT } }

http://dev.enterprisetester.com/help/#Search_TQL_Aggregation

http://blog.bittercoder.com/2012/08/20/odata-tql-and-filtering/

We don't build an index, but the result sets are built on the fly against Lucene.Net indexes, so it's pretty fast (few milliseconds) to execute.

I think the ROLLUP/CUBE can work well for these scenarios too when using Sql Server (unfortunately I don't have the luxury of supporting just a single Database) - but I think as things get more complex, like faceted results, or including a list of entities matching a facet (top 10 / bottom 10 etc.) then the Sql Server solution becomes a lot more complex...

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB