RavenDB–Indexing performance timeline

time to read 3 min | 437 words

We get people asking us about the details of RavenDB indexes, why a certain index is busy, what is costly, etc. This is a pretty common set of questions, and while we were always able to give the customer a good answer, I was never truly happy with that. A lot of that was about understanding the system to a very deep level and being able to read fairly obscure to an outsider. Like most things that aren’t obvious and easy to use, that bugged me, deeply. Therefor, we spent a few weeks adding internal metrics and exposing everything in a nice and easy fashion.

Let me show you the details from one of our production databases:

image

The Y axis is for different indexes. On the Y axis, you have time. Note that the top part is for map operations, and the bottom part is for reduce operations, and that they are running in parallel. You can also visually see that those the map & reduce are actually running in parallel.

What is even better, you can visually see each individual indexing batch, let us focus on the long indexing batch in the middle:

image

As you can see, we are indexing 512 documents, but it took us a really long time to index them, why is that? We can see that the reason that this batch took so long is that the Orders/Stats and Products/Stats index were very long. Clicking on them, we can see:

image

 

The Orders/Stats index accepted a 105 documents, and outputted 1,372 map values. And the costly part of this index was to actually commit those (along with the Products/Stats, which also output a similar amount) to the map storage.

Another slow index is exposed here:

image

In this case, we have an index that is big enough that we need to flush it to disk from RAM. That explains things.

I’m really excited about this feature.