RavenDB performance optimizations
Just to note, you’ll probably read this post about a month after the change was actually committed.
I spent the day working on a very simple task, reducing the number of writes that RavenDB makes when we perform a PUT operation. I managed to reduce one write operation from the process, but it took a lot of work.
I thought that I might show you what removing a single write operation means, so I built a simple test harness to give me consistent numbers (in the source, look for Raven.Performance).
Please note that the perf numbers are for vanilla RavenDB, with the default configuration, running in debug mode. We can do better than that, but what I am interested in is not absolute numbers, but the change in those numbers.
Here are the results for build 124, before the change:
Wrote 5,163 documents in 5,134ms: 1.01: docs/ms
Finished indexing in 8,032ms after last document write
And here are the numbers for build 126, after the change:
Wrote 5,163 documents in 2,559ms: 2.02: docs/ms
Finished indexing in 2,697ms after last document write
So we get double the speed at write time, but we also get much better indexing speed, this is sort of an accidental by product, because now we index documents based on range, rather than on specific key. But it is a very pleasant accident.
Comments
Pretty good results ayende, should satisfy a lot of use-cases.
What type of documents are you using in these benchmarks?
According to github.com/.../Program.cs he's using relative small User-objects (Id, Email, Name).
Anyway, nice tweak!
Cool. I'm probably using a version before this was added and it was already fast. Faster than SQL and MongoDB in my simplistic tests. ..and I mean a LOT!! faster than both.
[This assumes the code at github.com/.../Program.cs is the code that has been run for those benchmarks]
Ayende, any reason you're calling SaveChanges once when batch == 128, then wait until after the 5,163th object has been processed to call SaveChanges again? Eg, you are not resetting batch to 0 in the if ( github.com/.../Program.cs#L560).
For anyone interested I have modified benchmarks to include timings for Redis as well. I've kept it as close as possible to the RavenDB example including the 128 batch size which Redis doesn't need.
Basically the results shows that Redis stores all 5,163 documents in 981ms making it 2.85x quicker than RavenDB in this scenario.
I have more information available on my blog post here:
http://www.servicestack.net/mythz_blog/?p=474
Although Redis and RavenDB are not exactly the same type of NoSQL data store (RavenDB is a document database while Redis is a data structures server) they still have some overlapping use cases.
Dennis, you seem to have copyed over the bug Simon Labrecque is talking about.
Does it make any difference when you reset the batch counter when 128 is reached?
// Ryan
Simon,
That is a bug, it should be batchSize % 128 == 0
Demis,
Just to point out, Redis writes to memory, RavenDB writes to disk
Ok so there seems to be some confusion how Redis works, so I'll just copy a paragraph from my blog explaining it in more detail:
http://www.servicestack.net/mythz_blog/?p=474
Why is Redis so fast?
Based on the comments below there appears to be some confusion as to what Redis is and how it works. Redis is high-performance a data structures server written in C that operates predominantly in-memory and routinely persists to disk and maintains an Append-only transaction log file for integrity – both of which are configurable and can be made to write to disk on every operation.
For redundancy it includes built-in replication where you can turn any redis instance into a slave of another, which can be configured at runtime. It also features its own Virtual Machine implementation so if your dataset exceeds your available memory, un-frequented values are swapped out to disk whilst the hot values remain in memory.
Like other high-performance network servers e.g. Nginx, Node.js, etc it achieves maximum efficiency by having each Redis instance is a single process where all IO is asynchronous and no time is wasted context-switching between threads.
It achieves concurrency is by being really fast and achieves integrity by having all operations atomic. You are not just limited to the available transactions either as you can compose any combination of Redis commands together and process them atomically in a single transaction.
Demis,
Did you configure your Redis server to write to disk on every operation (to match more closely what RavenDB is doing)?
The benchmarks are both using the standard configuration for both servers, so no.
I will re-run the benchmarks with the bug fix and configure it to write on every operation when I get home tonight.
Okay new benchmarks are in - details in my blog under the heading: Benchmarks – Take 2
http://www.servicestack.net/mythz_blog/?p=474
As any additional overhead is multiplied when the 'fsync' option is on, I removed some of these overheads imposed on the Redis Client i.e. active entity id tracking and batching (as its not required for Redis) before enabling the appendonly transaction log with ‘fsync always’ option.
Note: I’m using Redis's batch-ful MSET operation behind the scenes, so the fsync penalty is only paid once.
The new benchmarks show Redis is now 11.75x faster than RavenDB with this configuration.
If you disable the append only transaction log Redis becomes 16.9x faster than RavenDB.
Not saying performance is the most important metric just wanted to show that Redis provides a high-performance NoSQL solution for .NET clients. Multiple choices benefit everyone.
Nice! I'd love to have accidents like this!
Side question:
Any idea when you guys are going to implement geocoding support at the core of Raven? I thought about hacking it in myself, but at the rate of change right now I figured that would be a bad idea. Alternatively, I could perform the algos outside in our logic but I'd rather they be native. (Map/Reduce seems like our best bet atm).
Thanks,
Chance
Chance,
RavenDB already support spatial queries. I need to document it, though
Ah! I can't believe I missed the email alert for your comment Ayende. That's awesome man, thanks!
By the way, its still on your Todo list. If you've finished that, I can only imagine what else you've knocked off of that list. You guys are rocking hard on Raven - keep it up!
Comment preview