Voron Performance, let go off that tree, dude!
B-Trees has a few really awesome properties, chief among them is that they are playing very well with the hierarchical nature of data access that we have in our systems (L1 – L3 caches, main memory, disk, etc). Another really nice property is that searches in a B-Trees have very low costs.
However, it became apparent to us during performance tests that actually searching the tree was very costly. Assuming that we have the following pseudo code:
1: for i in range(1, 1000 * 1000):2: add(i, "val")
What is the cost of actually doing this? Well, an interesting tidbit about this is that every time we add an item to the tree, we need to search it, so we can find where in the tree that new value goes.
Cost of search the tree, based on size:
Number of entries |
Cost |
1 |
0.19 |
1,000 |
1.93 |
10,000 |
2.57 |
25,000 |
2.83 |
50,000 |
3.02 |
100,000 |
3.21 |
250,000 |
3.47 |
500,000 |
3.66 |
1,000,000 |
3.86 |
2,000,000 |
4.05 |
The cost, by the way, is O(log36 (N+1)). The log36 comes from the number of entries that we can fit in a single page. This cost ignores the actual search inside a page. But it is a good enough approximation for our needs.
Now, the cost of actually inserting 1 million items is the sum of all of those costs. Which means that the cost for 1 million is 3,576,242.35. This is another example of Schlemiel the Painter algorithm.
What we did was introduce a small cache, which remember the last few pages that we inserted to. That turned the cost down from searching the tree to checking the cache, where we can usually find the value, and gave a nice performance boost.
Comments
Well, it's much cheaper to push items on stack or add to a linked list. But if you want that data to be indexed,you need to maintain the index. There's no way to do that in constant time. Poor Shlemil can't carry such huge bucket of paint everywhere on his back.
... but you gave him a nice little can of paint that he can carry everywhere and refill when it's empty. (after realizing i missed the last sentence)
Rafal, Yes, that is pretty much it. By not having to check the full tree very often, we save a lot of back & forth that would get the same result.
I am impressed by Mehdi Gholam's MGIndex in Raptor found here: http://www.codeproject.com/Articles/316816/RaptorDB-The-Key-Value-Store-V2 Would love to hear your thoughts on this index design vs b-trees. An interesting high performance index in c#.
CCB+ Tree in action =).
@Ayende have you considered reusing "the cursor" from the last active transaction as part of a caching strategy? This appears to be especially useful in case inserts or searches are sequential.
Alex, This is effectively what I am doing.
Why does the cache remember only last pages to insert to?
It has to remember last most accessed pages. Even in case of bulk inserts, the insert location is not guaranteed to be the same.
Lex, The cache remember the last few pages because if we tried to remember the most accessed pages, we would have a lot more complexity in the cache. Instead, we choose to remember the last few accessed, which make it a lot easier to handle. In practice, it doesn't matter. A cache is local to a transaction anyway.
Infinitas, The major problem there is that he seems to be doing several things differently than us: * Concurrent write access to the tree - while we allow only a single writer. * It is a lot more costly in terms of CPU, it appears. * No support for anything like transactions or ACID * And while it is hard to compare, it is easy to see that we are at least one order of magnitude faster than it according to their benchmark.
So it isn't really an interesting topic.
In LMDB cursor_set, we check to see if the new key belongs to the page the cursor is pointing to, before doing a full top-down tree traversal. Thus sequential inserts are always fast/constant time.
Comment preview