Voron Performance, let go off that tree, dude!

time to read 3 min | 470 words

B-Trees has a few really awesome properties, chief among them is that they are playing very well with the hierarchical nature of data access that we have in our systems (L1 – L3 caches, main memory, disk, etc). Another really nice property is that searches in a B-Trees have very low costs.

However, it became apparent to us during performance tests that actually searching the tree was very costly. Assuming that we have the following pseudo code:

   1: for i in range(1, 1000 * 1000):
   2:    add(i, "val")

What is the cost of actually doing this? Well, an interesting tidbit about this is that every time we add an item to the tree, we need to search it, so we can find where in the tree that new value goes.

Cost of search the tree, based on size:

Number of entries

Cost

1

0.19

1,000

1.93

10,000

2.57

25,000

2.83

50,000

3.02

100,000

3.21

250,000

3.47

500,000

3.66

1,000,000

3.86

2,000,000

4.05

The cost, by the way, is O(log36 (N+1)). The log36 comes from the number of entries that we can fit in a single page. This cost ignores the actual search inside a page. But it is a good enough approximation for our needs.

Now, the cost of actually inserting 1 million items is the sum of all of those costs. Which means that the cost for 1 million is 3,576,242.35. This is another example of Schlemiel the Painter algorithm.

What we did was introduce a small cache, which remember the last few pages that we inserted to. That turned the cost down from searching the tree to checking the cache, where we can usually find the value, and gave a nice performance boost.