Codex KV: How to build a KV storage from scratch

Jun 05 2018

Codex KVHow to build a KV storage from scratch

time to read 3 min | 553 words

We are exploring a few data structure for a particular feature in RavenDB, and I run into something that is elegant, simple, easy and deep enough that we can discuss serious implementation details upon without getting too bogged down in the details.

The idea is that I’m going to be using this series of blog post to post a detailed walk through about building a key value store from scratch. Including all the intermediate steps and wrong turns along the way. In other words, this is a “Show YourWork” kind of series. The end result is going to be a key/value store that can:

Store arbitrary keys / values.
Get key’s value by the key.
Support range queries and iteration.
Support some form of ACID.

In this case, I’m going to start from the very basics and build up. The challenge we are going to deal with is ingesting all the titles of articles in Wikipedia, about 277MB of them. I took them from here: (https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz). There are 13,755,095 of them in my case.

I’m calling the KV store that we’ll be creating Codex. And I’m going to start from the most basic of example, just being able to store and check if a value exists in the store. Here is the code that reads from the titles list and add them to the store. Note that the articles are sorted, but we don’t want this advantage of adding sorted data, so we randomize things.

The question here, how are we going to store these titles in a way that allow us fast retrieval? Here is the idea, we are going to write the strings to the output file as they come, and also record their positions. When we are done inserting strings to the codex, we’ll run a sort based on the positions, and that will give us an array of offsets to the strings in the files, sorted by their value. The first version of this code looks like this:

If you’ll run this code on the Wikipedia titles, you’ll find that it takes a while to run. On my machine, that took just under 50 minutes.

Well, we are dealing with the full set of Wikipedia titles, but even so, that doesn’t sound like it should take this long. What gives?

Let’s analyze what is going on here, okay? If you run this code, you’ll note that it isn’t using CPU or I/O or really seems to be doing much. What is going on?

The key here is in the ReadFrom method. There, we do two seemingly innocent actions. We set the file’s position (translate to SetFilePointer call) and read a short string (translate to a ReadFile call). Now, why is that expensive? Well, the ReadFrom method is called twice each time we need to sort an entry. In this case, it means that ReadFrom will be called a total of 575,616,878 times.

That is not a typo. And each invocation means two separate system call. In other words, this innocent seeming piece of code executed over 1.15 billion system calls.

For reference, simple by reading the entire file to a MemoryStream and keeping everything else the same, I was able to bring the cost of this operation down to under 3 minutes.

Lesson learned, system calls are expensive, let’s try to reduce them as much as we can.

Tweet Share Share 0 comments

Tags:

Oren Eini

Oren Eini

CEO of RavenDB