Voron, LMDB and the external APIs, on my!

time to read 12 min | 2250 words

One of the things that I really don’t like in LMDB is the API that is exposed to the user. Well, it is C, so I guess there isn’t much that can be done about it. But let look at the abstractions that are actually exposed to the user by looking how you usually work with Voron.

   1: using (var tx = Env.NewTransaction(TransactionFlags.ReadWrite))
   2: {
   3:     Env.Root.Add(tx, "key/1", new MemoryStream(Encoding.UTF8.GetBytes("123")));
   4:  
   5:     tx.Commit();
   6: }
   7:  
   8:  
   9: using (var tx = Env.NewTransaction(TransactionFlags.Read))
  10: {
  11:     using(var stream = Env.Root.Read(tx, "key/1"))
  12:     using (var reader = new StreamReader(stream))
  13:     {
  14:         var result = reader.ReadToEnd();
  15:         Assert.Equal("123", result);
  16:     }
  17:     tx.Commit();
  18: }

This is a perfectly nice API, it is quite explicit about what is going on, and it gives you a lot of options with regards to how to actually make things happen. It also gives the underlying library about zero chance to do interesting things. Worse, it means that you have to know, upfront, if you want to do a read only or a read/write operation. And since there can be only one write transaction at any given point in time… well, I think you get the point. If you code doesn’t respond well to explicit demarcation between read/write, you have to create a lot of writes transaction, essentially serializing pretty much your entire codebase.

Now, sure, you might have good command / query separation, right? So you have queries for reads and commands for writes, problem solved. Except that the real world doesn’t operate in this manner. Let us consider the trivial case of a user logging in. When a user logs in, we need to check the credentials, and if they are wrong, we need to mark it so we can lock the account after 5 failed tries. That means either having to always do the login in a write transaction (meaning only one user can log it at any time) or we start with a read transaction, then we switch to a write transaction when we need to write.

Either option isn’t really nice as far as I am concerned. Therefor, I came with a different API (which is internally based on the one above). This now looks like this:

   1: var batch = new WriteBatch();
   2: batch.Add("key/1", new MemoryStream(Encoding.UTF8.GetBytes("123")), null);
   3:  
   4: Env.Writer.Write(batch);
   5:  
   6: using (var snapshot = Env.CreateSnapshot())
   7: {
   8:     using (var stream = snapshot.Read(null, "key/1"))
   9:     using (var reader = new StreamReader(stream))
  10:     {
  11:         var result = reader.ReadToEnd();
  12:         Assert.Equal("123", result);
  13:     }
  14: }

As you can see, we make use of snapshots & write batches. Those are actually ideas taken from LevelDB. A write batch is a set of changes that we want to apply to the database. We can add any number of changes to the write batch, and it require no synchronization. When we want to actually write those changes, we call Writer.Write(). This will take the entire batch and apply it as a single transactional unit.

However, while it will do so as a single unit, it will also be able to merge concurrent calls to WriteBatch into a single write transaction, increasing the actual concurrency we gain by quite a bit. The expected usage pattern is that you create a snapshot, do whatever you need to do when reading the data, including maybe adding/removing stuff via a WriteBatch, and finally you write it all out.

Problems with this approach:

  • You can’t read stuff that you just added, because they haven’t been added yet to the actual storage yet. (Generally not that much of an issue in our expected use case)
  • You need to worry about concurrently modifying the same value in different write batches. (We’re going to add optimistic concurrency option for that purpose)

Benefits of this approach:

  • We can optimize concurrent writes.
  • We don’t have to decide in advance whatever we need to read only or read / write.