Building data stores – Transaction log

time to read 3 min | 499 words

One of the interesting aspects in building a data store is that you run head on into things that you would generally leave to the infrastructure. By far, most developers deal with concurrency by relegating that responsibility to a database.

When you write your own database, you have to build this sort of thing. In essence, we have two separate issues here:

  • Maximizing Concurrency – does readers wait for writers? does writers wait for readers? does writers wait for writers?
  • Ensuring Consistency – can I read uncommitted data? can I read partially written data?

As I mentioned in my previous post, there are two major options when building a data store, Transaction Log & Append Only. There are probably a better name for each, but that is how I know them.

This post is going to focus on transaction log. Transaction log is actually pretty simple idea, conceptually. It simply requires that you would state what you intend to do before you do it, in such a way that you can reverse it.

For example, let us say that I want to store “users/ayende” –> "ayende@ayende.com”. All I need to do is to write the following to the transaction log.

{
   "Key": "users/ayende",
   "OldVal": "AYENDE@AYENDE.COM",
   "NewVal": "ayende@ayende.com",
   "TxId": 19474774
}

If the data store crashes before the transaction is completed, we can run a recovery process that would resolve any issues in the actual data from the transaction log. Once a transaction is committed, we can safely delete it from the transaction log.

As I said, conceptually it is a very simple idea, but it leads to some interesting implementation challenges:

  • You can optimize things by not writing to disk (except for writing to the transaction log) immediately.
  • You need to keep track of concurrent transactions touching the same records.
  • You need to handle background flushing to disk.
  • The crash recovery process can be.. finicky to write.

Concurrency control is something that you essentially have to roll on your own, and you can make it as granular as you feel like. There is some complexity involved in ensuring that you read the appropriate data from the transaction log / data file (based on whatever you are in a transaction reading data you modified or outside it, reading the old data), but where it gets really complex is the number of moving parts that you have to deal with.