Raven.Munin

time to read 7 min | 1390 words

Raven.Munin is the actual implementation of a low level managed storage for RavenDB. I split it out of the RavenDB project because I intend to make use of it in additional projects.

At its core, Munin provides high performance transactional, non relational, data store written completely in managed code. The main point in writing it was to support the managed storage in RavenDB, but it is going to be used for Raven MQ as well, and probably a bunch of other stuff as well. I’ll post about Raven MQ in the future, so don’t bother asking about it.

Let us look at a very simple API example. First, we need to define a database:

public class QueuesStorage : Database
{
    public QueuesStorage(IPersistentSource persistentSource) : base(persistentSource)
    {
        Messages = Add(new Table(key => key["MsgId"], "Messages")
        {
            {"ByQueueName", key => key.Value<string>("QueueName")},
            {"ByMsgId", key => new ComparableByteArray(key.Value<byte[]>("MsgId"))}
        });

        Details = Add(new Table("Details"));
    }

    public Table Details { get; set; }

    public Table Messages { get; set; }
}

This is a database with two tables, Messages and Details. The Messages table has a primary key of MsgId, and two secondary indexes by queue name and by message id. The Details table is sorted by the key itself.

It is important to understand one very important concept about Munin. Data stored in it is composed of two parts. The key and the data. It is easier to explain when you look at the API:

bool Remove(JToken key);
Put(JToken key, byte[] value);
ReadResult Read(JToken key);

public class ReadResult
{
    public int Size { get; set; }
    public long Position { get; set; }
    public JToken Key { get; set; }
    public Func<byte[]> Data { get; set; }
}

Munin doesn’t really care about the data, it just saves it. But the key is important. In the Details table case, the table would be sorted by the full key. In the Messages table case, things are different. We use the lambda to extract the primary key from the key for each item, and we use additional lambdas to extract secondary indexes. Munin can build secondary indexes only from the key, not from the value. It is only the secondary indexes that allow range queries, the PK allow only direct access, which is why we define both the primary key and a secondary index on MsgId.

Let us see how we can save a new message:

public void Enqueue(string queue, byte[]data)
{
    messages.Put(new JObject
    {
        {"MsgId", uuidGenerator.CreateSequentialUuid().ToByteArray()},
        {"QueueName", queue},
    }, data);
}

And now we want to read it:

public Message Dequeue(Guid after)
{
    var key = new JObject { { "MsgId", after.ToByteArray()} };
    var result = messages["ByMsgId"].SkipAfter(key).FirstOrDefault();
    if (result == null)
        return null;

    var readResult = messages.Read(result);

    return new Message
    {
        Id = new Guid(readResult.Key.Value<byte[]>("MsgId")),
        Queue = readResult.Key.Value<string>("Queue"),
        Data = readResult.Data(),
    };
}

We do a bunch of stuff here, we scan the secondary index for the appropriate value, then get it from the actual index, load it into a DTO and return it.

Transactions

Munin is fully transactional, and it follows an append only, MVCC, multi reader single writer mode. The previous methods are run in the context of:

using (queuesStroage.BeginTransaction())
{
    // use the storage
    queuesStroage.Commit();
}

Storage on disk

Munin can work with either a file or in memory (which makes unit testing it a breeze). It uses an append only model, so on the disk it looks like this:image

Each of the red rectangle represent a separate transaction.

In memory data

You might have noted that we don’t keep any complex data structures on the disk. This is because all the actual indexing for the data is done in memory. The data on the disk is used solely for building the index in memory. Do note that the actual values are not held, only the keys. That means that search Munin indexes is lightning fast, since we never touch the disk for the search itself.

The data is actually held in an immutable binary tree, which gives us the ability to do MVCC reads, without any locking.

Compaction

Because Munin is using the append only model, it require periodic compaction. It does so automatically in RavenDB, waiting for periods of inactivity to do so.

Summary

Munin is a low level api, not something that you are likely to use directly. And it was explicitly modeled to give me an interface similar in capability to what Esent gives me, but in purely managed code.

Please note that is is released under the same license as RavenDB, AGPL.