Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 6 min | 1068 words

Originally posted at 4/19/2011

Yesterday I posted the following challenge:

Given the following API, can you think of a way that would prevent memory leaks?

public interface IBufferPool
{
    byte[] TakeBuffer(int size);
    void ReturnBuffer(byte[] buffer);
}

The problem with having something like this is that forgetting to return the buffer is going to cause a memory leak. Instead of having that I would like to have the application stop if a buffer is leaked. Leaked means that no one is referencing this buffer but it wasn’t returned to the pool.

What I would really like is that when running in debug mode, leaking a buffer would stop the entire application and tell me:

  • That a buffer was leaked.
  • What was the stack trace that allocated that buffer.

Let us take a look at how we are going about implementing this, shall we? I am going to defer the actual implementation of the buffer pool to System.ServiceModel.Channels.BufferManager and focus on providing the anti leak features. The result is that this code:

IBufferPool pool = new BufferPool(1024*512, 1024);

var buffer = pool.TakeBuffer(512);
GC.WaitForPendingFinalizers(); // nothing here

pool.ReturnBuffer(buffer);
buffer = null;
GC.WaitForPendingFinalizers(); // nothing here, we released the memory properly

pool.TakeBuffer(512); // take and discard a buffer without returning to the pool
GC.WaitForPendingFinalizers(); // failure!

Will result in the following error:

Unhandled Exception: System.InvalidOperationException: A buffer was leaked. Initial allocation:
   at ConsoleApplication1.BufferPool.BufferTracker.TrackAllocation() in IBufferPool.cs:line 22
   at ConsoleApplication1.BufferPool.TakeBuffer(Int32 size) in IBufferPool.cs:line 60
   at ConsoleApplication1.Program.Main(String[] args) in Program.cs:line 21

And now for the implementation:

public class BufferPool : IBufferPool
{
    public class BufferTracker
    {
        private StackTrace stackTrace;

        public void TrackAllocation()
        {
            stackTrace = new StackTrace(true);
            GC.ReRegisterForFinalize(this);
        }

        public void Discard()
        {
            stackTrace = null;
            GC.SuppressFinalize(this);
        }

        ~BufferTracker()
        {
            if (stackTrace == null)
                return;

            throw new InvalidOperationException(
                "A buffer was leaked. Initial allocation:" + Environment.NewLine + stackTrace
                );
        }
    }

    private readonly BufferManager bufferManager;
    private ConditionalWeakTable<byte[], BufferTracker> trackLeakedBuffers = new ConditionalWeakTable<byte[], BufferTracker>();

    public BufferPool(long maxBufferPoolSize, int maxBufferSize)
    {
        bufferManager = BufferManager.CreateBufferManager(maxBufferPoolSize, maxBufferSize);
    }

    public void Dispose()
    {
        bufferManager.Clear();
        // note that disposing the pool before returning all of the buffers will cause a crash
    }

    public byte[] TakeBuffer(int size)
    {
        var buffer = bufferManager.TakeBuffer(size);
        trackLeakedBuffers.GetOrCreateValue(buffer).TrackAllocation();
        return buffer;
    }

    public void ReturnBuffer(byte[] buffer)
    {
        BufferTracker value;
        if(trackLeakedBuffers.TryGetValue(buffer, out value))
        {
            value.Discard();
        }
        bufferManager.ReturnBuffer(buffer);
    }
}

As you can see, utilizing ConditionalWeakTable is quite powerful, since it allows us to support a lot of really advanced scenarios in a fairly simple ways.

time to read 2 min | 208 words

Given the following API, can you think of a way that would prevent memory leaks?

public interface IBufferPool
{
    byte[] TakeBuffer(int size);
    void ReturnBuffer(byte[] buffer);
}

The problem with having something like this is that forgetting to return the buffer is going to cause a memory leak. Instead of having that I would like to have the application stop if a buffer is leaked. Leaked means that no one is referencing this buffer but it wasn’t returned to the pool.

What I would really like is that when running in debug mode, leaking a buffer would stop the entire application and tell me:

  • That a buffer was leaked.
  • What was the stack trace that allocated that buffer.

A solution will be posted tomorrow.

time to read 3 min | 514 words

One of the annoying bits about Silverlight is how much of what I consider core API isn’t there. For example, concurrent collections or thread local.

I didn’t have the time to implement concurrent collections properly, but thread local isn’t really that hard. Here is a simple implementation:

public class ThreadLocal<T>
{
    private readonly Func<T> valueCreator;

    public class Holder
     {
         public T Val;
     }

    [ThreadStatic]
    private static ConditionalWeakTable<object, Holder> _state;

    public ThreadLocal():this(() => default(T))
    {
    }

    public ThreadLocal(Func<T> valueCreator)
    {
        this.valueCreator = valueCreator;
    }

    public T Value
    {
        get
        {
            Holder value;
            if (_state == null || _state.TryGetValue(this, out value) == false)
            {
                var val = valueCreator();
                Value = val;
                return val;
            }
            return value.Val;
        }
        set
        {
            if (_state == null)
                _state = new ConditionalWeakTable<object, Holder>();
            var holder = _state.GetOrCreateValue(this);
            holder.Val = value;
        }
    }
}

The fun part, it satisfies the entire contract, but I am willing to bet it is going to cause some head scratching.

time to read 4 min | 684 words

There is a whole set of features that require collaboration from a set of severs. For example, when talking about auto scale scenarios, you really want the servers to figure things out on their own, without needing administrators to hold their hands and murmur sweet nothings at 3 AM.

We needed this feature in Raven DB, Raven MQ and probably in Raven FS, so I sat down and thought about what is actually needed and whatever I could package that in a re-usable form. I am on a roll for the last few days, and something that I estimated would take a week or two took me about six hours, all told.

At any rate, I realized that the important parts of this feature set is the ability to detect siblings on the same network, being able to detect failure of those siblings and the ability to dynamically select the master node.  The code is available here: https://github.com/hibernating-rhinos/Raven.SituationalAwareness under the AGPL license. If you want to use this code commercially, please contact me for commercial licensing arrangements.

Let us see what is actually involved here:

var presence = new Presence("clusters/commerce", new Dictionary<string, string>
{
    {"RavenDB-Endpoint", new UriBuilder("http", Environment.MachineName, 8080).Uri.ToString()}
}, TimeSpan.FromSeconds(3));
presence.TopologyChanged += (sender, nodeMetadata) =>
{
    switch (nodeMetadata.ChangeType)
    {
        case TopologyChangeType.MasterSelected:
            Console.WriteLine("Master selected {0}", nodeMetadata.Uri);
            break;
        case TopologyChangeType.Discovered:
            Console.WriteLine("Found {0}", nodeMetadata.Uri);
            break;
        case TopologyChangeType.Gone:
            Console.WriteLine("Oh no, {0} is gone!", nodeMetadata.Uri);
            break;
        default:
            throw new ArgumentOutOfRangeException();
    }
};
presence.Start();

As you can see, we are talking about a single class that is exposed to your code. You need to provide the cluster name, this allows us to run multiple clusters on the same network without conflicts. (For example, in the code above, we have a set of servers for the commerce service, and another for the billing service, etc). Each node also exposes metadata to the entire cluster. In the code above, we share the endpoint for our RavenDB endpoint.  The TimeSpan variable determines the heartbeat frequency for the cluster (how often it would check for failing nodes).

We have a single event that we can subscribe to, which let us know about changes in the system topology. Discovered and Gone are pretty self explanatory, I think. But MasterSelected is more interesting.

After automatically discovering all the siblings on the network, Raven Situation Awareness will use the Paxos algorithm to decide who should be the master. The MasterSelected event happens when a quorum of the nodes select a master. You can then proceed with your own logic based on that. If the master will fail, the nodes will convene again and the quorum will select a new master.

With the network topology detection and the master selection out of the way, (and all of that with due consideration for failure conditions) the task of actually implementing a distributed server system just became significantly easier.

time to read 2 min | 238 words

Originally posted at 4/15/2011

Take a look at the following piece of code:

if(items.ContainsKey(key) == false)
{
    lock(items)
    {
        if(items.ContainsKey(key) == false)
        {
            items.Add(key, val);
        }
    }
}

The answer is quite simple, and it is located directly in the docs:

image

Yes, on the face of it, this is safe code, in the sense that we will never get DuplicateKeyException. But the implementation of ContainsKey() isn’t safe to run when Add() is also executing.

The actual behavior depends on the implementation, but it is easy enough to imagine a scenario where invariants that ContainsKey() relies on are broken for the duration of the Add() call.

time to read 1 min | 163 words

Originally posted at 4/14/2011

The following diagrams were generated via Visual Studio (Architecture > Generate Dependency Graph).

I have shown before the assembly dependency graph for RavenDB, it has changed a bit since then (note the Raven.Json addition), but it is still quite a nice graph:

image

The problem was that for backward compatibility reasons, the namespaces weren’t nearly as ordered. Mostly because we moved things around in the assemblies but couldn’t change the associated namespaces.

We were going to experience a breaking changes anyway, because of Raven.Json, so I took the time to clean things up properly:

image

Yes, I admit, I am addicted to (mostly) straight lines and simple directed graphs :-)

time to read 8 min | 1543 words

One accusation that has been leveled at me often is that I keep writing my own implementation of Xyz (where Xyz is just about anything). The main problem is that I can get overboard with that, but for the most part, I think that I managed to strike the right balance. Wherever possible, I re-use existing, but when I run into problems that are easier to solve by creating my own solution, I would go with that.

A case in point is the JSON parser inside RavenDB. From the get go, I used Newtonsoft.Json.dll. There wasn’t much to think of, this is the default implementation from my point of view. And indeed, it has been an extremely fine choice. It is a rich library, it is available for .NET 3.5, 4.0 & Silverlight and it meant that I had opened up a lot of extensibility for RavenDB users.

Overall, I am very happy. Except… there was just one problem, with large JSON documents, the library showed some performance issues. In particular, a 3 MB JSON file took almost half a second to parse. That was… annoying. Admittedly, most documents tends to be smaller than that, but it also reflected on overall performance when batching, querying, etc. When you are querying, you are also building large json documents (a single document that contains a list of results, for example), so that problem was quite pervasive for us.

I set out to profile things, and discovered that the actual cost wasn’t in the JSON parsing itself, that part was quite efficient. The costly part was actually in building the JSON DOM (JObject, JArray, etc). When people usually think about JSON serialization performance, they generally think about the perf from and to .NET objects. The overriding cost in that sort of serialization is actually how fast you can call the setters on the objects. Indeed, when looking at perf metrics on the subject, most of the comparisons were concentrated on that aspect almost exclusively.

That make sense, since for the most part, that is how people use it. But for RavenDB, we are using JSON DOM for pretty much everything. This is how we are representing a document, after all, and that idea is pretty central to a document database.

Before setting out to write our own, I looked at other options.

ServiceStack.Json - that one was problematic for three reasons:

  • It wasn’t really nearly as rich in terms of API and functionality.
  • It was focused purely on reading to and from .NET objects, with no JSON DOM supported.
  • The only input format it had was a string.

The last one deserves a bit of explanation. We cannot afford to use a JSON implementation that accepts a string as input, because that JSON object we are reading may be arbitrarily large. Using a string means that we have to allocate all of that information up front. Using a stream, however, means that we can allocate far less information and reduce our overall memory consumption.

System.Json – that one had one major problem:

  • Only available on Silverlight, FAIL!

I literally didn’t even bother to try anything else with it. Other stuff we have looked on had those issues or similar as well, mostly, the problem was no JSON DOM available.

That sucked. I was seriously not looking to writing my own JSON Parser, especially since I was going to add all the bells & whistles of the Newtonsoft.Json. :-(

Wait, I can hear you say, the project is open source, why not just fix the performance problem? Well, we have looked into that as well.

The actual problem is pretty much at the core of how the JSON DOM is implemented in the library. All of the JSON DOM are basically linked lists, and all operations on the DOM are O(N). With large documents, that really starts to hurt. We looked into what it would take to modify that, but it turned out that it would have to be a breaking change (which pretty much killed the notion that it would be accepted by the project) or a very expensive change. That is especially true since the JSON DOM is quite rich in functionality (from dynamic support to INotifyPropertyChanged to serialization to… well, you get the point).

Then I thought about something else, can we create our own JSON DOM, but rely on Newtonsoft.Json to fill it up for us? As it turned out, we could! So we basically took the existing JSON DOM, stripped it out of everything that we weren’t using. Then we changed the linked list support to a List and Dictionary, wrote a few adapters (RavenJTokenReader, etc) and we were off to the races. We were able to utilize quite a large fraction of the things that Newtonsoft.Json already did, we resolved the performance problem and didn’t have to implement nearly as much as I feared we would.

Phew!

Now, let us look at the actual performance results. This is using a 3 MB JSON file:

  • Newtonsoft Json.NET - Reading took 413 ms
  • Using Raven.Json - Reading took 140 ms

That is quite an improvement, even if I say so myself :-)

The next stage was actually quite interesting, because it was unique to how we are using JSON DOM in RavenDB. In order to save the parsing cost (which, even when optimized, is still significant), we are caching in memory the parsed DOM. The problem with caching of mutable information is that you have to return a clone of the information, and not the actual information (because then it would be mutated by the called, corrupting the cached copy).

Newtonsoft.Json supports object cloning, which is excellent. Except for one problem. Cloning is also an O(N) operation. With Raven.Json, the cost is somewhat lower. But the main problem is that we still need to copy the entire large object.

In order to resolve this exact issue, we introduced a feature called snapshots to the mix. Any object can be turned into a snapshot. A snapshot is basically a read only version of the object, which we then wrap around another object which provide local mutability while preserving the immutable state of the parent object.

It is much easier to explain in code, actually:

public void Add(string key, RavenJToken value)
{
    if (isSnapshot)
        throw new InvalidOperationException("Cannot modify a snapshot, this is probably a bug");

    if (ContainsKey(key))
        throw new ArgumentException("An item with the same key has already been added: " + key);

    LocalChanges[key] = value; // we can't use Add, because LocalChanges may contain a DeletedMarker
}

public bool TryGetValue(string key, out RavenJToken value)
{
    value = null;
    RavenJToken unsafeVal;
    if (LocalChanges != null && LocalChanges.TryGetValue(key, out unsafeVal))
    {
        if (unsafeVal == DeletedMarker)
            return false;

        value = unsafeVal;
        return true;
    }

    if (parentSnapshot == null || !parentSnapshot.TryGetValue(key, out unsafeVal) || unsafeVal == DeletedMarker)
        return false;

    value = unsafeVal;

    return true;
}

If the value is on the local changes, we use that, otherwise if the value is in the parent snapshot, we use that. We have the notion of local deletes, but that is about it. All changes happen to the LocalChanges.

What this means, in turn, is that for caching scenarios, we can very easily and effectively create a cheap copy of the item without having to copy all of the items. Where as cloning the 3MB json object in Newtonsoft.Json can take over 100 ms to clone, we can create a snapshot (it involves a clone, so the first time it is actually expensive, around the same cost as Newtonsoft.Json is) and from the moment we have a snapshot, we can generate children for the snapshot at virtually no cost.

Overall, I am quite satisfied with it.

Oh, and in our tests runs, for large documents, we got 100% performance improvement from this single change.

time to read 1 min | 112 words

Originally posted at 4/13/2011

Yes, I know, you popped this out in the middle of a debug session, after VS didn’t respond to any user input for several minutes.

image

Urgh!

image

AYENDEPC is the local machine! You can’t lose a network connection to yourself.

Oh well, we already know that Visual Studio is schizophrenic.

time to read 3 min | 545 words

Originally posted at 4/13/2011

We discussed the problem in detail in my last post. As a reminder:

The requirement

Starting at a given time, charge as many accounts as fast as possible. Charging each account is a separate action, we don’t have the ability to do batching.

image_thumb2

The first question to ask, however, isn’t how we can get the data out of the database as fast as possible. The first question to ask is actually:

What are the limiting factors.

In this case, we have to make a separate request for each account. That means that we have to access a remote service, and that means we have to figure out whatever we can actually parallelize the work in any way.

Most production systems would limit the number of concurrent calls that a single customer can make (to prevent a customer with a lot of servers from crashing their own systems).

Let us say that we have 100,000 accounts to charge. And let us further say that the total time for a charge operation is in the order of 1 sec. What it means is that whatever we want, we are actually going to be limited by the processing capabilities on the accounts, not our own code.

A silly way of doing this would be something like:

var subscriptions = session.Query<Subscription>().Where(s=>s.NextCharge < DateTime.Now);
foreach(var sub in subscriptions)
{
   sub.ChargeAccount(); // 1 sec
}

This would be silly for several reasons:

  • It would take over a day to complete.
  • It would load a lot of data to memory, probably only to be used in several hours.
  • It puts a lot of strain on our own database and network backbone.

A better alternative would be to try to parallelize this. Assuming that we can perform 5 concurrent requests, we can drop the processing time to just below six hours.

Note where the major savings are, not in how we are actually doing stuff with the code, but rather in how much concurrency we can agree on with the accounts provider. Just to note, let us say that they allow us 25 concurrent requests per second, we are still much better doing chunks of the data all the way rather than trying to process it all at once.

What I would actually do in this scenario is just load the ids of the subscriptions that we need to update into a queue and let a bunch of workers (may be on separate machines) feed off the queue and try to handle as many account charges as they can possibly manage.

time to read 2 min | 359 words

Originally posted at 4/13/2011

As much as I rile against unbounded result set, I recently run into the first real case where “I need all of the data now” was a valid use case.

The issue is charging micro payments from customers in a particular market. In that market, accounts are usually handled using prepaid codes. So you open an account and load some amount of money into it. Once that money is over, there is no way to charge the account any longer. Users will occasionally re-charge their account. Now, we are talking about recurring billing scenario, where we need to charge users every week some amount of money.

So far, seems pretty reasonable, I hope. The catch is that it is pretty routine for billing to fail for insufficient funds. The strategy to resolve this is to analyze the user’s behavior and retry at likely times when we hope to catch the account with some money in it.

The problem?

Since failures are common, and since most people behave in a similar fashion, what happens is that billing cycles trends to focus on the likely times when we hope to find some money there. In other words, let us say that we noticed that on Sunday evening, most people charge their account…

What this means is that we will have to try to charge those accounts in that time period. To make things interesting, there are other parties as well, which probably also managed to figure out the times when most accounts have money in them. At that point, it actually becomes a race, who will be able to charge the accounts first, and get the money.

The requirement

Starting at a given time, charge as many accounts as fast as possible. Charging each account is a separate action, we don’t have the ability to do batching.

image

How would you solve this problem?

My solution, in the next post…

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}