Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,162
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 532 words

In my previous post, I discussed a potential implementation for a buffer pool, and commented that it would cause issues if we had particular usage patterns.

As a reminder, here is the code:

    [ThreadStatic] private static Stack<byte[]>[] _buffersBySize;

    private static byte[] GetBuffer(int requestedSize)
    {
        if(_buffersBySize == null)
            _buffersBySize = new Stack<byte[]>[32];

        var actualSize = PowerOfTwo(requestedSize);
        var pos = MostSignificantBit(actualSize);

        if(_buffersBySize[pos] == null)
            _buffersBySize[pos] = new Stack<byte[]>();

        if(_buffersBySize[pos].Count == 0)
            return new byte[actualSize];

        return _buffersBySize[pos].Pop();
    }

    private static void ReturnBuffer(byte[] buffer)
    {
        var actualSize = PowerOfTwo(buffer.Length);
        if(actualSize != buffer.Length)
            return; // can't put a buffer of strange size here (probably an error)

        if(_buffersBySize == null)
            _buffersBySize = new Stack<byte[]>[32];

        var pos = MostSignificantBit(actualSize);

        if(_buffersBySize[pos] == null)
            _buffersBySize[pos] = new Stack<byte[]>();


        _buffersBySize[pos].Push(buffer);
    }

Now, consider a user who uses this buffer pool, but for some reason decided to move most of the disposal code to a dedicated thread. This can happen when using large files, where disposing of the file stream can take a very long time ( flushing OS buffers, etc). I have seen several places where people had a dedicated disposal threads.

What would happen in such a scenario?

Well, we would allocate buffers in threads #1 – #10, and only return them to the buffer pool on thread #12. That would mean that we would keep allocating new buffers, but all the buffered that were returned to the pool would actually go and sit in the threads that are just releasing them. Welcome memory leak Smile.

time to read 4 min | 662 words

In my previous post, I threw a bunch a code at you, with no explanation, and asked you to discuss it.

Here is the code, with full discussion below.

    [ThreadStatic] private static Stack<byte[]>[] _buffersBySize;

    private static byte[] GetBuffer(int requestedSize)
    {
        if(_buffersBySize == null)
            _buffersBySize = new Stack<byte[]>[32];

        var actualSize = PowerOfTwo(requestedSize);
        var pos = MostSignificantBit(actualSize);

        if(_buffersBySize[pos] == null)
            _buffersBySize[pos] = new Stack<byte[]>();

        if(_buffersBySize[pos].Count == 0)
            return new byte[actualSize];

        return _buffersBySize[pos].Pop();
    }

    private static void ReturnBuffer(byte[] buffer)
    {
        var actualSize = PowerOfTwo(buffer.Length);
        if(actualSize != buffer.Length)
            return; // can't put a buffer of strange size here (probably an error)

        if(_buffersBySize == null)
            _buffersBySize = new Stack<byte[]>[32];

        var pos = MostSignificantBit(actualSize);

        if(_buffersBySize[pos] == null)
            _buffersBySize[pos] = new Stack<byte[]>();


        _buffersBySize[pos].Push(buffer);
    }

There are a couple of interesting things going on here. First, we do allocations by power of two number, this reduce the number of different sizes we have to deal with. We store all of that in a small array (using the most significant bit to index into the array based on the requested size) that contains stacks for all the requested sizes.

In practice, most of the time we’ll use a very small number of sizes, typically 4KB – 32KB.  The basic idea is that you’ll pull an array from the pool, and if there is a relevant one, we save allocations. If not, we allocate a new one and return it to the user.

Once we gave the user a buffer, we don’t keep track of it. If they return it to us, this is great, if not, the GC will clean it up. This is important, because otherwise forgetting to call ReturnBuffer creates what is effectively a memory leak.

Another thing to notice is that we aren’t requiring that the same thread will be used for getting and returning the buffer. It is fine to use one thread to get it and another to return it. This means that async code will work well with thread hopping and this buffer pool.  We also use a stack, to try to keep the busy buffer close to the actual CPU cache.

Note that this is notepad code, so there are probably issues with it.

In fact, there is a big issue here that will only show up in particular usage patterns. Can you see it? I’ll talk about it in my next post.

time to read 3 min | 427 words

After my recent posts about allocations, I thought that I would present a possible solution for buffer management.

The idea is to have a good way to manage buffers for things like I/O operations, etc. Here is the code:

    [ThreadStatic] private static Stack<byte[]>[] _buffersBySize;

    private static byte[] GetBuffer(int requestedSize)
    {
        if(_buffersBySize == null)
            _buffersBySize = new Stack<byte[]>[32];

        var actualSize = PowerOfTwo(requestedSize);
        var pos = MostSignificantBit(actualSize);

        if(_buffersBySize[pos] == null)
            _buffersBySize[pos] = new Stack<byte[]>();

        if(_buffersBySize[pos].Count == 0)
            return new byte[actualSize];

        return _buffersBySize[pos].Pop();
    }

    private static void ReturnBuffer(byte[] buffer)
    {
        var actualSize = PowerOfTwo(buffer.Length);
        if(actualSize != buffer.Length)
            return; // can't put a buffer of strange size here (probably an error)

        if(_buffersBySize == null)
            _buffersBySize = new Stack<byte[]>[32];

        var pos = MostSignificantBit(actualSize);

        if(_buffersBySize[pos] == null)
            _buffersBySize[pos] = new Stack<byte[]>();


        _buffersBySize[pos].Push(buffer);
    }

I’m going to discuss it in my next post, but for now, can you figure out what it is doing, and what are the implications?

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}