Ayende @ Rahien

filter by tags archive

architecture (612) rss
bugs (451) rss
challanges (123) rss
community (380) rss
databases (481) rss
design (895) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1450) rss
ravendb.net (534) rss
reviews (184) rss

2025
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

May 29 2025

RecordingRavenDB's Upcoming Optimizations Deep Dive

time to read 1 min | 60 words

Tweet Share Share 2 comments

Tags:

Yesterday I gave a live talk about some of the re-design we did to the internals of RavenDB’s storage engine (Voron). I think it went pretty well, and the record is here.

Would love to hear your feedback!

May 12 2025

Optimizing the cost of clearing a set

time to read 8 min | 1414 words

Tweet Share Share 6 comments

Tags:

Let’s say I want to count the number of reachable nodes for each node in the graph. I can do that using the following code:

void DFS(Node start, HashSet<Node> visited) 
{
    if (start == null || visited.Contains(start)) return;
    visited.Add(start);
    foreach (var neighbor in start.Neighbors) 
    {
        DFS(neighbor, visited);
    }
}


void MarkReachableCount(Graph g)
{
   foreach(var node in g.Nodes)
   {
       HashSet<Node> visited = [];
       DFS(node, visisted);
       node.ReachableGraph = visited.Count;
   }
}

A major performance cost for this sort of operation is the allocation cost. We allocate a separate hash set for each node in the graph, and then allocate whatever backing store is needed for it. If you have a big graph with many connections, that is expensive.

A simple fix for that would be to use:

void MarkReachableCount(Graph g)
{
   HashSet<Node> visited = [];
   foreach(var node in g.Nodes)
   {
       visited.Clear();
       DFS(node, visisted);
       node.ReachableGraph = visited.Count;
   }
}

This means that we have almost no allocations for the entire operation, yay!

This function also performs significantly worse than the previous one, even though it barely allocates. The reason for that? The call to Clear() is expensive. Take a look at the implementation - this method needs to zero out two arrays, and it will end up being as large as the node with the most reachable nodes. Let’s say we have a node that can access 10,000 nodes. That means that for each node, we’ll have to clear an array of about 14,000 items, as well as another array that is as big as the number of nodes we just visited.

No surprise that the allocating version was actually cheaper. We use the visited set for a short while, then discard it and get a new one. That means no expensive Clear() calls.

The question is, can we do better? Before I answer that, let’s try to go a bit deeper in this analysis. Some of the main costs in HashSet<Node> are the calls to GetHashCode() and Equals(). For that matter, let’s look at the cost of the Neighbors array on the Node.

Take a look at the following options:

public record Node1(List<Node> Neighbors);
public record Node2(List<int> NeighborIndexes);

Let’s assume each node has about 10 - 20 neighbors. What is the cost in memory for each option? Node1 uses references (pointers), and will take 256 bytes just for the Neighbors backing array (32-capacity array x 8 bytes). However, the Node2 version uses half of that memory.

This is an example of data-oriented design, and saving 50% of our memory costs is quite nice. HashSet<int> is also going to benefit quite nicely from JIT optimizations (no need to call GetHashCode(), etc. - everything is inlined).

We still have the problem of allocations vs. Clear(), though. Can we win?

Now that we have re-framed the problem using int indexes, there is a very obvious optimization opportunity: use a bit map (such as BitsArray). We know upfront how many items we have, right? So we can allocate a single array and set the corresponding bit to mark that a node (by its index) is visited.

That dramatically reduces the costs of tracking whether we visited a node or not, but it does not address the costs of clearing the bitmap.

Here is how you can handle this scenario cheaply:

public class Bitmap
{
    private ulong[] _data;
    private ushort[] _versions;
    private int _version;


    public Bitmap(int size)
    {
        _data = new ulong[(size + 63) / 64];
        _versions = new ushort[_data.Length];
    }


    public void Clear()
    {
        if(_version++ < ushort.MaxValue)
            return;
            
        Array.Clear(_data);
        Array.Clear(_versions);
    }


    public bool Add(int index)
    {
        int arrayIndex = index >> 6;
        if(_versions[arrayIndex] != _version)
        {
            _versions[arrayIndex] = _version;
            _data[arrayIndex] = 0;
        }


        int bitIndex = index & 63;
        ulong mask = 1UL << bitIndex;
        ulong old = _data[arrayIndex];
        _data[arrayIndex] |= mask;
        return (old & mask) == 0;
    }
}

The idea is pretty simple, in addition to the bitmap - we also have another array that marks the version of each 64-bit range. To clear the array, we increment the version. That would mean that when adding to the bitmap, we reset the underlying array element if it doesn’t match the current version. Once every 64K items, we’ll need to pay the cost of actually resetting the backing stores, but that ends up being very cheap overall (and worth the space savings to handle the overflow).

The code is tight, requires no allocations, and performs very quickly.

May 05 2025

RavenDB Storage Provider for Orleans

time to read 1 min | 187 words

Tweet Share Share 0 comments

Tags:

Orleans is a distributed computing framework for .NET. It allows you to build distributed systems with ease, taking upon itself all the state management, persistence, distribution, and concurrency.

The core aspect in Orleans is the notion of a “grain” - a lightweight unit of computation & state. You can read more about it in Microsoft’s documentation, but I assume that if you are reading this post, you are already at least somewhat familiar with it.

We now support using RavenDB as the backing store for grain persistence, reminders, and clustering. You can read the official announcement about the release here, and the docs covering how to use RavenDB & Microsoft Orleans.

You can use RavenDB to persist and retrieve Orleans grain states, store Orleans timers and reminders, as well as manage Orleans cluster membership.

RavenDB is well suited for this task because of its asynchronous nature, schema-less design, and the ability to automatically adjust itself to different loads on the fly.

If you are using Orleans, or even just considering it, give it a spin with RavenDB. We would love your feedback.

May 02 2025

RavenDB NewsMay 2025

time to read 3 min | 441 words

Tweet Share Share 0 comments

Tags:

RavenDB is moving at quite a pace, and there is actually more stuff happening than I can find the time to talk about. I usually talk about the big-ticket items, but today I wanted to discuss some of what we like to call Quality of Life features.

The sort of things that help smooth the entire process of using RavenDB - the difference between something that works and something polished. That is something I truly care about, so with a great sense of pride, let me walk you through some of the nicest things that you probably wouldn’t even notice that we are doing for you.

RavenDB Node.js Client - v7.0 released (with Vector Search)

We updated the RavenDB Node.js client to version 7.0, with the biggest item being explicit support for vector search queries from Node.js. You can now write queries like these:

const docs = session.query<Product>({collection: "Products"})
   .vectorSearch(x => x.withText("Name"),
      factory => factory.byText("italian food"))
  .all();

This is the famous example of using RavenDB’s vector search to find pizza and pasta in your product catalog, utilizing vector search and automatic data embeddings.

Converting automatic indexes to static indexes

RavenDB has auto indexes. Send a query, and if there is no existing index to run the query, the query optimizer will generate one for you. That works quite amazingly well, but sometimes you want to use this automatic index as the basis for a static (user-defined) index. Now you can do that directly from the RavenDB Studio, like so:

You can read the full details of the feature at the following link.

RavenDB Cloud - Incidents History & Operational Suggestions

We now expose the operational suggestions to you on the dashboard. The idea is that you can easily and proactively check the status of your instances and whether you need to take any action.

You can also see what happened to your system in the past, including things that RavenDB’s system automatically recovered from without you needing to lift a finger.

For example, take a look at this highly distressed system:

As usual, I would appreciate any feedback you have on the new features.

Apr 28 2025

The null check that didn't check for nulls

time to read 2 min | 342 words

Tweet Share Share 6 comments

Tags:

I wrote the following code:

if (_items is [var single])
{
    // no point invoking thread pool
    single.Run();
}

And I was very proud of myself for writing such pretty and succinct C# code.

Then I got a runtime error:

I asked Grok about this because I did not expect this, and got the following reply:

No, if (_items is [var single]) in C# does not match a null value. This pattern checks if _items is a single-element array and binds the element to single. If _items is null, the pattern match fails, and the condition evaluates to false.

However, the output clearly disagreed with both Grok’s and my expectations. I decided to put that into SharpLab, which can quickly help identify what is going on behind the scenes for such syntax.

You can see three versions of this check in the associated link.

if(strs is [var s]) // no null check


if(strs is [string s]) //  if (s != null)


if(strs is [{} s]) //  if (s != null)

Turns out that there is a distinction between a var pattern (allows null) and a non-var pattern. The third option is the non-null pattern, which does the same thing (but doesn’t require redundant type specification). Usually var vs. type is a readability distinction, but here we have a real difference in behavior.

Note that when I asked the LLM about it, I got the wrong answer. Luckily, I could get a verified answer by just checking the compiler output, and only then head out to the C# spec to see if this is a compiler bug or just a misunderstanding.

Apr 25 2025

AI Impact - normalizing Good Enough

time to read 3 min | 420 words

Tweet Share Share 0 comments

Tags:

I was just reviewing a video we're about to publish, and I noticed something in the subtitles. It said, "Six qubits are used for..."

I got all excited thinking RavenDB was jumping into quantum computing. But nope, it turned out to be a transcription error. What was actually said was, "Six kilobytes are used for..."

To be fair, I listened to the recording a few times, and honestly, "qubits" isn't an unreasonable interpretation if you're just going by the spoken words. Even with context, that transcription isn't completely out there. I wouldn't be surprised if a human transcriber came up with the same result.

Fixing this issue (and going over an hour of text transcription to catch other possible errors) is going to be pretty expensive. Honestly, it would be easier to just skip the subtitles altogether in that case.

Here's the thing, though. I think a big part of this is that we now expect transcription to be done by a machine, and we don't expect it to be perfect. Before, when it was all done manually, it cost so much that it was reasonable to expect near-perfection.

What AI has done is make it cheap enough to get most of the value, while also lowering the expectation that it has to be flawless.

So, the choices we're looking at are:

AI transcription - mostly accurate, cheap, and easy to do.
Human transcription - highly accurate, expensive, and slow.
No transcription - users who want subtitles would need to use their own automatic transcription (which would probably be lower quality than what we use).

Before, we really only had two options: human transcription or nothing at all. What I think the spread of AI has done is not just made it possible to do it automatically and cheaply, but also made it acceptable that this "Good Enough" solution is actually, well, good enough.

Viewers know it's a machine translation, and they're more forgiving if there are some mistakes. That makes it way more practical to actually use it. And the end result? We can offer more content.

Sure, it's not as good as manual transcription, but it's definitely better than having no transcription at all (which is really the only other option).

What I find most interesting is that it's the fact that this is so common now that makes it possible to actually use it more.

Yes, we actually review the subtitles and fix any obvious mistakes for the video. The key here is that we can spend very little time actually doing that, since errors are more tolerated.

Apr 14 2025

Who can cancel Carmen Sandiego?

time to read 2 min | 218 words

Tweet Share Share 0 comments

Tags:

RavenDB is a pretty big system, with well over 1 million lines of code. Recently, I had to deal with an interesting problem. I had a CancellationToken at hand, which I expected to remain valid for the duration of the full operation.

However, something sneaky was going on there. Something was cancelling my CancelationToken, and not in an expected manner. At last count, I had roughly 2 bazillion CancelationTokens in the RavenDB codebase. Per request, per database, global to the server process, time-based, operation-based, etc., etc.

Figuring out why the CancelationToken was canceled turned out to be a chore. Instead of reading through the code, I cheated.

token.Register(() =>
{
    Console.WriteLine("Cancelled!" + Environment.StackTrace);
});

I ran the code, tracked back exactly who was calling cancel, and realized that I had mixed the request-based token with the database-level token. A single line fix in the end. Until I knew where it was, it was very challenging to figure it out.

This approach, making the code tell you what is wrong, is an awesome way to cut down debugging time by a lot.

Apr 09 2025

When racing the Heisenbug, code quality goes out the Windows

time to read 8 min | 1522 words

Tweet Share Share 4 comments

Tags:

There are at least 3 puns in the title of this blog post. I’m sorry, but I’m writing this after several days of tracking an impossible bug. I’m actually writing a set of posts to wind down from this hunt, so you’ll have to suffer through my more prosaic prose.

This bug is the kind that leaves you questioning your sanity after days of pursuit, the kind that I’m sure I’ll look back on and blame for any future grey hair I have. I’m going to have another post talking about the bug since it is such a doozy. In this post, I want to talk about the general approach I take when dealing with something like this.

Beware, this process involves a lot of hair-pulling. I’m saving that for when the real nasties show up.

The bug in question was a race condition that defied easy reproduction. It didn’t show up consistently—sometimes it surfaced, sometimes it didn’t. The only “reliable” way to catch it was by running a full test suite, which took anywhere from 8 to 12 minutes per run. If the suite hung, we knew we had a hit. But that left us with a narrow window to investigate before the test timed out or crashed entirely. To make matters worse, the bug was in new C code called from a .NET application.

New C code is a scary concept. New C code that does multithreading is an even scarier concept. Race conditions there are almost expected, right?

That means that the feedback cycle is long. Any attempt we make to fix it is going to be unreliable - “Did I fix it, or it just didn’t happen?” and there isn’t a lot of information going on.The first challenge was figuring out how to detect the bug reliably.

Using Visual Studio as the debugger was useless here—it only reproduced in release mode, and even with native debugging enabled, Visual Studio wouldn’t show the unmanaged code properly. That left us blind to the C library where the bug resided. I’m fairly certain that there are ways around that, but I was more interested in actually getting things done than fighting the debugger.

We got a lot of experience with WinDbg, a low-level debugger and a real powerhouse. It is also about as friendly as a monkey with a sore tooth and an alcohol addiction. The initial process was all about trying to reproduce the hang and then attach WinDbg to it.

Turns out that we never actually generated PDBs for the C library. So we had to figure out how to generate them, then how to carry them all the way from the build to the NuGet package to the deployment for testing - to maybe reproduce the bug again. Then we could see in what area of the code we are even in.

Getting WinDbg attached is just the start; we need to sift through the hundreds of threads running in the system. That is where we actually started applying the proper process for this.

This piece of code is stupidly simple, but it is sufficient to reduce “what thread should I be looking at” from 1 - 2 minutes to 5 seconds.

SetThreadDescription(GetCurrentThread(), L"Rvn.Ring.Wrkr");

I had the thread that was hanging, and I could start inspecting its state. This was a complex piece of code, so I had no idea what was going on or what the cause was. This is when we pulled the next tool from our toolbox.

void alert() {
    while (1) {
        Beep(800, 200);
        Sleep(200);
    }
}

This isn’t a joke, it is a super important aspect. In WinDbg, we noticed some signs in the data that the code was working on, indicating that something wasn’t right. It didn’t make any sort of sense, but it was there. Here is an example:

enum state
{
  red,
  yellow,
  green
};


enum state _currentState;

And when we look at it in the debugger, we get:

0:000> dt _currentState
Local var @ 0x50b431f614 Type state
17 ( banana_split )

That is beyond a bug, that is some truly invalid scenario. But that also meant that I could act on it. I started adding things like this:

if(_currentState != red && 
   _currentState != yellow && 
   _currentState != green) {
   alert();
}

The end result of this is that instead of having to wait & guess, I would now:

Be immediately notified when the issue happened.
Inspect the problematic state earlier.
Hopefully glean some additional insight so I can add more of those things.

With this in place, we iterated. Each time we spotted a new behavior hinting at the bug’s imminent trigger, we put another call to the alert function to catch it earlier. It was crude but effective—progressively tightening the noose around the problem.

Race conditions are annoyingly sensitive; any change to the system—like adding debug code—alters its behavior. We hit this hard. For example, we’d set a breakpoint in WinDbg, and the alert function would trigger as expected. The system would beep, we’d break in, and start inspecting the state. But because this was an optimized release build, the debugging experience was a mess. Variables were often optimized away into registers or were outright missing, leaving us to guess what was happening.

I resorted to outright hacks like this function:

__declspec(noinline) void spill(void* ptr) {
    volatile void* dummy = ptr;
    dummy; // Ensure dummy isn't flagged as unused
}

The purpose of this function is to force the compiler to assign an address to a value. Consider the following code:

if (work->completed != 0) {
    printf("old_global_state : %p, current state: %p\n",
         old_global_state, handle_ptr->global_state);
    alert();
    spill(&work);
}

Because we are sending a pointer to the work value to the spill function, the compiler cannot just put that in a register and must place it on the stack. That means that it is much easier to inspect it, of course.

Unfortunately, adding those spill calls led to the problem being “fixed”, we could no longer reproduce it. Far more annoyingly, any time that we added any sort of additional code to try to narrow down where this was happening, we had a good chance of either moving the behavior somewhere completely different or masking it completely.

Here are some of our efforts to narrow it down, if you want to see what the gory details look like.

At this stage, the process became a grind. We’d hypothesize about the bug’s root cause, tweak the code, and test again. Each change risked shifting the race condition’s timing, so we’d often see the bug vanish, only to reappear later in a slightly different form. The code quality suffered—spaghetti logic crept in as we layered hacks on top of hacks. But when you’re chasing a bug like this, clean code takes a back seat to results. The goal is to understand the failure, not to win a style award.

Bug hunting at this level is less about elegance and more about pragmatism. As the elusiveness of the bug increases, so does code quality and any other structured approach to your project. The only thing on your mind is, how do I narrow it down?. How do I get this chase to end?

Next time, I’ll dig into the specifics of this particular bug. For now, this is the high-level process: detect, iterate, hack, and repeat. No fluff—just the reality of the chase. The key in any of those bugs that we looked at is to keep narrowing the reproduction to something that you can get in a reasonable amount of time.

Once that happens, when you can hit F5 and get results, this is when you can start actually figuring out what is going on.

Apr 07 2025

Production postmortemThe race condition in the interlock

time to read 18 min | 3547 words

Tweet Share Share 3 comments

Tags:

This post isn’t actually about a production issue—thankfully, we caught this one during testing. It’s part of a series of blog posts that are probably some of my favorite posts to write. Why? Because when I’m writing one, it means I’ve managed to pin down and solve a nasty problem.

This time, it’s a race condition in RavenDB that took mountains of effort, multiple engineers, and a lot of frustration to resolve.

For the last year or so, I’ve been focused on speeding up RavenDB’s core performance, particularly its IO handling. You might have seen my earlier posts about this effort. One key change we made was switching RavenDB’s IO operations to use IO Ring, a new API designed for high-performance, asynchronous IO, and other goodies. If you’re in the database world and care about squeezing every ounce of performance out of your system, this is the kind of thing that you want to use.

This wasn’t a small tweak. The pull request for this work exceeded 12,000 lines of code—over a hundred commits—and likely a lot more code when you count all the churn. Sadly, this is one of those changes where we can’t just split the work into digestible pieces. Even now, we still have some significant additional work remaining to do.

We had two or three of our best engineers dedicated to it, running benchmarks, tweaking, and testing over the past few months. The goal is simple: make RavenDB faster by any means necessary.

And we succeeded, by a lot (and yes, more on that in a separate post). But speed isn’t enough; it has to be correct too. That’s where things got messy.

Tests That Hang, Sometimes

We noticed that our test suite would occasionally hang with the new code. Big changes like this—ones that touch core system components and take months to implement—often break things. That’s expected, and it’s why we have tests. But these weren’t just failures; sometimes the tests would hang, crash, or exhibit other bizarre behavior. Intermittent issues are the worst. They scream “race condition,” and race conditions are notoriously hard to track down.

Here’s the setup. IO Ring isn’t available in managed code, so we had to write native C code to integrate it. RavenDB already has a Platform Abstraction Layer (PAL) to handle differences between Windows, Linux, and macOS, so we had a natural place to slot this in.

The IO Ring code had to be multithreaded and thread-safe. I’ve been writing system-level code for over 20 years, and I still get uneasy about writing new multithreaded C code. It’s a minefield. But the performance we could get… so we pushed forward… and now we had to see where that led us.

Of course, there was a race condition. The actual implementation was under 400 lines of C code—deliberately simple, stupidly obvious, and easy to review. The goal was to minimize complexity: handle queuing, dispatch data, and get out. I wanted something I could look at and say, “Yes, this is correct.” I absolutely thought that I had it covered.

We ran the test suite repeatedly. Sometimes it passed; sometimes it hung; rarely, it would crash.

When we looked into it, we were usually stuck on submitting work to the IO Ring. Somehow, we ended up in a state where we pushed data in and never got called back. Here is what this looked like.

0:019> k
 #   Call Site
00   ntdll!ZwSubmitIoRing
01   KERNELBASE!ioring_impl::um_io_ring::Submit+0x73
02   KERNELBASE!SubmitIoRing+0x3b
03   librvnpal!do_ring_work+0x16c 
04   KERNEL32!BaseThreadInitThunk+0x17
05   ntdll!RtlUserThreadStart+0x2c

In the previous code sample, we just get the work and mark it as done. Now, here is the other side, where we submit the work to the worker thread.

int32_t rvn_write_io_ring(void* handle, int32_t count, 
        int32_t* detailed_error_code)
{
        int32_t rc = 0;
        struct handle* handle_ptr = handle;
        EnterCriticalSection(&handle_ptr->global_state->lock);
        ResetEvent(handle_ptr->global_state->notify);
        char* buf = handle_ptr->global_state->arena;
        struct workitem* prev = NULL;
        for (int32_t curIdx = 0; curIdx < count; curIdx++)
        {
                struct workitem* work = (struct workitem*)buf;
                buf += sizeof(struct workitem);
                *work = (struct workitem){
                        .prev = prev,
                        .notify = handle_ptr->global_state->notify,
                };
                prev = work;
                queue_work(work);
        }
        SetEvent(IoRing.event);


        bool all_done = false;
        while (!all_done)
        {
                all_done = true;
                WaitForSingleObject(handle_ptr->global_state->notify, INFINITE)
                ResetEvent(handle_ptr->global_state->notify);
                struct workitem* work = prev;
                while (work)
                {
                        all_done &= InterlockedCompareExchange(
&work->completed, 0, 0);
                        work = work->prev;
                }
        }


        LeaveCriticalSection(&handle_ptr->global_state->lock);
        return rc;
}

We basically take each page we were asked to write and send it to the worker thread for processing, then we wait for the worker to mark all the requests as completed. Note that we play a nice game with the prev and next pointers. The next pointer is used by the worker thread while the prev pointer is used by the submitter thread.

You can also see that this is being protected by a critical section (a lock) and that there are clear hand-off segments. Either I own the memory, or I explicitly give it to the background thread and wait until the background thread tells me it is done. There is no place for memory corruption. And yet, we could clearly get it to fail.

Being able to have a small reproduction meant that we could start making changes and see whether it affected the outcome. With nothing else to look at, we checked this function:

void queue_work_origin(struct workitem* work)
{
    work->next = IoRing.head;
    while (true)
    {
        struct workitem* cur_head = InterlockedCompareExchangePointer(
                        &IoRing.head, work, work->next);
        if (cur_head == work->next)
            break;
        work->next = cur_head;
    }
}

I have written similar code dozens of times, I very intentionally made the code simple so it would be obviously correct. But when I even slightly tweaked the queue_work function, the issue vanished. That wasn’t good enough, I needed to know what was going on.

Here is the “fixed” version of the queue_work function:

void queue_work_fixed(struct workitem* work)
{
        while (1)
        {
                struct workitem* cur_head = IoRing.head;
                work->next = cur_head;
                if (InterlockedCompareExchangePointer(
&IoRing.head, work, cur_head) == cur_head)
                        break;
        }
}

This is functionally the same thing. Look at those two functions! There shouldn’t be a difference between them. I pulled up the assembly output for those functions and stared at it for a long while.

1 work$ = 8
 2 queue_work_fixed PROC                             ; COMDAT
 3        npad    2
 4 $LL2@queue_work:
 5        mov     rax, QWORD PTR IoRing+32
 6        mov     QWORD PTR [rcx+8], rax
 7        lock cmpxchg QWORD PTR IoRing+32, rcx
 8        jne     SHORT $LL2@queue_work
 9        ret     0
10 queue_work_fixed ENDP

A total of ten lines of assembly. Here is what is going on:

Line 5 - we read the IoRing.head into register rax (representing cur_head).
Line 6 - we write the rax register (representing cur_head) to work->next.
Line 7 - we compare-exchange the value of IoRing.head with the value in rcx (work) using rax (cur_head) as the comparand.
Line 8 - if we fail to update, we jump to line 5 again and re-try.

That is about as simple a code as you can get, and exactly expresses the intent in the C code. However, if I’m looking at the original version, we have:

1 work$ = 8
 2 queue_work_origin PROC                               ; COMDAT
 3         npad    2
 4 $LL2@queue_work_origin:
 5         mov     rax, QWORD PTR IoRing+32
 6         mov     QWORD PTR [rcx+8], rax
;                        ↓↓↓↓↓↓↓↓↓↓↓↓↓ 
 7         mov     rax, QWORD PTR IoRing+32
;                        ↑↑↑↑↑↑↑↑↑↑↑↑↑
 8         lock cmpxchg QWORD PTR IoRing+32, rcx
 9         cmp     rax, QWORD PTR [rcx+8]
10         jne     SHORT $LL2@queue_work_origin
11         ret     0
12 queue_work_origin ENDP

This looks mostly the same, right? But notice that we have just a few more lines. In particular, lines 7, 9, and 10 are new. Because we are using a field, we cannot compare to cur_head directly like we previously did but need to read work->next again on lines 9 &10. That is fine.

What is not fine is line 7. Here we are reading IoRing.headagain, and work->next may point to another value. In other words, if I were to decompile this function, I would have:

void queue_work_origin_decompiled(struct workitem* work)
{
    while (true)
    {
        work->next = IoRing.head;
//                        ↓↓↓↓↓↓↓↓↓↓↓↓↓ 
        struct workitem* tmp = IoRing.head;
//                        ↑↑↑↑↑↑↑↑↑↑↑↑↑
        struct workitem* cur_head = InterlockedCompareExchangePointer(
                        &IoRing.head, work, tmp);
        if (cur_head == work->next)
            break;
    }
}

Note the new tmp variable? Why is it reading this twice? It changes the entire meaning of what we are trying to do here.

You can look at the output directly in the Compiler Explorer.

This smells like a compiler bug. I also checked the assembly output of clang, and it doesn’t have this behavior.

I opened a feedback item with MSVC to confirm, but the evidence is compelling. Take a look at this slightly different version of the original. Instead of using a global variable in this function, I’m passing the pointer to it.

void queue_work_origin_pointer(
struct IoRingSetup* ring, struct workitem* work)
{
        while (1)
        {
                struct workitem* cur_head = ring->head;
                work->next = cur_head;
                if (InterlockedCompareExchangePointer(
&ring->head, work, work->next) ==  work->next)
                        break;
        }
}

And here is the assembly output, without the additional load.

ring$ = 8
work$ = 16
queue_work_origin PROC                              ; COMDAT
        prefetchw BYTE PTR [rcx+32]
        npad    12
$LL2@queue_work:
        mov     rax, QWORD PTR [rcx+32]
        mov     QWORD PTR [rdx+8], rax
        lock cmpxchg QWORD PTR [rcx+32], rdx
        cmp     rax, QWORD PTR [rdx+8]
        jne     SHORT $LL2@queue_work
        ret     0
queue_work_origin ENDP

That unexpected load was breaking our thread-safety assumptions, and that led to a whole mess of trouble. Violated invariants are no joke.

The actual fix was pretty simple, as you can see. Finding it was a huge hurdle. The good news is that I got really familiar with this code, to the point that I got some good ideas on how to improve it further 🙂.

Apr 02 2025

RavenDB.NET Aspire integration

time to read 2 min | 373 words

Tweet Share Share 0 comments

Tags:

.NET Aspire is a framework for building cloud-ready distributed systems in .NET. It allows you to orchestrate your application along with all its dependencies, such as databases, observability tools, messaging, and more.

RavenDB now has full support for .NET Aspire. You can read the full details in this article, but here is a sneak peek.

Defining RavenDB deployment as part of your host definition:

using Projects;


var builder = DistributedApplication.CreateBuilder(args);


var serverResource = builder.AddRavenDB(name: "ravenServerResource");
var databaseResource = serverResource.AddDatabase(
    name: "ravenDatabaseResource", 
    databaseName: "myDatabase");


builder.AddProject<RavenDBAspireExample_ApiService>("RavenApiService")
    .WithReference(databaseResource)
    .WaitFor(databaseResource);


builder.Build().Run();

And then making use of that in the API projects:

var builder = WebApplication.CreateBuilder(args);


builder.AddServiceDefaults();
builder.AddRavenDBClient(connectionName: "ravenDatabaseResource", configureSettings: settings =>
{
    settings.CreateDatabase = true;
    settings.DatabaseName = "myDatabase";
});
var app = builder.Build();


// here we’ll add some API endpoints shortly…


app.Run();

You can read all the details here. The idea is to make it easier & simpler for you to deploy RavenDB-based systems.

Oren Eini

Oren Eini

CEO of RavenDB

RecordingRavenDB's Upcoming Optimizations Deep Dive

Optimizing the cost of clearing a set

RavenDB Storage Provider for Orleans

RavenDB NewsMay 2025

RavenDB Node.js Client - v7.0 released (with Vector Search)

Converting automatic indexes to static indexes

RavenDB Cloud - Incidents History & Operational Suggestions

The null check that didn't check for nulls

AI Impact - normalizing Good Enough

Who can cancel Carmen Sandiego?

When racing the Heisenbug, code quality goes out the Windows

Production postmortemThe race condition in the interlock

Tests That Hang, Sometimes

RavenDB.NET Aspire integration

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed