architecture (601) rss
bugs (449) rss
challanges (123) rss
community (373) rss
databases (481) rss
design (893) rss
development (639) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (393) rss
programming (1080) rss
raven (1434) rss
ravendb.net (517) rss
reviews (184) rss

2025
- April (7)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Apr 14 2025

Who can cancel Carmen Sandiego?

time to read 2 min | 218 words

Tags:

RavenDB is a pretty big system, with well over 1 million lines of code. Recently, I had to deal with an interesting problem. I had a CancellationToken at hand, which I expected to remain valid for the duration of the full operation.

However, something sneaky was going on there. Something was cancelling my CancelationToken, and not in an expected manner. At last count, I had roughly 2 bazillion CancelationTokens in the RavenDB codebase. Per request, per database, global to the server process, time-based, operation-based, etc., etc.

Figuring out why the CancelationToken was canceled turned out to be a chore. Instead of reading through the code, I cheated.

token.Register(() =>
{
    Console.WriteLine("Cancelled!" + Environment.StackTrace);
});

I ran the code, tracked back exactly who was calling cancel, and realized that I had mixed the request-based token with the database-level token. A single line fix in the end. Until I knew where it was, it was very challenging to figure it out.

This approach, making the code tell you what is wrong, is an awesome way to cut down debugging time by a lot.

Apr 11 2025

Pricing transparency in RavenDB Cloud

time to read 1 min | 146 words

Tweet Share Share 0 comments

Tags:

Cloud service costs can often be confusing and unpredictable.RavenDB Cloud's new feature addresses this by providing real-time cost predictions whenever you make changes to your system. This transparency allows you to make informed choices about your cluster and easily incorporate cost considerations into your decision loop to take control of your cloud budget..

The implementation of cost transparency and visibility features within RavenDB Cloud has an outsized impact on cost management and FinOps practices. It empowers you to make informed decisions, optimize spending, and achieve better financial control.

The idea is to make it easier for you to spend your money wisely. I’m really happy with this feature. It may seem small, but it will make a difference. It also fits very well with our overall philosophy that we should take the burden of complexity off your shoulders and onto ours.

Apr 09 2025

When racing the Heisenbug, code quality goes out the Windows

time to read 8 min | 1522 words

Tweet Share Share 4 comments

Tags:

There are at least 3 puns in the title of this blog post. I’m sorry, but I’m writing this after several days of tracking an impossible bug. I’m actually writing a set of posts to wind down from this hunt, so you’ll have to suffer through my more prosaic prose.

This bug is the kind that leaves you questioning your sanity after days of pursuit, the kind that I’m sure I’ll look back on and blame for any future grey hair I have. I’m going to have another post talking about the bug since it is such a doozy. In this post, I want to talk about the general approach I take when dealing with something like this.

Beware, this process involves a lot of hair-pulling. I’m saving that for when the real nasties show up.

The bug in question was a race condition that defied easy reproduction. It didn’t show up consistently—sometimes it surfaced, sometimes it didn’t. The only “reliable” way to catch it was by running a full test suite, which took anywhere from 8 to 12 minutes per run. If the suite hung, we knew we had a hit. But that left us with a narrow window to investigate before the test timed out or crashed entirely. To make matters worse, the bug was in new C code called from a .NET application.

New C code is a scary concept. New C code that does multithreading is an even scarier concept. Race conditions there are almost expected, right?

That means that the feedback cycle is long. Any attempt we make to fix it is going to be unreliable - “Did I fix it, or it just didn’t happen?” and there isn’t a lot of information going on.The first challenge was figuring out how to detect the bug reliably.

Using Visual Studio as the debugger was useless here—it only reproduced in release mode, and even with native debugging enabled, Visual Studio wouldn’t show the unmanaged code properly. That left us blind to the C library where the bug resided. I’m fairly certain that there are ways around that, but I was more interested in actually getting things done than fighting the debugger.

We got a lot of experience with WinDbg, a low-level debugger and a real powerhouse. It is also about as friendly as a monkey with a sore tooth and an alcohol addiction. The initial process was all about trying to reproduce the hang and then attach WinDbg to it.

Turns out that we never actually generated PDBs for the C library. So we had to figure out how to generate them, then how to carry them all the way from the build to the NuGet package to the deployment for testing - to maybe reproduce the bug again. Then we could see in what area of the code we are even in.

Getting WinDbg attached is just the start; we need to sift through the hundreds of threads running in the system. That is where we actually started applying the proper process for this.

This piece of code is stupidly simple, but it is sufficient to reduce “what thread should I be looking at” from 1 - 2 minutes to 5 seconds.

SetThreadDescription(GetCurrentThread(), L"Rvn.Ring.Wrkr");

I had the thread that was hanging, and I could start inspecting its state. This was a complex piece of code, so I had no idea what was going on or what the cause was. This is when we pulled the next tool from our toolbox.

void alert() {
    while (1) {
        Beep(800, 200);
        Sleep(200);
    }
}

This isn’t a joke, it is a super important aspect. In WinDbg, we noticed some signs in the data that the code was working on, indicating that something wasn’t right. It didn’t make any sort of sense, but it was there. Here is an example:

enum state
{
  red,
  yellow,
  green
};


enum state _currentState;

And when we look at it in the debugger, we get:

0:000> dt _currentState
Local var @ 0x50b431f614 Type state
17 ( banana_split )

That is beyond a bug, that is some truly invalid scenario. But that also meant that I could act on it. I started adding things like this:

if(_currentState != red && 
   _currentState != yellow && 
   _currentState != green) {
   alert();
}

The end result of this is that instead of having to wait & guess, I would now:

Be immediately notified when the issue happened.
Inspect the problematic state earlier.
Hopefully glean some additional insight so I can add more of those things.

With this in place, we iterated. Each time we spotted a new behavior hinting at the bug’s imminent trigger, we put another call to the alert function to catch it earlier. It was crude but effective—progressively tightening the noose around the problem.

Race conditions are annoyingly sensitive; any change to the system—like adding debug code—alters its behavior. We hit this hard. For example, we’d set a breakpoint in WinDbg, and the alert function would trigger as expected. The system would beep, we’d break in, and start inspecting the state. But because this was an optimized release build, the debugging experience was a mess. Variables were often optimized away into registers or were outright missing, leaving us to guess what was happening.

I resorted to outright hacks like this function:

__declspec(noinline) void spill(void* ptr) {
    volatile void* dummy = ptr;
    dummy; // Ensure dummy isn't flagged as unused
}

The purpose of this function is to force the compiler to assign an address to a value. Consider the following code:

if (work->completed != 0) {
    printf("old_global_state : %p, current state: %p\n",
         old_global_state, handle_ptr->global_state);
    alert();
    spill(&work);
}

Because we are sending a pointer to the work value to the spill function, the compiler cannot just put that in a register and must place it on the stack. That means that it is much easier to inspect it, of course.

Unfortunately, adding those spill calls led to the problem being “fixed”, we could no longer reproduce it. Far more annoyingly, any time that we added any sort of additional code to try to narrow down where this was happening, we had a good chance of either moving the behavior somewhere completely different or masking it completely.

Here are some of our efforts to narrow it down, if you want to see what the gory details look like.

At this stage, the process became a grind. We’d hypothesize about the bug’s root cause, tweak the code, and test again. Each change risked shifting the race condition’s timing, so we’d often see the bug vanish, only to reappear later in a slightly different form. The code quality suffered—spaghetti logic crept in as we layered hacks on top of hacks. But when you’re chasing a bug like this, clean code takes a back seat to results. The goal is to understand the failure, not to win a style award.

Bug hunting at this level is less about elegance and more about pragmatism. As the elusiveness of the bug increases, so does code quality and any other structured approach to your project. The only thing on your mind is, how do I narrow it down?. How do I get this chase to end?

Next time, I’ll dig into the specifics of this particular bug. For now, this is the high-level process: detect, iterate, hack, and repeat. No fluff—just the reality of the chase. The key in any of those bugs that we looked at is to keep narrowing the reproduction to something that you can get in a reasonable amount of time.

Once that happens, when you can hit F5 and get results, this is when you can start actually figuring out what is going on.

Apr 07 2025

Production postmortemThe race condition in the interlock

time to read 18 min | 3547 words

Tweet Share Share 3 comments

Tags:

This post isn’t actually about a production issue—thankfully, we caught this one during testing. It’s part of a series of blog posts that are probably some of my favorite posts to write. Why? Because when I’m writing one, it means I’ve managed to pin down and solve a nasty problem.

This time, it’s a race condition in RavenDB that took mountains of effort, multiple engineers, and a lot of frustration to resolve.

For the last year or so, I’ve been focused on speeding up RavenDB’s core performance, particularly its IO handling. You might have seen my earlier posts about this effort. One key change we made was switching RavenDB’s IO operations to use IO Ring, a new API designed for high-performance, asynchronous IO, and other goodies. If you’re in the database world and care about squeezing every ounce of performance out of your system, this is the kind of thing that you want to use.

This wasn’t a small tweak. The pull request for this work exceeded 12,000 lines of code—over a hundred commits—and likely a lot more code when you count all the churn. Sadly, this is one of those changes where we can’t just split the work into digestible pieces. Even now, we still have some significant additional work remaining to do.

We had two or three of our best engineers dedicated to it, running benchmarks, tweaking, and testing over the past few months. The goal is simple: make RavenDB faster by any means necessary.

And we succeeded, by a lot (and yes, more on that in a separate post). But speed isn’t enough; it has to be correct too. That’s where things got messy.

Tests That Hang, Sometimes

We noticed that our test suite would occasionally hang with the new code. Big changes like this—ones that touch core system components and take months to implement—often break things. That’s expected, and it’s why we have tests. But these weren’t just failures; sometimes the tests would hang, crash, or exhibit other bizarre behavior. Intermittent issues are the worst. They scream “race condition,” and race conditions are notoriously hard to track down.

Here’s the setup. IO Ring isn’t available in managed code, so we had to write native C code to integrate it. RavenDB already has a Platform Abstraction Layer (PAL) to handle differences between Windows, Linux, and macOS, so we had a natural place to slot this in.

The IO Ring code had to be multithreaded and thread-safe. I’ve been writing system-level code for over 20 years, and I still get uneasy about writing new multithreaded C code. It’s a minefield. But the performance we could get… so we pushed forward… and now we had to see where that led us.

Of course, there was a race condition. The actual implementation was under 400 lines of C code—deliberately simple, stupidly obvious, and easy to review. The goal was to minimize complexity: handle queuing, dispatch data, and get out. I wanted something I could look at and say, “Yes, this is correct.” I absolutely thought that I had it covered.

We ran the test suite repeatedly. Sometimes it passed; sometimes it hung; rarely, it would crash.

When we looked into it, we were usually stuck on submitting work to the IO Ring. Somehow, we ended up in a state where we pushed data in and never got called back. Here is what this looked like.

0:019> k
 #   Call Site
00   ntdll!ZwSubmitIoRing
01   KERNELBASE!ioring_impl::um_io_ring::Submit+0x73
02   KERNELBASE!SubmitIoRing+0x3b
03   librvnpal!do_ring_work+0x16c 
04   KERNEL32!BaseThreadInitThunk+0x17
05   ntdll!RtlUserThreadStart+0x2c

In the previous code sample, we just get the work and mark it as done. Now, here is the other side, where we submit the work to the worker thread.

int32_t rvn_write_io_ring(void* handle, int32_t count, 
        int32_t* detailed_error_code)
{
        int32_t rc = 0;
        struct handle* handle_ptr = handle;
        EnterCriticalSection(&handle_ptr->global_state->lock);
        ResetEvent(handle_ptr->global_state->notify);
        char* buf = handle_ptr->global_state->arena;
        struct workitem* prev = NULL;
        for (int32_t curIdx = 0; curIdx < count; curIdx++)
        {
                struct workitem* work = (struct workitem*)buf;
                buf += sizeof(struct workitem);
                *work = (struct workitem){
                        .prev = prev,
                        .notify = handle_ptr->global_state->notify,
                };
                prev = work;
                queue_work(work);
        }
        SetEvent(IoRing.event);


        bool all_done = false;
        while (!all_done)
        {
                all_done = true;
                WaitForSingleObject(handle_ptr->global_state->notify, INFINITE)
                ResetEvent(handle_ptr->global_state->notify);
                struct workitem* work = prev;
                while (work)
                {
                        all_done &= InterlockedCompareExchange(
&work->completed, 0, 0);
                        work = work->prev;
                }
        }


        LeaveCriticalSection(&handle_ptr->global_state->lock);
        return rc;
}

We basically take each page we were asked to write and send it to the worker thread for processing, then we wait for the worker to mark all the requests as completed. Note that we play a nice game with the prev and next pointers. The next pointer is used by the worker thread while the prev pointer is used by the submitter thread.

You can also see that this is being protected by a critical section (a lock) and that there are clear hand-off segments. Either I own the memory, or I explicitly give it to the background thread and wait until the background thread tells me it is done. There is no place for memory corruption. And yet, we could clearly get it to fail.

Being able to have a small reproduction meant that we could start making changes and see whether it affected the outcome. With nothing else to look at, we checked this function:

void queue_work_origin(struct workitem* work)
{
    work->next = IoRing.head;
    while (true)
    {
        struct workitem* cur_head = InterlockedCompareExchangePointer(
                        &IoRing.head, work, work->next);
        if (cur_head == work->next)
            break;
        work->next = cur_head;
    }
}

I have written similar code dozens of times, I very intentionally made the code simple so it would be obviously correct. But when I even slightly tweaked the queue_work function, the issue vanished. That wasn’t good enough, I needed to know what was going on.

Here is the “fixed” version of the queue_work function:

void queue_work_fixed(struct workitem* work)
{
        while (1)
        {
                struct workitem* cur_head = IoRing.head;
                work->next = cur_head;
                if (InterlockedCompareExchangePointer(
&IoRing.head, work, cur_head) == cur_head)
                        break;
        }
}

This is functionally the same thing. Look at those two functions! There shouldn’t be a difference between them. I pulled up the assembly output for those functions and stared at it for a long while.

1 work$ = 8
 2 queue_work_fixed PROC                             ; COMDAT
 3        npad    2
 4 $LL2@queue_work:
 5        mov     rax, QWORD PTR IoRing+32
 6        mov     QWORD PTR [rcx+8], rax
 7        lock cmpxchg QWORD PTR IoRing+32, rcx
 8        jne     SHORT $LL2@queue_work
 9        ret     0
10 queue_work_fixed ENDP

A total of ten lines of assembly. Here is what is going on:

Line 5 - we read the IoRing.head into register rax (representing cur_head).
Line 6 - we write the rax register (representing cur_head) to work->next.
Line 7 - we compare-exchange the value of IoRing.head with the value in rcx (work) using rax (cur_head) as the comparand.
Line 8 - if we fail to update, we jump to line 5 again and re-try.

That is about as simple a code as you can get, and exactly expresses the intent in the C code. However, if I’m looking at the original version, we have:

1 work$ = 8
 2 queue_work_origin PROC                               ; COMDAT
 3         npad    2
 4 $LL2@queue_work_origin:
 5         mov     rax, QWORD PTR IoRing+32
 6         mov     QWORD PTR [rcx+8], rax
;                        ↓↓↓↓↓↓↓↓↓↓↓↓↓ 
 7         mov     rax, QWORD PTR IoRing+32
;                        ↑↑↑↑↑↑↑↑↑↑↑↑↑
 8         lock cmpxchg QWORD PTR IoRing+32, rcx
 9         cmp     rax, QWORD PTR [rcx+8]
10         jne     SHORT $LL2@queue_work_origin
11         ret     0
12 queue_work_origin ENDP

This looks mostly the same, right? But notice that we have just a few more lines. In particular, lines 7, 9, and 10 are new. Because we are using a field, we cannot compare to cur_head directly like we previously did but need to read work->next again on lines 9 &10. That is fine.

What is not fine is line 7. Here we are reading IoRing.headagain, and work->next may point to another value. In other words, if I were to decompile this function, I would have:

void queue_work_origin_decompiled(struct workitem* work)
{
    while (true)
    {
        work->next = IoRing.head;
//                        ↓↓↓↓↓↓↓↓↓↓↓↓↓ 
        struct workitem* tmp = IoRing.head;
//                        ↑↑↑↑↑↑↑↑↑↑↑↑↑
        struct workitem* cur_head = InterlockedCompareExchangePointer(
                        &IoRing.head, work, tmp);
        if (cur_head == work->next)
            break;
    }
}

Note the new tmp variable? Why is it reading this twice? It changes the entire meaning of what we are trying to do here.

You can look at the output directly in the Compiler Explorer.

This smells like a compiler bug. I also checked the assembly output of clang, and it doesn’t have this behavior.

I opened a feedback item with MSVC to confirm, but the evidence is compelling. Take a look at this slightly different version of the original. Instead of using a global variable in this function, I’m passing the pointer to it.

void queue_work_origin_pointer(
struct IoRingSetup* ring, struct workitem* work)
{
        while (1)
        {
                struct workitem* cur_head = ring->head;
                work->next = cur_head;
                if (InterlockedCompareExchangePointer(
&ring->head, work, work->next) ==  work->next)
                        break;
        }
}

And here is the assembly output, without the additional load.

ring$ = 8
work$ = 16
queue_work_origin PROC                              ; COMDAT
        prefetchw BYTE PTR [rcx+32]
        npad    12
$LL2@queue_work:
        mov     rax, QWORD PTR [rcx+32]
        mov     QWORD PTR [rdx+8], rax
        lock cmpxchg QWORD PTR [rcx+32], rdx
        cmp     rax, QWORD PTR [rdx+8]
        jne     SHORT $LL2@queue_work
        ret     0
queue_work_origin ENDP

That unexpected load was breaking our thread-safety assumptions, and that led to a whole mess of trouble. Violated invariants are no joke.

The actual fix was pretty simple, as you can see. Finding it was a huge hurdle. The good news is that I got really familiar with this code, to the point that I got some good ideas on how to improve it further 🙂.

Apr 04 2025

RavenDB on AWS Marketplace

time to read 1 min | 103 words

Tweet Share Share 0 comments

Tags:

We just announced the general availability of RavenDB on AWS Marketplace.

By joining AWS Marketplace, we provide users with a seamless purchasing experience, flexible deployment options, and direct integration with their AWS billing.

You can go directly to RavenDB on AWS Marketplace here.

That means:

One-click cluster deployment
Easy scaling for growing workloads
High-availability and security on AWS

Most importantly, being a partner in AWS Marketplace allows us to optimize costs and offer you flexible billing options via the Marketplace.

This opens up a whole new world of opportunities for collaboration.

You can find more at the following link.

Apr 02 2025

RavenDB.NET Aspire integration

time to read 2 min | 373 words

Tweet Share Share 0 comments

Tags:

.NET Aspire is a framework for building cloud-ready distributed systems in .NET. It allows you to orchestrate your application along with all its dependencies, such as databases, observability tools, messaging, and more.

RavenDB now has full support for .NET Aspire. You can read the full details in this article, but here is a sneak peek.

Defining RavenDB deployment as part of your host definition:

using Projects;


var builder = DistributedApplication.CreateBuilder(args);


var serverResource = builder.AddRavenDB(name: "ravenServerResource");
var databaseResource = serverResource.AddDatabase(
    name: "ravenDatabaseResource", 
    databaseName: "myDatabase");


builder.AddProject<RavenDBAspireExample_ApiService>("RavenApiService")
    .WithReference(databaseResource)
    .WaitFor(databaseResource);


builder.Build().Run();

And then making use of that in the API projects:

var builder = WebApplication.CreateBuilder(args);


builder.AddServiceDefaults();
builder.AddRavenDBClient(connectionName: "ravenDatabaseResource", configureSettings: settings =>
{
    settings.CreateDatabase = true;
    settings.DatabaseName = "myDatabase";
});
var app = builder.Build();


// here we’ll add some API endpoints shortly…


app.Run();

You can read all the details here. The idea is to make it easier & simpler for you to deploy RavenDB-based systems.

Mar 31 2025

AI Integration in RavenDB - Embeddings Generation

time to read 8 min | 1552 words

Tweet Share Share 0 comments

Tags:

In version 7.0, RavenDB introduced vector search, enabling semantic search on text and image embeddings.For example, searching for "Italian food" could return results like Mozzarella & Pasta. We now focus our efforts to enhance the usability and capability of this feature.

Vector search uses embeddings (AI models' representations of data) to search for meaning.Embeddings and vectors are powerful but complex.The Embeddings Generation feature simplifies their use.

RavenDB makes it trivial to add semantic search and AI capabilities to your system by natively integrating with AI models to generate embeddings from your data. RavenDB Studio's AI Hub allows you to connect to various models by simply specifying the model and the API key.

You can read more about this feature in this article or in the RavenDB docs. This post is about the story & reasoning behind this feature.

Cloudflare has a really good post explaining how embeddings work. TLDR, it is a way for you to search for meaning. That is why Ravioli shows up for Italian food, because the model understands their association and places them near each other in vector space. I’m assuming that you have at least some understanding of vectors in this post.

The Embeddings Generation feature in RavenDB goes beyond simply generating embeddings for your data.It addresses the complexities of updating embeddings when documents change, managing communication with external models, and handling rate limits.

The elevator pitch for this feature is:

RavenDB natively integrates with AI models to generate embeddings from your data, simplifying the integration of semantic search and AI capabilities into your system.The goal is to make using the AI model transparent for the application, allowing you to easily and quickly build advanced AI-integrated features without any hassle.

While this may sound like marketing jargon, the value of this feature becomes apparent when you experience the challenges of working without it.

To illustrate this, RavenDB Studio now includes an AI Hub.

You can create a connection to any of the following models:

Basically, the only thing you need to tell RavenDB is what model you want and the API key to use. Then, it is able to connect to the model.

The initial release of RavenDB 7.0 included bge-micro-v2 as an embedded model. After using that and trying to work with external models, it became clear that the difference in ease of use meant that we had to provide a good story around using embeddings.

Some things I’m not willing to tolerate, and the current status of working with embeddings in most other databases is a travesty of complexity.

Next, we need to define an Embeddings Generation task, which looks like this:

Note that I’m not doing a walkthrough of how this works (see this article or the RavenDB docs for more details about that); I want to explain what we are doing here.

The screenshot shows how to create a task that generates embeddings from the Title field in the Articles collection.For a large text field, chunking options (including HTML stripping and markdown) allow splitting the text according to your configuration and generate multiple embeddings.RavenDB supports plain text, HTML, and markdown, covering the vast majority of text formats.You can simply point RavenDB at a field, and it will generate embeddings, or you can use a script to specify the data for embeddings generation.

Quantization

Embeddings, which are multi-dimensional vectors, can have varying numbers of dimensions depending on the model.For example, RavenDB's embedded model (bge-micro-v2) has 384 dimensions, while OpenAI's text-embedding-3-large has 3,072 dimensions.Other common values for dimensions are 768 and 1,536.

Each dimension in the vector is represented by a 32-bit float, which indicates the position in that dimension.Consequently, a vector with 1,536 dimensions occupies 6KB of memory.Storing 10 million such vectors would require over 57GB of memory.

Although storing raw embeddings can be beneficial, quantization can significantly reduce memory usage at the cost of some accuracy.RavenDB supports both binary quantization (reducing a 6KB embedding to 192 bytes) and int8 quantization (reducing 6KB to 1.5KB).By using quantization, 57GB of data can be reduced to 1.7GB, with a generally acceptable loss of accuracy.Different quantization methods can be used to balance space savings and accuracy.

Caching

Generating embeddings is expensive.For example, using text-embedding-3-small from OpenAI costs $0.02 per 1 million tokens.While that sounds inexpensive, this blog post has over a thousand tokens so far and will likely reach 2,000 by the end.One of my recent blog posts had about 4,000 tokens.This means it costs roughly 2 cents per 500 blog posts, which can get expensive quickly with a significant amount of data.

Another factor to consider is handling updates.If I update a blog post's text, a new embedding needs to be generated.However, if I only add a tag, a new embedding isn't needed. We need to be able to handle both scenarios easily and transparently.

Additionally, we need to consider how to handle user queries.As shown in the first image, sending direct user input for embedding in the model can create an excellent search experience.However, running embeddings for user queries incurs additional costs.

RavenDB's Embedding Generation feature addresses all these issues.When a document is updated, we intelligently cache the text and its associated embedding instead of blindly sending the text to the model to generate a new embedding each time..This means embeddings are readily available without worrying about updates, costs, or the complexity of interacting with the model.

Queries are also cached, so repeated queries never have to hit the model.This saves costs and allows RavenDB to answer queries faster.

Single vector store

The number of repeated values in a dataset also affects caching.Most datasets contain many repeated values.For example, a help desk system with canned responses doesn't need a separate embedding for each response.Even with caching, storing duplicate information wastes time and space. RavenDB addresses this by storing the embedding only once, no matter how many documents reference it, which saves significant space in most datasets.

What does this mean?

I mentioned earlier that this is a feature that you can only appreciate when you contrast the way you work with other solutions, so let’s talk about a concrete example. We have a product catalog, and we want to use semantic search on that.

We define the following AI task:

It uses the open-ai connection string to generate embeddings from the Products’ Name field.

Here are some of the documents in my catalog:

In the screenshots, there are all sorts of phones, and the question is how do we allow ourselves to search through that in interesting ways using vector search.

For example, I want to search for Android phones. Note that there is no mention of Android in the catalog, we are going just by the names. Here is what I do:

$query = 'android'


from "Products" 
where vector.search(
      embedding.text(Name, ai.task('products-on-openai')), 
      $query
)

I’m asking RavenDB to use the existing products-on-openai task on the Name field and the provided user input. And the results are:

I can also invoke this from code, searching for a “mac”:

var products = session.Query<Products>()
.VectorSearch(
x => x.WithText("Name").UsingTask("products-on-openai"), 
factory => factory.ByText("Mac")
).ToList();

This query will result in the following output:

That matched my expectations, and it is easy, and it totally and utterly blows my mind. We aren’t searching for values or tags or even doing full-text search. We are searching for the semantic meaning of the data.

You can even search across languages. For example, take a look at this query:

This just works!

Here is a list of the things that I didn’t have to do:

Generate the embeddings for the catalog

And ensure that they are up to date as I add, remove & update products
Handle long texts and appropriate chunking
Perform quantization to reduce storage costs
Handle issues such as rate limits, model downtime (The GPUs at OpenAI are melting as I write this), and other “fun” states

Create a vector search index
Generate an embedding vector from the user’s input

See above for all the details we skip here

Query the vector search index using the generated embedding

This allows you to focus directly on delivering solutions to your customers instead of dealing with the intricacies of AI models, embeddings, and vector search.

I asked Grok to show me what it would take to do the same thing in Python. Here is what it gave me. Compared to this script, the RavenDB solution provides:

Efficiently managing data updates, including skipping model calls for unchanged data and regenerating embeddings when necessary.
Implementing batching requests to boost throughput.
Enabling concurrent embedding generation to minimize latency.
Caching results to prevent redundant model calls.
Using a single store for embeddings to eliminate duplication.
Managing caching and batching for queries.

In short, Embeddings Generation is the sort of feature that allows you to easily integrate AI models into your application with ease.

Use it to spark joy in your users easily, quickly, and without any hassle.

Mar 07 2025

RavenDB 7.0 ReleasedMoving to NLog

time to read 1 min | 195 words

Tweet Share Share 3 comments

Tags:

One of the “minor” changes in RavenDB 7.0 is that we moved from our own in-house logging system to using NLog. I looked at my previous blog posts, and I found the blog post outlining the rationale for the decision to use our own logging infrastructure from 2016.

At the time, no other logging framework was able to sustain the kind of performance that we required. The .NET community has come a long way since then, and it has become clear that we need to revisit this decision. Performance has a much higher priority, and the API at all levels supports that (spans, avoiding allocations, etc).

The move to NLog gives users a much simpler way to integrate RavenDB logs into their monitoring & observability pipeline.

You can read about the new NLog feature in our blog.

We also spent time making the logs view inside the RavenDB Studio nicer, taking advantage of the new capabilities we now expose:

Hopefully, you won’t need to dig too deeply into the logs, but it is now easier than ever to use them.

Mar 05 2025

RavenDB 7.0 ReleasedSnowflake & data warehouse integration

time to read 8 min | 1445 words

Tweet Share Share 0 comments

Tags:

RavenDB 7.0 adds Snowflake integration to the set of ETL targets it supports.

Snowflake is a data warehouse solution, designed for analytics and data at scale. RavenDB is aimed at transactional scenarios and has a really good story around data distribution and wide geographical deployments.

You can check out the documentation to read the details about how you can use this integration to push data from RavenDB to your Snowflake database. In this post, I want to introduce one usage scenario for such integration.

RavenDB is commonly deployed on the edge, running on site in grocery stores, restaurants’ self-serve kiosks, supermarket checkout counters, etc. Such environments have to be tough and resilient to errors, network problems, mishandling, and much more.

We had to field support calls in the style of “there is ketchup all over the database”, for example.

In such environments, you must operate completely independently of the cloud. Both because of latency and performance issues and because you must keep yourself up & running even if the Internet is down. RavenDB does very well in such a scenario because of its internal architecture, the ability to run in a multi-master configuration, and its replication capabilities.

From a business perspective, that is a critical capability, to push data to the edge and be able to operate independently of any other resource. At the same time, this represents a significant challenge since you lose the ability to have an overall view of what is going on.

RavenDB’s Snowflake integration is there to bridge this gap. The idea is that you can define Snowflake ETL processes that would push the data from all the branches you have to a single shared Snowflake account. Your headquarters can then run queries, analyse the data, and in general have near real-time analytics without hobbling the branches with having to manage the data remotely.

The Grocery Store Scenario

In our grocery store, we manage the store using RavenDB as the backend. With documents such as this to record sales:

{
  "Items": [
    {
      "ProductId": "P123",
      "ProductName": "Milk",
      "QuantitySold": 5
    },
    {
      "ProductId": "P456",
      "ProductName": "Bread",
      "QuantitySold": 2
    },
    {
      "ProductId": "P789",
      "ProductName": "Eggs",
      "QuantitySold": 10
    }
  ],
  "Timestamp": "2025-02-28T12:00:00Z",
  "@metadata": {
    "@collection": "Orders"
  }
}

And this document to record other inventory updates in the store:

{
  "ProductId": "P123",
  "ProductName": "Milk",
  "QuantityDiscarded": 3,
  "Reason": "Spoilage",
  "Timestamp": "2025-02-28T14:00:00Z",
  "@metadata": {
    "@collection": "DiscardedInventory"
  }
}

These documents are repeated many times for each store, recording the movement of inventory, tracking sales, etc. Now we want to share those details with headquarters.

There are two ways to do that. One is to use the Snowflake ETL to push the data itself to the HQ’s Snowflake account. You can see an example of that when we push the raw data to Snowflake in this article.

The other way is to make use of RavenDB’s map/reduce capabilities to do some of the work in each store and only push the summary data to Snowflake. This can be done in order to reduce the load on Snowflake if you don’t need a granular level of data for analytics.

Here is an example of such a map/reduce index:

// Index Name: ProductConsumptionSummary
// Output Collection: ProductsConsumption


map('Orders', function(order) {
    return {
        ProductId: order.ProductId,
        TotalSold: order.QuantitySold,
        TotalDiscarded: 0,
         Date: dateOnly(order.Timestamp)
    };
});


map('DiscardedInventory', function(discard) {
    return {
        ProductId: discard.ProductId,
        TotalSold: 0,
        TotalDiscarded: discard.QuantityDiscarded,
         Date: dateOnly(order.Timestamp)
    };
});


groupBy(x => ({x.ProductId, x.Date) )
  .aggregate(g =>{
         ProductId: g.key.ProductId,
         Date: g.key.Date,
         TotalSold: g.values.reduce((c, val) => g.TotlaSld + c, 0),
         TotalDiscarded: g.values.reduce((c, val) => g.TotalDiscarded + c, 0)
   });

This index will output its documents to the artificial collection: ProductsConsumption.

We can then define a Snowflake ETL task that would push that to Snowflake, like so:

loadToProductsConsumption({
        PRODUCT_ID: doc.ProductId,
        STORE_ID: load('config/store').StoreId,
        TOTAL_SOLD: doc.TotalSold,
        TOTAL_DISCARDED: doc.TotalDiscarded,
        DATE: doc.Date
});

With that in place, each branch would push details about its sales, inventory discarded, etc., to the Snowflake account. And headquarters can run their queries and get a real-time view and understanding about what is going on globally.

Mar 03 2025

RavenDB 7.0 ReleasedAWS SQS & AWS Lambda integration

time to read 4 min | 728 words

Tweet Share Share 0 comments

Tags:

The big-ticket item for RavenDB 7.0 may be the new vector search and AI integration, but those aren’t the only new features this new release brings to the table.

AWS SQS ETL allows you to push data directly from RavenDB into an SQS queue. You can read the full details in our documentation, but the basic idea is that you can supply your own logic that would push data from RavenDB documents to SQS for additional processing.

For example, suppose you need to send an email to the customer when an order is marked as shipped. You write the code to actually send the email as an AWS Lambda function, like so:

def lambda_handler(event, context):
    for record in event['Records']:
        message_body = json.loads(record['body'])
        
        email_body = """
Subject: Your Order Has Shipped!


Dear {customer_name},
Great news! Your order #{order_id} has shipped. Here are the details:
Track your package here: {tracking_url}
""".format(
            customer_name=message_body.get('customer_name'),
            order_id=message_body.get('order_id'),
            tracking_url=message_body.get('tracking_url')
        )
        
        send_email(
            message_body.get('customer_email'),
            email_body
        )
        
        sqs_client.delete_message(
            QueueUrl=shippedOrdersQueue,
            ReceiptHandle=record['receiptHandle']
        )

You wire that to the right SQS queue using:

aws lambda create-event-source-mapping \
--function-name ProcessShippedOrders \
--event-source-arn arn:aws:sqs:$reg:$acc:ShippedOrders \
--batch-size 10

The next step is to get the data into the queue. This is where the new AWS SQS ETL inside of RavenDB comes into play.

You can specify a script that reacts to changes inside your database and sends a message to SQS as a result. Look at the following, on the Orders collection:

if(this.Status !== 'Completed') return;


const customer = load(this.Customer);
loadToShippedOrders({
  'customer_email': customer.Email,
  'customer_name': customer.Name,
  'order_id':        id(this),
  'tracking_url': this.Shipping.TrackingUrl
});

And you are… done! We have a full blown article with all the steps walking you through configuring both RavenDB and AWS that I encourage you to read.

Like our previous ETL processes for queues, you can also use RavenDB in the Outbox pattern to gain transactional capabilities on top of the SQS queue. You write the messages you want to reach SQS as part of your normal RavenDB transaction, and the RavenDB SQS ETL will ensure that they reach the queue if the transaction was successfully committed.

Oren Eini

Oren Eini

CEO of RavenDB

Who can cancel Carmen Sandiego?

Pricing transparency in RavenDB Cloud

When racing the Heisenbug, code quality goes out the Windows

Production postmortemThe race condition in the interlock

Tests That Hang, Sometimes

RavenDB on AWS Marketplace

RavenDB.NET Aspire integration

AI Integration in RavenDB - Embeddings Generation

Quantization

Caching

Single vector store

What does this mean?

RavenDB 7.0 ReleasedMoving to NLog

RavenDB 7.0 ReleasedSnowflake & data warehouse integration

The Grocery Store Scenario

RavenDB 7.0 ReleasedAWS SQS & AWS Lambda integration

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed