Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 498 words

In RavenDB 5.2 we made a small, but significant, change to how RavenDB is processing cluster wide transactions. A cluster wide transaction in RavenDB prior to the 5.2 release would look like this:

A cluster wide transaction in RavenDB allows you to mix writing documents as well as compare exchange values. The difference between those two options is that documents are stored at the database level and compare exchange values are stored at the cluster level.

As such, a cluster wide transaction can only fail if the compare exchange values aren’t matching the expected values. To be rather more blunt, optimistic concurrency on a document with a cluster wide transaction is not going to work (in fact, we’ll throw an error if you’ll try). Unfortunately, it can be tricky to understand that and we have seen users in the field that made the wrong assumptions about how cluster wide transactions interact with documents.

The most common mistake that we saw was that people expected changes to documents to be validated as part of the cluster wide transaction. However, RavenDB is a multi master database, so changes can happen across cluster wide transactions, normal transactions as well as replication from isolated locations. That make it challenging to match the expected behavior. Leaving the behavior as it, however, means that you have a potential pitfall for users to stumble into. That isn’t where we want to be, so we fixed it.

Let’s consider the following piece of code:

Notice the code in line 8, we are creating a reservation for the username by allocating a document id for the new document. However, that is the kind of code that would work when running in a single server, not in a cluster. The change we made in RavenDB 5.2 means that we are now able to make code this simple happen across the cluster in a consistent and usable manner.

Behind the scenes, RavenDB creates these guard compare exchange values. Those are used to validate the state of the transaction across the cluster.

image

The change vectors of the documents are also modified, like so: "RAFT:2-XiNix+yo0E+sArbTyoLLyQ,TRXN:559-bgXYYHaYiEWKR2mpjurQ1A".

The idea is that the TRXN notation gives us the relevant compare exchange value index, which we can use to ensure that the cluster wide state of the document matches the expected state at the time of the read.

In short, you can skip using compare exchange values directly in cluster wide transactions. RavenDB will understand what you mean and create the relevant entries and manage them for you. That would work also when you are working with both cluster wide and regular transactions.

A common scenario for this code above is that you’ll include the usernames/ayende reservation document whenever you modify the user’s name, and that will always be using a cluster wide transactions. However, when modifying the user and not the user name, you can skip the cost of a cluster wide transaction.

time to read 3 min | 571 words

RavenDB 5.2 introduce a new concept for deploying indexes: Rolling indexes deployment.

Typically, deploying an index to production on a loaded database is something that you do only with great trepidation. There are many horror stories about creating a new index and resulting in the entire system locking down for a long period of time.

RavenDB was specifically designed to address those concerns. Deploying a new index in production is something that you are expected to do. In fact, RavenDB will create indexes for you on the fly as needed, in production.

To be perfectly honest, the two aspects are very closely tied together. The fact that we expect to be able to create an index without disruption of  service feeds into us being able to create indexes on the fly. And creating indexes on the fly ensures that we’ll need to keep being able to create indexes without putting too much load on a running system.

RavenDB limits the amount of CPU time that a new index can consume and will control the amount of memory and I/O that is used by an index to prioritize user queries over background work.

In version 5.2, we have extended this behavior to allow RavenDB further. We now allow users to deploy their own code as part of RavenDB indexes, which make it much harder to control what exactly is going on inside RavenDB during indexing. For example, you may have choose to run something like Parallel.For(), which may use more CPU than RavenDB accounts for. The situation is a bit more complex in the real world, because we need to worry about other factors as well (memory, I/O, CPU and network comes to mind).

Consider what happens if a user does something like this in an index:

RavenDB has no way to control what is actually going on there, and this code will use 1GB of RAM and quite a bit of CPU (over multiple cores) without the ability to control that. This is a somewhat contrived example, I’ll admit (can’t think of any reason you’ll want to do this sort of thing in an index). It is far more common to want to do Machine Learning Predictions in indexes now, which can have similar affects.

When pushing a large number of documents through such an index, such as the scenario of deploying a new index, that can put a lot of strain on the system. Enter: Rolling index deployment.

This is an index deployment mode where RavenDB will not immediately deploy the index to all the nodes. Instead, it will choose the least busy node and get it to run the index. At any time, only a single node in the cluster is going to run the index, and that node is going to be in the back of the line for any other work the cluster has for it. Once that node is completed, RavenDB will select the next node (and reassign work as needed).

The idea is that even if you deploy an index that has a negative impact on the system behavior, you have mitigated the potential impact to a single (hopefully unused) node.

The cost of that, of course, is that the indexes are now going to run in a serial fashion, one node at a time. That means that they will take longer to deploy to the entire system, of course, but the resource utilization is going to be far more stable.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}