RavenDB 4.0Maintaining transaction boundary integrity in a distributed cluster
Transactions are important for a database, even if it feels strange to talk about it. Sometimes it feels like taking out this ad:
We pretty much treat RavenDB’s transactional nature as a baseline, same as the safe assumption that any employee we hire will have a pulse. (Sorry, we discriminate against Zombies and Vampires because they create hostile work environment, see here for details).
Back to transactions, and why I’m brining up a basic requirement like that. Consider the case when you need to pay someone. That operation is compose of two distinct operations. First the bank debit your account and then the bank credit the other account. You generally want these to happen as a transactional unit, either both of them happened, or none of them did. In practice, that isn’t how banks work at all, but that is the simplest way to explain transactions, so we’ll go with that.
With RavenDB, that is how it always worked. You can save multiple documents in a single transaction, and we guarantee that we are either able to save all of them, or none. There is a problem, however, when we start talking about distributed clusters. When you save data to RavenDB, that goes into a single node, and then it is replicated from there. In RavenDB 4.0 we are now also guaranteeing that replicating the data between nodes will also respect the transaction boundary, so overall in the cluster you can now rely that we’ll never break apart transactions.
Let us consider the following transfer:
- Deduct 5$ from accounts/1234
- Credit 5$ to accounts/4321
These changes will also be replicated as a single unit, so the total amount of money in the system remains the same.
But a more complex set of operation can be:
- Deduct 5$ from accounts/1234
- Credit 5$ to accounts/4321
- Deduct 5$ from accounts/1234
- Credit 5$ to accounts/5678
In this case, we have a document that is involved in multiple transactions. When we need to replicate to another node, we’ll replicate the current state and we do that in batches. So it is possible that we’ll replicate just:
- Credit 5$ to accounts/4321
We don’t replicate accounts/1234 yet, because it has changed in a later transaction. That means that in one server, we suddenly have magically more money. While in general that would be a good thing, I’m told that certain parties aren’t in favor of such things, so we have another feature that interact with this. You can enable document revisions, in which case even if you documents are modified multiple times with different transactions, we’ll always send the transactions across the wire as they were saved.
This gives you transaction boundaries on a distributed database, and allow you to reason about your data in a saner fashion.
More posts in "RavenDB 4.0" series:
- (30 Oct 2017) automatic conflict resolution
- (05 Oct 2017) The design of the security error flow
- (03 Oct 2017) The indexing threads
- (02 Oct 2017) Indexing related data
- (29 Sep 2017) Map/reduce
- (22 Sep 2017) Field compression
Comments
Do you have a document or blog post somewhere where you have more details about RavenDB transactions? I always wondered how transactions are implemented in a NoSQL database ... are there read-locks/document(rows) versions etc.
Dalibor, I have _lots_, yes. Look at this blog for posts about Voron, which is the storage engine we use.
Comment preview