Proper document modeling and multi document transactions

time to read 3 min | 557 words

One of the first features that RavenDB had, from the very first release, was multi document ACID transactions. With RavenDB you could modify multiple documents at the same time and then save them all, knowing that they would be be saved as a single atomic unit. Other NoSQL databases had you jump through fire if you wanted transactional behavior (building your own two phase commit protocol at the application level, not fun). I consider this to be a pretty important feature, obviously. But why?

In Inside RavenDB book, there is the following advice about modeling considerations for documents:

  • Independent, meaning a document should have its own separate existence from any other documents.
  • Isolated, meaning a document can change independently from other documents.
  • Coherent, meaning a document should be legible on its own, without referencing other documents.

In other words, with proper modeling, you shouldn’t need to have multi document transactions. Any transaction should only contain a single document, so that should be enough, no? Why spend all the time and effort on building multi document transactions?

Well, to start with “proper modeling” is a very loaded term. It is great if you can get it, but there are many cases where you need to deviate from it for various reasons. For example, in this blog, we represent a blog post as two documents. One holds the text of the post, the other holds the comments for the post. The reason for this separation is that there are many cases where you want only the blog post, and not the (potentially very many) comments.

In this case, the layout of the document is subject to the physical realities. It is better to split the document to multiple documents based on their purpose. However, if I add a comment to a post, I want to both add it to the Comments document and to increase the NumberOfComments property on the Post document. Doing this in a single transaction means that I don’t have to worry about consistency.

Another good example of wanting to work with multiple documents at the same time is when my documents aren’t using just holding data. Consider the case of accepting a new employee to the company. I need to:

  • Create the new Employee document.
  • Create initial workflow requirements (orientation, employee handbook, tax papers, setup machine / user / vpn access, etc).

In other words, each document here is independent, but they are created together. I want to create the new Employee’s document and at the same time setup a task for the IT to create a new user, allocate a machine, etc. I want to have the employee go through orientation, do all the require paper work, etc.

Each one of those is modeled as a separate document, because they are. See the definition above and consider how they match. But I absolutely don’t want to have a partial state. That I have a new employee, but I didn’t setup payroll for them is a big problem. Having multi document transaction make things a lot simpler.

You can argue that it would be better to model things as a set of related services with independent databases. I’m not sure that I disagree, but this is a much more complex architecture. Not having to go there on day one, while having clear and easy to use consistency guarantees is a major plus in my eyes.