Document Databases are not Relational

time to read 4 min | 692 words

I got several similar questions regarding my post about modeling data for document databases:

…how would you handle a situation where you need (or want) to store some information in a relational database. For example, user accounts.
Would you duplicate the user accounts in the document db? If not, how would you relate posts to users and preserve some kind of integrity.

The most typical error people make when trying to design the data model on top of a document database is to try to model it the same way you would on top of a relational database. A document database is a non relational data store, and trying to hammer a relational model on top of it will produce sub optimal results. But you can get fantastic results by taking advantage on the documented oriented nature of Raven.

Documents, unlike a row in a RDBMS, are not flat, you are not limited to just storing keys and value. Instead, you can store complex object graphs as a single document. That includes arrays, dictionaries and trees. What it means, in turn, is that unlike a relational database, where a row can only contain simple values and more complex data structures need to be stored as relations, you don't need to work hard to get your data into Raven.

Let us take the following page as an example:

In a relational database, we would have to touch no less than 4 tables to show the data in this single page (Posts, Comments, Tags, RelatedPosts).

Using a document database, we can store all the data that we need to work with as a single document with the following format:

This format allows to get everything that we need to display the page shown above in a single request.

Documents are expected to be meaningful on their own. You can certainly store references to other documents, but if you need to refer to another document to understand what the current document means, you are probably using the document database wrongly.

With document database, you are encourage to include in your documents all the information they need. Take a look at the post example above. In a relational database, we would have a link table for RelatedPosts, which would contain just the ids of the linked posts. If we would have wanted to get the titles of the related posts, we would need to join to the Posts table again. You can do that in document database, but that isn't the recommended approach, instead, as shown in the example above, you should include all the details that you need inside the document. Using this approach, you can display the page with just a single request, leading to much better overall performance.

Nitpicker corner: Yes, it does mean that you need to update related posts if you edit the title of a post.

Once we established this context, we can try answering the actual question.

Assuming that we store users in a relational database, the question now becomes, what would we gain by replicating the users information to a document database?

If we were using a relational database, that would have given us the ability to join against the users. But a document database doesn’t support joins. Moreover, if we consider the apparent aim of the question “maintain some integrity”, we can see that it doesn’t really matter where we store the users’ data. A document database doesn’t support things like referential integrity in the first place, so putting the users inside the document database gives you no benefit.

Now, you may want to be able to put the users in the document database anyway, to benefit from the features that it brings to the table, but integrity isn’t one of those reasons.