The use of caches in OR/M

time to read 4 min | 772 words

I recently talked quite a bit about caches in NHibernate, and I am a great believer in careful use of it in order to give an application much better performance. Frans Buoma, however, does not agree. Just to note, Frans is the author of LLBLGen Pro.

First, let me point to an issue that I have with the terminology that he uses. When Frans is talking about cache and uniquing, he refers to a term generally (at least by N/Hibernate & Fowler) called Identity Map.

Frans:

 A cache is an object store which manages objects so you don't have to re-instantiate objects over and over again, you can just re-use the instance you need from the cache.

Fowler:

Ensures that each object gets loaded only once by keeping every loaded object in a map.

When speaking about the advantages of an Identity Map, performance is almost never the first reason to use it. It is a side benefit, which can have a certain affect, but it is not the main reason for that. If we consider Frans' arguments as they apply to Identity Map, I agree. If nothing else, Identity Map tends to be fairly short lived and limited in scope in most cases, so it doesn't have the chance to be of great effectiveness.

But an OR/M has an opportunity to cache much more than just at the session / context level. A word of warning, though, as was mentioned in the post, Caching by its very nature means that you are not seeing the very latest data. You can use cache invalidation policies (including the new data driven cache invalidation policites in .Net 2.0) to help, but you should be aware of this issue.

However, when we consider the common scenarios, it is not often that we need to have real time information. The case than Frans is presenting is a CRM application with a query on all the customers that has more than 5 orders in the last month.

Do we really need this data at real time? Or can we be satisfied with data from several minutes ago? This question is dependant on the business scenario, but fairly often the answer is that we can be reasonably satisfied with a data that is a few minutes or hours behind the real events.

Even if we would like to get real time data, the data can be changed between the time that we queried it and the time that we displayed it, so we would need to query again as soon as we finished displaying (or maybe at the same time as), ad infinitum.

Given that we assume that the business requirements allows us to use caching, this has tremendous benefit perfromance wise. Let us assume that we have cached the query and its results (again, I'm using NHibernate as the model here, and its caches are not caching live entities, but rather their values), we can then satisfy the query entirely from the cache (which usually mean in-proc memory).

The only real cost of the query is several hash table lookups, which are (by their nature) very fast, and constructing the objects, which I already shown to be highly efficent. The end result is that we can serve the results immediately. In many cases, even a cache that is valid for a few minutes can significantly reduce the amounts of queries that the DB has to process.

The concerns that Frans is raising are valid in the context* that he is talking about, but I disagree that caches are not extremely improtant to performance. That said, they should not be over used, and the DB is still the one and only authoritive source for the data. I have seen some places where the requirement is to run the application entirely from cache, without touching the database at all.

This is taking this way too far...

* Do you get the joke here?