Cache, it ain’t just remembering stuff
I mentioned that this piece of code have an issue:
public class LocalizationService { MyEntities _ctx; Cache _cache; public LocalizationService(MyEntities ctx, Cache cache) { _ctx = ctx; _cache = cache; Task.Run(() => { foreach(var item in _ctx.Resources) { _cache.Set(item.Key + "/" + item.LanguageId, item.Text); } }); } public string Get(string key, string languageId) { var cacheKey = key +"/" + languageId; var item = _cache.Get(cacheKey); if(item != null) return item; item = _ctx.Resources.Where(x=>x.Key == key && x.LanguageId == languageId).SingleOrDefault(); _cache.Set(cacheKey, item); return item; } }
And I am pretty sure that the lot of you’ll be able to find a lot of additional issues that I’ve not thought about.
But there are at least three major issues in the code above. It doesn’t do anything to solve the missing value problem, it doesn’t have good handling for expiring values and have no way to handle changing values.
Look at the code above, assume that I am making continuous calls to Get(“does not exists”, “nh-YI”), or something like that. The way the code is currently written, it will always hit the database to get that value.
The second problem is that if we have had a cache cleanup run, which expired some values, we will actually load them one at a time, in pretty much the worst possible way from the point of view of performance.
Then we have the problem of how to actually handle updating values.
Let us see how we can at least approach this. We will replace the Cache with a ConcurrentDictionary. That will mean that the data cannot just go away from under us, and since we expect the number of resources to be relatively low, there is no issue in holding all of them in memory.
Because we know we hold all of them in memory, we can be sure that if the value isn’t there, it isn’t in the database either, so we can immediately return null, without checking with the database.
Last, we will add a StartRefreshingResources task, which will do the actual refreshing in an async manner. In other words:
public class LocalizationService { MyEntities _ctx; ConcurrentDictionary<Tuple<string,string>,string> _cache = new ConcurrentDictionary<Tuple<string,string>,string>(); Task _refreshingResourcesTask; public LocalizationService(MyEntities ctx) { _ctx = ctx; StartRefreshingResources(); } public void StartRefreshingResources() { _refreshingResourcesTask = Task.Run(() => { foreach(var item in _ctx.Resources) { _cache.Set(item.Key + "/" + item.LanguageId, item.Text); } }); } public string Get(string key, string languageId) { var cacheKey = Tuplce.Create(key,languageId); var item = _cache.Get(cacheKey); if(item != null || _refreshingResourcesTask.IsCompleted) return item; item = _ctx.Resources.Where(x=>x.Key == key && x.LanguageId == languageId).SingleOrDefault(); _cache.Set(cacheKey, item); return item; } }
Note that there is a very subtle thing going on in here. as long as the async process is running, if we can’t find the value in the cache, we will go to the database to find it. This gives us a good balance between stopping the system entirely for startup/refresh and having the values immediately available.
Comments
Well, you could have said what is the Cache class you're replacing with ConcurrentDictionary. After all, we can't assume anything about it's cleanup procedure or updating entries, and without this information, well, I don't really see a reason to make the changes you've made. BTW, your implementation will also hit the database every time a missing key is accessed. And anyway, aren't localized strings supposed to remain constant? If so, why ever wonder about cache expiration?
Rafal, Cache implies some sort of expiration, so you need to deal with that. My implementation will only hit the db for a missing value as long as the load all items task is running.
Don't agree ;) It will query the db every time you hit a null item in the cache, which is every time you access a missing value
[code]
[/code]
Rafal, You are correct, this should be an OR, not an AND. Fixed now.
Hi Ayende,
I know your code is for showing how to implement caching with an async approach which I think is great.
However, I just wanted to mention three things: Firstly your Task.Run does not do exception handling, and I think it has problem to keep the state of cache correctly when exception happens. Secondly ConcurrentDictionary does not have Get/Set methods, but has TryGet/TryAdd/AddorUpdate/etc and I think you could easily use TryGet and GetOrAdd instead of Get/Set. Thirdly, your sample does deal with expiration.
I think cache expiration is always a difficult issue. In some situations it's no big problem working with an older version of a value but sometimes you want more "fresh" data. It depends on the use case. Therefore I propose to implement a Get function with an additional "freshness" parameter of type DateTime. The cache value timestamp must be younger than that otherwise it will be refreshed.
TomCollins, That is a great idea, until you realize that you just punted the buck down the road. How is the developer calling Get going to decide that?
I take the start time of a "long" lasting operation (e.g. start of a session, start of an export) and use that. Let's say you do an export of orders together with customer information. The customer will be fetched once "fresh" and later on from the cache.
In normal use cases there should be taken a default freshness value - to keep the developer away from the buck down the road" problem you mentioned :-)
Any reason you didn't use ForEach.Parallel in the StartRefreshingResources method?
You have the process running on its own thread, but it seems like you could use some parallelism (is that a word?) there.
Mekon256, We don't need to. We are doing only a single expensive operation, not many.
There is a race condition, where _cache.Get(cacheKey) returned null but _refreshingResourcesTask was still running, but by the time you check for _refreshingResourcesTask.IsCompleted it is already completed.
Comment preview