The problem of over aggressive caching

time to read 3 min | 456 words

Following the recent profiling effort, I decided to put far more aggressive caching into SvnBridge.

I set it up so it would cache the full revision from TFS on any query, and then serve it from the cache. When I run it against the test server it worked beautifully. Then I had run into this issue:

image

I am pretty sure that this is not going to be an acceptable scenario. To be rather exact, I would find it acceptable if it was a one time cost, but the problem is that this is a cost that you have to pay per revision. And that is unacceptable. The major problem is that this uses the underlying QueryItems() method, which returns all of the results, including those from previous revisions. This means that on a busy server (like tfs03), the cost of doing such a query is high.

The number of files returned is actually pretty small (910 in this case), but I assume that it have to check all the files on the server for permission before it allows me to get them.

I wonder how Rhino Security would handle this situation, it wouldn't even get the data out of the DB, and the query enhancement is pretty light weight. I assume it would be pretty fast.

Anyway, this is obviously a bad approach. For now, I made it load only the path (and its descendants) that we need, this mean that we don't get the same benefit of preemptive caching and might talk to the server a bit too much. However, it turn out that the way SvnBridge and SVN makes requests in a way that make this style of caching work fairly well. We always ask for the directory before asking for the descendant, and we have fairly long conversations about the same revision, so that is good candidate.

That isn't optimal for big projects, with a lot of files and a lot of activity, however. Because the way I handle it now, we download the entire project metadata for each revision, that can be a lot for those kind of projects, and having to download them each and every time is a waste.

SvnBridge already contains a very smart piece of code (the UpdateDiffCalculator class) that can figure out the differences between two revisions and only get the items that it needs. The problem is that the caching layer is built mainly in order to support that class.

I think that I'll need to get a bit smarter about this in the future, but for now it seems to be doing the work very well.