The curse of resource utilization

time to read 3 min | 447 words

I got a request from a customer to look at the resource utilization of RavenDB under load. In particular, the customer loaded the entire FreeDB data set into RavenDB, then setup SQL Replication and a set of pretty expensive indexes (map reduce, full text searches, etc).

This was a great scenario to look at, and we did find interesting things that we can improve on. But during the testing, we had the following recording taken.

in fact, all of them were taken within a minute of each other:

image

image

image

The interesting thing about them is that they don’t indicate a problem. They indicate a server that is making as much utilization of resources as is available to it to handle the work load that it has.

While I am writing this, there is about 1GB of free RAM that we’ve left for the rest of the system, but basically, if you have a big server, and you give it a lot of work to do, it is pretty much given that you’ll want to get your money’s worth from all of that hardware. The problem in the customer’s case was that while it was very busy, it didn’t get things done properly, but that is a side note.

What I want to talk about is the assumption that the server is using a lot of resources, that is bad. In fact, that isn’t true, assuming that it is using them properly. For example, if you have a 32GB RAM, there is little point in trying to conserve memory utilization. And there is all the reason in the world to try to prepare ahead of time answers to upcoming queries. So we might be using CPU even on idle moments. The key here is how you detect when you are over using the system resources.

If you are under memory pressure, is is best to let go of that cache. And if you are busy handling requests, it is probably better to reduce the load of background tasks. We’ve been doing this sort of auto tuning in RavenDB for a while. A lot of the changes between 1.0 and 2.0 were done around that area.