Active Metrics or Slapping the Sloppy Developers

Nov 16 2006

Active Metrics or Slapping the Sloppy Developers

time to read 7 min | 1302 words

Any performance advice you hear starts with "Measure". And indeed, I know of nothing worse than making "performance improvements" without numbers that you can check. But I am not going to talk about perf tuning in this post, I am going to talk about something a bit different, how to avoid the more major pitfalls of performance, and in general, how to put (strict) guidelines for developers.

It is safe to assume that the #1 cause of performance issues in most applications is talking to the database. It is really easy (especially if you are using an abstraction) to do some things that are atrociously bad to the database. But it is pretty easy to do a lot of damage even when you are just using the raw DB model.

The worst I have ever seen was in an Oracle-based application that was "hand optimized" for every single action. I am not sure what it was optimized for, because it was really hard to read and really bad from the database performance perspective (nested cursors on a tables with many millions of rows, for instance, with no where clause and hidden assumption about the ordering).

Other reasons usually include a hotspot in an unexpected location, or simple a complex piece of work that just takes time by its nature, but those aren't what I am talking about here.

The cause is usually oversight, something that is really easy to fix in place (or should be escelated early on) at the time you are writing it. Trying to fix this issue three months later, when you suddenly discover that loading a page results in unacceptable performance... well, that is a whole another story.

My method for this is very simple, actually. Measure and Fail. Set a limit to how many queries a page is allowed to perform to display its content. On the end of the request, check that it has not crossed that limit. If it had, fail. Throw an exception that would stop the developer from doing any more work until they fix this issue. I have shown how this can be done for NHibernate here, and you can see that it is really easy to check for the number of queries per page using this method.

This all ties back to the Early Feedback Principal. It is very easy to see that that call over there is calling the database once per each customer, instead of loading all the data at once, a few minutes after you wrote it. It is a lot harder to look at a perf chart and try to analyze why accessing the contact information page is taking three minutes to load.

I would use this approach on the dev / test machines. This would mean that even if the developer didn't hit that limit during development, it is likely that the QA would catch it, and would file a bug: "ContactInformation.aspx is throwing PerformanceViolationException when I do XYZ..."

A few notes about this approach:

When you fail, fail hard. This means throwing an exception, not showing something on the trace. The idea is that the developer would get that yellow screen that would tell them "fix me!".
When you fail, give exact details about what happened:

Performance Violation - The page ContactInformation.aspx execute 56 database queries, but only 10 are allowed. Reduce the number of queries the the allowed limit, or talk to Oren about why this page should get a special extention.
Make sure that you have an easy way to shut it off (for production, for instnace).
All the tests on the build server must have this setting on, so they would fail if this was violated.
Provide a simple way to temporarily disable this (to show functionality, to review all steps of a problem, etc). This query string is usually enough: "hack=42"
You also will want a list of exclusions, most probably
ASP.Net HttpModules are suited very well for this approach, by the way. And it is a really nice way to set up a set of this checks.

Of the top of my head, I can think of the following things that I would measure and fail in this way:

Queries per page
Total page execution time - note that this require that you would take into account if you are in the debugger or not.
Usage of session variables - which I tend to really dislike.
Total size of a page.
Total size of view state.
Validating XHTML / HTML output.

This is something that you probably would want to do from the start, but I have no compunction about enabling things like this in mid project. In my opinion, this is the simplest (and perhaps the only) way that you can maintain a set of standards in a team.

I found that using this approach for cross cutting changes is the best way of handling those issues. I can usually find the 90% of the places to change, but the last 10% are elusive, and maybe a developer in the future will not remember that we need to do something like this because of this or that. Failing fast means that I get visible issues, not invisible ones.

A good example of that is security. I had to add a column level security on all the grids in the application, after starting out with page level security. I solved this issue by adding the security logic to the grid itself. But the defination about the required operation for each column had to come from the column itself. This means that each and every (actionable) column in the application had to have an operation specified. I didn't get all of them when I implemented this approach. But when I try to load a grid with column that doesn't have an operation, I get a very nice error message saying that I really should care about security and what about putting an operation on the "Delete User" column?

This way, I ensure that either everything works like it should, or it doesn't work at all. There is not half-measures here. This is happening when you code, so the feedback cycle is very short and easy to solve.

Tweet Share Share 0 comments

Tags:

Oren Eini

Oren Eini

CEO of RavenDB