What kind of logging should you do in production?
That really depends on the type of application that you write and what kind of operations team it is going to have.
I have applications that I setup, then forget about (this blog being one of them). In those types of applications, having a log in production is a burden, I need to purge it occasionally, or write the code to purge it automatically.
Then I have applications that have a strong operations team, where people are looking at the application every single day. An alert raised from the system is actually going to be looked at by a human before irate customers start calling. In those cases, a lot is pretty important, and understanding how to properly distinguish between real errors (human needs to look at) and transient ones (do a review once a month) is pretty important.
Setting things up is that I have production sites log only error conditions, which is pretty common, is also a mistake, as a simple example, I had once seen a log that where 40% of the errors where users coming back to the site after the session has timed out, and the error was leading them to the error page.
The way I try to do things is:
- Pay attention to messages that arrive to the error queue, see if there is anything that can be done about them.
- Log & alert any time that an error crosses the system boundary (if the users see an error page, I really want to know about it).
- Setup things so I can change log levels in productions without restarting / redeployment, etc.
Please note that I am making a distinction here between developer’s log and audit trails or operations information. Depending on the type of system that you have, and the requirements on it, those two can be a gold mine when trying to troubleshoot issues.
Providing things like performance counters or access to internal state in your application is also important. For example, being able to ask the app for the worst performing queries is a great way of troubleshooting perf issues. Or querying the cache miss ratios, etc. It isn’t just logging that gives you visibility into the system.
Something that I haven’t had the chance to do yet (but that I would like to try) is to plug the NH Prof backend (which is basically an event aggregation and analysis system) as a way to analyze log streams. That way, even if you do have some logging turned on, it doesn’t stay in its raw form, but is translated to something much more concise and understandable.
Comments
My personal preference is Log4j for Java applications, as it is simple and effective. Have used it in many production environments and no issues till date.
Applications that are setup to run and then forgotten about are actually applications that could most stand to have good logs. There should be no "burden" involved in purging the log files. At a minimum, a good logging system will "roll" the log files, ensuring a maximum amount of disk space usage at all times. No need to care that the logs are there... until something happens that you have to diagnose.
I find one feature extremely useful in production - the ability to reconfigure logging on the fly, without stopping or restarting the application. This way you can turn on more detailed logging if necessary and then return to 'standard' logging. And perfect logging library would allow me to put log statements in source code without stopping the application ;) Up to now I've been able to do that only with scripts and it was good.
I find centralized logging as a huge benefit in production. We are using logFaces for doing most of log management stuff and it proves to be great. It aggregates the log data, stores it in separate database, makes log searchable and viewable in real-time. And most importantly - you don't manage the log files, which is a major headache. Whoever needs the log can easily grab it from the database in a snap. This could be a developer, or a support person.
we've experimented with NH before... let me just say there are better options out there. Don't get me wrong its not a horrible system.. but it can be a pain in the ass from time to time.
Comment preview