Messaging Concepts - Auditability & Tracebility
In normal systems, when we want to understand what is going on, or to investigate a problem, we have a really simple option, just debug it until you have found the root cause. This is so common that we actually have best practices against being too dependent on the debugger.
I think that we can all agree that the industry is going more and more toward parallelism. And distributed applications are more and more common. This present an interesting problem for us when we come to understand & troubleshoot a system. It is no longer possible to simply step through the code as it is executing and thus gain the knowledge that we need in order to understand exactly what is going on.
In order to understand these kind of systems, we need to develop new tools and approaches. Microsoft already did some of that when they built web services for .Net. It is literally possible to debug through a web service call and move from the client to the server (assuming that they are on the same machine) just by pressing F11.
This doesn't really work for most scenarios, however, a common example is a system that is distributed in both time and space (imagine an authorization process that can take minutes or hours). Or a system where the stream of messages is simply too high to be able to individually understand.
In order to deal with this type of systems, we need to go a long way back. Sometimes before we had debuggers, we still needed a way to figure out what is going on, and we found it.
Welcome to printf() debugging (or Response.Write debugging, if you prefer it this way).
And no, we are not quite in the same position, but we are close. One of the main problems here is the fact that we need to coordinate several different machines and correlate between work done on different times and with different capacity.
WCF calls this end to end logging, and achieve this by attaching a guid that you have to carry around, plus some tools that allow you to merge different logging files to give a unified view across systems.
BizTalk has the notion of Business Activity Monitoring, and other tools share the same concept. All of them are based on the notion of a common id that spans multiple messages and can be used across machines to get a single view of the entire set of actions.
In a world that is fast become more and more distributed, such tools are quickly becoming essentials, and I foresee quite a few best practices that will be aimed solely at ensuring that we keep that single thread of traceability in place.
I find it quite amusing that we are basically going back to reading log files to figure out what is going on with our applications. Of course, there are logs and there are logs, and I'll talk about logging & auditing in a lot more detail in a future post.
Comments
"I find it quite amusing that we are basically going back to reading log files to figure out what is going on with our applications."
And all the people I've worked with recently who developed in Perl and 'just' wrote to the console are thinking "We'd tell you we told you so, but we are too busy getting work done...and trying to remember what our Perl code does."
Looking forward to your forthcoming posts on logs and logging.
If I want system-wide logging I get my services to publish messages to a 'topic', that way I can subscribe to all 'live' service topics at any time with a single application and get a complete overview of which messages were sent and in what order.
If I want system-wide logging I get my services to publish messages to a mq topic, that way I can subscribe to all live service topics at any time with a single application and get a complete overview of which messages were sent and in what order.
Debugging is much more useful in single threaded scenarios because you save the time of writing all the logging messages. Unfortunately it is hopeless in a multi threaded scenario and you have to put in all the debug messages but that at least delays it until it is really needed.
@Demis
Yes we did this.
One day we also noticed that the mq failed.
But I agree that it is one of the most simple and robust way to notify processes.
Comment preview