Queuing Systems
It is not a surprise that I have a keen interest in queuing system and messaging. I have been pondering building another queuing system recently, and that led to some thinking about the nature of queuing systems.
In general, there are two major types of queuing systems. Remote Queuing Systems and Local Queuing Systems. While the way they operate is very similar, they are actually major differences in the way you would work with either system.
With a Local Queuing System, the queue is local to your machine, and you can make several assumptions about it:
- The queue is always going to be available – in other words: queue.Send() cannot fail.
- Only the current machine can pull data from the queue – in other words: you can use transactions to manage queue interactions.
- Only one consumer can process a message.
- There is a lot of state held locally.
Examples of Local Queuing Systems include:
- MSMQ
- Rhino Queues
A Remote Queuing System uses a queue that isn’t stored on the same machine. That means that:
- Both send and receive may fail.
- Multiple machines may work against the same queue.
- You can’t rely on transactions to manage queue interactions.
- Under some conditions, multiple consumers can process the same message.
- Very little state on the client side.
An example of Remote Queuing System is a Amazon SQS.
Let us take an example of simple message processing with each system. Using local queues, we can:
using(var tx = new TransactionScope()) { var msg = queue.Receive(); // process msg tx.Complete(); }
There are a lot actually going on here. The act of receiving a message in transaction means that no other consumer may receive it. If the transaction complete, the message will be removed from the queue. If the transaction rollbacks, the message will become eligible for consumers once again.
The problem is that this pattern of behavior doesn’t work when using remote queues. DTC are a really good way to kill both scalability and performance when talking to remote systems. Instead, Remote Queuing System apply the concept of a timeout.
var msg = queue.Receive( ackTimeout: TimeSpan.FromMniutes(1) );
// process msg
queues.Ack(msg);
When the message is pulled from the queue, we specify the time that we promise to process the message by. The server is going to set aside this message for that duration, so no other consumer can receive it. If the ack for successfully processing the message arrives in the specified timeout, the message is deletes and everything just works. If the timeout expires, however, the message is now available for other consumers to process. The implication is that if for some reason processing a message exceed the specified timeout, it may be processed by several consumers. In fact, most Remote Queuing Systems implement a poison message handling so if X number of time consumers did not ack a message in the given time frame, the message is marked as poison and moved aside.
It is important to understand the differences between those two systems, because they impact the system design for systems using it. Rhino Service Bus, MassTransit and NServiceBus, for example, all assume that the queuing system that you use is a local one.
A good use case for a remote queuing system is when your clients are very simple (usually web clients) or you want to avoid deploying a queuing system.
Comments
Interesting, I was just doing some work with MSDTC and MSMQ remote queues. With the poison message scenario you describe (as an alternative to MSDTC) how would rollback be handled in the case of a larger transaction? What if there was a db operation and a queue operation that needed to be transactional but the db operation failed on commit? It seems like the poison message method only works if you are only using a message queue and nothing else is taking part in the transaction?
Would be interested to know if those scenarios are handled as they are quite common in large scale non trivial systems which is where MSDTC is of use.
Bill,
Generally, remote queuing systems can't really take part of a real transaction.
Mostly because it costs too much.
What you can do is set the timeout for a very long time, and ack on the transaction commit.
yes, thats one way of doing it but as you mention is not a real transaction i.e not 100% reliable. Its an interesting problem since the tradeoffs are performance for transactional safety.
There are some java solutions which look good but I am not sure about their performance compared to msdtc. I have for a while thought about looking in to java queuing and transactions since the options on the Microsoft side are limited.
@Bill
You need to either make your messages indempotent or have some sort of tracking system (such as a database) to keep an eye out for duplicates. Any other approach I've tried has not worked out well.
What about scheduling system for csharp? Almost any major website can use that for doing some work at some specific time.
@Dennis - the Quartz.NET project ( http://quartznet.sourceforge.net/) is a scheduler for .Net, if that suits your needs.
Have you taken a look at Sql Service Broker? You get transactions (msg + database activity), and free remoting with Sql Express.
Ayende, you mentioned Amazon SQS. Can you deploy local queues in a cloud? Is it only a cost of deploying a queuing system or maybe mismatch in paradigms?
It's a bummer .NET doesn't have a decent native messaging solution like java does (JMS implementations abound). MSMQ is a sorrow excuse for an enterprise messaging solution IMO.
Apache.NMS ( http://activemq.apache.org/nms.html) provides a decent client for .NET interaction with ActiveMQ. I have been using it quite a bit lately and rocks. It supports all the major features of JMS including transactions, durable consumers, topics, queues, different ack modes and so on and so forth.
@Jeff
AMQP (specifically RabbitMQ) have great .NET support these days. ZeroMQ also looks like it has a lot of potential buti have yet to try it.
NServiceBus doesn't always assume the queue you have is a local one:
http://www.nservicebus.com/Distributor.aspx
Brian,
Yes, it does.
The distributer simply split the process into several steps, each of them assumes a local queue.
Ayende, u should really have a look into ZooKeeper( http://hadoop.apache.org/zookeeper/),
It is not just queue, but have more features that the other lacking.., and probably could give u some idea
@Ayende
"it is important to run the distributor on a cluster configuring its queues as clustered resources"
We're looking at the definition of a local queue differently. You listed 5 things in your article that distinguish a local queue from a remote queue - I thought that a queue configured as a clustered resource would meet those requirements.
@Ayende,
I am unclear on your consequences around local queue.
My understanding is that a Local Queue is Local to the machine and not the process. Hence, based on the client implementation of Local Queue and the underlying operating system, you could use tcp. If for example the host system runs out of file descriptors, the send() and receive for local can fail also.
You can have multiple processes connected to a local queue on the same machine. And if multiple processes are connected to same queue, it is also possible for multiple consumers to consume the same message under certain conditions.
State is held in the queuing system and not the processing system. Irrespective of remote or local, the state will be held. Also based on queue system implementation, you can design for failovers.
Maninder,
Even in the same process, things can fail. For example HD space might have run out.
But in general, when speaking about such things, it is common to think of a single machine as a single unit.
In the same sense that you are composed of a complex system of organs working together, but you consider yourself a single entity.
And I am not aware of a single queuing impl. that would let multiple consumers consume the SAME message. Note that there are things like topics, streams and exchanges, but that isn't what I am talking about here.
@Ayende,
I said "And if multiple processes are connected to same queue, it is also possible for multiple consumers to consume the same message under certain conditions."
The key is "under certain conditions".
So if there are multiple processes on the host machine, picking up messages from the local queue, then it is possible under the model of "Client Acknowledgement" as opposed to "Auto Acknowledgement" that the consuming process could not respond with ACK, within timeout period and hence, queuing system put the message back on the queue , which gets picked by the second consumer. Hence, multiple consumer , under certain conditions can process the same message even on a local queue scenario.
Maninder,
a) this is typical for remote queues.
b) this is a failure scenario, where the first consumer has failed
@Ayende,
"This is a failure scenario, where the first consumer has failed"
In this case, second consumer may actually fail.
Comment preview