Ayende @ Rahien

filter by tags archive

architecture (612) rss
bugs (451) rss
challanges (123) rss
community (380) rss
databases (481) rss
design (895) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1450) rss
ravendb.net (534) rss
reviews (184) rss

2025
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Oct 30 2010

Queuing Systems

time to read 4 min | 756 words

Tweet Share Share 19 comments

Tags:

Messaging

It is not a surprise that I have a keen interest in queuing system and messaging. I have been pondering building another queuing system recently, and that led to some thinking about the nature of queuing systems.

In general, there are two major types of queuing systems. Remote Queuing Systems and Local Queuing Systems. While the way they operate is very similar, they are actually major differences in the way you would work with either system.

With a Local Queuing System, the queue is local to your machine, and you can make several assumptions about it:

The queue is always going to be available – in other words: queue.Send() cannot fail.
Only the current machine can pull data from the queue – in other words: you can use transactions to manage queue interactions.
Only one consumer can process a message.
There is a lot of state held locally.

Examples of Local Queuing Systems include:

MSMQ
Rhino Queues

A Remote Queuing System uses a queue that isn’t stored on the same machine. That means that:

Both send and receive may fail.
Multiple machines may work against the same queue.
You can’t rely on transactions to manage queue interactions.
Under some conditions, multiple consumers can process the same message.
Very little state on the client side.

An example of Remote Queuing System is a Amazon SQS.

Let us take an example of simple message processing with each system. Using local queues, we can:

using(var tx = new TransactionScope())
{
   var msg = queue.Receive();

   // process msg
  
   tx.Complete();
}

There are a lot actually going on here. The act of receiving a message in transaction means that no other consumer may receive it. If the transaction complete, the message will be removed from the queue. If the transaction rollbacks, the message will become eligible for consumers once again.

The problem is that this pattern of behavior doesn’t work when using remote queues. DTC are a really good way to kill both scalability and performance when talking to remote systems. Instead, Remote Queuing System apply the concept of a timeout.

var msg = queue.Receive( ackTimeout: TimeSpan.FromMniutes(1) );

// process msg

queues.Ack(msg);

When the message is pulled from the queue, we specify the time that we promise to process the message by. The server is going to set aside this message for that duration, so no other consumer can receive it. If the ack for successfully processing the message arrives in the specified timeout, the message is deletes and everything just works. If the timeout expires, however, the message is now available for other consumers to process. The implication is that if for some reason processing a message exceed the specified timeout, it may be processed by several consumers. In fact, most Remote Queuing Systems implement a poison message handling so if X number of time consumers did not ack a message in the given time frame, the message is marked as poison and moved aside.

It is important to understand the differences between those two systems, because they impact the system design for systems using it. Rhino Service Bus, MassTransit and NServiceBus, for example, all assume that the queuing system that you use is a local one.

A good use case for a remote queuing system is when your clients are very simple (usually web clients) or you want to avoid deploying a queuing system.

Oct 18 2009

What am I missing? MSMQ Perf Issue

time to read 4 min | 723 words

Tweet Share Share 37 comments

Tags:

I am getting some strange perf numbers from MSMQ, and I can’t quite figure out what is going on here.

The scenario is simple, I have a process reading from queue 1 and writing to queue 2. But performance isn’t anywhere near where I think it should be.

In my test scenario, I have queue 1 filled with 10,000 messages, each about 1.5 Kb in size. My test code does a no op move between the queues. Both queues are transactional.

Here is the code:

private static void CopyData()
{
    var q1 = new MessageQueue(@".\private$\test_queue1");
    var q2 = new MessageQueue(@".\private$\test_queue2");
    q2.Purge();
    var sp = Stopwatch.StartNew();
    while (true)
    {
        using (var msmqTx = new MessageQueueTransaction())
        {
            msmqTx.Begin();

            Message message;
            try
            {
                message = q1.Receive(TimeSpan.FromMilliseconds(0), msmqTx);
            }
            catch (MessageQueueException e)
            {
                Console.WriteLine(e);
                break;
            }

            q2.Send(message, msmqTx);

            msmqTx.Commit();
        }
    }
    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);
}

Using this code, it takes 236.8 seconds to move 10,000 messages. If I use System.Transactions, instead of MSMQ’s internal transactions, I get comparable speeds.

Just to give you an idea, this is about 40 messages a second, this number is ridiculously low.

Changing the code so each operation is a separate transaction, like this:

private static void CopyData()
{
    var q1 = new MessageQueue(@".\private$\test_queue1");
    var q2 = new MessageQueue(@".\private$\test_queue2");
    q2.Purge();
    var sp = Stopwatch.StartNew();
    while (true)
    {

        Message message;
        try
        {
            message = q1.Receive(TimeSpan.FromMilliseconds(0), MessageQueueTransactionType.Single);
        }
        catch (MessageQueueException e)
        {
            Console.WriteLine(e);
            break;
        }

        q2.Send(message, MessageQueueTransactionType.Single);
    }
    Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);
}

Means that it takes 16.3 seconds, or about 600 messages per second, which is far closer to what I would expect.

This is on a quad core machine 8 GB RAM (4 GB free), so I don’t think that it is the machine that is causing the problem. I can see similar results on other machines as well.

Am I missing something? Is there something in my code that is wrong?

Apr 06 2009

Rhino Queues feature: detecting message failures

time to read 92 min | 18223 words

Tweet Share Share 4 comments

Tags:

Messaging

One of the more annoying problems with async messaging is that you have no way of knowing if the destination that you are sending to is up or not. Well, that is also one of the major advantages, of course. It gets annoying, however, when you need to be able to respond to node failure in a more proactive manner. There are ways around that, usually with message replies, heartbeats or timeouts, but they tend to be complex, and they also tend to be visible for the actual application.

I just added a very small feature for Rhino Queues that would let you know whenever we have failed to send a message to its destination. Note that failing to send the message to the destination doesn't really mean much, Rhino Queues will retry sending the message in increasing intervals for about 3 days. In fact, here is the retry schedule for a message before it is considered dead:

Retry	Delay	Time
0	0:00:00	4/6/2009 00:00:00
1	0:00:01	4/6/2009 00:00:01
2	0:00:04	4/6/2009 00:00:05
3	0:00:09	4/6/2009 00:00:14
4	0:00:16	4/6/2009 00:00:30
5	0:00:25	4/6/2009 00:00:55
6	0:00:36	4/6/2009 00:01:31
7	0:00:49	4/6/2009 00:02:20
8	0:01:04	4/6/2009 00:03:24
9	0:01:21	4/6/2009 00:04:45
10	0:01:40	4/6/2009 00:06:25
11	0:02:01	4/6/2009 00:08:26
12	0:02:24	4/6/2009 00:10:50
13	0:02:49	4/6/2009 00:13:39
14	0:03:16	4/6/2009 00:16:55
15	0:03:45	4/6/2009 00:20:40
16	0:04:16	4/6/2009 00:24:56
17	0:04:49	4/6/2009 00:29:45
18	0:05:24	4/6/2009 00:35:09
19	0:06:01	4/6/2009 00:41:10
20	0:06:40	4/6/2009 00:47:50
21	0:07:21	4/6/2009 00:55:11
22	0:08:04	4/6/2009 01:03:15
23	0:08:49	4/6/2009 01:12:04
24	0:09:36	4/6/2009 01:21:40
25	0:10:25	4/6/2009 01:32:05
26	0:11:16	4/6/2009 01:43:21
27	0:12:09	4/6/2009 01:55:30
28	0:13:04	4/6/2009 02:08:34
29	0:14:01	4/6/2009 02:22:35
30	0:15:00	4/6/2009 02:37:35
31	0:16:01	4/6/2009 02:53:36
32	0:17:04	4/6/2009 03:10:40
33	0:18:09	4/6/2009 03:28:49
34	0:19:16	4/6/2009 03:48:05
35	0:20:25	4/6/2009 04:08:30
36	0:21:36	4/6/2009 04:30:06
37	0:22:49	4/6/2009 04:52:55
38	0:24:04	4/6/2009 05:16:59
39	0:25:21	4/6/2009 05:42:20
40	0:26:40	4/6/2009 06:09:00
41	0:28:01	4/6/2009 06:37:01
42	0:29:24	4/6/2009 07:06:25
43	0:30:49	4/6/2009 07:37:14
44	0:32:16	4/6/2009 08:09:30
45	0:33:45	4/6/2009 08:43:15
46	0:35:16	4/6/2009 09:18:31
47	0:36:49	4/6/2009 09:55:20
48	0:38:24	4/6/2009 10:33:44
49	0:40:01	4/6/2009 11:13:45
50	0:41:40	4/6/2009 11:55:25
51	0:43:21	4/6/2009 12:38:46
52	0:45:04	4/6/2009 13:23:50
53	0:46:49	4/6/2009 14:10:39
54	0:48:36	4/6/2009 14:59:15
55	0:50:25	4/6/2009 15:49:40
56	0:52:16	4/6/2009 16:41:56
57	0:54:09	4/6/2009 17:36:05
58	0:56:04	4/6/2009 18:32:09
59	0:58:01	4/6/2009 19:30:10
60	1:00:00	4/6/2009 20:30:10
61	1:02:01	4/6/2009 21:32:11
62	1:04:04	4/6/2009 22:36:15
63	1:06:09	4/6/2009 23:42:24
64	1:08:16	4/7/2009 00:50:40
65	1:10:25	4/7/2009 02:01:05
66	1:12:36	4/7/2009 03:13:41
67	1:14:49	4/7/2009 04:28:30
68	1:17:04	4/7/2009 05:45:34
69	1:19:21	4/7/2009 07:04:55
70	1:21:40	4/7/2009 08:26:35
71	1:24:01	4/7/2009 09:50:36
72	1:26:24	4/7/2009 11:17:00
73	1:28:49	4/7/2009 12:45:49
74	1:31:16	4/7/2009 14:17:05
75	1:33:45	4/7/2009 15:50:50
76	1:36:16	4/7/2009 17:27:06
77	1:38:49	4/7/2009 19:05:55
78	1:41:24	4/7/2009 20:47:19
79	1:44:01	4/7/2009 22:31:20
80	1:46:40	4/8/2009 00:18:00
81	1:49:21	4/8/2009 02:07:21
82	1:52:04	4/8/2009 03:59:25
83	1:54:49	4/8/2009 05:54:14
84	1:57:36	4/8/2009 07:51:50
85	2:00:25	4/8/2009 09:52:15
86	2:03:16	4/8/2009 11:55:31
87	2:06:09	4/8/2009 14:01:40
88	2:09:04	4/8/2009 16:10:44
89	2:12:01	4/8/2009 18:22:45
90	2:15:00	4/8/2009 20:37:45
91	2:18:01	4/8/2009 22:55:46
92	2:21:04	4/9/2009 01:16:50
93	2:24:09	4/9/2009 03:40:59
94	2:27:16	4/9/2009 06:08:15
95	2:30:25	4/9/2009 08:38:40
96	2:33:36	4/9/2009 11:12:16
97	2:36:49	4/9/2009 13:49:05
98	2:40:04	4/9/2009 16:29:09
99	2:43:21	4/9/2009 19:12:30
100	2:46:40	4/9/2009 21:59:10

This new feature doesn't impact this, it will simply tell you whenever a send failure has occurred. This lets you build more sophisticated error handling strategies around that. You will probably want to wait for several consecutive failures of the same endpoint before deciding to do something about it, of course, but the capability is there.

Apr 03 2009

Rhino Queues, Take 6

time to read 3 min | 521 words

Tweet Share Share 28 comments

Tags:

As I stated before, I started writing a queuing system a few days ago, loosely based on my previous efforts in this area, but aimed to create a production ready queuing infrastructure that I can use in my own applications. The decision to build this was not made lightly, but I wanted a queuing system that met my needs, and was flexible enough to extend at needs.

The design goals stated for Rhino Queues were:

XCopy deployable
Zero configuration
Durable
Supports System.Transactions
Works well with Load Balancing hardware
Supports sub queues
Support arbitrarily large messages

It is now publically available, and I think it deserve some discussion.

Here is a very simple example of using it:

var queueManager = new QueueManager(new IPEndPoint(IPAddress.Loopback, 2200), "queues.esent");
queueManager.CreateQueues("Web");
var queue = queueManager.GetQueue("Web");

while(ContinueProcessing)
{
    using(var tx = new TransactionScope())
    {
        var msg = queue.Receive();
        Console.WriteLine("Message from {0}:", msg.Headers["source"]);
        Console.WriteLine(Encoding.Unicode.GetString(msg.Data));
        tx.Complete();
    }
}

This example is merely to show the API.

The actual software is pretty interesting. Communication between different queues are done using TCP, with a protocol that works well with load balancing hardware, making load balancing queued application as easy as balancing any HTTP based app.

You cannot see it in the API example, but the system is supporting System.Transactions fully, so sending a message is delayed until a transaction is committed, and receipt of a message would be rolled back on transaction rollback. We even support recovery after a hard crash, by plugging into the recovery mechanism that MSDTC offers us.

All messages are durable, so a system reboot will not remove them. While the default mode for Rhino Queues is an embedded component in your application (XCopy deployable), we still support the ability to deliver a message to a server that is down, by implementing a fairly flexible message retry mechanism.

Because Rhino Service Bus makes such a heavy use of this, we are also support sub queues, and we have no hard limit on the size of messages that we can deliver.

I delayed announcing this until I could finish integrating this completely into Rhino Service Bus, but this is now done, and should just work. Rhino Service Bus will still supports MSMQ, but I think that a lot of the work that we are going to do now on Rhino Service Bus would be with the new queuing infrastructure.

Apr 01 2009

Introducing Rhino Queues

time to read 1 min | 197 words

Tweet Share Share 11 comments

Tags:

Messaging

After running into a spate of problems with MSMQ, and following my previous decision, I decided that I might as well bite the bullet and do what I usually do, write my own. I have looked into other queuing systems, and all of them has something in them that precluded me from using them. And not, that something wasn’t “I didn’t write them”, whatever some people may think.

I have the advantage of writing Rhino Queues 5 times already, but this time I am setting up with a set of new goals. I intend to unveil my creation soon, but for now, I thought it would be a good idea to talk a bit about the features that I intend to build.

Rhino Queues is a queuing system that is:

XCopy deployable
Zero configuration
Durable
Supports System.Transactions
Works well with Load Balancing hardware
Supports sub queues
Support arbitrarily large messages

I’ll talk about this in more details when I can actually show working code (which would be in a bit)…

Feb 22 2009

Solving problems with messaging: Creating a new user

time to read 5 min | 876 words

Tweet Share Share 20 comments

Tags:

Messaging

Another example, which is naturally asynchronous, is the way most web sites create a new user. This is done in a two stage process. First, the user fills in their details and initial validation is done on the user information (most often, that we have no duplicate user names).

Then, we send an email to the user and ask them to validate their email address. It is only when they validate their account that we will create a real user and let them login into the system. If a certain period of time elapsed (24 hours to a few days, usually), we need to delete any action that we perform and make that user name available again.

When we want to solve the problem with messaging, we run into an interesting problem. The process of creating the user is a multi message process, in which we have to maintain the current state. Not only that, but we also need to deal with the timing issues build into this problem.

It gets a bit more interesting when you consider the cohesiveness of the problem. Let us consider a typical implementation.

First, we have the issue of the CreateUser page:

Then we have the process of actually validating the user:

And, lest us forget, we have a scheduled job to remove expired user account reservations:

We have the logic for this single feature scattered across three different places, which execute in different times, and likely reside in different projects.

Not only that, but if we want to make the experience pleasant for the user, we have a lot to deal with. Sending an email is slow. You don’t want to put this as a part of synchronous process, if only because of the latency it will add to showing a response to the user. It is also an unreliable process. And we haven’t even started to discuss error handling yet.

For that matter, sending an email is not something that you should just new an SmtpClient for. You have to make sure that someone doesn’t use your CreateUser page to bomb someone else’s mailbox, you need to keep track of emails for regulatory reasons, you need to detect invalid emails (from SMTP response), etc.

Let us see how we can do this with async messaging, first we will tackle the register user and send an email to validate their email:

When the user click on the link in their email, we have the following set of interactions:

And, of course, we need to expire the reserved username:

In the diagrams, everything that happens in the App Server is happening inside the context of a single saga. This is a single class that manages all the logic relating to the creation of a new user. That is good in itself, but I gain a few other things from this approach.

Robustness from errors and fast response times are one thing, of course, but there are other things that I am not touching here. In the previous example, I have shown a very simplistic approach to handling the behavior, where everything is happening inline. This is done because, frankly, I didn’t have the time to sit and draw all the pretty boxes for the sync example.

In the real world, we would want to have pretty much the same separation in the sync example as well. And now we are running into even more “interesting” problems. Especially if we started out with everything running locally. The sync model make it really hard to ignore the fallacies of distributed computing. The async model put them in your face, and make you deal with them explicitly.

The level of complexity that you have to deal with with async messaging remains more or less constant when you try to bring the issues of scalability, fault tolerance and distribution. They most certainly do not stay the same when you have sync model.

Another advantage of this approach is that we are using the actor model, which make it very explicit who is doing what and why, and allow us to work on all of those in an independent fashion.

The end result is a system compromised of easily disassembled parts. It is easy to understand what is going on because the interactions between the parts of the system are explicit, understood and not easily bypassed.

Feb 09 2009

The cost of messaging

time to read 6 min | 1007 words

Tweet Share Share 38 comments

Tags:

Messaging

Greg Young has a post about the cost of messaging. I fully agree that the cost isn't going to be in the time that you are going to spend actually writing the message body. You are going to have a lot of those, and if you take more than a minute or two to write one, I am offering overpriced speed your typing courses.

The cost of messaging, and a very real one, comes when you need to understand the system. In a system where message exchange is the form of communication, it can be significantly harder to understand what is going on. For tightly coupled system, you can generally just follow the path of the code. But for messages?

When I publish a message, that is all I care about in the view of the current component. But in the view of the system? I sure as hell care about who is consuming it and what it is doing with it.

Usually, the very first feature in a system that I write is login a user. That is a good proof that all the systems are working.

We will ignore the UI and the actual backend for user storage for a second, let us thing about how we would deal with this issue if we had messaging in place? We have the following guidance from Udi about this exact topic. I am going to try to break it down even further.

We have the following components in the process. The user, the browser (distinct from the user), the web server and the authentication service.

We will start looking at how this approach works by seeing how system startup works.

The web server asks the authentication service for the users. The authentication service send the web server all the users he is aware off. The web server then cache them internally. When a user try to login, we can now satisfy that request directly from our cache, without having to talk the the authentication service. This means that we have a fully local authentication story, which would be blazingly fast.

But what happens if we get a user that we don't have in the cache? (Maybe the user just registered and we weren't notified about it yet?).

We ask the authentication service whatever or not this is a valid user. But we don't wait for a reply. Instead, we send the browser the instruction to call us after a brief wait. The browser set this up using JavaScript. During that time, the authentication service respond, telling us that this is a valid user. We simply put this into the cache, the way we would handle all users updates.

Then the browser call us again (note that this is transparent to the user), and we have the information that we need, so we can successfully log them in:

There is another scenario here, what happens if the user is not valid. The first part of the scenario is identical, we ask the authentication service to tell us if this is valid or not. When the service reply that this is not a valid user, we cache that. When the browser call back to us, we can tell it that this is not a valid user.

(Just to make things interesting, we also have to ensure that the invalid users cache will expire or has a limited size, because otherwise this is an invitation for DOS attack.)

Finally, we have the process of creating a new user in the application, which work in the following fashion:

Now, I just took three pages to explain something that can be explained very easily using:

usp_CreateUser
usp_ValidateLogin

Backed by the ACID guarantees of the database, those two stored procedures are much simpler to reason about, explain and in general work with.

We have way more complexity to work with. And this complexity spans all layers, from the back end to the UI! My UI guys needs to know about async messaging!

Isn't this a bit extreme? Isn't this heavy weight? Isn't this utterly ridiculous?

Yes, it is, absolutely. The problem with the two SPs solution is that it would work beautifully for a simple scenario, but it creaks when start talking about the more complex ones.

Authentication is usually a heavy operation. ValidateLogin is not just doing a query. It is also recording stats, updating last login date, etc. It is also something that users will do frequently. It make sense to try to optimize that.

Once we leave the trivial solution area, we are face with a myriad of problems that the messaging solution solve. There is no chance of farm wide locks in the messaging solution, because there is never a lock taking place. There are no waiting threads in the messaging solution, because we never query anything but our own local state.

We can take the authentication service down for maintenance and the only thing that will be affected is new user registration. The entire system is more robust.

Those are the tradeoffs that we have to deal with when we get to high complexity features. It make sense to start crafting them, instead of just assembling a solution.

Just stop and think about what it would require of you to understand how logins work in the messaging system, vs. the two SP system. I don't think that anyone can argue that the messaging system is simpler to understand, and that is where the real cost is.

However, I think that you'll find that after getting used to the new approach, you'll find that it start making sense. Not only that, but it is fairly easy to see how to approach problems once you have started to get a feel for messaging.

Jan 03 2009

Msmq and troublesome API experience

time to read 1 min | 88 words

Tweet Share Share 5 comments

Tags:

There is just one thing in the Msmq API that I hate.

If you try to send to a queue using the wrong transaction, it will silently not send you message, but give absolutely no error.

This is incredibly error prone, and has caused me quite a number of bugs.

If at all possible, API should never silently fail. In this case, the API should throw an explicit argument exception, saying that this transaction is not valid for this queue.

That would make things much simpler all around.

Dec 17 2008

Rhino Service Bus

time to read 15 min | 2848 words

Tweet Share Share 34 comments

Tags:

First, I need to justify why I am doing this. In the past, I have evaluated both NServiceBus and Mass Transit, and I have created applications to try both of them up. Both code bases have enlightened me about the notions of messaging and how to make use of them. That said, they are both of much wider scope than I need right now, and that is hurting me. What do I mean by that?

Right now I am introducing a whole lot of concepts to a team, and I want to make sure that we have as few moving parts as possible. If you go through the code, you'll see that I liberally stole from both ideas and code from Mass Transit and NServiceBus. My main goal was to get to a level with no configuration, no complexity, high degree of flexibility from design stand point, but a very rigid structure for the users. Along the way I also took care to try to build into this support for some concepts that I think are interesting (message interception, taking advantage of features in MSMQ 4.0, message modules, etc).

The current code base is tiny, ~2,550 lines of code, cohesive, and well designed (I believe). It doesn't do everything that NServiceBus or Mass Transit , but I think that it is an interesting project, and I intend to take it to production.

Some of the highlights:

C# 3.0 & MSMQ 4.0 - taking advantage of features such as sub queues, async queue receiving.
Pluggable internal architecture.
Minimal moving parts, strongly opinionated.
Hard focus on developer productivity and ease of use.
Conventions, assertions and convictions.

Let me try to go over each item and explain what I mean in it.

I have the freedom to chose my platform for this project, and I decided to make as much use of the platform features as I can, to make my life easier. C# 3.0 makes a lot of tasks more pleasant, if only because of extension methods and the ability to filter using Where(), and I am quite addicted to var.

But probably more interesting is the use of MSMQ 4.0 sub queues, I am using them here to make sure that an endpoint is a consolidated unit. Take a look:

Using this approach, subscriptions are stored in a sub queue of the current queue, and I can reroute errors to the error sub queue. This keeps everything in a single place, and simplify understanding what is actually going on in the application, not to mention that it keeps everything together (less queues, less places to check, less things can go wrong). Aysnc queue receiving is another different thing that I am doing. This allow me to avoid consuming application level threads when they are not needed, and defer this to the level of the OS.

The next two points are sort of contrary to one another. On the one hand, I have a very strong bias toward pluggable systems, but at the same time, I wanted to make something that requires very little from the user. That is, it should require very little configuration and a real effort to break. I solved that by creating they type of architecture that I generally do, but by making the public API as simple as possible:

Startable service bus does just that, it allow you to start & dispose of the bus. The real interesting tidbits are in IServiceBus.

Subscribe and Unsubscribe should be pretty much self explanatory, I think. They let other endpoints know that you are are interested / not interested in a particular message type.

Send will send (bet you wouldn't figure that one out if I didn't told you!) a message to a particular end point, although you can (and probably will) just send a message, and the bus will figure out who the message owner on its own (one of the few pieces of configuration that I did put in).

Reply will send a message back to the originator of the message we are currently handling. If you aren't handling a message, it will throw.

Notify & Publish are very similar, both of them will publish the message to all interested subscribers, but they have one very different semantic. If you publish a message and it has no subscribers, it is considered an error, and it will throw. If you you use notify, it is expected that you might not have any subscribers. I find this distinction important enough that I wanted to capture this in the API.

AddInstanceSubscription is something that I am not sure that I like yet, although I can see why I would need it if I didn't have it. It allow a live instance to request to subscribe to messages from the bus.

One thing that you will notice is that there is absolutely no facility for request / reply here. At least, not to the same instance of the object. Actually, that is not accurate, you can handle this scenario, but you would have to handle your own locking, I am not going to provide anything there for you. This is a scenario that I want to discourage.

Let us move from the bus to the actual message handlers. In Rhino ServiceBus, they are specified as:

A standard consumer for messages need to implement ConsumerOf<TMsg>, and that about it. The bus will automatically check the container for all types that match that, subscribe them to their destinations, and get ready to start processing messages. OccasionalConsumerOf<TMsg> is a way to tell the bus that we aren't really interested in automatically subscribing, Maybe we intend to use this using AddInstanceSubscription or maybe we want to programmatically control subscriptions.

InitiatedBy<TMsg> and Orchestrate<TMsg> are messages that drives sagas. I don't have much to say about them except that they work much the same way you would use them in NServiceBus or Mass Transit.

If you look at the API, you probably notice something interesting. A lot of the concepts are very similar to the way NServiceBus handle them, but the some of the API (especially for message handlers) is much closer to Mass Transit. As I mentioned, I dug deep into both of them before deciding to build my own. Much of the behavior is also modeled around the way NServiceBus works (transactional message interactions, auto roll backs on error, message handlers are not expected to deal with errors, auto retries, moving to error sub queue for administrator attention, etc).

Here is the configuration for an endpoint in the service bus:

<castle>
  <facilities>
    <facility id="rhino.esb" >
      <bus threadCount="1"
           numberOfRetries="5"
           endpoint="msmq://localhost/test_queue2"
             />
      <messages>
        <add assembly="Rhino.ServiceBus.Tests"
             endpoint="msmq://localhost/test_queue"/>
      </messages>
    </facility>
  </facilities>
</castle>

And in order to start the bus:

var container = new WindsorContainer(new XmlInterpreter());// read config from app.config
container.Kernel.AddFacility("rhino.esb", new RhinoServiceBusFacility());// wire up facility for rhino service bus
container.Register(
    AllTypes.FromAssembly(Assembly)
        .BasedOn(typeof(IMessageConsumer))
    );

var bus = container.Resolve<IStartableServiceBus>();
bus.Start();

And this is it, messages arrives at the test_queue2 endpoint will now be served.

There isn't any additional configuration, and every line of configuration here is something that an administrator is likely to want to change. Note that we pick up consumers automatically from the container, and we register them automatically. This is another step that I am doing in order to reduce the amount of things that you should worry about.

I mentioned that I somehow had to resolve my own drive toward pluggable architecture and operational flexibility with the need to create a simple, standard way of doing things. I solved the problem making the code pluggable, but the RhinoServiceBusFacility has very strong opinions about how things should go, and it is responsible of setting the bus' policies.

Another design goal was that this system should be extremely developer friendly. This means that I took the time to investigate and warn about as many failure paths as possible. For example, publishing without subscribers, which is something that happened to me in the past, and took me some time to figure out. I consider errors to be part of the user interface of a framework. Bad errors are a big problem, and can cause tremendous amount of time wastage along the way, just tracking down issues.

No configuration is another, but there are a few others. For example, Rhino ServiceBus contains a logging mode in which all messaging activities can be sent to a separate queue, to allow us a time delayed view of what is actually going on. The serialization format on the wire is also human readable (xml, and heavily influenced by how NServiceBus's serialization format is built).

Here is an example of a test message:

<?xml version="1.0" encoding="utf-8"?>
<esb:messages xmlns:esb="http://servicebus.hibernatingrhinos.com/2008/12/20/esb"
              xmlns:tests.order="Rhino.ServiceBus.Tests.XmlSerializerTest+Order, Rhino.ServiceBus.Tests"
              xmlns:uri="uri"
              xmlns:int="int"
              xmlns:guid="guid"
              xmlns:datetime="datetime"
              xmlns:timespan="timespan"
              xmlns:array_of_tests.orderline="Rhino.ServiceBus.Tests.XmlSerializerTest+OrderLine[], Rhino.ServiceBus.Tests"
              xmlns:tests.orderline="Rhino.ServiceBus.Tests.XmlSerializerTest+OrderLine, Rhino.ServiceBus.Tests"
              xmlns:string="string"
              xmlns:generic.list_of_int="System.Collections.Generic.List`1[[System.Int32, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">
  <tests.order:Order>
    <uri:Url>msmq://www.ayende.com/</uri:Url>
    <int:Count>5</int:Count>
    <guid:OrderId>1909994f-8173-452c-a651-14725bb09cb6</guid:OrderId>
    <datetime:At>2008-12-17T00:00:00.0000000</datetime:At>
    <timespan:TimeToDelivery>P0Y0M1DT0H0M0S</timespan:TimeToDelivery>
    <array_of_tests.orderline:OrderLines>
      <tests.orderline:value>
        <string:Product>milk</string:Product>
        <generic.list_of_int:Fubar>
          <int:value>1</int:value>
          <int:value>2</int:value>
          <int:value>3</int:value>
        </generic.list_of_int:Fubar>
      </tests.orderline:value>
      <tests.orderline:value>
        <string:Product>butter</string:Product>
        <generic.list_of_int:Fubar>
          <int:value>4</int:value>
          <int:value>5</int:value>
          <int:value>6</int:value>
        </generic.list_of_int:Fubar>
      </tests.orderline:value>
    </array_of_tests.orderline:OrderLines>
  </tests.order:Order>
</esb:messages>

That is pretty readable, I think. And yes, I am aware of the other million of XML serialization formats that are out there. I wanted something readable, remember?

Another common pain in the... knee with messaging systems is debugging. Usually you have a sender and a receiver, which communicate with one another via messaging, and trying to track down what is going on between them can be a pretty nasty issue. Frankly, I would much rather wish that I didn't have to debug this, but that is not going to happen any time soon, so I wanted to create a good debugging experience. Rhino ServiceBus is hostable in any type process, and for development, I set it up so my web front end also host the back end, this make debugging through complex message flows pretty much seamless.

Last, conventions and assertions. I have a very clear idea about what I want to do with this library, and the code reflects that. Things like making a distinction between publish and notify, or making it hard to do request reply are just part of that. Another is the notion of an endpoint as a fully independent unit, as you can see in the notion of subscriptions being part of a sub queue on the endpoint, rather than being backed by either a service (Mass Transit) or a separate queue or database based subscription storage (NServiceBus).

And now we get to talk about convictions. Rhino ServiceBus will actively fight against you if you try to do things that you shouldn't. For example, if you try to send a batch of messages containing over 256 messages, it will fail. But more than that, if you try to send a message containing a collection that has more than 256 items, it will also fail. In both cases, it will alert you to a problem with unbounded result set. The proper way to handle this scenario is to send several batches of messages. Then it is the job of the transport to ensure that they get to the destination in the appropriate fashion, and that is free to join them together if it really wants to.

One final notion relates to the idea of service isolation. Right now, Rhino ServiceBus doesn't have an explicit notion about that, but it does let you run several services as part of the same process. I intend to take it a bit further once I finish polishing up the host, to the point where you can say that a service is an assembly, and you can host several of them (isolated to their own AppDomains) in the same process.

But that is enough, I think. You can get the source in the usual place:

http://github.com/ayende/rhino-esb

Some stats:

I started the project at about noon, Sunday.
It has 36 commits so far, and I am already using it for a real project.
2,550 lines of code.
1,761 lines of tests
86% test coverage in 81 tests
C# 3.0
MSMQ 4.0 - Vista or Windows Server 2008

Oren Eini

Oren Eini

CEO of RavenDB

Queuing Systems

What am I missing? MSMQ Perf Issue

Rhino Queues feature: detecting message failures

Rhino Queues, Take 6

Introducing Rhino Queues

Solving problems with messaging: Creating a new user

The cost of messaging

Msmq and troublesome API experience

Rhino Service Bus

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed