How to lead a convoy to safety

time to read 2 min | 310 words

I recently run into a convoy situation in NH Prof. Under sustained heavy load (not a realistic scenario for NH Prof), something very annoying would happen.

Messages would stream in from the profiled application faster than NH Prof could process them.

The term that I use for this is Convoy. It is generally bad news. With NH Prof specifically, it meant that it would consume larger and larger amounts of memory, as messages waiting to be processed queued up faster than NH Prof could handle them.

NH Prof uses the following abstraction to handle queuing:

public interface IQueue<T>
{
    void Enqueue(T o);
    T Dequeue();
    bool IsEmpty { get; }
}

Now, there are a few things that we can do to avoid having a convoy. The simplest solution is to put some threshold on the queue and just start dropping messages if we reached it. NH Prof is actually designed to handle such things as interrupted message stream, but i don’t think that this would would be nice thing to do.

Another alternative would be write everything to disk, so we don’t have memory pressure and can handle much larger queue sizes. The problem is, of course, that this requires something very subtle. T now must be serializable, and not just T, but everything that T references.

Oh, Joy!

This is one of the cases where just providing the abstraction is not going to be enough, providing an alternative implementation means having to touch a lot of other code as well.

Tweet Share Share 14 comments

Tags:

Comments

20 Sep 2009
05:33 AM

Tuna Toksoz

Use an object database :)

20 Sep 2009
08:42 AM

Richard Dingwall

"not a realistic scenario for NH Prof" <-- I think you overestimate your customers.

I can think of at least half a dozen pages in one web application I work on that take anything from 70-700 SQL/cache requests per hit (30-40 mapped classes, 500 tables, 30GB database). During this time NH Prof frequently becomes unresponsive, and often remains busy for a few secs after the session ended.

We know our code is not the best -- using domain models for building a report, automapper resolvers getting more details per item, recursive trees, leaning far too much on the cache etc. Even after lots of fetching/joins/caching tuning there is still lots of SELECT N+1.

So unfortunately overloading NH Prof is a very realistic scenario for us.

20 Sep 2009
09:40 AM

Rafal

Maybe you should add an option of offline profiling - some small component would write all the trace information to a log and NH Prof would then be used to analyze that log? Live profiling is a problem in production environment - if you have memory/performance problems and want to analyze that with a profiler, the profiler will add more load to the system and seriously worsen the situation.

20 Sep 2009
09:46 AM

Frank Quednau

My question would be...what questions regarding NH usage can NH Prof answer in a heavy load scenario that couldn't be answered when running the app under less heavy load?

In such a case it might be OK to have NH Prof "degrade" to processing only messages of severe importance until it catches up again...

Of course this falls down again if the application is so bitchy that all messages are severe...

20 Sep 2009
09:47 AM

Ayende Rahien

Richard,

I am sorry, but we have different definitions for what sustained heavy load _means_. When I am talking about this I am talking about doing this for 30 minutes or so of non stop activity. That is rarely the case.

Anyway, I already have a branch where I am taking care of this, and I'll publish it sometimes this week.

20 Sep 2009
09:49 AM

Ayende Rahien

Rafal,

InitializeOfflineProfiling() - it is there. :-)

20 Sep 2009
09:50 AM

Ayende Rahien

Frank,

The problem isn't with showing the information, the problem is in processing it fast enough

20 Sep 2009
10:28 AM

Frank Quednau

I didn't think UI was the problem...so I gather that the queuing of messages is absolutely "dumb" in that all possible messages are gathered, while I thought that there might be some form of "pre-processing". I suppose that isn't really possible, though, since defining whether a message is "severe" or not probably involves quite a bit of knowledge (= processor time).

Otoh, how expensive is RAM these days? If you're profiling an app with such throughput I'd hope that people could spare a few dollars on a couple of GBs.

20 Sep 2009
10:43 AM

Ayende Rahien

Frank,

It is possible that this would lead to an Out Of Memory Exception

And in general it is better not to try walking that line

20 Sep 2009
16:11 PM

Kyle Szklenski

Hm, I wonder if you could do a meta-analysis over a given number of messages knowing that some messages have been dropped. For example, if your profiler could run, say, 10 times on the same system with approximately the same load, you could average together the results, in a sense, to guarantee a stable conclusion. This would probably require some kind of ability to drop pseudo-random messages though, as you wouldn't be able to rely on just dropping when it starts to get overloaded - if you tried that, then you could very well be missing the exact thing which is causing the overload.

Differently, you could define certain messages (and that which they are dependent on) to be knowingly serializable, then only serialize those with a marker saying where they show up in the queue. This would probably end up creating a scheduling problem over the queue, though, so it's most likely not worth it.

21 Sep 2009
14:07 PM

Thomas Krause

Instead of dropping messages when you reach a threshold... why not simply block the host application, so it has to wait until it can write the next message to the queue?

granted, this would reduce the performance of the host application, but if i want to debug/trace my application i usually would want to get all messages, even if it means that my application may run a bit slower while being traced...

21 Sep 2009
15:07 PM

Mike Rettig

Can you gain efficiency through batching? For instance, are you updating the screen on every update? With a slow resource such as a UI, file, or socket, batching can give you better throughput by merging updates and limiting the number of slow calls required.

For Example:

public void OnBatch(List <updates updates){

ApplyAll(updates);

UpdateScreen();

}

This way updates are efficiently throttled and the Queue doesn't fall far behind.

Of course, this is something that Retlang does for you.

http://code.google.com/p/retlang/>

Mike

21 Sep 2009
17:42 PM

Ayende Rahien

Thomas,

One of the design goals is to have as little impact as possible on the profiled application.

Stopping the profiled application is not an option.

21 Sep 2009
17:43 PM

Ayende Rahien

Mike,

You seem to be missing the point. It isn't the time to update the screen that is meaningful. It is the time to process the messages.

I'll have a separate post about it, but let us just say that the same problem exists with no UI as well

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB