Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,163
Privacy Policy · Terms
filter by tags archive
time to read 4 min | 631 words

In this post, I want to talk about libraries that want or need to not only support being run in multiple threads, but actually want to use multiple threads themselves. Remember, you are a library, not a framework. You are a guest in someone’s else home, and you shouldn’t litter.

The first thing to remember is error handling. That actually comes in two parts. First, unhandled exceptions from a thread will kill the application. There are very few things that people will find more annoying with your library than your errors killing their application. Second, and almost as important, you should have a way to report those errors.

Even more annoying than killing my application, failing to do something silently and in a way that is really hard to debug is going to cause major hair loss all around.

There are several scenarios that we need to consider:

  • Long running threads – I need to do something in a background thread that would usually live as long as the application itself.
  • Short term threads – I need to do something that requires a lot of threads, just for a short time.
  • Timeouts / delays / expirations – I need to do something every X amount of time.

In the first case, of long running threads, there isn’t much that can be done. You want to handle errors, obviously, and you want to make it crystal clear when you spin up your threads, and when / how you tear them down again. Another important aspect is that you should name your threads. This is important because it means that when debugging things, we can figure out what this or that thread is doing more easily.

The next approach is much more common, you just need some way to execute some code in parallel. The easiest thing to do is to go to new Thread(), ThreadPool.QueueUserWorkItem or Task.Factory.StartNew(). Such, this is easy to do, and it is also perfectly wrong.

Why is that, you say?

Quite simply, it ain’t your app. You don’t get to make such decisions for the application that is hosting your library. Maybe the app needs to conserve threads to serve requests? Maybe it is trying to utilize less threads to reduce CPU load and save power on a laptop running on batteries? Maybe they are trying to debug something and all those threads popping around is driving them crazy?

The polite thing to do when you recognize that you have a threading requirement in your application is to:

  • Give the user a way to control that.
  • Provide a default implementation that works.

A good example of that can be seen in RavenDB’s sharding implementation.

public interface IShardAccessStrategy
{
    event ShardingErrorHandle<IDatabaseCommands> OnError;

    T[] Apply<T>(IList<IDatabaseCommands> commands, ShardRequestData request, Func<IDatabaseCommands, int, T> operation);
}

As you can see, we abstracted the notion of making multiple requests. We provide you out of the box with sequential and parallel implementations for this.

The last item, timeouts /expirations / delays is also something that you want to give the user of your library control of. Ideally, using something like the strategy above. By all means, make a default implementation and wire it without needing anything.

But it is important to have control over those things. The expert users for your library will want and need it.

time to read 3 min | 419 words

Next on the agenda for writing correct multi threaded libraries, how do you handle shared state?

The easiest way to handle that is to use the same approach that NHibernate and the RavenDB Client API uses. You have a factory / builder / fizzy object that you use to construct all of your state, this is done on a single thread, and then you call a method that effectively “freeze” this state from now on.

All future accesses to this state are read only. This is really good for doing things like reflection lookups, loading configuration, etc.

But what happens when you actually need shared mutable state? A common example is a cache, or global statistics. This is where you actually need to pull out your copy of Concurrent Programming on Windows and very carefully write true multi threaded code.

It is over a thousand pages, you say? Sure, and you need to know all of this crap to get multi threading working properly. Multi threading is scary, hard and should not be used.

In general, even if you actually need to do shared mutable state, you really want to make sure that there are clear definitions between things that can be shared among multiple threads and the things that cannot. And you want to make most of the work in the parts where you don’t have to worry about multi threading.

It also means that your users have much easier time figuring out what the expected behavior of the system is. This is very important with the advent of C# 5.0, since async API are going to be a lot more common. Sure, you use the underlying async primitives, but did you consider what may happen when you are issuing multiple concurrent async requests. Is that allowed?

With C# 5.0, you can usually treat async code as if it was single threaded, but that breaks down if you are allowing multiple concurrent async operations.

In RavenDB and NHibernate, we use the notion of Document Store / Session Factory – which are created once, safe for multi threading and are usually singletons. And then we have the notion of sessions, which are single threaded, easy & cheap to create and follow the notion of one per thread (actually, one per work unit, but that is beside the point).

On my next post, I’ll discuss what happens when your library actually wants to go beyond just being safe for multi threading, when the library wants to use threading directly.

time to read 2 min | 274 words

The major difference between libraries and frameworks is that a framework is something that runs your code, and is in general in control of its own environment and a library is something that you use in your own code, where you control the environment.

Examples for frameworks: ASP.Net, NServiceBus, WPF, etc.

Examples for libraries: NHibernate, RavenDB Client API, JSON.Net, SharpPDF, etc.

Why am I talking about the distinction between frameworks and libraries in a post about multi threaded design?

Simple, there are vastly different rules for multi threaded design with frameworks and libraries. In general, frameworks manage their own threads, and will let your code use one of their threads. On the other hands, libraries will use your own threads.

The simple rule for multi threaded design for libraries? Just don’t do it.

Multi threading is hard, and you are going to cause issues for people if you don’t know exactly what you are doing. Therefor, just write for a single threaded application and make sure to hold no shared state.

For example, JSON.Net pretty much does this. The sole place where it does do multi threading is where it is handling caching, and it must be doing this really well because I never paid it any mind and we got no error reports about it.

But the easiest thing to do is to just not support multi threading for your objects. If the user want to use the code from multiple threads, he is welcome to instantiate multiple instances and use one per thread.

In my next post, I’ll talk about what happens when you actually do need to hold some shared state.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}