Refactoring threading code
Let us start by saying that threading is complex, hard and complicated.
A while ago I was asked to do a code review of a JoinAll() method, it looked somewhat like this:
public void JoinAll(IEnumerable<Thread> threads) { foreach (Thread thread in threads) { if(thread.Join(500)==false) thread.Abort(); } }
I was properly horrified by the use of thread.Abort(), but as you know, all rules have exceptions, so instead of commenting on the implementation details of the method, I asked to understand the entire scenario. This piece of code will probably easier than me trying to explain:
List<Thread> threads = new List<Thread>(); List<string> results = new List<string>(); foreach (string urlParam in GetUrlsToQuery()) { string url = urlParam; Thread thread = new Thread(delegate() { WebRequest wr = WebRequest.Create(url); WebResponse response = wr.GetResponse(); using (Stream stream = response.GetResponseStream()) using (StreamReader sr = new StreamReader(stream)) { string result = sr.ReadToEnd(); lock (results) { results.Add(result); } } }); threads.Add(thread); thread.Start(); } foreach (Thread thread in threads) { if(thread.Join(500)==false) thread.Abort(); } ProcessResults(results);
That piece of code is in the Page_Load method of a page. The scenario was querying remote servers for information (15 - 20 of them, most often), and there was a time limit to how long we needed to wait before we need to display the results to the user.
The code above actually works, but it is tremendously wasteful of resources (creating a thread is expensive) and not really nice way to behave to a thread, it is entirely possible to get to a point where you break up the lock mechanism and remains with a broken app because of aborted threads that were in the middle of supposedly atomic operation. Not to mention that we are likely not going to cleanup the resources very well if we abort the thread.
The hard constraint that the page must be rendered in a second or so. But here, if even two of those servers are not playing well, it is not going to render fast enough.
In short, do write code like the above.
I took the code and change it a bit, instead of explicit threading, let us use the async API instead, so we get this piece of code:
List<string> results = new List<string>(); List<WaitHandle> waitHandles = new List<WaitHandle>(); foreach (string url in GetUrlToQuery()) { WebRequest wr = WebRequest.Create(url); IAsyncResult asyncResult = wr.BeginGetResponse(delegate(IAsyncResult ar) { WebResponse response = wr.EndGetResponse(ar); using (Stream stream = response.GetResponseStream()) using (StreamReader sr = new StreamReader(stream)) { string result = sr.ReadToEnd(); lock (results) { results.Add(result); } } },null); waitHandles.Add(asyncResult.AsyncWaitHandle); } WaitHandle.WaitAll(waitHandles.ToArray(), TimeSpan.FromMilliseconds(700),false); string[] resultsCopy; lock(results) { resultsCopy = results.ToArray(); } ProcessResults(resultsCopy);
Now, instead of having a lot of blocked threads, we have moved it to the framework responsibility, we then gather all the wait handles and wait on all of them (remember, there aren't more than 20, but if more than 64 are required, we can manage as well).
The important part comes when we decide that we want to give up, we make a copy of the results so far, and then start to process them. In the meantime, any request that was completed can go along its merry way and add the newly got result to the results list, but that is no longer relevant, because it got there too late.
If I wanted to get really deep into it, ASP.Net has the notions of the async request handling, which is very cool, but it doesn't have an expiration time (that I know of, at least), and we didn't want to get into it in the first place.
Anxiously awaiting your comments, telling me how wrong I am.
Comments
Your rewrite of the threading code looks fine, but what scares me is this:
"That piece of code is in the Page_Load method of a page."
What the hell is code like that doing in code-behind? :)
Certainly better. But: BeginGetResponse uses a threadpool thread, right? After being burned by something like this very recently, I always try to check if my threading code is hogging the pool. It's alright in this case, but if there were more than 25 servers to query some of them would have to wait for the first ones to complete.
Or - is the framework smart enough to queue all asynchronous web requests on the same thread?
I'd certainly consider using a separate thread pool for this. The problem with using the .NET threadpool in this way (i.e. queuing a lot of async processes from a single request thread) is that you are indeed hogging the threadpool already used by the core framework. I mean it's fine to schedule one or two, but scheduling 15? on a single processor server that's already eating up more than half of the threadpool, and then what happens when you've got got more multiple requests on your page?
Think about it this way: Those extra threads you need to do the requests to the other sites in parallel are essentially IO threads, so they will spent most of the time just sitting there waiting for I/O to complete. It makes sense to delegate that to a larger threadpool but one where you can still reuse threads and control their lifetime.
My understanding is that this is not done by blocking a thread on the backend, but by utilizing Windows' IO Completion ports, which means that we aren't actually taking any threads at all.
Paul,
Do you really want to get into The Things That People Should Not Do In Page_Load ?
I have a big list somewhere.
@Ayende:
yes, it's true that the .NET ThreadPool is based on an IOCOmpletion port, but there's nothing magical about IOCPs. Yes, they use threads (in fact, you specify how many threads the port can use when you create it), and it provides for a very efficient way to use those threads, but you can still starve the pool and even dead lock it if you're not careful.
It might be you're right that this won't cause any problem.... but I'd still test it throughly with heavy load to see how it behaves :) Also, notice that because you don't cancel pending requests when you decide to stop waiting and move on, you'll loose resources while those requests finish processing (even though you don't care about the results anymore).
So, imagine there's a problem with one of the sites you're connecting to and it just hangs there without responding for a long time... If this indeed is taking threads, then you basically lost one until that site responds, and now imagine you queued ten requests for that sites and well, you get the picture.
I haven't done exactly the same thing you are doing, but I've had the "pleasure" or working on similar stuff (except waiting for processes or other network resources) and it can be tricky to make it work reliably under heavy load.
Tomas,
I have a system like that that is waiting scaling for up to 10,000 concurrent active (send/receive) connections and pushing hundreds of MBs per seconds +on my laptop+, using the same approach.
That system is using the sockets directly, but it should hold for web request as well.
IOCP don't hold up a thread while they wait, that is the beauty there.
Ayende: You don't have to explain IOCPs to me; I've been using them for a long time (heck, I even wrote my own C++ wrapper around them back in the good old NT5 days) and understand how they work, they aren't the issue here. Yes, in this case the .net threadPool might behave like you expect it, all I'm saying is test it to make sure.
You say "That system is using the sockets directly, but it should hold for web request as well.". That's a big assumption, unless you've poked quite a bit under the hood and actually checked the WebRequest code to ensure it pretty much behaves like your plain sockets case. Yes, it's a reasonable assumption, but I've been bitten down in the past by those so I thread more lightly. All I'm saying is test it.
Actually System.Net.HttpWebRequest.BeginGetResponse() does use the ThreadPool. Even worse than that, it throws an exception if ThreadPool.GetAvailableThreads() does not return at least two.
This probably means that you should never use both WebRequest.BeginGetResponse() and the ThreadPool in the same application.
Since the ThreadPool is supposed to spawn more threads if the other threads are blocking, I am completely baffled at why they have to implement this check.
At least HttpWebRequest.GetResponse() no longer uses BeginGetResponse()/EndGetResponse() internally (it did prior to .NET 2.0), so you can combine WebRequest.GetResponse() and the ThreadPool now.
I will stray from the discussion regarding IOCPs, and go back to the matter at hand, async web requests that should be joined to display on a web page.
I have written exactly this kind of code for a windows services and I used, the simple approach, you set a timeout on the HttpWebRequest, when it elapses you get a nice TimeOutException which you catch within the thread, log it so you’ll know which server is to blame and then the thread goes to where all the threads go to die.
Obviously exceptions are not the way you’d like to handle this, but Thread.Abort and letting the thread continue to run though you don’t need the results anymore seems much worse… awaiting the lynch…
Tomas,
You are correct, I gave it a quick glance in reflector, but I can't really be sure that this is the case, and I am not in the mood to read documentation now.
Good point about the timeout, Moran. If for some reason setting a timeout is too inflexible - say, you would prefer to return a result after five seconds, but are prepared to wait 30 seconds to get at least one response - you can use (Http)WebRequest.Abort(), which is a clean way of aborting the request.
Hello,
well, not really an expert on how the ASP.NET threading works, but I'll still give you my opinion on that. regarding the way threads are used, i think that the behavior depends on the version of iis that you're using. If i'm not mistaken, in iis 6 asp.net requests are handled by worker threads which means that it should be ok to (for instance) call a web service in async mode since that operation would end up using an I/O thread (and not a working thread).
regarding the asyn tasks available in asp.net, i think that you can set the timeout if a request by using the AsyncTimeout property (hope I've got the name right) which can be set per page (ie, by using a property) or on the web.config on the <pages> section
It's really interesting that the example you give corresponds perfectly to the example code I've got in NServiceBus (http://www.NServiceBus.com):
Send N async requests, store up the results as they come in up to a timeout of T. When all responses have returned OR T has elapsed, use the set of results available.
The implementation is also very efficient in terms of threads for client code - only one listener thread is needed.
Comment preview