A Story About Testing In Multi Threaded Environments

time to read 3 min | 538 words

I just finished a really nice project that involved heavy use of multi threaded code. The project is enterprisey (which has turned into a really bad word recently, thanks to Josling and The Daily WTF), so I needed to make sure that I was handling several failure cases appropriately.

Now, testing multi threaded code is hard. It is hard since you run into problems where the code under test and the test itself are running on seperate threads, and you need to add a way to tell the test that a unit of work was completed, so it could verify it. Worse, an unhandled exception in a thread can get lost, so you may get false positive on your tests.

In this case, I used events to signal to the tests that something happened, even though the code itself didn't need them at the beginning (and didn't need many of them at all in the end). Then it was fun making sure that none of the tests was going to complete before the code under test ( there has got to be  a technical term for this, CUT just doesn't do it for me ) has completed, or that there were no false positives.

As I was developing the application, I run into issues that broke the tests, sometimes it was plainly bugs in the thread safety of the tests, and I could see that the code is doing what it should, but the test is not handling it correctly. I was really getting annoyed with making sure that the tests run correctly in all scenarios.

In the end, though, I was just about done, and I run the tests one last time, and they failed. The error pointed to a threading problem in a test. It was accessing a disposed resource, and I couldn't quite figure why. Some time later, I found out that an event that was supposed to be fired once was firing twice. This one was a real bug in the code, which was caught (sometimes it passed, the timing had to be just wrong for it to fail) by the test.

Actually, this was a sort of an accident, since I never thought that this event could fire twice, and I didn't write a test to verify that it fired just once. (In this case, it was the WorkCompleted event, which was my signal to end the test, so I'm not sure how I could test it, but never mind that.)

I spent some non trivial time in writing the tests, but they allowed me to work a rough cut of the functionality, and then refine it, knowing that it is working correctly. That was how I managed to move safely from the delete each row by itself to the BulkDeleter that I blogged about earlier, or to tune the consumers, etc. That final bug was just a bonus, like a way to show me that the way I used was correct.

Now, the tests didn't show me that if I try to shove a 30,000 times more data than it expected, the program is going to fail, that was something that load testing discovered. But the tests will allow me to fix this without fear of breaking something else.