Fixing test failures using binary search

time to read 3 min | 404 words

With Rhino Queues, I have a very serious problem. While all the tests would run if they were run individually, running them in concrete would cause them to fail. That is a pretty common case of some test affecting the global state of the system (in fact, the problem is one test not releasing resources and all other tests failing because they tried to access a locked resource).

The problem is finding out which. By the way, note that I am making a big assumption here, that there is only one test that does so, but for the purposes of this discussion, it doesn’t matter, the technique works for both options.

At the heart of it, we have the following assumption:

If you have tests that cause failure, and you don’t know which, start by deleting tests.

In practice, it means:

Run all unit tests
If the unit tests failed:

Delete a unit test file with all its tests
Go to first step

If the unit tests passed:

Revert deletion of a unit test file in reverse order to their deletion
Go to step 1

When you have only a single file, you have pinpointed the problem.

In my case, unit tests started to consistently pass when I deleted all of those ( I took a shortcut and delete entire directories ):

So I re-introduced UsingSubQueues, and… the tests passed.

So I re-introduced Storage/CanUseQueue and… the tests passed.

This time, I re-introduced Errors, because I have a feeling that QueueIsAsync may cause that, and… the tests FAILED.

We have found the problem. Looking at the Errors file, it was clear that it wasn’t disposing resources cleanly, and that was a very simple fix.

The problem wasn’t the actual fix, the problem was isolating it.

With the fix… the tests passed. So I re-introduced all the other tests and run the full build again.

Everything worked, and I was on my way :-)

Tweet Share Share 15 comments

Tags:

Test Driven Development

Comments

15 May 2009
12:31 PM

Bruno Martínez

You could make your code less dependent on global state, replacing a static class with a normal class and a single static variable. Normal code uses the static but tests use the normal class, which is created for each test.

In my code I usually have conflicting tcp ports. You can let the OS chose the port for a listening socket with Socket.Bind.

Lastly, I think that if you have hard to pinpoint failures in your tests, you have tested and failed in the debuggeability of your system. It's time to fire up serious debuggers like WinDbg and make changes in your program so as to make determining the problem easier. If it's hard to debug unit tests is surely hard to debug the same program in production, so it pays off to have better debug procedures in place.

15 May 2009
12:35 PM

Billy Stack

This is a code smell anyway. Any tests that effect the actual outcome of other test results should be looked at and made more isolated!

15 May 2009
13:40 PM

Ayende Rahien

Bruno,

Tests do things that the real app never does.

In that example, the problem was that one of the tests wasn't releasing a file. That never happens in the app, because that file it used throughout the lifetime of the app

15 May 2009
13:41 PM

Ayende Rahien

Billy,

Duh!

The problem was a bug in the tests

15 May 2009
15:59 PM

Greg Young

Ayende, I have some similar tests (not surprisingly on a similar problem, a transaction file to be specific).

I ended up supporting temp files to avoid this so there would not be dependencies between my tests.

Cheers,

Greg

15 May 2009
19:34 PM

Joe Gutierrez

@Ayende

I think you may have missed an abstraction. Have you thought of getting rid of the specific dependency (an interface) on the file for these tests and application?

From the follow on comments you stated that the application maintains the file during it's lifetime. Do you not want your tests to reflect their real-world usage?

Dependencies of resources and primitives in tests, to me have a code-smell of state based testing.

I believe that the failing tests were indicating changes to the code are required. Fixing the tests IMHO seems a pretty anemic solution. It may come back and bite you in the ass.

16 May 2009
00:30 AM

Kevin Gadd

The last time I had a problem like this, I made a change to the test runner I was using (in this particular case, python's unittest module) to let me binary search my full set of tests. It ended up being worthwhile since this sort of problem can come up often, and having to do the search manually is pretty painful.

I wouldn't be surprised if it was pretty easy to do the same in NUnit (or whatever runner you happen to be using).

16 May 2009
01:38 AM

Gerardo Contijoch

I wonder why didn't you spot the problem when you wrote (and ran) the test that didn't release the file. It seems, from your description, that that particular test was written some time ago.

16 May 2009
04:16 AM

Ayende Rahien

Joe,

Those are integration tests.

16 May 2009
04:28 AM

Ayende Rahien

Gerardo,

Order, that was the problem.

That test went last on the queue, for some reason.

And other tests went after it didn't use the same thing.

It was only when I created some tests that did use it that it started failing, and that was long after that.

16 May 2009
05:58 AM

Jonathon Rossi

I'm not surprised that running your tests in concrete would cause them to fail :)

16 May 2009
12:45 PM

Nathan

Sounds like it might be a good feature for test frameworks to not run their tests in a pre-determined order?

16 May 2009
12:53 PM

Ayende Rahien

Actually, it is a horrible idea.

The problem is that when you DO have such a problem, you have a really hard time figuring it out.

18 May 2009
20:46 PM

zvolkov

I thought you said you used binary search? I mean it's pretty obvious but some people may find the title confusing... The logic above is linear: you just remove a test after test. Instead your logic should be "recursively disable half of the tests and see if that helped", right?

18 May 2009
20:48 PM

Ayende Rahien

zvolkov,

It means disable a big parts of the tests, run the test.

Enable some tests, run the test

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB