The randomly failing test
We made low level change in how RavenDB is writing to the journal. This was verified by multiple code reviews and a whole battery of tests and production abuse. And yet, once in a blue moon, we’ll have a test failure. Utterly non reproducible, and only happening once every week or two (out of hundreds or thousands of test runs. That was worrying, because this test was checking the behavior of RavenDB when it crashed midway through a transaction, which is kind of an important metric for us.
It took a long while to finally figure out what is going on there. The first thing that we ruled out was non reprodicability because of threading. This test was single threaded, and nothing code inject anything to the code.
The format of the test was something like this:
- Write 1000 random fixed size values to the database.
- Close the database.
- Corrupt the last page of the page journal.
- Star the db again and note that all the values in the last transaction are not in the db.
So far, awesome. So why would it fail?
The underlying reason was a obvious, once we looked at it. The only thing that differs from test to test is the random call. But we are using fixed size buffers to write, so that shouldn’t change anything. The data itself is meaningless.
As it turned out, the data is not quite meaningless. As part of the commit process, we compress the data before we write it to the journal . As it turns out, different patterns of random buffers have different compression characteristics. In other words, a buffer of 100 random bytes may compress to 90 bytes or 102 bytes. And that mattered. If the test got enough random inputs to create a new journal file, we will still corrupt the last page on that journal, but since we already are on a new journal, that last page hasn’t been used yet, and the transaction wouldn’t become corrupt and we would have the data still in the database, effectively failing the test.
Comments
Why don't you record all the random values and store them when a test fails? This would give you a perfect reproduction scenario while still running random tests (potentially covering more cases in a long run).
The lesson is: use pseudo-random data, with a pre-ordained fixed seed. Good idea anyway.
Scooletz, Sometimes we do, in this case, we didn't, and we didn't consider the random aspect of it as meaningful until very late
Oleg, Actually, a random seed is better, assuming you can repro. You test more stuff this way
Yes, we have tests which use random input and we take care to log the seed. This way, when a test fails, we can reproduce the issue and add the seed to the list of fixed tests so the issue will not return. It is worth a lot of effort to make your tests reproducible.
Comment preview