The right thing and what the user expect to happen are completely unrelated

time to read 4 min | 783 words

“In theory, there is no difference between theory and the real world.”

One of the more annoying things to learn was that the kind of things that you worry about from inside the product are almost never the kind of things that your users worry about. Case in point, we spend an amazing amount of time making sure that RavenDB is crash proof, that you will not have data corruption and that transactions are atomic and durable in the face of what is sometimes horribly broken environments.  Users often just assume “this must work” and move along, having no idea how hard some of these things are.

But that much, I get. It make sense that you would just assume that things should work this way. In fact, one of the reason that RavenDB exists is that none of the NoSQL products at the time didn’t provide what I considered to be basic functionality. Since then I learned that what a user consider basic functionality and what a database consider basic functionality are two very distinct things.

But I think that the most shocking thing was that users tend to not care about data consistency anyway near the level you would expect them to. We spend time and effort and a whole lot of coding to ensure that it would be possible to reason about the behavior of a distributed and concurrent system in an fairly predictable manner, that data is never lost or misplaced, and no one notices. What is worse, when you get things right, and another database engine gets it clearly wrong, users will sometimes want to use the other guy (wrong) implementation, because doing the clearly wrong thing is easier for them.

For example, consider the case of two concurrent modifications to the same document. If you do nothing, you’ll get a Last Write Wins scenario. You can also do the proper thing and error when the second write comes, because it is based of a now out of date version of the document. A few weeks ago I got a frantic call from one of the marketing & sales people about “I broke our database” and “found a major issue”. That was quite strange, given that the person talking to me wasn’t a developer, instead, she was using one of our internal systems to update a customer purchase and got an error. She then proceeded to figure out that she could reproduce this error at will. All she had to do was edit the same customer record at the same time as a colleague was also editing it. Whoever saved the record first would work, and the second would get an error.

For the developers among you, that is Optimistic Concurrency in action, absolutely expected and what we want in this scenario. But I had to give a full explanation of how this is not a bug, tell the marketing guys to put down the “Serious Bug Fixed, Upgrade Immediately” email template down and that this is how it is meant to work. The problem, by the way, wasn’t that they couldn’t understand the issue. They just couldn’t figure out why they got an error in the first place, surely the “system” was supposed to figure out what to do there and not given them an error.

I’ll freely admit that we skimp on the UX of our internal systems because… well, they are internal, and it is easier to train a few dozen people on how the systems work than to train the systems how people work at that scale. But this really hit home because even after I explained the issue, asked them what they expected to happen and how this is supposed to work, I couldn’t get through. An error shown to them is obviously something that is wrong in the system. And being able to generate an error by their own actions means that the system is broken.

It took showing the same exact behavior in the accounting software (made by an external company) before they were half convinced that this isn’t actually an issue.

Now, to be fair, our marketing people aren’t technical, so they aren’t expected to understand concurrency and handling thereof, and typically any error by the internal system means that something is broken in the infrastructure level so I can absolutely understand where they are coming from.

The sad thing is, this isn’t isolated to non technical people and we have to be careful to design things in such a manner that they match what the user expect. And user in this case is the developers working with RavenDB and the ops teams responsible for its care and feeding. I’ll talk about one such decision in the next post.