The right thing and what the user expect to happen are completely unrelated
“In theory, there is no difference between theory and the real world.”
One of the more annoying things to learn was that the kind of things that you worry about from inside the product are almost never the kind of things that your users worry about. Case in point, we spend an amazing amount of time making sure that RavenDB is crash proof, that you will not have data corruption and that transactions are atomic and durable in the face of what is sometimes horribly broken environments. Users often just assume “this must work” and move along, having no idea how hard some of these things are.
But that much, I get. It make sense that you would just assume that things should work this way. In fact, one of the reason that RavenDB exists is that none of the NoSQL products at the time didn’t provide what I considered to be basic functionality. Since then I learned that what a user consider basic functionality and what a database consider basic functionality are two very distinct things.
But I think that the most shocking thing was that users tend to not care about data consistency anyway near the level you would expect them to. We spend time and effort and a whole lot of coding to ensure that it would be possible to reason about the behavior of a distributed and concurrent system in an fairly predictable manner, that data is never lost or misplaced, and no one notices. What is worse, when you get things right, and another database engine gets it clearly wrong, users will sometimes want to use the other guy (wrong) implementation, because doing the clearly wrong thing is easier for them.
For example, consider the case of two concurrent modifications to the same document. If you do nothing, you’ll get a Last Write Wins scenario. You can also do the proper thing and error when the second write comes, because it is based of a now out of date version of the document. A few weeks ago I got a frantic call from one of the marketing & sales people about “I broke our database” and “found a major issue”. That was quite strange, given that the person talking to me wasn’t a developer, instead, she was using one of our internal systems to update a customer purchase and got an error. She then proceeded to figure out that she could reproduce this error at will. All she had to do was edit the same customer record at the same time as a colleague was also editing it. Whoever saved the record first would work, and the second would get an error.
For the developers among you, that is Optimistic Concurrency in action, absolutely expected and what we want in this scenario. But I had to give a full explanation of how this is not a bug, tell the marketing guys to put down the “Serious Bug Fixed, Upgrade Immediately” email template down and that this is how it is meant to work. The problem, by the way, wasn’t that they couldn’t understand the issue. They just couldn’t figure out why they got an error in the first place, surely the “system” was supposed to figure out what to do there and not given them an error.
I’ll freely admit that we skimp on the UX of our internal systems because… well, they are internal, and it is easier to train a few dozen people on how the systems work than to train the systems how people work at that scale. But this really hit home because even after I explained the issue, asked them what they expected to happen and how this is supposed to work, I couldn’t get through. An error shown to them is obviously something that is wrong in the system. And being able to generate an error by their own actions means that the system is broken.
It took showing the same exact behavior in the accounting software (made by an external company) before they were half convinced that this isn’t actually an issue.
Now, to be fair, our marketing people aren’t technical, so they aren’t expected to understand concurrency and handling thereof, and typically any error by the internal system means that something is broken in the infrastructure level so I can absolutely understand where they are coming from.
The sad thing is, this isn’t isolated to non technical people and we have to be careful to design things in such a manner that they match what the user expect. And user in this case is the developers working with RavenDB and the ops teams responsible for its care and feeding. I’ll talk about one such decision in the next post.
Comments
This is a UX issue then. Don't call it an "error"! that is misleading to most people - error=problem with program Perhaps give a friendly notice that "someone else has already changed the record since you viewed it. Please view updated record and determine if your required update is still valid" etc.
From the user perspective it's a serious defect - user got into some situation where he's beingh threatened about losing his work because something/somebody else managed to change the data underneath. So our innocent user is being punished for somebody else's actions, without even a way to fix that. The only option is to lose your data now or later, after fumbling a little. Developers were shot for lesser offenses against the users.
Principle of least surprise ... so often we forget about it.
To give an example of what I've mean, I've encountered an Entity Framework issue not long ago.
Basically: if you have a column mapping in the form :
Where IsActive is an boolean property (not nullable).
You cannot insert a listing with
IsActive = false
, because it uses default CLR value to mean insert the store generated value, in this case false => true, true => true; Yup, see this issueI've personally got bitten by this in production, when code as innocent as
ctx.Add(new TenantFeature{TenantId = x, FeatureId = y, IsEnabled=False});
was checked in, and surprise, in DB it's Enabled: 1, just after save.Anyway ...
The principle of least surprise ... is too often ignore by programmers because we have "logical explanations" ready to give, like that's how "optimistic concurrency works", or "default CLR value is used to mean Db Default", or others. This is one of the fallacies of being a programmer and It's insufficient to have a logical explanation for something, it should also be the solution least surprising to our users.
But there's a system handling concurrency issues smoothly and safely!
It's called git.
The fact that it works in one system so successfully, means there could be a strategy that would work for your use case too. UX-wise it might lead to visual diff between documents. You see both documents, with different field values highlighted. Then you go about and 'accept' or 'reject' changes.
If diffed fields come with an extra in-place (or flyout) mark who and when made alternative edits -- you'll have happy users and structural correctness on your hands.
Not for free though :-)
Peter, You got to call it something, and the semantics is not that different. I agree that this would be friendlier to the user, even if it says the same thing, basically. And even after multiple explanation, the users though it was an issue.
What is worse, we are seeing the same with (different) feature where developers take the same position.
Rafal, Oh, I understand the position of the user, but unless they want to learn to do things like merge conflicts, I don't see antoher way. The problem is that I couldn't explain that in a way they will understand.
Pop Catalin, Can you explain how you would handle this with principle of least surprise?
Oleg, Um... nope. A whole basket of nopes.
Leaving aside that Git is a wonderful system that manages to confuse developers all the time. Case in point: https://xkcd.com/1597/
The problem is that this only works if you are actually able to reason about each change independently. However, in business domains, changing a field is not just something that happens, it goes through validation, business logic,has stuff that fire as it happens, etc. For example, changing the state that a shipment is going to is going to lead to a different shipping charge, but what do you do when two users changed the address of an order? Default merge rules will typically mean that it will pass without conflicts, but will have double shipping charge.
Ayende, considering merging automatic or otherwise is not feasible or is too costly, I think a good choice would be to ask the user how to proceed. IE: A newer version of the document [view in new window] was saved on the server. Would you like to:
Overwrite
'Discard Changes' ?This way no category of users are ever surprised. The authoritative category expect an override to be default (I'm owner of this content I should be able to override it, not loose my changes). The non authoritative category won't surprised (Ups I've I've overwritten the changes my Boss has made by mistake, will I get fired?) .
Pop Catalin, I have very hard time accepting "overwrite changes you never seen" as something that we should do.
Ayende, in general Yes, especially thinking out of context, you don't want to override changes you've never seen. However a user can decide that whatever those changes might be, his are most probably far more important.
The importance of the content that get's lost is what actually matters the most. Is OK to overwrite? Maybe you just edited some customer information adding important stuff over a phone call, like a 5 new bank accounts and addresses, and new contact persons ... would you like to loose that info with an application error ? Granted it's far more easier for the person receiving an error to fix data than for a person unknowingly overriding it, but this is not about unknowingly overriding data, it's a about a conscious user choice.
What would happen if instead of Visual Studio asking the famous question Visual studio Dialog box
instead of the dialog, it would just either override, or discard your changes during save with an error? (based on original programmer preference)
@Ayende
in your Effectus (https://msdn.microsoft.com/en-us/magazine/ee819139.aspx) you show handling concurrency conflicts way as below...
public void OnSave() { bool successfulSave; try { using (var tx = Session.BeginTransaction()) { Session.Update(Model.Action);
} catch (StaleObjectStateException) { successfulSave = false; MessageBox.Show( @"Another user already edited the action before you had a chance to do so. The application will now reload the new data from the database, please retry your changes and save again.");
}
EventPublisher.Publish(new ActionUpdated { Id = Model.Action.Id }, this);
if (successfulSave) View.Close(); }
mario, that's a pattern from Relational Databases world. It's so easy to keep document history with a Document Db compared to a Relational Db that, optimistic concurrency may be entirely skipped in some cases if you keep the history and it's available to users. (Granted that depends business specifics).
Ayende, you have a complex merge problem, that's what UI needs to target.
You cannot solve the problem by auto-merge, you cannot solve the problem by popping up error messages, nor YesNoCancel chickenboxes.
UI overlaying two documents and highlighting changes would be ideal, and leave no residual animosity -- neither for a user nor for a developer. That is a fair bit of work, but it correctly reflects the true complexity, while avoiding unnecessary theoretical voodoo.
"Hey user here's your 3-state merge in dead-clear consumable UX, you deal with it now"
Oleg, Doing a 3 way state merge is anything but easy. Especially with dead clear consumable UX. Even if it was, optimistic concurrency usually means that you have to apply it across the board. Will you have this for every CRUD screen in your app? For the 0.001% when this happens? Or would you rather just show an error?
This is an important question, which also affect the design of the application itself. If there is a need to have concurrent modifications on the same entity, you can design it that it won't be using simple CRUD, but something that can handle this. Commands that can be applied, or maybe split the entity into individual components that can be safely modified separately, etc.
That tend to be much better than giving users 3 way merge.
I think there are many cases where last wins is an acceptable approach. Users don't usually complain when this approach is taken, even when they overwrite changes they have never seen . However there are other cases where it isn't acceptable. If concurrency conflicts are very rare and the potential lost of work is small then you should show a message to the user when the conflict occurs because this is the simplest thing you can do. However, if the potential lost of work is bigger or the concurrency conflicts are more likely to occur you should take another approach. Developing UX interfaces for merging conflicts are very complex and needs a lot of time, this is the reason I usually take another approach, I usually apply the "check out/check in" pattern which is conceptually similar to pessimistic locking, the user has to check the entity out before editing it, when the user finish editing the entity, the user checks it in. Only one user in one session can have the entity checked out at a time, preventing concurrency conflicts.
So we all know what's the ideal solution is, but it would cost too much.
That's a rational compromise, but naturally it comes with slightly vexed users. Here are my opportunistic pragmatic suggestions:
^^^ that 3-way diff is of course read-only view
Comment preview