Transactions are a figment of your imagination
This post is in response for a few comments here. In particular, I get the sense that people expect businesses and systems to behave as if transactions are a real thing. The only problem here is that this is quite far from the truth.
Let me define what I mean by a transaction here. I’m not talking about database transactions, ACID or any such thing. I’m talking about the notion that any interaction between a user and a business, or between two businesses can actually be modeled with the notion of a transaction similar to what we see in databases.
That is, that we have an interaction that would either be all there, or won’t be there at all. The most obvious example is the notion of financial transaction, the notion that we debit an account and then we credit another account. And we have to do that in such a way that either both accounts were modified or none of them were modified. That is the classic example for database transactions, and it is wrong. As anyone who ever wrote a check or sent an wire money transfer can tell. A good discussion on how that actually works can be found here. Note that in this case, the way money transfer works, in the real world, is that you upload a file over FTP, then wait three to five days to see if the orders your sent were rejected.
Another example is the notion of accepting an order, in a transactional manner. If I accepted your order, after verifying that I have reserved everything, and my warehouse burned down, what do I do with your order? I can hardly roll it back.
To move away from businesses, let us consider doing something as unimportant as voting in a national election. Logically speaking, this is a pretty simple process. Identify the voter, record their vote, total the votes, select winner. Except that you can go back and force a re-election in a particular district if such is needed, or you might find a box of lost votes, or any of a hundred evil little things that crop up in the real world.
Any attempt to model such systems in neat transactional boxes with “all in or none at all” is going to break.
Comments
"...either both accounts were modified or none of them were modified."
If I'm transferring money to another account of course that is my main expectation. And I'd much rather see that my bank: * rejects my transaction if it cannot take it 100% reliably * take 5 days but process it in the end than * taking my money transaction order and then losing it * taking money from my account and then not adding to another account
It is simply not an option.
And I don't care if they internally use FTP or Sneakernet, my transaction order cannot be lost.
So, if you have reliable nodes and you guarantee that my order is reliably stored on local node and that message will proceed in reasonable amount of time then storing messages locally is OK. Otherwise it is not.
This is in context of financial transactions.
If the app is about non critical data that can be lost or much delayed without consequences that's another story.
PetarR, Welcome to the world of money transfers. It is actually pretty common for such transactions to get lost, and require chasing after them to recover.
This happens every few months for us. A customer pays via international money transfer (usually SWIFT). They present us with a proof of payment. That includes the transaction id for the money transfer. A week or two goes by, no money in our account.
We need to open a call with the bank to figure out what might have happened.
For example, it might have been put on hold for suspicion of tax fraud, and require getting a letter about it from the tax authority, or because money transfer laws have changed (I'm looking at you, Argentina), or because of a spelling error, or a hundred other reasons.
And those are for transactions that can be in pretty big amounts, so they are important. So you deal with it.
Being hold is not equal to being lost.
In all cases you described there is a way to solve the problem.
If you store msg on local node (cloud instance) and that is killed you lost the msg.
PetarR, See my example on the previous post regarding elections, and paper ballots and burning down the building. The exact same behavior
Right. But what do you propose?
Tobi, For the problem? I suggest belt AND suspenders. It won't help you 100%, but it can reduce the chance of failure, and it gives you more options.
I can testify that Argentina money transfers rules change overnight without explanation. Wiring transfers to other countries or receiving them is a mess (sometimes taking far more than the usual 5 days rules).
But aside from that, having worked on financial before (connecting 2 central banks among other clients) I can tell that the transactional rules do not apply (they are nice to catch errors though). That is why any transacting system have the concept of a compensation built in the general case. Shit happens and we have to fix them between the parameters of traceability to catch money laundering and also to do not create money out of thin air.
On the shit happens side of things I have a friend who mistakenly deleted a 1M transaction with a wrong SQL statement. So I am inclined to agree. The real world is not transactional.
"
"I suggest belt AND suspenders."
Except what you are suggesting is to wrap a thin thread through your belt loops and tell the caller you are wearing a belt. Storing a message on multiple disks in multiple data centers is a belt. Using a 'backup' approach which is less reliable when the primary is down simply cannot add reliability, only uptime. Storing on local disk and the remote queue and only reporting success when both happened can improve reliability but does nothing for uptime.
That error handling in a client/server scenario is difficult and cannot be failproof does not mean that every situation calls for preferring uptime to reliability. Not all clients will simply give up if the service goes down for a few minutes and some may prefer to reduce the likelihood of lost messages by retrying later. This is especially the case of the caller is already processing from a guaranteed queue and can simply NACK the message until it goes through to the next reliable stage.
I don't want to claim that the approach is bad or shouldn't ever be done. This is a tradeoff which can be made as long as it is made consciously. This article argues well that it may often be a worthwhile tradeoff. It fails to clearly acknowledge that a tradeoff is being made.
Forgot to mention that while it is certainly interesting to draw parallels to existing real world systems, the fact that a polling building can burn down does not imply we should just throw up our hands and not bother trying to build a system more reliable than a paper ballot.
Nate, Practically speaking, adding persistence if the network is down does work. It add another measure in which you can recover if there is an error.
Sure there is a tradeoff, but the difference is that if you don't have a network, all you can do is fail. If you write to local disk you can also try to recover. It is more complex (you may need to deal with seriously out of date messages) but in many cases, it is better than not doing it at all.
I think the main differences in our viewpoints are 1) The value judgment of whether failure recovery (or best effort for recovery) is better left to the client or server. How much confidence a server should have in its ability to succeed an action before reporting success. 2) A quibble over semantics. "The server fails when it tells the client it can't do something" vs "The server fails when it tells the client it can do something and then does not do it".
I think we agree on the technical aspects of the tradeoff. The rest is context specific and I doubt either of our opinions is likely to change.
Dear Ayende,
What counts in your examples is not the factual iplementation of the transaction, it 's the normative character. Both in banking and in voting, the norm is transactional: there is a clear situation that everybody can agree upon and that is the ultimate yardstick. A banking transaction should credited on one side and debited on the other, a polling result should be clear after all votes are cast and counted. You don't seem to dispute that.
What you bring up are practical difficulties that may arise in banking, polling and other domains. Difficulties that incidently also arise in computing (input errors, for example). These errors do not put the model in question, they only show us that every implementation, be it a physical process or an IT process, has its flaws. The transactional model doesn't loose its normative value just because of some errors.
Regards,
Rudy
Rudy, If you look at the actual model for banks, in particular, those issues dictate the entire approach to modeling this. Banks don't use the transactional model, they use the log reconciliation model.
I've long noticed that banks seem to be terribly backward when it comes to technology, best practices, state-of-the-art, etc. Perhaps this be just another example?
R A, Banks predate computers by a significant margin, all their business processes are not based around technology. It can be a bad or good thing, but that is the way it is. For that matter, there are very few business that have a business model that rely on technology.
Comment preview