Distributed transactions aren’t, and microservices still don’t mix

time to read 3 min | 520 words

They just aren’t. And I’m talking as someone who has actually implemented multiple distributed transaction systems. People moving to microservices are now discovering a lot of the challenges and hurdles of distributed systems and it is only natural to want to go back to the cozy transactional world, where you can reason about things properly.

This post is in response to this article: Microservices and distributed transactions, which I read with interest, because it isn’t often that a post will refute it’s own premise with the very first statement.

The two-phase commit protocol was designed in the epoch of the “big iron” systems like mainframes and UNIX servers; the XA specification was defined in 1991 when the typical deployment model consisted of having all the software installed in a single server.

That is a really important observation, because in this case, we remove one big factor from the distributed transactions, the distributed part. Note that this is almost 30 years ago, distributed transactions and the two phase commit protocol aren’t running on a single node any longer. But the architecture is still rooted into the same concept. And it doesn’t work. I wrote a blog post explaining the core issues with two phase commit about 5 years ago. Nothing changed so far.

From a technical perspective, the approach that is shown in the article is interesting. It is really nice that you can have a “transaction” that spawn multiple services and databases. It is a problem that this isn’t going to result in an atomic behavior (you can observe some of the transactions being committed before others), it is a problem that this has really bad failure modes (hanging / timeout / inconsistencies) under fairly common scenarios and finally, it is a really bad approach because your microservices shouldn’t be composed using transactions.

Leaving aside all the technical details about why two phase commit is a bad idea, there is still the core architectural issue, you are tying together the services in your system. If service A is stalled for whatever reason, your service B is now impacted because it is waiting for a transaction to close.

Have fun trying to debug something like that, especially because you actual state is hidden away in some transaction manager and not readily visible. It means adding a tricky layer of complexity that will break, and will cause issues, and will create silent dependencies between your services. Silent ones, invisible ones, and they will come to haunt you.

The whole point of a microservice architecture is separation of concerns to independently managed, deployed and provisioned systems. If you need to actually have cross service transactions, you either have modelled things wrong or are doing very badly. Go back to a monolith with a single database backend and use that as the transactional store. You’ll be much happier.

Remember: Microservices. Are. Separated.

That isn’t a bug, that isn’t a hurdle to overcome. That is the point. Tying them close together is a mistake, but you’ll usually only see it after a few months of production. So take a measure of prevention before you’ll need a metric ton of cures.