Architecture > Code
Steve Py asks an interesting question in one of the comments to my On Infinite Scalability post:
Can you elaborate more on: "Note, those changes are not changes to the code, they are architectural and system changes. Where before you had a single database, now you have many. Where before you could use ACID, now you have to use BASE. You need to push a lot more tasks to the background, the user interaction changes, etc."
When you talk about jumping from 1 server to multiple servers, ACID to BASE, and how user interaction changes, how do you quantify that this is done without code changes?
The answer to that is that there is a mistaken assumption here. Changing the architecture is going to change the code. But usually that is rarely relevant, because changing the architecture is a big change. If you are moving from a single DB to multiple database, for example, there are going to be code changes, but that isn’t what you worry about. The major change is the architecture differences (how do you split the data, how do you do reporting, can some of the dbs be down, etc).
Moving from ACID to BASE is an even greater change. The code might change a little or change drastically, but that isn’t where a lot of the effort is. Just defining the new system behavior on those scenarios is going to be much more complex. For example, taking something as simple as “user names are unique” would move from being a unique constraint in the database to something that needs to be able to handle those sort of things in a reasonable fashion.
Depending on your original architecture, it might be anything from replacing a single service implementation to re-writing significant parts of the code.
Comments
Of course architecture is bigger than code. A slight change in the architecture means a huge change in the code, and that's why most programmers hate to make any change in their architecture once they are halfway through the project, simply because of the huge work involved. Of course, that change of architecture may save many days of work later on in the project, but the upfront overhead is scary from most developers, and if it's working, it's working!
"For example, taking something as simple as “user names are unique” would move from being a unique constraint in the database to something that needs to be able to handle those sort of things in a reasonable fashion."
You can't even talk about uniqueness in the context of BASE, as you never know when the state is consistent so you can decide whether a given name N is already in use or not. In all other situations, you can't decide whether a name N is violating a uniqueness constraint: e.g. it might be the value N is also in the process of being removed, but it's only really removed when the state is consistent again, and you never know when that is.
Moving to a non-acid environment has a tremendous impact on how your code should work, in every aspect: you can't 'assume' anything is valid, as there's no way you can assume at the time you perform a given action that the state is consistent. You can only see what's there, but making decisions on what's there is not said to be correct, as BASE implies that the state is in fact, not consistent.
In a system which is highly volatile, it might even be so that the BASE oriented system never reaches a consistent state. Moving an acid oriented system to such an architecture is IMHO impossible, unless you rewrite / rearchitect the system around the concept of stale data / user-state==application-state from the ground up. It's a conceptual change, what you see as data is only valid for the current code, not for the entire application. Making assumptions it is (like in the uniqueness of a name) is therefore impossible to do.
More on what's called 'CAP's theorem', which illustrates the core aspects of what one has to do when moving away from ACID towards BASE: http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
I think what's essential here is that if you are coming from the ACID world (and we all are, more or less), moving towards the other end of the spectrum is not going to be easy, and requires understanding of what it is that makes the other side of the spectrum actually work. that's not: dropping the database, nor dropping one character of ACID. The article above illustrates that IMHO the best.
Hope it's helpful, also for the people who think stale data is something they don't have to work with because they work with ACID databases.
Comment preview