Not so Impressive
I had hard time deciding what examples I should give about what is impressive to me and what is not. I didn't want to use the real world scenarios, but I did want to give concrete examples. Fortunately, I did manage to think of a couple of good examples.
Distributed Grid and Distributed Caching. Both of those are considered hard. In the sense that it takes a lot of expertise to build them. They are also one of those pure technical issues that developers love, no need to deal with pesky tax laws or understand why delinquent customers are given more credit, just pure programming bliss.
The problem is that those are just aren't impressive on their own. Building a distributed cache is easy and fun. There is nothing complicated going on there. Distributed Grid sounds complex, at first, but it is a very simple technical challenge. It is slightly more complex if you want to implement it with automatic binary distribution (that is, you don't need to manually deploy dlls to the machines in the grid, it happens for you), but even then, it is firmly in the realm of the easy to do.
What is impressive in such a scenario is how you solve the management problem. How do you gracefully recover from a failed worker node on the grid? How do you handle another node adding itself to the cache? What happen if a server crash?
Handling those problem is an interesting, challenging and impressive. Because it require a bit of thinking beyond just technical expertise.
Comments
I am just starting to look at building a distributed grid for processing millions of documents (.doc, .xls, etc). The things you mention are some of the things I have been pondering. Do you, or any of your readers, have any good guidance/references on this stuff? Or maybe some upcoming posts to look forward too?
A couple of things of interest I have come across for those interested are Digipede (http://www.digipede.net/) and Alchemi (http://www.alchemi.net/).
BTW - your recent posts on Multi Tenancy were excellent. Much appreciated.
We had some guys from Coherence (recently bought by Oracle) come in to give us a demo. They explained how their failover and recovery works - it's pretty clever and I was definitely impressed!
Essentially there is no single node which is responsible for managing the other nodes, each node has a backup and other nodes "vote" to determine whether they think a particular node is active or not. If a node's backup goes down, it chooses another node to be its backup and pushes its data onto that node. Vice versa, the backup takes over as the master node and picks a new backup. All transparently
Comment preview