Distributed computing fallacies: There is one administrator

time to read 4 min | 624 words

imageActually I would state it better as “there is any administrator / administration”.

This fallacy is meant to refer to different administrators defining conflicting policies, at least according to wikipedia. But our experience has shown that the problem is much more worse. It isn’t that you have two companies defining policies that end up at odds with one another. It is that even in the same organization, you have very different areas of expertise and responsibility, and troubleshooting any problem in an organization of a significant size is a hard matter.

Sure, you have the firewall admin which is usually the same as the network admin or at least has that one’s number. But what if the problem is actually in an LDAP proxy making a cross forest query to a remote domain across DMZ and VPN lines, with the actual problem being a DNS change that hasn’t been propagated on the internal network because security policies require DNSSEC verification but the update was made without one.

Figure that one out, with the only error being “I can’t authenticate to your software, therefor the blame is yours” can be a true pain. Especially because the person you are dealing with has likely no relation to the people who have a clue about the problem and is often not even able to run the tools you need to diagnose the issue because they don’t have permissions to do so (“no, you may not sniff our production networks and send that data to an outside party” is a security policy that I support, even if it make my life harder).

So you have an issue, and you need to figure out where it is happening. And in the scenario above, even if you managed to figure out what the actual problem was (which will require multiple server hoping and crossing of security zones) you’ll realize that you need the key used for the DNSSEC, which is at the hangs of yet another admin (most probably at vacation right now).

And when you fix that you’ll find that you now need to flush the DNS caches of multiple DNS proxies and local servers, all of which require admin privileges by a bunch of people.

So no, you can’t assume one (competent) administrator. You have to assume that your software will run in the most brutal of production environments and that users will actively flip all sort of switches and see if you break, just because of Murphy.

What you can do, however, is to design your software accordingly. That means reducing to a minimum the number of external dependencies that you have and controlling what you are doing. It means that as part of the design of the software, you try to consider failure points and see how much of everything you can either own or be sure that you’ll have the facilities in place to inspect and validate with as few parties involved in the process.

A real world example of such a design decision was to drop any and all authentication schemes in RavenDB except for X509 client certificates. In addition to the high level of security and trust that they bring, there is also another really important aspect. It is possible to validate them and operate on them locally without requiring any special privileges on the machine or the network. A certificate can be validate easy, using common tools available on all operating systems.

The scenario above, we had similar ones on a pretty regular basis. And I got really tired of trying to debug someone’s implementation of active directory deployment and all the ways it can cause mishaps.