3 Levels Of Failures

time to read 1 min | 121 words

  • You have the no-failure, everything completed successfully and everyone is happy.
  • You have complete-failure, someone just pull the power cord off the machine.
  • And in the middle, you have partial-failure, something bad happened, (network is down, someone pulled the USB key out) but the system can (and should) keep running.

Most people give up on the complete failure scenario, leaving that up to transactional storage to handle this. That is usually the smartest solution.

The problem occur with partial failure, where you need to recover in meaningful ways. It is actually very easy to write a system where a complete failure is better than a partial failure.