Thoughts about building your own source control

time to read 3 min | 574 words

Let me start by stating that you really don't want to do that. This is not something that you want to do, period.

Now that we are over that, I had the chance lately to go fairly deeply into SCM and how they are implemented from two fairly different perspectives. This is a randomly collected set of observations about SCM systems. As usual, the order is arbitrary, and no attempt was made to make any coherent idea out of this.

  • It is all about the client. The client in an SCM system has significant responsibilities. It is in charge of reporting the client state, managing all the errors hat the user can cause, and shoulder a lot of the burden.
  • It is all about the protocol. Anyone who designs a SCM system should be given a lousy DSL line with disconnects every 15 minutes. Oh, and they should also have to work on a plane a lot.
  • On the wire, it is all so simple. It is really surprising to see how the SCM complexity is really just a lot of tiny, easy to handle, details.
  • The devil is in the details, though.
  • Complexities on the server side:
    • Space management - do you save the diff or the whole file?
    • If just the diff, how do you construct an arbitrary version
    • Keeping history around for branches and copies
    • Cheap copies

  • Complexities on the client side:
    • Do you have one version on the client, per the working copy?
    • Do you have multiple versions, one per each file?
    • Handling inconsistencies between server version, working copy version and original version.

  • What do you optimize for? Bandwidth? Roundtrips?
    • I know of one SCM product who is lousy optimizer for both

  • Distributed SCM can be handled on top of centralized SCM.
  • It is not hard at all, except for all the details.
  • Don't write your own SCM.
  • Trust matters, and you really don't want to be in the situation where you don't trust your SCM.
  • Remember that SCM is temporal, you can go backward in time, and even sideway, to a branch.
  • There are only three types of operations in SCM:
    • Generate a change set between two paths at two versions
    • Apply a diff to a path, generating a new version
    • Reporting (logs, mostly, and outputting various formats of a changeset)
Overall, it is very simple endeavor. It gets complex when you start talking beyond the wire protocol. As a simple example, how much does it cost you to branch? How much does it cost you to find out if there has been any changes to the working copy?
The other major issue is: How do you ensure that it is reliable?
Now, let me repeat myself, do not write your own source control!