Thoughts about building your own source control

time to read 3 min | 574 words

Let me start by stating that you really don't want to do that. This is not something that you want to do, period.

Now that we are over that, I had the chance lately to go fairly deeply into SCM and how they are implemented from two fairly different perspectives. This is a randomly collected set of observations about SCM systems. As usual, the order is arbitrary, and no attempt was made to make any coherent idea out of this.

It is all about the client. The client in an SCM system has significant responsibilities. It is in charge of reporting the client state, managing all the errors hat the user can cause, and shoulder a lot of the burden.
It is all about the protocol. Anyone who designs a SCM system should be given a lousy DSL line with disconnects every 15 minutes. Oh, and they should also have to work on a plane a lot.
On the wire, it is all so simple. It is really surprising to see how the SCM complexity is really just a lot of tiny, easy to handle, details.
The devil is in the details, though.
Complexities on the server side:
- Space management - do you save the diff or the whole file?
- If just the diff, how do you construct an arbitrary version
- Keeping history around for branches and copies
- Cheap copies
Complexities on the client side:
- Do you have one version on the client, per the working copy?
- Do you have multiple versions, one per each file?
- Handling inconsistencies between server version, working copy version and original version.
What do you optimize for? Bandwidth? Roundtrips?
- I know of one SCM product who is lousy optimizer for both
Distributed SCM can be handled on top of centralized SCM.
It is not hard at all, except for all the details.
Don't write your own SCM.
Trust matters, and you really don't want to be in the situation where you don't trust your SCM.
Remember that SCM is temporal, you can go backward in time, and even sideway, to a branch.
There are only three types of operations in SCM:
- Generate a change set between two paths at two versions
- Apply a diff to a path, generating a new version
- Reporting (logs, mostly, and outputting various formats of a changeset)

Overall, it is very simple endeavor. It gets complex when you start talking beyond the wire protocol. As a simple example, how much does it cost you to branch? How much does it cost you to find out if there has been any changes to the working copy?
The other major issue is: How do you ensure that it is reliable?
Now, let me repeat myself, do not write your own source control!

Tweet Share Share 11 comments

Tags:

Subversion

Comments

30 Apr 2008
00:39 AM

Marcus Wyatt aka. Maruis Marais

Currently, my preferred SCM is Git. I've used the following SCM's:

CVS - Just horrible

SVN - Still use it day by day because I have to. It is still ok.

Team System - To heavy and quite brittle.

Source Safe - No thanks....

In short Git is a distributed SCM that makes tasks like branch & merging extremely easy. You also have features like stash and what is nice, is that you can run Git locally using an remote SVN repo, while the rest of the team is totally oblivious about this fact.

When you first look at Git, the whole distributed without a single main repo (ala svn style), just sounds weird. But once you start using Git, you very quickly realize what an awesome SCM it is. When I can share my branch of the code with your git repo while the remote (let's call it main repo) has the official branch. You can then merge my branch into your branch easily and then rebase your branch with the official main branch. Or you can fork the main branch and take it in a completely separate direction. Anyways, as you can see there is so much you can do. And Git makes tasks you would normally not attempt likely, as easy as saying cheese... (if you know git of course)

There is multiple good sources of information on the web about Git. (Google Video of Linus, PeepCode, etc.)

30 Apr 2008
00:51 AM

Ken Sykora

For some reason, I feel very strongly that I should not write my own source control.

30 Apr 2008
00:54 AM

cristian

Yep, Git is good but their support for Windows Plataforms still sucky, I prefer myself Mercurial.

30 Apr 2008
01:07 AM

Chad Myers

Sweet! Ayende announces Rhino.SCM! I know I'm not alone when I say that I look eagerly to your estimated end-of-May beta release date.

30 Apr 2008
01:37 AM

shanebush

"Distributed SCM can be handled on top of centralized SCM."

First thing I thought of: Git. First comment on blog post: about Git.

As much as I use and like Subversion, I do believe that if the client tools were there for Git like they are for svn, Git would soon surpass it.

Since you have so much experience now with SVN now, why not go all out and help out Toravalds with a Git implementation that rivals TortoiseSVN.

RhinoGit... got a good ring to it! You could also holler "Rhino! Git!" while making commits in Rhino Tools.

Shane

30 Apr 2008
09:12 AM

axl

Hmm, crap. :) I have just begun writing my own after giving up on finding one that does things the way I want it to.

As christian says, Git lacks decent Windows support and requires manual db management.

Mercurial and Bazaar each have at least one feature the other doesn't and I want both. Trying to build either one from source to extend them failed miserably after I spent two days trying to build Python from source so it would accept extensions compiled in VS 2008. That's open source dependency hell for you.

Perforce has served me well for almost ten years, but its lack of good move/rename functionality as well as being cumbersome when it comes to branching is beginning to get on my nerves. It's got to go.

Subversion is not an option, its only upside is that it's free, everything else annoys me.

ClearCase is too expensive, too slow, too big and has too much legacy to be a real option.

BitKeeper looks nice, but the licensing model and license agreement, as well as the available documentation and the general attitude of the company puts me off.

AccuRev seem to have gotten the back-end right, but their GUI sucks eggs when you actually try to work in it.

And there's a bunch of other commercial alternatives that all lack in features or force me to work in ways I don't like.

So, even if it takes forever, I'm going to write my own.

And I'll use Rhino.Mocks with xUnit.Net to test it. :)

30 Apr 2008
10:59 AM

Neil Mosafi

Hmm... I wonder if anyone ever thought of building a SCM on top of Microsoft's new FeedSync protocol?

01 May 2008
08:51 AM

Dan

..or on top of MS Mesh, and itegrate it into sharepoint and then use WCF. Oh and be sure to use SSIS at some point , just to top it all off.

94640d74-ea07-42d2-9c8c-def42488b8e3

01 May 2008
08:58 AM

Ayende Rahien

Dan,

I may never recover from this suggestion

01 May 2008
20:55 PM

jdn

Don't forget to export to Excel and run it through a web service as part of the check-in process.

02 May 2008
08:34 AM

Neil Mosafi

I'll just get me coat then...

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB