RavenFS & Rsync

time to read 4 min | 775 words

One of the things that I considered when starting to design RavenFS is “Do we really need this? Can’t we just use rsync?”

There were several reasons why I decided to go ahead with RavenFS. The first was that rsync isn’t really used on the Windows platform, for various reasons that I won’t get into.

The second is that rsync is meant to be a general purpose file synchronization tool. RavenFS is meant to be more than that.

More specifically, RavenFS is aimed specifically at distribution and synchronization of large files (tens of MB are considered small, hundreds of MB are common and multiple GBs are frequently used). It turns out that very large files mean a whole different kettle of fish. I am going to reference the thesis project of Andrew Tridgell quite frequently for the rest of this post (and you can find the link at the bottom of this post). Andrew is the author of rsync, so he probably knows a thing or two about synchronization problems.

In particular, he had thought about and discarded my approach for synchronizing files in his thesis:

This algorithm is very simple and meets some of the aims of our remote update algorithm, but it is useless in practice
The problem with it is that A can only find matches that are on block boundaries. If the file on A is the same as B except that one byte has been inserted at the start of the file then no block matches will be found and the algorithm will transfer the whole file.

This is absolutely correct. My approach, by design, suffer from the “inserting a single byte at beginning of file”. Why do I take such a stupid approach?

Put simply, because anything else is too expensive. The rsync approach is to compute a hash at byte boundaries, it gets away with that with having multiple hash functions, one of which is very cheap to compute (with higher number of collisions possible) and another that is more expensive to compute but has far lower probability of collisions.

Now, go up and look at the description of RavenFS. It is meant for very large files. Are you really going to perform an operation on every single byte on the file when the file is 2 GB in size?

This is where a very interesting property of file systems come into place. Let us assume that you have the following file, shown as a sequence of bytes:

[1,2,3,4]

And let us say that we want to add the byte 0 at the beginning of the file. For now, we will assume buffer size of 1, we will have to issue the following commands to the file system to do so:

  • Write 0 to position 1
  • Write 1 to position 2
  • Write 2 to position 3
  • Write 3 to position 4
  • Write 4 to position 5 <—increase the file size

In other words, inserting a value (vs modifying a value) is an O( File.Length – File.Position). In other words, the closer to the beginning of the file you are, the more expensive it is to insert a new value.

Ponder that while considering the cost of doing something like that on files that are big. Let us assume a more reasonable file size of 4,096 (the .NET default), and we realize that inserting a value into the beginning of a 500 MB file would require 128 thousand file system operations.

In practice, this isn’t a real issue because most of the time, when we are talking about such large files, we are talking about either files that are using fixed sized records ( no inserting whatsoever ) or appending data to the end of the file. That is mostly because there really isn’t any other choice for them, in order to preserve reasonable performance.

RavenFS is meant to take advantage on this aspect to make the “useless in practice” option a viable option.

On a separate issue, this quote made me laugh:

The lack of such a routine was quite a surprise and prompted me to look into the whole issue of parallel sorting, totally ignoring the fact that I only wanted a parallel sorting routine in order to solve a problem that didn’t really require sorting at all.

~ Andrew Tridgell http://samba.org/~tridge/phd_thesis.pdf