More thoughts about compression & storage

Nov 13 2013

More thoughts about compression & storage

time to read 3 min | 513 words

One of the things that I liked in LevelDB is that it pretty much had compression built in from day one. The entire format is built to save space just about wherever it can. From using 7 bit numbers to prefix compression to making compression core part of the on disk data structure. Considering the cost of I/O, it is usually worth trading CPU time for I/O costs. LMDB, in contrast, went quite the opposite way. It actively discourage compression, since you have no support in the format. But it is more than that, LMDB is meant to be used by directly getting a memory pointer, and it actively encourage you to do such things as have a pointer reference from one record to another record.

The idea is that you can just persist your in memory state and require zero difference between the on disk structure and the in memory structure. This allows you to take full advantage of the memory mapped nature of the database.

Voron, however, doesn’t really do that. And the idea of compressing the data can significantly cut the amount of I/O we use, and can allow us to pack much more data into the same space. Now, in a few places, Howard Chu (creator of LMDB) points out that you don’t need any support from LMDB to add compression, and points to this as proof. I don’t find this satisfactory. I think that this puts a lot of effort on the application side, and require a lot of special hacks.

Now, what do we talk about when we talk about compression. We can, of course, individually compress each record. But that just means that we remove redundancies in that single value. In practice, it is almost always better to do compression on multiple values, because then you can get much better compression ratios.

For reference, I have taken the opposite view a while ago, see here: http://ayende.com/blog/162466/some-thoughts-about-compression-storage

But let us say that we compress the whole page? Taking a sample page from one of our files, I can see that compressing that (using DeflateStream) is taking essentially zero time, and show that a 4KB set of sample data got turned into about 800 bytes.

That means a lot, because it means that I can shove more data into a single page. And if I can do that, it means that I can also (drastically) reduce the height of the tree, and since that is one of the major factors in I/O costs in B-Tree based systems, that might actually have a lot more performance implications that it would seem.

It is also a lot harder. For example, you would have a very hard time telling ahead of time if an addition / deletion is going to be within the page boundary, or if it is going to require to require a split. For now, it is a really nice idea, with some interesting perf implications for later on, but I don’t think we’ll explore this very closely in the near future.

Tweet Share Share 7 comments

Tags:

raven

Comments

13 Nov 2013
11:37 AM

Rafal

But is such compression any better than a compressed filesystem?

13 Nov 2013
11:48 AM

Ayende Rahien

Rafal, The problem with compressed FS is that you need to ensure that this works with things like fsync and write through. But yes, that is probably something that we'll end up doing.

13 Nov 2013
12:29 PM

Khalid Abuhakmeh

sooo, if compression is on by default with Voron does that just become a standard feature of RavenDB?

13 Nov 2013
12:46 PM

Ayende Rahien

Khalid, At the moment, we aren't doing this at all.

13 Nov 2013
14:46 PM

tobi

Look at how SQL Server implements PAGE compression. The way they do it allows for incremental efficient updates at the cost of less compression ratio. They also do not have to decompress the entire page to read one row. I think you'll see your perf totally tank if single-row access requires decompressing an entire page.

13 Nov 2013
14:52 PM

Ayende Rahien

Tobi, Any reference for this?

13 Nov 2013
15:23 PM

tobi

There are internals series on that that I have read. Don't have them handy. You'll find them going through a few results of https://www.google.com/webhp?complete=1&hl=en#complete=1&hl=en&q=sql+server+page+compression+internals

Basically, they store the compression dictionary with the page header. Most compressors build the dict on the fly and never store it. SQL Server does so that it can do incremental compression.

If your decompression algorithm can decompress 100MB/s then you can decompress 25k pages per second. I don't think that's acceptable for mostly random single-row accesses. Without compression I'd expect 1-2 orders of magnitude more throughput.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB