The building blocks of a database: Transactional & persistent hash table

time to read 2 min | 373 words

I am working on managed storage solution for RavenDB in an off again on again fashion for a while now. The main problem Is that doing something like this is not particularly hard, but it is complex. You either have to go with a transaction log or an append only model.

There are more than enough material on the matter, so I won’t touch that. The problem is that building that is going to take time, and probably a lot of it I decided that it is better off to have something than nothing, and scaled back the requirements.

The storage has to have:

ACID
Multi threaded
Fully managed
Support both memory and files
Fast
Easy to work with

The last one is important. Go and take a look at the codebase of any of the available databases. They can be… pretty scary.

But something has to be give, so I decided that to make things easier, I am not going to implement indexing on the file system. Instead, I’ll store the data on the disk, and keep the actual index completely in memory. There is an overhead of roughly 16 bytes per key plus the key itself, let us round it to 64 bytes per held per key. Holding 10 million keys would cost ~600 MB. That sounds like a lot, because it is. But it actually not too bad. It isn’t that much memory for modern hardware. And assuming that our documents are 5 KB in size, we are talking about 50 GB for the database size, anyway.

Just to be clear, the actual data is going to be on the disk, it is the index that we keep in memory. And once we have that decision, the rest sort of follow on its own.

It is intentionally low level interface, and mostly it gives you is a Add/Read/Remove interface, but it gives you multiple tables, the ability to do key and range scans and full ACID compliance (including crash recovery).

And the fun part, it does so in 400 lines!

http://github.com/ayende/Degenerated.Storage

Tweet Share Share 15 comments

Tags:

Databases

Comments

29 Oct 2010
11:35 AM

Peter Morris

I remember years ago reading how Interbase does versioned transactions so that people can read existing data with repeatable read whilst someone else is still in a transaction that has updated the data.

Is that the approach you meant when you said "an append only model"?

29 Oct 2010
12:08 PM

Patrick Huizinga

@Peter Morris

I don't know anything about Interbase, and just a little bit of CouchDB, but this is how I understand CouchDB works.

CouchDB uses immutable trees for their tables. When you modify the table, a new tree is created (creating only one new leaf and only the branches from the leaf up to the root). This way you can/could ask CouchDb for the current root and then at your own pace browse through that snapshot, while the rest of the world races by, creating updates by creating new trees.

29 Oct 2010
12:17 PM

Patrick Huizinga

cont. (heh heh..)

CouchDB uses an append-only file to store your changes. At the end of the file, your data, the new leaf and all the new branches will be written.

'Snapshotting' doesn't require an immutable structure and an immutable structure doesn't require append-only, but with my feeble mind I wouldn't want to implement them any other way.

29 Oct 2010
20:07 PM

Torvin

looks cool!

i have a question though

public Stream CreateTemporaryStream()

{

  string tempFile = Path.Combine(basePath, Path.GetFileName(Path.GetTempFileName()));

  return File.Open(tempFile, FileMode.Create, FileAccess.ReadWrite);

}

who gonna delete that file?

29 Oct 2010
22:55 PM

Rickrock

Dear Ayende,

what exactly makes you think that you as a (unquestionable exceptionally talented) developer and architect will not into the same troubles that all those huge clumsy databases ran after a while? There is a big codebase not only because all other coders suck.

Don't get me wrong: Sometimes small projects change the whole way we see things (Ruby is the most prominent example).

How about though starting with a more "humble" idea like: "Let's skip some things a see how far we get with it and whether it is usable for some scenarios."

Simplifying DBs is good, I support that. Conveying or implying the message that they are complicated without any reason is misleading.

Or is a bit of "Not Invented Here" again?

30 Oct 2010
11:10 AM

Ayende Rahien

Peter,

I am not aware of the Interbase model, but an append only model make MVCC trivial to work with.

30 Oct 2010
11:11 AM

Ayende Rahien

Tovin,

That is done in the ReplaceAtomically function

30 Oct 2010
11:12 AM

Ayende Rahien

Rickrock,

Mostly, because I am not trying to target the same feature set.

And by not trying to hit the same feature set, I have drastically simplified my problem set.

30 Oct 2010
11:20 AM

Peter Morris

I don't recall exactly how it works but this link looks as though it explains it quite well. It might give you some additional ideas...

conferences.embarcadero.com/.../32280

31 Oct 2010
09:24 AM

Jesús López

Something sounds wrong in this code:

            WriteCommands(cmds, persistentSource.Log);


            persistentSource.FlushData(); // sync the data to disk before doing anything else

            WriteCommands(cmds, persistentSource.Log);

            persistentSource.FlushLog(); // flush all the index changes to disk

You are writting commands to log twice. When do you write commands to data?

31 Oct 2010
09:26 AM

Ayende Rahien

Jesús,

Please note that the codebase there has underwent a lot of changes, I'll have a lot to say about this shortly, but you can look at the current codebase here:

github.com/ayende/ravendb/tree/master/Raven.Munin/

01 Nov 2010
14:12 PM

moshe

I spent time going back into Ayende's archives starting from the beginning while he was on jail duty and coding. It reminded me a little of a science fiction short story I once read about a child who was a gifted composer whom they kept apart from all other people in order that he should not hear any music composed by others and have his creativity tainted (whether through imitation or by consciously trying to avoid imitation).

I think Ayende is the type who should try and rewrite everything from scratch.

For example even microsoft re-implemented javascript regex for IE9, as mentioned here:

blogs.msdn.com/.../...-in-internet-explorer-9.aspx

03 Nov 2010
07:35 AM

Dmytrii Nagirniak

If the indexes are kept in memory it will take significant amount of time to rebild them every time.

Can you comment on it please?

03 Nov 2010
08:10 AM

Ayende Rahien

Dmyrii,

Not really. We tested it with millions of items, and it just works

17 Nov 2010
12:06 PM

Luc

My first question would be exactly the same as Torvin already wrote: In "Path.Combine(basePath, Path.GetFileName(Path.GetTempFileName()));", who is going to delete that empty file created in the TEMP directory by Path.GetTempFileName()? Despite its name, GetTempFileName() doesn't just generate a name, it creates an actual file you are supposed to delete...

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB