More on collection performance, immutability and designing for a specific scenario
Note: I started writing a few minutes after starting the performance test, it took that long to run.
As you are probably already aware, we used the Immutable Collections package in Voron. That turned out to be a cause for pretty major performance issues, so I started investigating other approaches for dealing with the problem of both immutability and high performing code.
Before I go on, I would like to explain what we are using the immutable collections for. Basically, we have a page translation table, which tells us which pages belong where. This table changes on every write transaction.
However, locking isn’t a good option for us, because once started, the transaction view of the world can never be changed. That means that while we are modifying it in a write transaction, the read transaction should not ever see the modifications. This is exactly the case where immutable collections make sense. Read only views aren’t going to work, since once the write tx modify the page table, that change is going to be visible to read transactions, which isn’t what we want.
Our test scenario is that after each write tx commit, we will update the page table with all the new pages locations. That means that we wrote the following test case:
1: var tmp = new Dictionary<long, object>();2:
3: var sp = Stopwatch.StartNew();
4: var rnd = new Random(32);5: for (int i = 0; i < iterations; i++)6: {
7: foreach (var item in Enumerable.Range(rnd.Next(0, i), Math.Max(i * 2, 16)))8: {
9: tmp[item] = null;10: }
11: }
12:
13: Console.WriteLine(sp.Elapsed + " Adding items, mutable dictionary");
Every iteration is a “transaction”, and the work inside the for loop simulate us updating the page locations. Note, however, that the code above uses a mutable dictionary, and the changes are neither thread safe, nor are we isolating other readers from seeing our changes.
We can do this using immutable dictionary using the following code:
1: var dic = ImmutableDictionary<long, object>.Empty;2:
3: var sp = Stopwatch.StartNew();
4:
5: var rnd = new Random(32);6: for (int i = 0; i < iterations; i++)7: {
8: var tmp = new Dictionary<long, object>();9: foreach (var item in Enumerable.Range(rnd.Next(0, i), Math.Max(i * 2, 16)))10: {
11: tmp[item] = null;12: }
13:
14: dic = dic.SetItems(tmp);
15: }
16:
17: Console.WriteLine(sp.Elapsed + " Adding items, immutable dictionary");
Note that we are using the optimal “SetItems” method, so we only modify the immutable dictionary once per transaction. That means that we create roughly 1 instance per transaction.
I then decided to see what would be the cost of running this with 50,000 transactions. Note that this means that in the end we have 149,697 items in the dictionary.
- Cost for running this in mutable dictionary: 1 minute and 12 seconds.
- Cost for running this in immutable dictionary: About 30 minutes.
Cost for reading 10,000 items from the dictionary:
- 0.0002 seconds for mutable dictionary
- 0.0061 seconds for immutable dictionary
Remember, the reason we go to this place is because we actually got performance issues specifically because of the SetItems calls. But note that we also have a performance problem for reads as well.
Initially, I tried to do the simple thing of just creating a new instance for every transaction. That worked quite nicely to remove the read costs, but it had a non trivial cost for adds, quite aside from the GC cost. The test is:
1: var dic = new Dictionary<long, object>();2:
3: var sp = Stopwatch.StartNew();
4: var rnd = new Random(32);5: for (int i = 0; i < iterations; i++)6: {
7: var tmp = new Dictionary<long, object>();8: foreach (var item in Enumerable.Range(rnd.Next(0, i), Math.Max(i * 2, 16)))9: {
10: tmp[item] = null;11: }
12: dic = new Dictionary<long, object>(dic);13: foreach (var o in tmp)14: {
15: dic[o.Key] = o.Value;
16: }
17: }
18:
19: Console.WriteLine(sp.Elapsed + " Adding items, safe dictionary");
Cost for adding using new instance every time:
- 7 minutes and 37 seconds.
So right off the bat, we are over 4 times faster than immutable dictionary. But this is still expensive. But at least is has the same read speed as a standard dictionary, because it is.
In order to deal with that, I created something that I call LinkedDictionary. A linked dictionary is based on the following idea: Since dictionaries can only contain a single value per item, it is perfectly safe to layer them on one another. This means that we can just have a linked list of dictionaries, and when we want to read from it, we just need to walk through the list in reverse addition order. That means that it is pretty much perfect for append only linked list model.
The downside of that is that we now have to traverse the linked list in order to find our value. The good thing about this is that we are able to limit that, by introducing a merge step along the way. The merge step would merge all the pending parts into a single dictionary. Currently we are doing a merge every 32 writes or so. This strikes a nice balance between read and write speed.
The end result is that for 50,000 iterations, we have the following values:
Mutable Dictionary | Immutable Dictionary | Cloned Dictionary | Linked Dictionary | |
50,000 adds |
1 minutes and 12 seconds |
29 minutes and 28 seconds |
7 minutes and 37 seconds |
4 minutes and 37 seconds |
10,000 reads |
0.0002 seconds |
0.0064 |
0.0002 seconds |
0.0032 |
In other words, we are able to get write speeds that are just about 4 times the standard mutable behavior, and we are able to get read speeds that are half the immutable dictionary model.
For reference, the entire code is in the following gist.
Comments
Did you try the multi-value dictionary idea that came up in the comments? I think it's pretty clever.
@Oren, in which sequence you did the performance measurings? I mean, first all the writes, then all the reads, or did you intermingled them, running them in parallel (more like a real world usage)?
I'm asking because a difference like this in the access pattern may change the results from good to bad or viceversa. What do you think?
But why do you need a linked list of dictionaries? Isn't it sufficient to have two dictionaries - the original version + the delta for current transaction ('delta' storing only modified entries)?
... and also entries that have already been accessed in current tran. a 'read committed' isolation level.
Hi! I just realized, this problem would be a perfect fit for the STM library that I wrote to you about ( for other readers: https://github.com/jbakic/Shielded - please forgive me for self-promotion ;) ). I'll try to reproduce this test and see how the ShieldedDict compares to the standard Dictionary. I've so far only compared it to the ConcurrentDictionary, not to the standard one. I presume writing will be slower than in your linked dictionary solution, since the ShieldedDict is about 4-5 times slower than the ConcurrentDictionary, and he is probably slower than the standard one.
But, I think the ShieldedDict might score ok on reading - it's just a wrapper around a ConcurrentDictionary. It does an additional version check, and may travel down the version list to find the one for you, but this is key-oriented - if no concurrent transaction commits into the exact same key, then the reading does not have to travel down the version list, the first one will be ok.
Why not:
var dic = new Dictionary<long, object>(); var sp = Stopwatch.StartNew(); var rnd = new Random(32);
for (int i = 0; i < 5000; i++) { var tmp = new Dictionary<long, object>(dic);
} Console.WriteLine(sp.Elapsed + " Adding items, safe dictionary");
I'm not sure if I'm missing something, but as long as you only have one writer, why not just maintain an internal mutable dictionary and copy to a publicly accessible instance on every completed write tx? It has the read characteristic of a mutable dictionary, because it is, and writes seem to be slightly faster than the linked dictionary.
Tobi, Yes, I did, note that this post was written long before those comments.
Njy, First writes, then reads. I don't think it matters, this is testing the ideal condition.
Rafal, What happens on the next transaction?
David, I already tried that, but the cost of copying it on every tx is too big.
" Currently we are doing a merge every 32 writes or so. This strikes a nice balance between read and write speed."
How does this impact transactional ACID guarantees?
Chris, It doesn't.
"Rafal, What happens on the next transaction?"
Well, you have to merge your changes to some 'consensus' dictionary - ok, I see the point, you'll have to modify it and so probably use locking and deal with all versioning conflicts etc. But how do you avoid these problems by using immutable dictionaries? You will have multiple versions of the page table, possibly with some of them having conflicting modifications. All will be immutable, independent dictionaries, so how would you decide which one is the current state of affairs in the db? And how does transaction commit look like - don't you have to merge the changes into some common page table?
@Oren: you said "... got performance issues specifically because of the SETITEMS calls..." and "... but it had a non trivial cost for ADDS..." .
Now, because of this, I was thinking about a snapshot-like behaviour, where at every change you simply reset a bool flag that keeps track of the fact that one or more changes happened. Then, if and only if someone want to do a read, then a snapshot would be created and stored, for everybody wanting to read the data. As soon as another write will change the state, the flag would be reset and the last snapshot deleted. In this way you would only have just one snapshot lying around, and that would be created only if and when a read is needed.
This way the setitems and similar methods wouldn't do basically anything (set a bool flag, set to null the snapshot) and only the first read after a change would require the creation of a readonly version of the items list.
Rafal, The current page table at the time the transaction start is the valid one. There is one write transaction at any given point in time. When it is over, it set the current page table.
njy, Imagine that you had concurrent transactions running, one doing the writes and one doing the reads.
Oren, yeah I thought about that, and it can be handled in 2 ways: you either (1) use a lock while you write (do not like this, meh :-) or (2) have 1 additional snapshot to use while the writer writes. That would take the amount of list copies to 2, that is always less than creating a new one for each change. I have to admit I'm not that experienced in handling these kind of scenarios, but it seems a reasonable approach, instead of creating a new list for each and every write/change.
Does it make sense?
Check out my fork at https://gist.github.com/StephenCleary/7932242
I found that using a sorted dictionary (with builder) was faster than a straight-up immutable dictionary. It won't/can't be as fast as your custom datatype, but it does perform more decently.
Also, if your updates follow a certain pattern (i.e., semi-clustered), then you may find an alternative type such as a sorted ImmutableList to be even faster.
Ok, I forgot you're still allowing only one write transaction at a time. So in this case you should be fine with double buffering approach - two dictionaries, one constant and the other one modified by current transaction, and replacing the original one on commit. I still don't see why you need a chain of immutable dictionaries.
Rafal: because there is no way to know in advance how long a reader transaction will still need to reference the previous transaction. Simple double-buffering means you could overwrite a dictionary while a number of read txns are still using it.
Thanks, Howard. Makes sense.
Njy, You assume tx == write tx, but in our case, we have one write tx and multiple read tx. And we want to avoid slowing read tx down.
If I understood correctly, when you say "slowing down the read tx" you are referring to the fact that when a reader needs to read, if a change happened it needs to wait for the snapshot to be created. That is true, but doing so would at least requires less copies of the list to be created, so less pressure on the memory, GC and friends. It should be measured probably, becuase if the amount of instances created are way less and the reads (but only the "first read after a change", not every read) get slowed down by only a little bit, that may be a good compromise.
Just my two cents.
are you familiar with persistent data structures?
I did the test using the Shielded library, you can see the code here: https://github.com/jbakic/Shielded/blob/master/ConsoleTests/SequentialTests.cs
The results are not very good. Reading is excellent, just 2,7x slower than the mutable Dictionary<>, but writing was 100x slower at first. With some improvements in the ShieldedDict I got it down to 70x.
I suppose the conclusion woud be that it's an overkill to use an STM, given that you have a very specific concurrency profile, with just one writer and readers that don't require transactions. In any case, applying your test was very useful, revealed some inefficiencies in the lib, and I'll definitely look into further improvements.
Why don't you initialize tmp with dic in the first place, so you do not have to copy dic + tmp into a new dictionary after?
Cao, It end up being pretty much the same work.
Comment preview