SafeList, SafeDictionary–fast immutable structures
After the issues we run into with immutable structures, I decided that I still wanted to maintain the same guarantees that immutable data structures gave us, but that still maintain the speed of mutable structures.
I managed to do that by making different design choices when building my implementation. While the BCL immutable collections are based on binary trees, and attempt to spare GC pressure in favor of CPU cycles, I find that for a lot of the things that we do, we can spare the memory in favor of actually being faster overall. In particular, this is true since the immutable collections are order of magnitude slower for reads than their mutable counterparts. I could have dealt with the slow writes, somehow. But we do a lot of reads, and that is utterly unacceptable.
Here is how I implemented SafeList:
1: public class SafeList<T> : IEnumerable<T>2: {
3: List<T> _inner = new List<T>();4:
5: public static readonly SafeList<T> Empty = new SafeList<T>();6:
7: private SafeList()8: {
9:
10: }
11:
12: public SafeList<T> AddRange(IEnumerable<T> items)13: {
14: var inner = new List<T>(_inner);15: inner.AddRange(items);
16: return new SafeList<T>17: {
18: _inner = inner
19: };
20: }
21:
22: public SafeList<T> Add(T item)23: {
24: return new SafeList<T>25: {
26: _inner = new List<T>(_inner) {item}27: };
28: }
29:
30: public int Count31: {
32: get { return _inner.Count; }33: }
34:
35: public T this[int i]36: {
37: get { return _inner[i]; }38: }
39:
40:
41: public SafeList<T> RemoveAll(Func<T, bool> filter)42: {
43: return new SafeList<T>44: {
45: _inner = new List<T>(_inner.Where(x => filter(x) == false))46: };
47: }
48:
49: public T Find(Predicate<T> predicate)50: {
51: return _inner.Find(predicate);52: }
53:
54: public SafeList<T> RemoveAllAndGetDiscards(Predicate<T> filter, out List<T> discards)55: {
56: var list = new List<T>();57:
58: discards = list;
59:
60: return new SafeList<T>61: {
62: _inner = new List<T>(_inner.Where(x =>63: {
64: if (filter(x) == false)65: {
66: list.Add(x);
67: return false;68: }
69: return true;70: }))
71: };
72:
73: }
74: }
The implementation for SafeDictionary is pretty similar as well. Note that I have dedicated & optimized methods for the type of things that I need in my code.
But the key part here is that I gain the safety by always creating another copy of the underlying implementation. We are never actually mutating anything.
Sure, this results in us having to deal with a lot more allocations, but we get the same speed for reads as you do for the standard mutable collections. And more importantly, we are still faster for writes.
Comments
But now a simple 'Add' executes in linear time so I can't see how this can be faster than a mutable implementation.
Rafal, Add happens a lot less often than reads. And it can't be faster than the mutable impl. We are trying to get something that is both safe, immutable and fast.
Yeah, I was referring to your last sentence, that the immutable implementation is faster for writes. So probably this means 'faster than BCL collections', not 'faster than mutable collections'.
Let's say it's fast for a special use case: * very few writes * very few different versions of the list
Hi,
I am unsure if Add() is really safe in a multithreaded environment:
_inner = new List<T>(_inner) {item}
isn't this only a shorter writing for;
_inner = new List<T>(_inner); _inner.Add(item);
..if yes, then there exists a time, were _inner is visible to another thread while not fully constructed.
Fabian, No, it goes to a temp variable first, so it is never exposed to the outside world before it happens.
@FabianWetzel. Es it is indeed so. So the SafeList is not immutable if you can change it's state and keep the instance reference. And it is not thread safe, if the Add or any other changing method don't return a new collection, because it can be changed inbetween. If the Add/Remove calls are so infrequent and thread safety so negligible, you can use the fast mutable collections. Otherwise I see no other possibility than to make a compromise - whether don't bother about thread safety or about mutability. One possibility I see to use Interlocked.CompareExchange, what makes the implementation even slower than ImmutableXYZ
Andy, Huh? Add returns a new list, and doesn't modify the existing instance.
I might be inclined to declare _inner as IReadOnlyList<T>, to make the intent (this state doesn't mutate) absolutely clear, except it doesn't expose the Find method.
It still could be valuable to refactor that field into a readonly field though. A helping check from the compiler to go "yes, this really never changes".
Having looked at the implementation of Find, I think I'd be comfortable re-implementing this to get that IReadOnlyList:
<pre> public T Find(Predicate<T> match) { if (match == null) { throw new ArgumentNullException("match"); } for (int i = 0; i < this._inner.Count; i++) { if (match(_inner[i])) { return _inner[i]; } } return default(T); } </pre>Damn, that < ate my code block!
@Fabian
is effectively:Source: C# Language Specification - 7.5.10.3 Collection Initializers
In this scenario, it's safe, because
is a member of the which is being composed and has not yet returned from the method. No other threads could possibly be able to access the instance yet.Would you be "leaking" an empty list every time you set _inner explicitly after creating new SafeList?
List<T> _inner = new List<T>();
return new SafeList<T> { _inner = inner // Overwriting empty list, so this is "leaking" };
That seems like unnecessary GC pressure.
Just 2 minor improvements: do not auto-initialize the _inner like @Harry said, and provide an addiitonal private ctor to pass an existing list instead of re-setting later, and check the AddRange parameter for null/empty param. Since the code using it will be generic and may receive empty lists, that's a minor optimization that doesn't cost a thing and may provide some gains here and there.
Gist here: https://gist.github.com/anonymous/edeca36cfc2a353c7c08
Since List<T> is using an array internally, you should initialize the new instances with precalculated Capacity, otherwise for certain number of items you will end up copying the data around twice.
http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/Collections/Generic/List@cs/1305376/List@cs
@Oren: one more thing, you know what bothers me? That each and every change will generate a new instance. What if, in a certain amount of time, there are a lot of writes and a few reads (like in a bulk insert scenario) or if if reads a lot but not "near" (from a time-related point of view) the writes? An idea, probably silly: why not use the impl of the IEnumerable.GetEnumerator() to store-and-return an enumerator for a new list only if the data is changed since the last time? This way if nobody make a read, no new list would be created. Does this make any sense?
Hi Oren,
I'm probably way off here, but have you looked at how F# deals with immutable lists? Do they just use the same .Net Immutable Collections under the covers or are the classes in Microsoft.FSharp.Collections their own hand rolled implementations? You may be able to re-use these from your C# code.
I think it is a bit confusing that RemoveAllAndGetDiscards returns the discards and has the list with items removed as out parameter - and is named discards.
@kopvleeuwen??? "if (filter(x) == false)"
Well, yes, the double-negative makes it end up in the wrong list. Please run the following test if you do not believe me: https://gist.github.com/koen-lee/8098277
Comment preview