Immutable collections performance, take II
Why is the performance of an immutable list over 16 times slower than a standard list? I took a peek at what it was actually doing, and it made a lot of sense. In order to maintain efficient indexing access, the actual storage of the data in the immutable list is a binary tree. With the key being used as the indexer.
This result is a much higher cost for pretty much everything. Let us look at the following:
1: var listsp = Stopwatch.StartNew();2: var list = new List<int>(Enumerable.Range(0, 10*1000*1000));3:4: for (int i = 0; i < list.Count; i++)5: {6: var _ = list[i];7: }8:9: Console.WriteLine(listsp.Elapsed);10:11: var ilist = ImmutableList<int>.Empty.AddRange(list);12: listsp.Restart();13:14: for (int i = 0; i < ilist.Count; i++)15: {16: var _ = ilist[i];17: }18: Console.WriteLine(listsp.Elapsed);
This List<T> is 0.23 seconds, ImmutableList<T> takes 1.28 seconds. When I use foreach, instead, we get 0.22 seconds vs. 2.29 seconds.
As you can see from the blog post describing them, because immutable collections are mostly implemented as binary trees, I don’t really think that there is a good way to approach this as is. The problem is that the immutable collections actually need to do a lot more than what I need them to do.
Now, it might have been more acceptable to use them if the perf was limited to just writing to them, it might have been acceptable. But as you can see, we have the same problem when we are reading, and that is quite unacceptable.
 

Comments
Structures like Vector (see http://www.navision-blog.de/2012/05/29/porting-clojures-persistent-data-structures-to-net-part-1-of-n-persistentvector/ which comes originally from Clojure) are using Hash array mapped tries (see http://en.wikipedia.org/wiki/Hash_array_mapped_trie) which have much better constant factors that binary or red/black trees.
There are also implementations that use transients to speed up bulk writes (See http://www.navision-blog.de/blog/2012/05/29/porting-clojures-persistent-data-structures-to-net-part-2-of-n-transientvector/).
Maybe you should look in that direction.
Gosh, if only there was a nice fast immutable B+tree you could use instead of a binary tree. Oh wait, there is...
Howard, B+Tree isn't really what we want to do here. To start with, we cannot afford the minimum 4KB cost for every modification. Worse, we don't want O(logN) search perf, I really want to get the O(1) performance that I can get from a hash table.
There is no way to get a theoretical O(1), but Vector does at most 6 hops for all practical problems. So it's quasi O(1).
Steffen, The problem isn't with list/vector. My major problem is with dictionary.
You can implement a map on top of vector.
https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/APersistentMap.java
Comment preview