Trivial Lru Cache impl
It has been a while since I actually posted some code here, and I thought that this implementation was quite nice, in that it is simple & works for what it needs to do.
1: public class LruCache<TKey, TValue>
2: {
3: private readonly int _capacity;
4: private readonly Stopwatch _stopwatch = Stopwatch.StartNew();
5:
6: private class Node
7: {
8: public TValue Value;
9: public volatile Reference<long> Ticks;
10: }
11:
12: private readonly ConcurrentDictionary<TKey, Node> _nodes = new ConcurrentDictionary<TKey, Node>();
13:
14: public LruCache(int capacity)
15: {
16: Debug.Assert(capacity > 10);
17: _capacity = capacity;
18: }
19:
20: public void Set(TKey key, TValue value)
21: {
22: var node = new Node
23: {
24: Value = value,
25: Ticks = new Reference<long> { Value = _stopwatch.ElapsedTicks }
26: };
27:
28: _nodes.AddOrUpdate(key, node, (_, __) => node);
29: if (_nodes.Count > _capacity)
30: {
31: foreach (var source in _nodes.OrderBy(x => x.Value.Ticks).Take(_nodes.Count / 10))
32: {
33: Node _;
34: _nodes.TryRemove(source.Key, out _);
35: }
36: }
37: }
38:
39: public bool TryGet(TKey key, out TValue value)
40: {
41: Node node;
42: if (_nodes.TryGetValue(key, out node))
43: {
44: node.Ticks = new Reference<long> {Value = _stopwatch.ElapsedTicks};
45: value = node.Value;
46: return true;
47: }
48: value = default(TValue);
49: return false;
50: }
51: }
Comments
_nodes.AddOrUpdate(key, node, (_, __) => node);
What's wrong with_nodes[key] = node
?.Take(_nodes.Count / 10)
I think that should be.Take(_nodes.Count - _capacity)
Personally I would have made Reference immutable or use Interlocked to update the long.
I don't think long running Stopwatch are a good practice. You might get into issues related to processors clock speeds.
@cao: how would that be an issue in this case, where its value is only used relatively to the others? (genuinely interested)
@Patrick,
I think it's cause when you hit capacity, you do not want to iterate the cache every time after that. Reduce by 10% makes sense here.
@Igor Kalders,
The problem with Stopwatch is that it's tickcount processor core dependent. If you read the same stopwatch on different cores, you get different results.
var first = stopwatch.TickCount; SwitchCore(); Assert(stopwatch.TickCount > first);
could failI don't know if the difference will continue to build up though. So that eventually two cores get so much out of sync that this cache will start to get it wrong noticeably.
Probably the best thing is to replace
readonly Stopwatch _stopwatch = Stopwatch.StartNew();
withint _version;
and replace each_stopwatch.ElapsedTicks
withInterlocked.Increment(ref _version)
. That way you can also make Node.Ticks a volatile int.typo, it should've been
Assert(stopwatch.TickCount >= first);
could fail (missed the =)links about the caveats of Stopwatch:
http://kristofverbiest.blogspot.co.uk/2008/10/beware-of-stopwatch.html http://www.virtualdub.org/blog/pivot/entry.php?id=106
What is Reference<T>? I haven't seen it before...
Edit: What is Reference<T>?
As Cao said, I don't like the long running stopwatch and don't think that would be good practice, as Patrick Huizinga exposed.
@Patrick Huizinga The problem with simply incrementing a count is that results depends on real life data in worst case scenario. If you have items used in bursts, they will stick a long time in the cache but will not be access for a long time (think an heavy, once a day scheduled job).
Usually for a LRU-like cache, the best performance I get on average from best/worst case scenarios (for a simple implementation) is to pair a timestamp with a count. The timestamp is immutable and the count is volatile and incremented on access with Interlocked. Then I order by (Count * weightBasedOnElapsedTimeSinceTimeStamp). This allows for long cached items to be released and recreated eventually after a couple of minutes/hours/days/weeks if they are used in burst. Works great when you have scheduled tasks that uses a cached resource extensively once a day, preventing it from hogging the cache all the time.
Sometimes I also add a DelayRecycle/DoNotRecycle flag if the recreation is very costly, but often recreating a costly resource once a day is not that bad.
I believe that would be similar to what Ayende did for administrator cache in Raven, but my implement is usually more akin to the above post.
@Doug Probably because volatile long is not allowed, but volatile on a reference type is, hence the Reference<T>.
Recently I saw an implementation that used a linked list to store when an item was accessed. On access the item was moved to the head. On writing the N items from the tail were removed. I liked that there was no count variable or sorting of all items needed. Ofcource at the expense of the linked list overhead.
// Ryan
Patrick, I am never certain what the indexer setter will do, and this is more obvious And the reason we do node.Count / 10 is to remove the bottom 10% so we don't constantly add & remove stuff from the cache.
cao, We don't actually care what the values are, we don't even care if they are accurate, sort-of should be good enough.
Patrick, The problem with your approach is that Interlocked require all processors to sync their memory. In contrast, we explicitly do not care for that here.
Doug, A class that encapsulate a value. public class Reference { public T Value; }
It allows to either see the previous or current value, without fear of corrupted writes.
there's always a need for another Lru cache implementation, but this one has an unpleasant habit of sorting the whole collection on each insert if capacity limit is reached.
... and an atomic GetOrAdd operation would be very handy too
Rafal, The default size I have for this is 2048, so that isn't really too bad. But note that it is NOT on each insert, it drops the last 10% on capacity breech. with 2048, it would do this sort once every 200 ops or so.
Right, i went to conclusions too fast.
Comment preview