Trivial Lru Cache impl

time to read 15 min | 2884 words

It has been a while since I actually posted some code here, and I thought that this implementation was quite nice, in that it is simple & works for what it needs to do.

   1: public class LruCache<TKey, TValue>

   2: {

   3:     private readonly int _capacity;

   4:     private readonly Stopwatch _stopwatch = Stopwatch.StartNew();

5:

   6:     private class Node

   7:     {

   8:         public TValue Value;

   9:         public volatile Reference<long> Ticks;

  10:     }

11:

  12:     private readonly ConcurrentDictionary<TKey, Node> _nodes = new ConcurrentDictionary<TKey, Node>();

13:

  14:     public LruCache(int capacity)

  15:     {

  16:         Debug.Assert(capacity > 10);

  17:         _capacity = capacity;

  18:     }

19:

  20:     public void Set(TKey key, TValue value)

  21:     {

  22:         var node = new Node

  23:         {

  24:             Value = value,

  25:             Ticks = new Reference<long> { Value = _stopwatch.ElapsedTicks }

  26:         };

27:

  28:         _nodes.AddOrUpdate(key, node, (_, __) => node);

  29:         if (_nodes.Count > _capacity)

  30:         {

  31:             foreach (var source in _nodes.OrderBy(x => x.Value.Ticks).Take(_nodes.Count / 10))

  32:             {

  33:                 Node _;

  34:                 _nodes.TryRemove(source.Key, out _);

  35:             }

  36:         }

  37:     }

38:

  39:     public bool TryGet(TKey key, out TValue value)

  40:     {

  41:         Node node;

  42:         if (_nodes.TryGetValue(key, out node))

  43:         {

  44:             node.Ticks = new Reference<long> {Value = _stopwatch.ElapsedTicks};

  45:             value = node.Value;

  46:             return true;

  47:         }

  48:         value = default(TValue);

  49:         return false;

  50:     }

  51: }

Tweet Share Share 19 comments

Tags:

programming

Comments

19 Jun 2013
09:53 AM

Patrick Huizinga

_nodes.AddOrUpdate(key, node, (_, __) => node); What's wrong with _nodes[key] = node ?

.Take(_nodes.Count / 10) I think that should be .Take(_nodes.Count - _capacity)

Personally I would have made Reference immutable or use Interlocked to update the long.

19 Jun 2013
11:04 AM

cao

I don't think long running Stopwatch are a good practice. You might get into issues related to processors clock speeds.

19 Jun 2013
13:24 PM

Igor Kalders

@cao: how would that be an issue in this case, where its value is only used relatively to the others? (genuinely interested)

19 Jun 2013
14:33 PM

João Bragança

@Patrick,

I think it's cause when you hit capacity, you do not want to iterate the cache every time after that. Reduce by 10% makes sense here.

19 Jun 2013
14:56 PM

Patrick Huizinga

@Igor Kalders,

The problem with Stopwatch is that it's tickcount processor core dependent. If you read the same stopwatch on different cores, you get different results.

var first = stopwatch.TickCount; SwitchCore(); Assert(stopwatch.TickCount > first); could fail

I don't know if the difference will continue to build up though. So that eventually two cores get so much out of sync that this cache will start to get it wrong noticeably.

Probably the best thing is to replace readonly Stopwatch _stopwatch = Stopwatch.StartNew(); with int _version; and replace each _stopwatch.ElapsedTicks with Interlocked.Increment(ref _version). That way you can also make Node.Ticks a volatile int.

19 Jun 2013
14:58 PM

Patrick Huizinga

typo, it should've been Assert(stopwatch.TickCount >= first); could fail (missed the =)

19 Jun 2013
15:06 PM

Patrick Huizinga

links about the caveats of Stopwatch:

http://kristofverbiest.blogspot.co.uk/2008/10/beware-of-stopwatch.html http://www.virtualdub.org/blog/pivot/entry.php?id=106

19 Jun 2013
15:12 PM

Doug

What is Reference<T>? I haven't seen it before...

19 Jun 2013
15:13 PM

Doug

Edit: What is Reference<T>?

19 Jun 2013
15:22 PM

Karhgath

As Cao said, I don't like the long running stopwatch and don't think that would be good practice, as Patrick Huizinga exposed.

@Patrick Huizinga The problem with simply incrementing a count is that results depends on real life data in worst case scenario. If you have items used in bursts, they will stick a long time in the cache but will not be access for a long time (think an heavy, once a day scheduled job).

Usually for a LRU-like cache, the best performance I get on average from best/worst case scenarios (for a simple implementation) is to pair a timestamp with a count. The timestamp is immutable and the count is volatile and incremented on access with Interlocked. Then I order by (Count * weightBasedOnElapsedTimeSinceTimeStamp). This allows for long cached items to be released and recreated eventually after a couple of minutes/hours/days/weeks if they are used in burst. Works great when you have scheduled tasks that uses a cached resource extensively once a day, preventing it from hogging the cache all the time.

Sometimes I also add a DelayRecycle/DoNotRecycle flag if the recreation is very costly, but often recreating a costly resource once a day is not that bad.

I believe that would be similar to what Ayende did for administrator cache in Raven, but my implement is usually more akin to the above post.

19 Jun 2013
19:05 PM

Ryan Heath

@Doug Probably because volatile long is not allowed, but volatile on a reference type is, hence the Reference<T>.

Recently I saw an implementation that used a linked list to store when an item was accessed. On access the item was moved to the head. On writing the N items from the tail were removed. I liked that there was no count variable or sorting of all items needed. Ofcource at the expense of the linked list overhead.

// Ryan

19 Jun 2013
19:18 PM

Ayende Rahien

Patrick, I am never certain what the indexer setter will do, and this is more obvious And the reason we do node.Count / 10 is to remove the bottom 10% so we don't constantly add & remove stuff from the cache.

19 Jun 2013
19:23 PM

Ayende Rahien

cao, We don't actually care what the values are, we don't even care if they are accurate, sort-of should be good enough.

19 Jun 2013
19:25 PM

Ayende Rahien

Patrick, The problem with your approach is that Interlocked require all processors to sync their memory. In contrast, we explicitly do not care for that here.

19 Jun 2013
19:26 PM

Ayende Rahien

Doug, A class that encapsulate a value. public class Reference { public T Value; }

It allows to either see the previous or current value, without fear of corrupted writes.

20 Jun 2013
08:51 AM

Rafal

there's always a need for another Lru cache implementation, but this one has an unpleasant habit of sorting the whole collection on each insert if capacity limit is reached.

20 Jun 2013
08:53 AM

Rafal

... and an atomic GetOrAdd operation would be very handy too

20 Jun 2013
09:12 AM

Ayende Rahien

Rafal, The default size I have for this is 2048, so that isn't really too bad. But note that it is NOT on each insert, it drops the last 10% on capacity breech. with 2048, it would do this sort once every 200 ops or so.

20 Jun 2013
09:19 AM

Rafal

Right, i went to conclusions too fast.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB