Raven Streams: Aggregations–from the system point of view

Jun 05 2013

Raven StreamsAggregations–from the system point of view

time to read 5 min | 941 words

Previously we introduced the following index (not sure if that would be a good name, maybe aggregation or projection?):

   1: from msg in messages

   2: select new

   3: {

   4:     Customer = msg.From,

   5:     Count = 1

   6: }

7:

   8: from result in results

   9: group result by result.Customer

  10: into g

  11: select new

  12: {

  13:     Customer = g.Key,

  14:     Count = g.Sum(x=>x.Count)

  15: }

Let us consider how it works from the point of view of the system. The easy case is when we have just one event to work on. We are going to run it through the map and then through the reduce, and that would be about it.

What about the next step? Well, on the second event, all we actually need to do is run it through the map, then run it and the previous result through the reduce. The only thing we need to remember is the final result. No need to remember all the pesky details in the middle, because we don’t have the notion of updates / deletes to require them.

This make the entire process so simple it is ridiculous. I am actually looking forward to doing this, if only because I have to dig through a lot of complexity to get RavenDB’s map/reduce’s indexes to where they are now.

Tweet Share Share 8 comments

Tags:

raven

More posts in "Raven Streams" series:

(06 Jun 2013) What to do with the data?
(05 Jun 2013) Aggregations–from the system point of view
(04 Jun 2013) aggregations–how the user sees them

Comments

05 Jun 2013
14:57 PM

Brian Vallelunga

Will there be a time component to this system? What about creating an index that shows data for the last 5 minutes? Obviously that index would have to be updated automatically on the server. If something like this could be done, that might be very useful for metric tracking.

I believe this is what StreamInsight is used for and am just beginning to research the possibilities.

06 Jun 2013
08:00 AM

nuodb

ayende, what your opinion about http://www.nuodb.com/? It's not specifically related with the things you comment in this series of posts, but they tell that supports ACID and RavenDB has eventual consistency...

06 Jun 2013
12:07 PM

Ayende Rahien

Brian, Time sensitive stuff is pretty important. As I mentioned, I have absolutely no idea if / whatever this will go forward, but I would like to do it like that. If you just want to get aggregation over the last N time, that should be pretty easy to do, I guess.

06 Jun 2013
12:19 PM

Ayende Rahien

RavenDB is ACID. I haven't looked deeply into NouDB.

07 Jun 2013
07:04 AM

nuodb

I know that RavenDB is ACID too. I try to say that NuoDB is ACID and consistent. What happes with consistency? I found this post interesting http://lostechies.com/jimmybogard/2013/05/15/eventual-consistency-in-rest-apis/

07 Jun 2013
07:15 AM

Ayende Rahien

I think you are missing something. Even from brief cursory read in the docs it was quite apparent that using noudb is not going to produce consistent results in failure modes, and good luck with doing aggregation on a distributed network, or joins across that. There is a reason that the model just not going to work.

07 Jun 2013
09:14 AM

nuodb

I'm sure that is possible that I'm missing something ;) but extracted from his site: "And of course, you can rely on NuoDB to provide 100% Atomic, Consistent, Isolated, and Durable (ACID) transactions.". It isn't this contradict with the impressions you get from the docs?

07 Jun 2013
18:35 PM

Ayende Rahien

That is pretty much marketing only. ACID transactions and consistency are two very different things. For example, you can't get distributed consistent reads across a cluster in the presence of a failure.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB