Raven StreamsAggregations–from the system point of view
Previously we introduced the following index (not sure if that would be a good name, maybe aggregation or projection?):
1: from msg in messages2: select new3: {
4: Customer = msg.From,
5: Count = 1
6: }
7:
8: from result in results9: group result by result.Customer
10: into g
11: select new12: {
13: Customer = g.Key,
14: Count = g.Sum(x=>x.Count)
15: }
Let us consider how it works from the point of view of the system. The easy case is when we have just one event to work on. We are going to run it through the map and then through the reduce, and that would be about it.
What about the next step? Well, on the second event, all we actually need to do is run it through the map, then run it and the previous result through the reduce. The only thing we need to remember is the final result. No need to remember all the pesky details in the middle, because we don’t have the notion of updates / deletes to require them.
This make the entire process so simple it is ridiculous. I am actually looking forward to doing this, if only because I have to dig through a lot of complexity to get RavenDB’s map/reduce’s indexes to where they are now.
More posts in "Raven Streams" series:
- (06 Jun 2013) What to do with the data?
- (05 Jun 2013) Aggregations–from the system point of view
- (04 Jun 2013) aggregations–how the user sees them
Comments
Will there be a time component to this system? What about creating an index that shows data for the last 5 minutes? Obviously that index would have to be updated automatically on the server. If something like this could be done, that might be very useful for metric tracking.
I believe this is what StreamInsight is used for and am just beginning to research the possibilities.
ayende, what your opinion about http://www.nuodb.com/? It's not specifically related with the things you comment in this series of posts, but they tell that supports ACID and RavenDB has eventual consistency...
Brian, Time sensitive stuff is pretty important. As I mentioned, I have absolutely no idea if / whatever this will go forward, but I would like to do it like that. If you just want to get aggregation over the last N time, that should be pretty easy to do, I guess.
RavenDB is ACID. I haven't looked deeply into NouDB.
I know that RavenDB is ACID too. I try to say that NuoDB is ACID and consistent. What happes with consistency? I found this post interesting http://lostechies.com/jimmybogard/2013/05/15/eventual-consistency-in-rest-apis/
I think you are missing something. Even from brief cursory read in the docs it was quite apparent that using noudb is not going to produce consistent results in failure modes, and good luck with doing aggregation on a distributed network, or joins across that. There is a reason that the model just not going to work.
I'm sure that is possible that I'm missing something ;) but extracted from his site: "And of course, you can rely on NuoDB to provide 100% Atomic, Consistent, Isolated, and Durable (ACID) transactions.". It isn't this contradict with the impressions you get from the docs?
That is pretty much marketing only. ACID transactions and consistency are two very different things. For example, you can't get distributed consistent reads across a cluster in the presence of a failure.
Comment preview