Ayende @ Rahien

Jun 06 2013

Raven StreamsWhat to do with the data?

time to read 2 min | 254 words

Tags:

raven

So, we have events sitting in a stream, and you can append to them as long as you like. And you can read from them, too. And you can also generate aggregations and look at those. In the previous post, I discussed the example of billing for a phone company as the example.

Let us consider this for a second. You’ll probably have aggregation per customer per month. So loading them would probably look something like:

eventStream.AggregationFor<MonthlyBilling>(“customers/1234/2013-05”);

And that is nice & all, but what about the more complex things? What happen if I want to search for high billing statement? Or all statements for the past year? What happen if I want to do a lookup by the customer name, not its id?

Well, I already know how to use Lucene, so we can plug this into the system and allow you to sear…

Wait a second! I already wrote all of that. As it turns out, a lot of the stuff that we want to do is already done. RavenDB has it implemented. For that matter, storing the aggregation results in RavenDB is going to introduce a lot of additional options. For example, we can do another map/reduce (inside RavenDB) to give you additional aggregation. We can do full text search, enrich the data and a lot more.

So that is what will probably happen. We will be writing the aggregated results from Raven Streams into RavenDB as standard documents, and you could then process them further using RavenDB’s standard tools.

Raven StreamsAggregations–from the system point of view

time to read 5 min | 941 words

Tweet Share Share 8 comments

Tags:

raven

Previously we introduced the following index (not sure if that would be a good name, maybe aggregation or projection?):

   1: from msg in messages

   2: select new

   3: {

   4:     Customer = msg.From,

   5:     Count = 1

   6: }

7:

   8: from result in results

   9: group result by result.Customer

  10: into g

  11: select new

  12: {

  13:     Customer = g.Key,

  14:     Count = g.Sum(x=>x.Count)

  15: }

Let us consider how it works from the point of view of the system. The easy case is when we have just one event to work on. We are going to run it through the map and then through the reduce, and that would be about it.

What about the next step? Well, on the second event, all we actually need to do is run it through the map, then run it and the previous result through the reduce. The only thing we need to remember is the final result. No need to remember all the pesky details in the middle, because we don’t have the notion of updates / deletes to require them.

This make the entire process so simple it is ridiculous. I am actually looking forward to doing this, if only because I have to dig through a lot of complexity to get RavenDB’s map/reduce’s indexes to where they are now.

Raven Streamsaggregations–how the user sees them

time to read 6 min | 1013 words

Tweet Share Share 10 comments

Tags:

raven

The major reason for streams is the idea that you don’t really care about each individual item on its own. What you care about a lot more is some sort of aggregation over those values. And sure, you do want to be able to access the values, but you generally don’t.

Let us say that you are a phone company, and you want to use Raven Streams to record all the events that happened, so you can bill on them. Let us imagine that we are interested in just SMS for the moment, so we append each sms to the stream.

Then we are going to write something like:

   1: from msg in messages

   2: select new

   3: {

   4:     Customer = msg.From,

   5:     Count = 1

   6: }

7:

   8: from result in results

   9: group result by result.Customer

  10: into g

  11: select new

  12: {

  13:     Customer = g.Key,

  14:     Count = g.Sum(x=>x.Count)

  15: }

If you ever did RavenDB map/reduce indexes, this should be very familiar to you. However, unlike RavenDB, here we don’t need to handle any pesky updates or deletes. That means that the implementation is much simpler, but I’ll discuss that on my next post.

In the meantime, let us consider what is the result of this would be. It would generate a result, which we would persist and allow you to lookup. One can imagine that you can do this via the customer id, and get the sum total as it is right now.

But you’ll probably want to do additional operations, so we need to consider this as well.

For that matter, imagine the scenario where we want to get the data about SMS, MMS, phone calls, etc. How would you expect that to look like?

Oren Eini

Oren Eini

CEO of RavenDB

Raven StreamsWhat to do with the data?

Raven StreamsAggregations–from the system point of view

Raven Streamsaggregations–how the user sees them

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed