Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,163
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 254 words

So, we have events sitting in a stream, and you can append to them as long as you like. And you can read from them, too. And you can also generate aggregations and look at those. In the previous post, I discussed the example of billing for a phone company as the example.

Let us consider this for a second. You’ll probably have aggregation per customer per month. So loading them would probably look something like:

eventStream.AggregationFor<MonthlyBilling>(“customers/1234/2013-05”);

And that is nice & all, but what about the more complex things? What happen if I want to search for high billing statement? Or all statements for the past year? What happen if I want to do a lookup by the customer name, not its id?

Well, I already know how to use Lucene, so we can plug this into the system and allow you to sear…

Wait a second! I already wrote all of that. As it turns out, a lot of the stuff that we want to do is already done. RavenDB has it implemented. For that matter, storing the aggregation results in RavenDB is going to introduce a lot of additional options. For example, we can do another map/reduce (inside RavenDB) to give you additional aggregation. We can do full text search, enrich the data and a lot more.

So that is what will probably happen. We will be writing the aggregated results from Raven Streams into RavenDB as standard documents, and you could then process them further using RavenDB’s standard tools.

time to read 5 min | 941 words

Previously we introduced the following index (not sure if that would be a good name, maybe aggregation or projection?):

   1: from msg in messages
   2: select new
   3: {
   4:     Customer = msg.From,
   5:     Count = 1
   6: }
   7:  
   8: from result in results
   9: group result by result.Customer
  10: into g
  11: select new
  12: {
  13:     Customer = g.Key,
  14:     Count = g.Sum(x=>x.Count)
  15: }

Let us consider how it works from the point of view of the system. The easy case is when we have just one event to work on. We are going to run it through the map and then through the reduce, and that would be about it.

What about the next step? Well, on the second event, all we actually need to do is run it through the map, then run it and the previous result through the reduce. The only thing we need to remember is the final result. No need to remember all the pesky details in the middle, because we don’t have the notion of updates / deletes to require them.

This make the entire process so simple it is ridiculous. I am actually looking forward to doing this, if only because I have to dig through a lot of complexity to get RavenDB’s map/reduce’s indexes to where they are now.

time to read 6 min | 1013 words

The major reason for streams is the idea that you don’t really care about each individual item on its own. What you care about a lot more is some sort of aggregation over those values. And sure, you do want to be able to access the values, but you generally don’t.

Let us say that you are a phone company, and you want to use Raven Streams to record all the events that happened, so you can bill on them. Let us imagine that we are interested in just SMS for the moment, so we append each sms to the stream.

Then we are going to write something like:

   1: from msg in messages
   2: select new
   3: {
   4:     Customer = msg.From,
   5:     Count = 1
   6: }
   7:  
   8: from result in results
   9: group result by result.Customer
  10: into g
  11: select new
  12: {
  13:     Customer = g.Key,
  14:     Count = g.Sum(x=>x.Count)
  15: }

If you ever did RavenDB map/reduce indexes, this should be very familiar to you. However, unlike RavenDB, here we don’t need to handle any pesky updates or deletes. That means that the implementation is much simpler, but I’ll discuss that on my next post.

In the meantime, let us consider what is the result of this would be. It would generate a result, which we would persist and allow you to lookup. One can imagine that you can do this via the customer id, and get the sum total as it is right now.

But you’ll probably want to do additional operations, so we need to consider this as well.

For that matter, imagine the scenario where we want to get the data about SMS, MMS, phone calls, etc. How would you expect that to look like?

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - one day from now
  2. Configuration values & Escape hatches - 4 days from now
  3. What happens when a sparse file allocation fails? - 6 days from now
  4. NTFS has an emergency stash of disk space - 8 days from now
  5. Challenge: Giving file system developer ulcer - 11 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}