Raven Streams: aggregations–how the user sees them

architecture (611) rss
bugs (450) rss
challanges (123) rss
community (379) rss
databases (481) rss
design (895) rss
development (641) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1448) rss
ravendb.net (532) rss
reviews (184) rss

2025
- June (4)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Jun 04 2013

Raven Streamsaggregations–how the user sees them

time to read 6 min | 1013 words

The major reason for streams is the idea that you don’t really care about each individual item on its own. What you care about a lot more is some sort of aggregation over those values. And sure, you do want to be able to access the values, but you generally don’t.

Let us say that you are a phone company, and you want to use Raven Streams to record all the events that happened, so you can bill on them. Let us imagine that we are interested in just SMS for the moment, so we append each sms to the stream.

Then we are going to write something like:

   1: from msg in messages

   2: select new

   3: {

   4:     Customer = msg.From,

   5:     Count = 1

   6: }

7:

   8: from result in results

   9: group result by result.Customer

  10: into g

  11: select new

  12: {

  13:     Customer = g.Key,

  14:     Count = g.Sum(x=>x.Count)

  15: }

If you ever did RavenDB map/reduce indexes, this should be very familiar to you. However, unlike RavenDB, here we don’t need to handle any pesky updates or deletes. That means that the implementation is much simpler, but I’ll discuss that on my next post.

In the meantime, let us consider what is the result of this would be. It would generate a result, which we would persist and allow you to lookup. One can imagine that you can do this via the customer id, and get the sum total as it is right now.

But you’ll probably want to do additional operations, so we need to consider this as well.

For that matter, imagine the scenario where we want to get the data about SMS, MMS, phone calls, etc. How would you expect that to look like?

Tweet Share Share 10 comments

Tags:

raven

Comments

04 Jun 2013
09:21 AM

Moti

"However, unlike RavenDB"

I think you meant, "However, unlike map/reduce"

04 Jun 2013
12:05 PM

Damian Hickey

I'm assuming the streams here are homogenous, that is, a stream per message type. So a separate stream for each of SMS, MMS, Phone Calls, Data Connections (as each of these have different attributes, you'll prob want different aggregations across them), per billing period, per customer.

Then it would be nice to project a new stream from these multiple streams, transforming the source message to the target stream message type. For example, generating the itemized bill 'stream' (attributes: description, amount), per billing period, per customer. This could be used to figure out if the customer is still under their credit limit.

(Separately, I'm interested in looking at the underlying storage for DDD\ES type of apps where the event streams are heterogeneous.)

04 Jun 2013
12:20 PM

Khalid Abuhakmeh

Like I commented on in a previous post, snapshotting is going to be necessary.

The example you gave is incomplete, because as the phone company I need to bill based on a period of time. I would go out of business if I had to wait until the person stopped using my service to bill them.

I would need the ability to either set up a snapshot period, or create the map/reduce to be grouped by both the Month and the CustomerId.

For you second question "For that matter, imagine the scenario where we want to get the data about SMS, MMS, phone calls, etc. How would you expect that to look?"

It would be cool if you could pass back the resulting object / it's auto generated Id and get the collection that resulted in that outcome back.

/results/1/collection -> All documents back

That way you can loop through each result and see what constituted it. This would be cool, but you would have to track it some how.

04 Jun 2013
13:43 PM

Karhgath

Unlike Damian, I believe a stream should be heterogeneous. Let's say you have a Telco Stream, with SMS, MMS and Phone events.

Each item you post have a type that could, by convention, be the class name (SMSEvent, MMSEvent, PhoneEvent). The type should be in the metadata (if you keep any). This means you could do a multi map of each type:

from msg in messages.SMSEvents select new { Customer = msg.From, SMSCount = 1, MMSCount = 0, PhoneCount = 0 }

Or simply an index per type if you want to split them (and create "substreams").

Also as Khalid said, we'd need snapshot.Since we're always forward moving, we should only be able to create a snapshot from the last snapshot (or start of stream if no snapshot) to the most recent item. They are on a per index basis:

stream.Snapshot<IndexType>(); stream.SnapshotAll();

This could speed up Map/Reduce and start the aggregation with the last snapshot and move forward, if possible. If you do some date stuff and all in the map/reduce query, it would try to use a snapshot if possible, or rebuild from scratch (which would be slower)

If no events are indexed, or a minimum of new events isn't triggered (configuration, like "minimum 10 items per snapshot"), the snapshot isn't created. It should be mostly behind the scene stuff and never be directly accessed/managed. You'd need to figure out where to store them (in RavenDB?), how to reference them internally, etc.

You'd need conventions for automated snapshots (disabled, every 100 items, every hour/day/week/month, every Type, dynamic per index...). That could be triggered before each append. If we do allow manual triggering of snapshots, we'd need to have some stats like "Item Count Since Last Snapshot" and stuff like that.

We'd have a issue of append date vs item date however (in the case above, we could append on a monday but the phone call happened on sunday), which is non-trivial to solve on a forward only stream. We'd need to assume date related stuff is always the server append date or else we'd have ordering issues.

Unlike Khalid, and because of that last part, Snapshot are performance only and should never reflect business logic.

Now to handle Khalid's issues, we'd need a strategy for this. A Stream per month maybe? This means we could append an item to a specific month even after the month is over, and handle business logic on the software side (detect cutoff and all) and not in Raven Streams. For DDD AggregateRoot, you'd have a stream by AR.

Like Ids and collection names in RavenDB, we could have a convention for this:

store.Conventions.StreamIdFor<EventType> = (item) => "events/" + item.EventDate.Format("yyyyMM");

store.Conventions.StreamIdFor<AggregateRoot> = (item) => "aggregates/" + item.AggregateId;

04 Jun 2013
16:20 PM

Damian Hickey

@Karhgath Hay, I didn't say what I believed, I only stated an assumption based on the code shown - that a stream contains messages of one type. :)

04 Jun 2013
17:54 PM

Karhgath

@Damian No offense meant, just bad phrasing on my part ;)

06 Jun 2013
12:09 PM

Ayende Rahien

Moti, No, I meant what I said. Map/Reduce generally don't deal with updates. RavenDB has updatable map/reduce, but it is pretty rare.

06 Jun 2013
12:10 PM

Ayende Rahien

Khalid, I am not sure what you mean when you say, snapshoting. If you are talking about being able to look at the aggregation value from previous time. I guess we can provide that. We are going to keep all of that information around, we aren't going to just keep the aggregation.

06 Jun 2013
12:26 PM

Khalid Abuhakmeh

I guess what I mean by snapshoting, is the ability to take a specified range, most likely by delineated by date or time, and either do two things with the data that falls within that range.

Save the aggregation into a collection automatically. This would be helpful for things like Account Usage scenarios that you might bill on a monthly basis. The cool thing about this is that you could treat this collection as another stream and in essence chain another aggregation on top of your previous one.

Day Stream -> Week Stream -> Month Stream -> Year Stream

(not sure if that makes sense)

Or be able to specify a range adhoc, and run the aggregation in real time. This would be helpful for exploring past data.

I guess both scenarios could be accomplished by taking the saved stream and importing it into RavenDB or SQL Server and doing analysis there, but it would be nice if it had a mechanism built in to do that.

I know we don't want to compare this to EventStore, but one of the mind blowing realizations of that software is that you can rerun all past events and, in theory, never lose business information. That is a really exciting prospect, but I have yet to see a real implementation of that model.

06 Jun 2013
12:30 PM

Ayende Rahien

I haven't thought yet about the ability to do something like aggregate to a different stream, but if we support heterogenous streams, I don't see a reason why that can't be the case. I assumed that if you need time based data, you would do that using the aggregation already.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Raven Streamsaggregations–how the user sees them

More posts in "Raven Streams" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

More posts in "Raven Streams" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication