RavenDB: Includes

architecture (624) rss
bugs (451) rss
community (383) rss
databases (481) rss
design (899) rss
development (658) rss
hibernating-practices (74) rss
miscellaneous (592) rss
performance (397) rss
programming (1113) rss
raven (1483) rss
ravendb.net (570) rss
reviews (184) rss

2025
- December (8)
- November (4)
- October (4)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Couchbase vs RavenDB Performance at Rakuten Kobo Whitepaper

Aug 12 2010

RavenDBIncludes

time to read 9 min | 1736 words

When I set out to build RavenDB, I had a very clear idea about what I wanted to do. I wanted to build an integrated, opinionated, solution. Something that will do the right thing most of the time, and let you override that if you really want to.

One of the things that really drove the design was 6 – 7 years of experience in building applications based on RDBMS using ORMs. Let me put it gently, I am… well acquainted with the problems that people may run into when they use an ORM. One of the things that I wanted to avoid was duplicating the possibility of error with RavenDB.

One of the major design decisions that I made was to disallow associations between documents. This is part of the core design of the system.

Let us take the following example:

As you can see, we have two documents, an Order and a Customer. The order references a customer, but unlike in RDBMS, we use a denormalized reference, with both the Id and the Name of the customer stored inside the Order document. That is advantageous because it allows us to perform most operations on the Order document without having to load the Customer document.

From the C# model, it looks like this:

public class Order
{
      public string Id { get;set; }
      public Address ShippingAddress { get;set; }
      public Address BillingAddress { get;set; }
      public DenormalizedReference Customer { get;set; }
}

public class DenormalizedReference
{
      public string Id { get;set; }
      public string Name { get;set; }
}

public class Customer
{
      public string Id { get;set; }
      public string Name { get;set; }
      public string Email { get;set; }

}

Note that there isn’t a direct reference between the Order and the Customer. Instead, Order holds a DenormalizedReference, which holds the interesting bits from Customer that we need to process requests on Order.

So far so good, but, and this is important, you can’t always set things up this way. There is a set of cases where you do want to be able to access the associated document.

Well, that is easy enough, isn’t it? All we need to do is load it:

var order = session.Load<Order>("orders/9432");
var customer = session.Load<Customer>(order.Customer.Id);

This is simple, easy to read, easy to understand and make me want to curl into a ball and weep. The problem is, of course, that this is going to generate two calls to Raven. And if there is one thing that I pay attention to is the number of remote calls that I am making.

I started to think about how I can make this scenario work better, and I came up with the following design.

Given the two documents

Then for GET /docs/orders/9432

And for GET /docs/orders/9432?include=Customer.Id

 

Note that in the second case, we get the full customer data merged into the order document.

From an implementation perspective, this would be very easy to do. The problem is how to represent this in the client API. We had a very interesting discussion on the topic in the mailing list.

Let me explain the problem in detail. Given the C# classes above, how do you express this notion of the include? You can’t use the Order model above, because that Customer property is going to of type DenormalizedReference. We can’t make that property of type Customer, either, because then the Customer data would be embedded inside the Order document, which isn’t what we wanted.

In the mailing list, there were a lot of proposal being raised, the one that seemed to be the most popular was to drop the 1:1 mapping between the C# model and the document model and move to something like this:

[RootAggregate]
public class Order
{
    public string Id { get; set; }
    public string Name { get; set; }
    public Customer Customer { get; set; }
}

[RootAggregate]
public class Customer
{
    public string Id { get; set; }
    [Denormalized]
    public string Name { get; set; }
    public string Email { get; set; }

}

And then make the client API smart enough to understand the attribute. The model above would generate the same documents as the previous model, but would allow much easier time when working on features such as this. This way, we can normally access the data that is embedded in the document, but also include the associated document when we need it.

There are several problems here:

This creates a misleading API, making people think that things are normalized when they aren’t.
It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).
It goes directly against the way I believe you should work with a document database.

But I couldn’t think of any other way, nor could anyone else.

Until Frank Schwieterman come to our rescue:

Maybe rather then join the documents into one result, such a request would cause the 'joined' entities to be preloaded instead.
From the client API perspective, I would load do the joined load of the user object, and the user is returned in its original form. But now the session has the customer object preloaded, so when I try to load the customer object via the client API no request is made to the server. From the caller's perspective, the only change to usage has been the preload hint passed in the original request.

Yes!

The problem was (from my perspective) never with the way the model is structured, the problem was that the way the documents were model caused a performance problem. Frank’s suggestion completely eliminated that issue.

It took some interesting coding to get it to work properly, but essentially, it is just an application of the Future usage for loading large object graph in NHibernate. Now we can do:

var order = session
    .Include("Customer.Id")
    .Load<Order>("orders/1");

var customer = session.Load<Customer>(order.Customer.Id);

And this code will only go to the server once!

We get to keep the separate model, and we can manipulate how we are loading associations easily. I really like this solution.

Tweet Share Share 37 comments

Tags:

Raven

Comments

12 Aug 2010
07:55 AM

Roy

Don't you mean "?include=Customer" instead of "?include=Company"?

12 Aug 2010
07:59 AM

Ayende Rahien

Roy,

Yes, fixed, thanks

12 Aug 2010
08:55 AM

Louis Haußknecht

I'm, a bit confused by the GET url: GET /docs/users/oren?include=Customer.Id

To retrieve the order with the customer included I'd use

GET /docs/orders/9432?include=Customer.Id

12 Aug 2010
08:58 AM

Thomas Eyde

Isn't there a way to get rid of the string argument in Include()? It looks funny compared to the typesafe Load <t().

12 Aug 2010
08:59 AM

Ayende Rahien

Thomas,

Yes, there is a Lambda option as well

12 Aug 2010
09:09 AM

Ayende Rahien

Louis,

Damn, I have a lot of typos in this post, fixed, thanks.

12 Aug 2010
09:26 AM

Adam

Will this work for indexes as well? So I can do

var orders = session

    .Include("Customer.Id")

    .Query

<order("orders_all");

and get all customers loaded for all orders.

12 Aug 2010
09:33 AM

Ayende Rahien

Adam,

Yes

12 Aug 2010
09:36 AM

Adam

Great feature, love it, thanks

12 Aug 2010
11:23 AM

Jonty

What is an example of a case where you would need the email from the orders? Wouldn't you create a new document for that purpose? This seems like it might be encouraging bad design decisions.

12 Aug 2010
11:48 AM

Ayende Rahien

Jonty,

A common case would be if you need to notify the customer about delay in the order

12 Aug 2010
11:54 AM

tucaz

I´m not a NoSql user yet, but I have a question. If the type of Customer object in Order class is a DenormalizedReference is is possible for one to expect getting the full customer when calling for order.Customer somehow?

I understood that you can pre-fetch the Customer and then access it through Session.Load but then we would need to 1) Keep a reference to the session object in Order instance to encapsulate the GetCustomer() method to return the full Customer or 2) Let the caller know that order.Customer will never be entire filled and if he wants it he will need to call Session.Load(order.Customer.Id).

I´m think of a way to both communicate the user that he has everything available when he needs and still don´t leak infrastructure aspects to the domain.

12 Aug 2010
11:59 AM

Ayende Rahien

Tucaz,

In short, no. See the discussion on the model changes required to make this work, and why I don't like them.

The easy way to handle this is to use an ambient session, which is the general recommendation anyway.

And I like the fact that you need to take an extra step. You don't want to be able to reference stuff outside your own aggregate easily. See the discussion on Root Aggregates in DDD

12 Aug 2010
13:59 PM

Brian Vallelunga

Any chance we'll be able to include a collection of Ids? I think it might be useful in situations like this:

public class Customer {

public string Id { get; set; }

public string[] OrderIds { get; set; }

}

I'd want to be able to preload all orders here.

12 Aug 2010
14:23 PM

Ayende Rahien

Brian,

This scenario just works

12 Aug 2010
14:57 PM

Charles Strahan

Very neat, Ayende. How would you support "including" a chain of references - such as Order.Customer.BestFriend.Etc?

-Charles

12 Aug 2010
15:07 PM

Benjamin

Hi Ayende,

Good stuff! With reference to Brian Vallelunga's question, how does a query (to preload all orders) look?

Thanks!

12 Aug 2010
15:09 PM

Ayende Rahien

Benjamin,

Exactly the same.

Include("Orders")

12 Aug 2010
15:10 PM

Ayende Rahien

Charles,

I wouldn't support it, I can't think of an actual scenario where you need it in a document database.

12 Aug 2010
16:19 PM

Jonty

"A common case would be if you need to notify the customer about delay in the order"

That sounds like a search screen where the rows would have the email in them already - ie a different document.

12 Aug 2010
18:32 PM

Ayende Rahien

Jonty,

That might be the case, yes, and it might also be the case that it is a problem with the way we model stuff.

But that is a features that a lot of people wanted

13 Aug 2010
00:41 AM

jdn

Thanks for accepting override patches.

13 Aug 2010
08:33 AM

cowgaR

first thing, changing the order of chaining methods would bring IMO __much better meaning of what you were trying to accomplish.

var order = session.Load <order("orders/1").Include("Customer.Id");

second, if all you're after is having to hit the DB just once when retrieving the Customer document for known Order.Id, why don't make API to have the Customer document loaded in one straight call?

var customer;

var order = session.Include("Customer.Id" __, out customer

).Load <order("orders/1");

for third, I would again change the ordering of methods so...

var customer;

var order = session.Load <order("orders/1").Include("Customer.Id", out customer);

just an idea, I know you're prove me wrong :)

13 Aug 2010
09:36 AM

Ayende Rahien

cowgaR,

Regarding the method ordering.

Sure, I would like that.

Now make it work when you don't include stuff as weel.

var person = session.Load(Person)

var person = session.Load(Person).Include("Customer")

Regarding the out param, good idea

13 Aug 2010
13:34 PM

oharab

If you changed the API slightly could you have the order like cowgaR suggests.

var person=session.Get(Person).Load();

var person=session.Get(Person).Include("Customer").Load();

13 Aug 2010
14:35 PM

Ayende Rahien

oharab,

That would make the default case (no includes) much uglier.

17 Aug 2010
00:19 AM

Daniel Steigerwald

I am not sure, if denormalized references are not actually premature optimization. Why should user care about minified models?

I know, it's stored twice, still...

Aggregate is class with Id

public class Order

{

public string Id { get; set; }

public string Name { get; set; }

public Customer Customer { get; set; }

}

public class Customer

{

public string Id { get; set; }

public string Name { get; set; }

public string Email { get; set; }

}

What's wrong with loading Order with whole Customer? I suppose it's explicit enough. It should work also for storing.

Instead of Hibernate default laziness, Raven can eager load everything.

Maybe it is stupid idea, but I would like to hear your opinion.

17 Aug 2010
00:43 AM

Daniel Steigerwald

I hope I understand all consequences related to sql joins, especially lazy evaluated. I am still newbie, but as I see it, raven document database is about eagerness (as opposite to sql laziness).

We have to create index before we can use it. Fine. We can put documents into db without scheme, super fine. So we should be able to load whole objects graphs in one step as well.

This code:

var order = session

.Include("Customer.Id")

.Load

<order("orders/1");

Is equivalent to afore mentioned, in case laziness is forbidden.

17 Aug 2010
00:59 AM

Daniel Steigerwald

|This creates a misleading API, making people think that things are normalized when they aren’t.

It suppose it is implementation detail. Remember that object with id contained in another object with id is stored twice is easy.

|It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).

So disallow lazy load at all. I don't need it anyway.

|It goes directly against the way I believe you should work with a document database.

What's wrong with hypertext documents? They are still documents.

Include is nice feature, but soon enough my code probably will be full of includes.

PS: Maybe I overlooked something (or everything) ;) It was just written brainstorming :)

17 Aug 2010
05:57 AM

Ayende Rahien

Daniel,

And Customer has a reference to Company, which has reference to Products, which has reference to...

In other words, you loaded the entire database.

It isn't premature, it is something that you have to deal with

17 Aug 2010
05:58 AM

Ayende Rahien

Daniel,

Yes, that is pretty much the point. Because you want to be able to control this for each scenario.

There is no one scenario that fit all

17 Aug 2010
06:01 AM

Ayende Rahien

Daniel,

You can't say it is an implementation detail, not when the impact is making remote calls.

And you can't disallow lazy loading, not when I consider this lazy loading as well:

session.Load(order.Customer.Id);

Hypertext docs are great, but you only read ONE doc at a time.

With DocDB documents, you may want to access more than that

16 Sep 2010
03:30 AM

Andres

What about index based join?

Like this:

Map:

from doc in docs

where doc["@metadata"]["Raven-Entity-Name"] == "Products" || doc["@metadata"]["Raven-Entity-Name"] == "ProductInputs"

select new {

Code = doc["@metadata"]["Raven-Entity-Name"] == "Products" ? doc.Code : doc.ProductCode,

Input = doc["@metadata"]["Raven-Entity-Name"] == "Products" ? doc.FirstInput : doc.Input

};

Reduce:

from result in results

group result by result.Code into g

select new

{

Code = g.Key,

Count = g.Count(),

TotalInputs = g.Sum(x => x.Input ?? 0)

}

16 Sep 2010
10:58 AM

Ayende Rahien

Andres,

While you can make this work, I am not quite sure what is the purpose. Especially in the context of includes.

16 Sep 2010
11:32 AM

Andres

That, maybe the includes and other denormalizations can be done by indexes.

16 Sep 2010
11:41 AM

Ayende Rahien

Why would this be beneficial?

16 Sep 2010
13:21 PM

Andres

It is faster and simpler than triggers and than non-intuitive queries like this:

var order = session.Include("Customer.Id").Load <order("orders/1");

(magic string, and how you now that you are loading a Customer?)

But Raven index syntax is not enough expressive. Doesn't it?

Sorry about my bad English.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

RavenDBIncludes

More posts in "RavenDB" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "RavenDB" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication