RavenDBIncludes
When I set out to build RavenDB, I had a very clear idea about what I wanted to do. I wanted to build an integrated, opinionated, solution. Something that will do the right thing most of the time, and let you override that if you really want to.
One of the things that really drove the design was 6 – 7 years of experience in building applications based on RDBMS using ORMs. Let me put it gently, I am… well acquainted with the problems that people may run into when they use an ORM. One of the things that I wanted to avoid was duplicating the possibility of error with RavenDB.
One of the major design decisions that I made was to disallow associations between documents. This is part of the core design of the system.
Let us take the following example:
As you can see, we have two documents, an Order and a Customer. The order references a customer, but unlike in RDBMS, we use a denormalized reference, with both the Id and the Name of the customer stored inside the Order document. That is advantageous because it allows us to perform most operations on the Order document without having to load the Customer document.
From the C# model, it looks like this:
public class Order { public string Id { get;set; } public Address ShippingAddress { get;set; } public Address BillingAddress { get;set; } public DenormalizedReference Customer { get;set; } } public class DenormalizedReference { public string Id { get;set; } public string Name { get;set; } } public class Customer { public string Id { get;set; } public string Name { get;set; } public string Email { get;set; } }
Note that there isn’t a direct reference between the Order and the Customer. Instead, Order holds a DenormalizedReference, which holds the interesting bits from Customer that we need to process requests on Order.
So far so good, but, and this is important, you can’t always set things up this way. There is a set of cases where you do want to be able to access the associated document.
Well, that is easy enough, isn’t it? All we need to do is load it:
var order = session.Load<Order>("orders/9432");
var customer = session.Load<Customer>(order.Customer.Id);
This is simple, easy to read, easy to understand and make me want to curl into a ball and weep. The problem is, of course, that this is going to generate two calls to Raven. And if there is one thing that I pay attention to is the number of remote calls that I am making.
I started to think about how I can make this scenario work better, and I came up with the following design.
Given the two documents
Then for GET /docs/orders/9432
And for GET /docs/orders/9432?include=Customer.Id
Note that in the second case, we get the full customer data merged into the order document.
From an implementation perspective, this would be very easy to do. The problem is how to represent this in the client API. We had a very interesting discussion on the topic in the mailing list.
Let me explain the problem in detail. Given the C# classes above, how do you express this notion of the include? You can’t use the Order model above, because that Customer property is going to of type DenormalizedReference. We can’t make that property of type Customer, either, because then the Customer data would be embedded inside the Order document, which isn’t what we wanted.
In the mailing list, there were a lot of proposal being raised, the one that seemed to be the most popular was to drop the 1:1 mapping between the C# model and the document model and move to something like this:
[RootAggregate] public class Order { public string Id { get; set; } public string Name { get; set; } public Customer Customer { get; set; } } [RootAggregate] public class Customer { public string Id { get; set; } [Denormalized] public string Name { get; set; } public string Email { get; set; } }
And then make the client API smart enough to understand the attribute. The model above would generate the same documents as the previous model, but would allow much easier time when working on features such as this. This way, we can normally access the data that is embedded in the document, but also include the associated document when we need it.
There are several problems here:
- This creates a misleading API, making people think that things are normalized when they aren’t.
- It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).
- It goes directly against the way I believe you should work with a document database.
But I couldn’t think of any other way, nor could anyone else.
Until Frank Schwieterman come to our rescue:
Maybe rather then join the documents into one result, such a request would cause the 'joined' entities to be preloaded instead.
From the client API perspective, I would load do the joined load of the user object, and the user is returned in its original form. But now the session has the customer object preloaded, so when I try to load the customer object via the client API no request is made to the server. From the caller's perspective, the only change to usage has been the preload hint passed in the original request.
Yes!
The problem was (from my perspective) never with the way the model is structured, the problem was that the way the documents were model caused a performance problem. Frank’s suggestion completely eliminated that issue.
It took some interesting coding to get it to work properly, but essentially, it is just an application of the Future usage for loading large object graph in NHibernate. Now we can do:
var order = session .Include("Customer.Id") .Load<Order>("orders/1"); var customer = session.Load<Customer>(order.Customer.Id);
And this code will only go to the server once!
We get to keep the separate model, and we can manipulate how we are loading associations easily. I really like this solution.
More posts in "RavenDB" series:
- (17 Feb 2025) Shared Journals
- (14 Feb 2025) Reclaiming disk space
- (12 Feb 2025) Write modes
- (10 Feb 2025) Next-Gen Pagers
Comments
Don't you mean "?include=Customer" instead of "?include=Company"?
Roy,
Yes, fixed, thanks
I'm, a bit confused by the GET url: GET /docs/users/oren?include=Customer.Id
To retrieve the order with the customer included I'd use
GET /docs/orders/9432?include=Customer.Id
Isn't there a way to get rid of the string argument in Include()? It looks funny compared to the typesafe Load <t().
Thomas,
Yes, there is a Lambda option as well
Louis,
Damn, I have a lot of typos in this post, fixed, thanks.
Will this work for indexes as well? So I can do
<order("orders_all");
and get all customers loaded for all orders.
Adam,
Yes
Great feature, love it, thanks
What is an example of a case where you would need the email from the orders? Wouldn't you create a new document for that purpose? This seems like it might be encouraging bad design decisions.
Jonty,
A common case would be if you need to notify the customer about delay in the order
I´m not a NoSql user yet, but I have a question. If the type of Customer object in Order class is a DenormalizedReference is is possible for one to expect getting the full customer when calling for order.Customer somehow?
I understood that you can pre-fetch the Customer and then access it through Session.Load but then we would need to 1) Keep a reference to the session object in Order instance to encapsulate the GetCustomer() method to return the full Customer or 2) Let the caller know that order.Customer will never be entire filled and if he wants it he will need to call Session.Load(order.Customer.Id).
I´m think of a way to both communicate the user that he has everything available when he needs and still don´t leak infrastructure aspects to the domain.
Tucaz,
In short, no. See the discussion on the model changes required to make this work, and why I don't like them.
The easy way to handle this is to use an ambient session, which is the general recommendation anyway.
And I like the fact that you need to take an extra step. You don't want to be able to reference stuff outside your own aggregate easily. See the discussion on Root Aggregates in DDD
Any chance we'll be able to include a collection of Ids? I think it might be useful in situations like this:
public class Customer {
}
I'd want to be able to preload all orders here.
Brian,
This scenario just works
Very neat, Ayende. How would you support "including" a chain of references - such as Order.Customer.BestFriend.Etc?
-Charles
Hi Ayende,
Good stuff! With reference to Brian Vallelunga's question, how does a query (to preload all orders) look?
Thanks!
Benjamin,
Exactly the same.
Include("Orders")
Charles,
I wouldn't support it, I can't think of an actual scenario where you need it in a document database.
"A common case would be if you need to notify the customer about delay in the order"
That sounds like a search screen where the rows would have the email in them already - ie a different document.
Jonty,
That might be the case, yes, and it might also be the case that it is a problem with the way we model stuff.
But that is a features that a lot of people wanted
Thanks for accepting override patches.
first thing, changing the order of chaining methods would bring IMO __much better meaning of what you were trying to accomplish.
var order = session.Load <order("orders/1").Include("Customer.Id");
second, if all you're after is having to hit the DB just once when retrieving the Customer document for known Order.Id, why don't make API to have the Customer document loaded in one straight call?
var customer;
var order = session.Include("Customer.Id" __, out customer
for third, I would again change the ordering of methods so...
var customer;
var order = session.Load <order("orders/1").Include("Customer.Id", out customer);
just an idea, I know you're prove me wrong :)
cowgaR,
Regarding the method ordering.
Sure, I would like that.
Now make it work when you don't include stuff as weel.
var person = session.Load(Person)
var person = session.Load(Person).Include("Customer")
Regarding the out param, good idea
If you changed the API slightly could you have the order like cowgaR suggests.
var person=session.Get(Person).Load();
var person=session.Get(Person).Include("Customer").Load();
oharab,
That would make the default case (no includes) much uglier.
I am not sure, if denormalized references are not actually premature optimization. Why should user care about minified models?
I know, it's stored twice, still...
Aggregate is class with Id
public class Order
{
}
public class Customer
{
}
What's wrong with loading Order with whole Customer? I suppose it's explicit enough. It should work also for storing.
Instead of Hibernate default laziness, Raven can eager load everything.
Maybe it is stupid idea, but I would like to hear your opinion.
I hope I understand all consequences related to sql joins, especially lazy evaluated. I am still newbie, but as I see it, raven document database is about eagerness (as opposite to sql laziness).
We have to create index before we can use it. Fine. We can put documents into db without scheme, super fine. So we should be able to load whole objects graphs in one step as well.
This code:
var order = session
<order("orders/1");
Is equivalent to afore mentioned, in case laziness is forbidden.
|This creates a misleading API, making people think that things are normalized when they aren’t.
It suppose it is implementation detail. Remember that object with id contained in another object with id is stored twice is easy.
|It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).
So disallow lazy load at all. I don't need it anyway.
|It goes directly against the way I believe you should work with a document database.
What's wrong with hypertext documents? They are still documents.
Include is nice feature, but soon enough my code probably will be full of includes.
PS: Maybe I overlooked something (or everything) ;) It was just written brainstorming :)
Daniel,
And Customer has a reference to Company, which has reference to Products, which has reference to...
In other words, you loaded the entire database.
It isn't premature, it is something that you have to deal with
Daniel,
Yes, that is pretty much the point. Because you want to be able to control this for each scenario.
There is no one scenario that fit all
Daniel,
You can't say it is an implementation detail, not when the impact is making remote calls.
And you can't disallow lazy loading, not when I consider this lazy loading as well:
session.Load(order.Customer.Id);
Hypertext docs are great, but you only read ONE doc at a time.
With DocDB documents, you may want to access more than that
What about index based join?
Like this:
Map:
from doc in docs
where doc["@metadata"]["Raven-Entity-Name"] == "Products" || doc["@metadata"]["Raven-Entity-Name"] == "ProductInputs"
select new {
};
Reduce:
from result in results
group result by result.Code into g
select new
{
}
Andres,
While you can make this work, I am not quite sure what is the purpose. Especially in the context of includes.
That, maybe the includes and other denormalizations can be done by indexes.
Why would this be beneficial?
It is faster and simpler than triggers and than non-intuitive queries like this:
var order = session.Include("Customer.Id").Load <order("orders/1");
(magic string, and how you now that you are loading a Customer?)
But Raven index syntax is not enough expressive. Doesn't it?
Sorry about my bad English.
Comment preview