Tertiary includes in RavenDB
For a while, I was absolutely against this, in fact, I refused to implement this several times when asked, because it is complicate to do and indicate design problems in your domain model.
During a course today, I found out that I might want to refuse, but the feature is actually in the product for the last two+ years, we just didn’t know about this.
But what are tertiary includes in the first place?
Let us imagine that we have something like this:
Now, I want to load an order, but also show its Customer and Products. Those are secondary includes, and are very easy to do with RavenDB:
var orders = session.Query<Order>()
.Include("Customer") .Include("Lines,Product") .ToList();
There is also a strongly typed option for this, of course.
What this does is instruct RavenDB to load into the session the Customer and the associated Products into the session, so when you do something like this:
var cust = session.Load<Customer>(order.Customer);
The value will be loaded from the session cache, without going to the sever. As I said, this is a feature that we have had for quite a while, and it is a really nice one, because it drastically reduce the number of queries that you have to make.
The problem is that some people want to take it one step further, they want to be able to search on an Order, but also load the Location of a Product. I don’t really like this, and as I said, when asked for this feature, I consistently said that it isn’t there because it represent a remnant of relational thinking in your design.
But as it turned out, we do support this, although quite by accident.
The reason is quite simple, we evaluate Includes only after we evaluate the TranformResults function. Which means that the TranformResults function gets to choose whatever you want to include. Here is how it works:
TranformResults = (database, results) => from result in results select new { Order = result, Locations = result.SelectMany(x=>x.Lines).Select(x=>database.Load<Product>(x.Product).Location) }
And then, you can just ask to Include(“Locations”), and you are pretty much set.
Except, that this is a really awkward thing to do, and I don’t really like it at all.
Sure, I don’t like this feature, but people will use it, and if it already there, we might as well make it elegant. Therefor, we now have the option of doing:
TranformResults = (database, results) =>
from result in results
let _ = database.Include(result.SelectMany(x=>x.Lines).Select(x=>x.Product))
select result;
I think you’ll agree that this is much nicer all around, this tells the server to include the data, without us needing to explicitly ask this from the client.
Comments
Just out of curiousity, how would you model the domain to avoid this? Why is it poor to have location as its own document and need to know information about it from an order?
Matt, I don't like this because it means that your operations doesn't match your model. If you need to do stuff to locations from the order, why isn't it also associated with the order?
Ok yep can definitely see that, I was more thinking of a read scenario than a write.
If you did associate locations with the order, does that mean you would then have to ensure that the "denormalized" location data is kept up to date whenever a location changes? Assuming you have 20 different models that require location data within them, would that mean writing the same info 20 separate times in application code (write the location change to Model 1, now Model 2, now Model 3, etc.), or is there some way Raven can sync them automatically?
Tyler, You see the problem, right? That is the modeling issue. Either locations are important to the order for another reason, or they aren't. If they are important, they should be with the order, if they aren't, why do you need tertiary include for it?
I definitely see that, I'm just trying to wrap my head around the maintenance of not having "once source of truth", but twenty sources of truth. I can envision developers that aren't aware of those twenty sources performing updates to some of them, but not all, causing data disparity. Then we have a maintenance nightmare where "location/1" has different Street Address in it than what is in the Order model for the "location/1" Street Address.
...or are you saying that you should do something like this with the Orders model:
Lines: [ { Product: "products/1", Quantity: 3, Location: "locations/1234" } ]
In this case, when the location of a product changes, does that mean you would have to update the Product model AND all of the Orders that use that product to use that new location?
TLDR; If I want to change the location of "products/1" to "location/5555", do I have to manually update the Product model AND the Orders to that new location?
Tyler, snort, see http://ayende.com/blog/156353/entities-associations-point-in-time-vs-current-associations
Without diving into tertiary relations, since you bring up the temporal issue again: Can you walk through what happens when you want to change orders/1 line for products/1 quantity from 3 to 4? Does orders/1 become orders/2, or do you edit it to have a 3rd line where the original gets marked off with an end date? How do you dive through the indexing to ensure the correct data state for a particular point in time is returned?
I know how to do this with relational data & SQL, just curious how a NoSQL solution changes perspective on this problem.
Steve, For orders? Order generally comes in two shapes. One is mutable, not really important, it has no real meaning except as a container for stuff user wants to buy
One is fixed, after you processed the order, and cannot be changed.
I don't think it was a good decision to make this an 'official' feature by posting it on your blog. It's still far too complicated, so it will produce lots of confusion when people try to use it. If at all, I would rather have it support syntax like this:
.Include("Lines,Product,Location")
But that would be hard to implement I guess.
Comment preview