Thinking about the Entity Framework
So, a few more things about the Entity Framework that came up today:
- One thing that worries me more than a little bit is that the Entity Framework has this three layers model, but all the focus is currently focused on 1:~1 (1:1 or nearly so) approach. This means that the interesting aspect of the Entity Framework, radically different models, is the least visited code path.
At best, this mean that it wouldn't be supported by the designer, at worst, it wouldn't be supported at all. In either case, pain is waiting for the implementors. - The Entity Framework doesn't support lazy loading, this means that your code is going to have stuff like: "customer.Orders.Load()" scattered throughout it. Microsoft is saying that they are doing this in purpose, in order to make it explicit when you are going to the database.
Frankly, I consider this as a cop out for the lazy loading problem. Now Microsoft can blame the developers for this problem. This is something that should be solved by the framework, not shifting the blame elsewhere. - This bring me to another issue, the Entity Framework's eager loading facility is a cartesean product nightmare. Trying to load a deep object grap will return a very wide result. NHibernate, on the other hand, has MultiQuery/MultiCriteria, which can be used to load a deep object graph with a single roundtrip, but with multiple result sets, drastically reducing the amount of data that goes on the wire.
Comments
Why oh why would that make you use a load method! That's just crap. Why bother have entities at all?
"NHibernate, on the other hand, has MultiQuery/MultiCriteria, which can be used to load a deep object graph with a single roundtrip, but with multiple result sets, drastically reducing the amount of data that goes on the wire."
A deep graph in a single roundtrip? You gotta be kidding. :) As soon as a graph has a branch, you need subqueries, as joins will kill your performance pretty badly because of all the empty columns.
In straight, linear graphs (no branches), you might get away with this, but still, it will give a lot of columns to wade through.
Another thing is that you can't optimize fetches of nodes based on a small set of values: you always have to rely on the filter of the parent to formulate the subset of the childnode. This can hurt performance pretty badly if the filter is expensive and gives few results.
So all in all, I don't think it pays of to have everything crammed into a single roundtrip.
Frans,
No, actually, I am not kidding, at least I hope so.
session.CreateMultiQuery()
.Add("from Blog b left join fetch b.Posts")
.Add("from Post p left join fetch p.Comments")
.List();
Here is one very trivial example, it would send the two queries to the database, in a single roundtrip, which return multiply result set back.
There is still too much duplicated data that is being returned, BTW, but the cost of that vs. Cartesian product is huge.
Bleh. I just don't like what MS is doing with Entities at all. Still wayyy to much "the database is first" design. Yeah, we all know about leaky abstractions, but this is taking it too far the other direction. I agree with others that MS is taking ORM backward, rather than forward for the most part.
Thankfully, Linq itself is still cool, and highly extensible. You can still get the niceness of Linq queries with the power of NHibernate.
"Here is one very trivial example, it would send the two queries to the database, in a single roundtrip, which return multiply result set back.
There is still too much duplicated data that is being returned, BTW, but the cost of that vs. Cartesian product is huge."
Ok, so it does send multiple queries, Ok, I was under the impression it would send a single query. Though how is the second query filtering on b (say we specify only blog '1' ) ? Or are we talking about different things?
The product isn't necessary, you merge the results using hashes in-memory. The advantage of that is that you can optimize for smaller sets of parents, because you can cut out joins and filter directly using IN clauses with values instead of a subquery with the parent filter. For example if you have 1000 blogs and you want the graph of blog '1', your second query needs that filter as well, otherwise you'll get back all posts of all blogs. You have to base the filter of the second query on the size of the resultset of the first, this isn't doable in a single roundtrip, but it could save a lot of performance in the second query so it's definitely worth the extra 5 miliseconds it takes to send a new query over the wire. :)
Frans,
it is somewhat awkward, but you basically specify the constraint on the query:
session.CreateMultiQuery()
.Add("from Blog b left join fetch b.Posts where b.Id = 1")
.Add("from Post p left join fetch p.Comments where p.Blod.Id = 1")
.List();
AH you've to do it yourself. Ok, I didn't know that, I assumed nhibernate was smart enough to fill it in for you ;) :P
No, that is for manual stuff.
I haven't had the chance to integrate similar functionality to the core NHibernate.
Does that multiple result set feature work with Oracle, or just SQL Server? I was just trying to figure out how to send multiple commands in a single trip to an Oracle database, and the only thing Google could tell me was that you need to build a stored procedure that returns a REF CURSOR. I can't imagine that solution would work for NHibernate, so I'd love to know how they do it.
Joshua,
Not oracle expert by any means, but I know that some guys in Oracle is working on adding this capability to Oracle.
This is more complex in ORA because you need anonymous SQL block that returns multiply output cursors. This should come up in the mailing list, the guy from ORA is hanging out there.
Comment preview