What is up with the Entity Framework vNext?
Every now and then I do a quick check on the EF blog, just to see what there status is. My latest peek had caused me to gulp. I am not sure where the EF is going with things, I just know that I don’t really like it.
For a start, take a look at the follow sample from their Code Only mapping (basically Fluent NHibernate):
.Case<Employee>(
e => new {
manager = e.Manager.Id
thisIsADiscriminator = “E”
}
)
There are several things wrong here: “manager” and “thisIsADiscriminator” are strings for all intent and purposes. The compiler isn’t going to check them, they aren’t there to do something, they are just there to avoid being a literal string. But they are strings.
Worse, “thisIsADiscriminator” is a magic string.
Second, and far more troubling, I am looking at this class definition and I cringe:
public class Category{
public int ID {get;set;}
public string Name {get;set;}
public List<Product> Products {get;set;}
}
The problem is quite simple, this class has no hooks for lazy loading. You have to load everything here in one go. Worse, you probably need to load the entire graph in one go. That is scaring me on many levels.
I am not sure how the EF is going to handle it, but short of IL Rewrite techniques, which I don’t think the EF is currently using, this is a performance nightmare waiting to happen.
Comments
I'm not so sure there is anything wrong with the first part there.
Yes, "manager" and "this IsADiscriminator" are essentially strings, but those are column names. At some point they have to be strings.
I think using anon classes like this has some value: 1) you don't have to type the quotes (small I know) and, 2) these types can be used later in the mapping call chain via type inference which could then provide you with compiler checking of those column names if they are referenced again. (This one is something of a gut-feel, I haven't figured the details out.)
I don't have any comment on the second item though.
Oren,
I wrote that post.
So as you can imagine, I'm sorry I made you cringe.
"thisIsADiscriminator" isn't a magic string by itself, it is assigned a constant value, which makes it a 'magic' string, or in this case a discriminator.
EF4 supports pure POCO's using snapshots but of course we will encourage virtuals etc so we can handle lazyloading, change tracking etc via proxies.
So maybe I should just go ahead and re-write the sample illustrating best practices for client code, i.e. Virtuals etc.
It's tempting to think that everything written in every code snippet has some hidden meaning / implications, but sometimes, well, it just doesn't.
Cheers
Alex
I agree about the anonymous type part, but that is only one way to map a type so I don't really care. And they just forgot to put virtual on the properties. If you followed entity framework blog a bit more you would have known that they do lazy loading through proxies just like NHibernate. Maybe do a bit of research more than just a cursory galnce before you click post.
I have to agree with darren. Just because an example is bad doesn't make the whole thing bad. I'm glad you gave yourself an out by stating that you don't follow it very much. :)
@Alex - I would very much like to see what the best practices are for EF vNext. I'm currently using LINQ to SQL on a project (chosen before EF came out) and wish we were using it differently. Our current usage is based on what we saw originally. Now, of course, the recommendations are quite different.
One of the best things Ayende and others do with NHibernate is provide excellent examples and helpful feedback on how to use the tool. You won't find very many bad examples from those who promote NH. If you want EF to succeed, you should focus on getting a message of excellence out there.
I'm not a huge fan of the relationship mapping (column pointing directly at the Mgr.Id - meh), but wouldn't the corresponding Fluent NH mapping be very similar:
ClassMap <employee {
....
<manager(m => m.Map(x => x.IsEProp))
<janitor(m => m.Map(x => x.IsJProp));
....
}
Now this reminds me of a recent discussion here between Ayende and Alex Yakunin, author of DataObjects ORM - but at least Alex has made some homework before pointing out weaknesses in 'competitor' product. Since both parties are very attached to their products the arguments become more emotional than rational. Apart from that, I'm sure I remember the issue of lazy loading in EF being brought up and explained here some time ago, so this is strange its here again.
Alex,
The problem in supporting non virtual by default is that you are going to hit every performance boundary that there is.
I am not sure what you mean by snapshots, but I can tell you that with NHibernate, not enforcing virtual by default was a decision that we came to regret and we changed that between 1.0 and 1.2
People will forget, and then they will load the entire database, then they call you at some ridiculous hour crying that their production site is done and how EF is to blame because it loads everything.
The end result is going to be a lot of anger about the code only version, because it is full of trip wires with regards to forcing eager loading.
There is a reason why it makes me cringe. The defaults _matter_, and using eager loading by default will KILL most applications.
As for the column names, I don't see the point in making this an anonymous type. You are better off making a honest string out of that column name, at least that way tools like R# can track it.
Darren,
The issue isn't with how they are doing lazy loading, I know how they do that.
The issue is that an eagerly loaded class is something that you should be jumping through hops to get, not something that is there by default
Ryan,
I have an issue with specifying the column names in this manner, it is a string, so treat it like a string.
Not having double quotes around it doesn't make it any less of a string
It's sad to see the EF team going down this path. Same as ASP.NET MVC - they needed a Dictionary <string,object> , so instead they just ask for "object" and expect you to use anon types. Silly. It's a hack to get around C# not having tuples or nicer collection literals. They should just be able to write
e => new[] { {"manager", e.ManagerId }, ... }
Unless I"m misunderstanding your gripe about the query, that LINQ syntax is standard for LINQ. It is not particular to EF or to code-only or to Alex.
Oren,
EF v4 does not eagerly load associations. By default you have to explicitely load them by using PropertynameRef.Load() command. Lazy-loading is optional and has to be enabled on the data context.
Snapshot change tracking means the data context compares values for every property in the graph before saving changes which can get quite slow. So proxies get very useful for change tracking.
The biggest problem with EF proxies is there are no extension points. You cannot have a proxy of a class instance with only the ID property populated which means you have to jump through hoops to assign an association without loading the object from the database. The upcoming foreign keys feature are just a workaround in my opinion. There is also no enum support. Also, there is no way on the data context to require proxies so if you forget to mark a property virtual you silently get a different behavior.
Dmitry
I personally believe that lazy loading should not be the default behavior of an ORM.
Having lazy loading disabled will not introduce bugs into your application, just potential performance issues and you can optimize these by lazy loading or rethinking your DAL strategy.
Having lazy loading enabled can introduce bugs into your application. In a 3 tier winforms scenario, you jump to an application service to load your domain object, jump back to the client and hit a lazy loaded collection and now im trying to execute a query on a workstation that doesnt even know about a database.
The fix? Force the developer to explicitly say they dont want to lazy load on EVERY mapped collection. NHibernate's lazy-load by default is a premature optimization IMO.
I use NHibernate. I love NHibernate. But I disagree with "you are going to get people crying at night unless you prematurely make everything lazy loaded!!!". This argument is implying that developers should lean on their ORM tool to prevent them from having to think about scaling issues. Naughty boy!
I think this might by an awesome decision for 2-tier web applications, but not everything.
Julie,
My gripe is no with the query, because there IS no query, it is a way of defining mapping.
And as a way of defining mapping, it tries to avoid using literal strings, ending up in just literals. There is no difference, and the syntax looks worse.
Dmitry,
With the class definition that was given in the post, you have to eagerly load, because you have no way to DO lazy loading.
That has huge performance problem
P,
Write an ORM or two and then come back and tell me about it.
Trying to force the user to tell you what they want exactly doesn't work, it put too much burden on the user.
Trying to do eager loading all the time is a performance nightmare, and you will only see it on production, in many cases.
Oren,
The class will be eagerly loaded but the association property will be null unless you explicitely run the ProductRef.Load() method which will replace the List <t instance.
Oren,
Again, this is all because you haven't been following the changes to EF, and that they made a simple mistake in the blog entry. Dmitry is right that in EF v4 everything is lazy loaded by default, and it is done by using proxies and the properties have to be virtual.
If you go back and visit the blog entry, you will notice that Alex has updated the code sample, so maybe you should do the same to this blog entry?
I am a user of your ORM, and my burden is having to say stop lazy loading everything! :)
And I can optimize my DAL techniques to get around larger quantities of data. I cannot optimize my code around some framework that magically executes a query clientside when im looping through a collection.
[QUOTE]The problem is quite simple, this class has no hooks for lazy loading. You have to load everything here in one go. Worse, you probably need to load the entire graph in one go. That is scaring me on many levels.[/QUOTE]
This isnt scary to me, because I have invested the time to think about and look at the queries that are generated by NHibernate. However, if you use NHibernate as the next "Set it, and forget it!" tool then yes, I suppose that is scary.
To each their own :)
Set it and forget it: http://www.youtube.com/watch?v=lsY6eaKsFW4
Dmitry,
You are kidding me, right?
If that is the case, I am not sure what is scarier. The implications of lazy loading or the implications for the broken model that you are going to have.
Darren,
I don't consider this a simple mistake, not taking into account lazy loaded semantics is BIG.
This isn't the first time that I commented about this exact issue.
P,
default-lazy="false", you are done.
I totally got owned!
"In a 3 tier winforms scenario, you jump to an application service to load your domain object, jump back to the client and hit a lazy loaded collection and now im trying to execute a query on a workstation that doesnt even know about a database."
You're sending entities over a service boundary and thus outside of your UoW. Most often a very bad decision.
Our service boundary is our UoW.
@Oren,
This is exactly how EF v1 works and the default behavior of the data context in v4. You can enable lazy-loading and proxies manually in the constructor by extending the data context class. But if you do that, you need to be responsible to make sure everything is virtual or you will get strange behavior silently. It does not through exceptions like NHibernate but rather defaults to snapshot/explicit loading behavior.
Dmitry
A huge problem with this explicit loading is if the data context is shared across multiple functions (say inside an MVC controller), a query can produce different results based on whether the association was loaded or not inside previous methods calls. I totally agree that the model is broken.
Dmitry,
This is totally broken, then.
p - the problem is that the client is receiving entities, and not DTOs.
With a proper way to load things eagerly (e.g. with proper prefetch paths which allow you to filter, sort etc.) eager loading isn't that performance intensive.
The thing with lazy loading is that it allows developers to bypass repositories, or a service, and for example allow a GUI developer to cut corners and use lazy loading to fetch data.
Eager loading of data into a model doesn't mean you load everything, you still tell the db what to load. So I'm a little curious why you point to eager loading as something utterly evil.
Frans,
Having a model that force ONLY eager loading is evil.
LukeB - I understand what you are saying now.
Frans,
[QUOTE]Eager loading of data into a model doesn't mean you load everything, you still tell the db what to load. So I'm a little curious why you point to eager loading as something utterly evil.[/QUOTE]
This was my thought exactly.
@Alex James
I haven't finished all the comments, but you really need to make sure that any example you post on the EF site is written properly. This has long been an issue with Microsoft (just look at the horrible WCF documentation for example).
There are plenty of developers who read posts that come from the mothership as gospel, so you owe it to them to only produce quality examples.
I'm sick of hearing developers say "But I built it just like Microsoft said!" when trying to example why something crashed and burned.
I agree fully that forcing eager loading is VERY EVIL. It'll just kill off any application in production, especially web one.
Lazy loading is actually the best default you can use and the most failsafe method, as it does not torture your data bandwidth with all that data you don't need.
However, in EFv1 I missed lazy loading the most. Magic strings and having to know how database is structured and type all those table names? - Horror!
Smart LINQ 2 SQL lazy loading model is soooo ftw.
Eager loading is evil.
But it does not force eager loading unless you are talking about local properties that come from the same table/view. In fact the only way to force eager loading is through the Include (prefetch) extension method in LINQ or do a similar thing in ESQL.
"Having lazy loading enabled can introduce bugs into your application. In a 3 tier winforms scenario, you jump to an application service to load your domain object, jump back to the client and hit a lazy loaded collection and now im trying to execute a query on a workstation that doesnt even know about a database."
P, in my opinion that is bad design. You are using your domain model for UI purposes. That will get you into trouble anyways if your business tier and data tier are deployed separately.
Best design practice in my opinion is to map queries to special DTO's for displaying purposes. And send commands to your business tier, which will actually use your domain model with lazy loading. Since - at least in my applications - most commands to the business tier only operate on one aggregate (for example order, with order lines, delivery settings, etc.), lazy loading will be perfect.
Looking at it from that perspective, then NHibernate does the correct thing. Lazy loading by default.
EF in .net 4.0 is going to be pretty nice, but not perfect. I think people will have to weigh the benefit of integrating nHibernate into a new/existing solution and having to potentially write plumbing code...or get EF "as is" (the bad and good) and have it work automagically with Microsoft's other ADO.NET-type middleware/services.
Frank,
[QUOTE]P, in my opinion that is bad design. You are using your domain model for UI purposes. That will get you into trouble anyways if your business tier and data tier are deployed separately.[/QUOTE]
The application has a rather large aggregate root that is brought down to the client. There is a series of presentation models that are all mapped to different UI components, and each presentation model represents a specific chunk of the aggregate root. After hitting about 10-15 different UI pieces in a wizard like fashion, the aggregate root is flushed to the service boundary and where a unit of work is wrapped around persisting the root and executing commands against the root.
I am not sure why this is bad design? We chose to optimize for a single call to the service boundary and build all of our presentation models from this, rather then call the service boundary 15 times when a user is taken to the next phase of the wizard. The problem is creating the presentation models from the root. If i hit a lazy loaded portion of my root while creating my presentation model its all over.
Please tell me why this is bad design? Please tell me why in every scenario, lazy loading is always better. I wish I had never played devils advocate because now there is going to be endless claims about how bad my example is without actually having any context.
+1 Ayende for the default setting I didnt know about
+1 Frans for saying what I was trying to say, but better
We are probably misunderstanding each other.
I can certainly agree with your decision to return the complete aggregate when you ask the business tier for it.
In my case, I would probably go with a domain model which stays in my business tier and cannot be directly accessed by the UI tier.
For presentation purposes I would have DTOs that represent the complete aggregate. When the UI tier asks for it, I would start a session (or datacontext, etc.), then retrieve the actual domain model entity, convert it to the DTOs, and end the session. That way you won't be bothered by lazy loading or whatsoever in the UI tier.
For executing commands on the business tier, like updating the complete aggregate, I would use about the same steps. (Re)using the DTOs from UI to business tier, start a session, update the actual domain model entity with data from the DTOs (or whatever exciting business case you might execute), flush the session, and close it.
In both cases I can leave lazy loading at it's default, and where possible, based on some hard facts, decide where to actually default to eager loading.
I hope this all made sense. And if people have better ideas, or think differently about this, I would love to hear. I'm all for new ideas to improve my own applications.
PS: At my current company, we are unfortunately not using OR/Ms, but I apply somewhat the same strategy even when using DataSets and when the application logic (business tier and data tier) must be centrally located. The rest of my thoughts is based on playing around with NHibernate, LINQ to SQL and such. So, don't consider me to be a big expert on the subject.
"...but I apply somewhat the same strategy even when using DataSets..."
Ouch!!!! I feel your pain brother!
P,
"If i hit a lazy loaded portion of my root while creating my presentation model its all over."
Evict (Detach) the object from the session from your service layer and you won't get that nasty exception.
But that still doesn't solve the problem of the data for the property not being loaded. :)
"You are better off making a honest string out of that column name, at least that way tools like R# can track it"
Excuse my ignorance, What is R#? I went on a small wild goose chase googling it with no apparent result.
Frank,
To be clear, when I say client I am not meaning UI. The client is just means the client machine. My domain model is what is loaded through NHibernate and what I ship back and forth between the client and the service layer. However, the domain model is never pushed into the UI layer.
To me the client is a physical layer that is composed of 2 logical layers; the presentation layer (view models & presenter) and the UI layer (controls aka. views). The presenter invokes the service layer, gets a domain object, pushes it into a stateful store, and then maps aspects of the domain model into view models that are tailored for databinding on the UI. The translation from domain model -> view model is where lazy loading can really sting you.
So a solution is to introduce a 3rd type of model that has no connection with NHibernate, and you use just to ship the aggregate root across the wire but what does that buy you? At this point im fighting with a tool that is supposed to be helping me. The alternate solution is to never lazy load any collections and have a fine tuned aggregate root that doesnt kill the system when its loaded/flushed.
I am sorry if I am hijacking this thread, but I really want to make it clear that lazy loading, in some contexts, is not a great idea.
Thilak Nathen,
R# = Resharper
[QUOTE]Evict (Detach) the object from the session from your service layer and you won't get that nasty exception.[/QUOTE]
As you said, this doesnt fix the problem; it just stops the errors from coming up. The fix is just to not lazy load.
Resharper! smacks forehead
For the particular architecture you're talking about, your solution is absolutely correct. It's a disconnected service layer providing "completely loaded" data as per the service contract.
Lazy loading starts to really shine and make more sense in architectures that are more stateful... where there are no explicit service boundaries... where the domain layer is much more interactive all the way up. A lot of things all of a sudden start to fall into place. You can now take advantage of contextful lazy loading, sorting, paging, etc.
Whether I agree with lazy loading being default on an ORM, I'm not too sure. It's almost like saying paging should be there by default.
"Having a model that force ONLY eager loading is evil. "
I don't see that. Eager loading is a way to tell the o/r mapper to load what you want and pass back the graph to the caller so the code in the caller doesn't have the chance to touch the DB. That's essential in code which uses repositories: any object returned from a repository which supports lazy loading bypasses the repository of the element that's lazy loaded.
In fact, I'll go as far as that with a puristic DDD implementation, there's no lazy loading nor is there eager loading, simply because all related entities are always loaded through their repositories: the orders of a customer are read by the order repository, not by touching the Orders property on a customer instance.
I have read 'evil' a couple of times now here, but no-one has given even the slightest argument why. So let's give that one, shall we?
Eager loading is 'evil' if the eager loading system is not allowing you to define the exact graph in detail, as the 'evil' part comes from the fact that you will load data you don't need. That might be inefficient, but in practice, that's just BS in a lot of cases, simply because the DB engine has read these rows anyway.
A solid eager loading system can load related entities with 1 query per graph node. that's much more efficient than lazy loading.
Lazy loading is also 'evil', namely in the situation where you need to load a lot of related elements which cause a lot of queries, which could have serious impact on the performance as well.
So calling one 'evil' over the other is really stupid, as both aren't evil all the time, they're only evil in some situations, namely, the situation where their downsides are noticeable.
I know I've been a big fan of lazy loading as well, however as I built a solid eager loading system later, I understood it's not all black and white: both are less-optimal versions of what's really required: a custom projection of a joined set.
I'll point to an old blog by Matt Warren, one of the Linq to sql developers. It's about spans in objectspaces. the blog talks about the pro's and cons of lazy loading and eager loading (spans). If you scroll down you'll read a reaction from me from 2004 which supports your PoV. But as I said, eager loading is IMHO not evil, it's just not always the best choice, so is lazy loading (SELECT N+1 problem comes to mind):
blogs.msdn.com/.../90275.aspx
Funny how its ok to slam EF but when anyone criticises nHibernate they're wrong.
Although EF is still in its infancy I can see it being far better for the nHibernate application I am working on which currently has many performance problems with loading data - both eager and lazy loading.
Care to elaborate why NHibernate is causing performance problems with loading data and how EF will solve that?
Frans,
You are probably aware that I am not overly impressed with repositories.
The issue isn't eager loading evil or not.
The issue is that if you don't have lazy loading as well, you model is broken when you try to touch the parts that weren't eagerly loaded.
There are situations where it is actually MORE efficent to page things using lazy loading than any other way.
@Frank - I can't really explain in depth as this is not the place for it..
but basically the object model has relations that force the loading of child/parent data. We are fixed in to having to return data against this model, so eager loading loads many related objects, even when you don't need the data from them. Lazy loading is not possible via ActiveRecord on one to many relationships, so thats out. The one to many relations can be lazy loaded, but this causes excessive server roundtrips.
With EF I would be able to define an entityset that only includes the data Im interested in, whilst not disturned the overall model.
@Mark
QUOTE Although EF is still in its infancy... UNQUOTE
Microsoft have been talking about EF and writing articles about it since 2006. That makes it a technology in its "Toddleracy" at the very least!
Title: Visual Studio 2005 Technical ArticlesThe ADO.NET Entity Framework Overview
Date: (June 2006)
link: http://msdn.microsoft.com/en-us/library/aa697427(VS.80).aspx
Microsoft needs to dump EF and just use NHibernate.
@Brian Chavez Why?
nHibernate don’t have proper support for Linq (example join). Don’t tell to use HQL that don’t have support for strongly typed (intellisense)
@Ayende any plans for linq to nhibernate 2 which won’t use Criteria API?
Comment preview