Persistence Ignorance in the Entity Framework
Dan Simmons is talking about persistence ignorance in the Entity Framework. It looks like Microsoft is going to make an effort to make the EF support at least some level of PI in V1, and pure PI in following versions. This is a Good Thing.
I would like to talk about some of the details in his post:
There's no question, though, that complete persistence ignorance comes at a price--both in the performance of applications built with "pure POCO", persistence ignorant domain models and as a result in the complexity of the entity framework which enables them.
There is a question, actually. I can't speak of the complexity of the EF, but I can speak about the framework requirements from OR/M solution that wants to support persistence ignorance. From the rest of the post, it looks like trying to do PI in the EF is going to have some issues with the existing infrastructure.
Dan is bringing up two issues:
- They must store a copy of an EntityKey.
Proposed solution: Choosing not to store EntityKeys on the entities, for instance, means that navigating from an entity to the ObjectStateEntry which matches it either requires a brute-force search of the ObjectStateManager or for the ObjectStateManager to maintain a dictionary mapping from CLR reference to ObjectStateEntry which is a significnat expense.
My reaction: You are going to need to keep track of loaded objects anyway, because you need to do identity resolution, this almost guarantees that you are already keeping track of this somewhere, why not utilize it for the same purpose?
Also, what is the significant expense in keeping a dictionary of <EntityKey, Entity> ? - They must provide change tracking notifications through a prescriptive interface.
Proposed solution: Not supporting the change tracking mechanism means that the ObjectStateManager must cache a copy of the original values for each entity (all original values and they must be cache whether or not the entity is modified).
My reaction: The way you do it now is basically the same, only they are not in the state manager, but in the entity itself, what is the big difference? There is a case here because of value types copy semantics (entities are usually composed of values types and strings), but I don't think that it is a big issue, in most cases, the lifetime of both entities and the session cache is short.
If it bothers you, you can implement change tracking by interception, and add only the modified values to the ObjectStateManager.
One thing that is really encouraging is the suggestion that users of the EF will be able to use custom collections. They are extremely helpful in a number of scenarios. I already mentioned the temporal aware collections that hooked into NHibernate in order to give us much better domain model. The last I have heard, it was not supported in the EF. So it is good to hear that they are thinking about it.
One thing that I haven't heard anything about is the extensibility mechanisms that are exposed. Specifically, custom types, persistence approaches, etc.
Comments
Thanks for the response to my posting!
Implementing PI in the EF isn't going to create problems with the existing infrastructure--it's just a matter (as always) of picking the correct features and then being careful to implement them.
With regard to the two issues... The problem with not storing an EntityKey on the entity is not that we don't have the information somewhere else, you are right that we do, it's more a matter of the efficiency of navigating from an entity back to the place where the additional information is stored. The entity key is just a more efficient way of doing this navigation--all of the additional information is stored indexed by EntityKey. If you have the entitykey on the entity, then it's easy and efficient to get to the extra info, but if you don't have the entitykey on the entity, and the entity is all you have, then you need a different kind of lookup--either brute force or an additional index (in this case a dictionary). We will support both mechanisms, but the perf is affected. In many cases, the affect won't be large, so it's fine, but in large scalability situations it does become an issue. As you might expect, since the EF is a core framework we are giving a lot of attention to performance in an attempt to keep the overhead to an absolute minimum. So we're very sensitive to this kind of thing.
With regard to the original values, in the EF today (and in the future if you opt-in for change tracking directly from the entity), we can store original values on a value-by-value basis only when the property in question is modified. If you don't provide notifications at just the right time, though, the framework must store the original values ahead of time which means storing more copies of the data than may strictly be necessary. Again, this is not a big deal for some scenarios but can be a big deal in others.
Custom collections is something I also believe important, but unfortunately I doubt we'll be able to enable them for the first release--that will probably have to wait for the next release.
With regard to extensibility mechanisms, I'd love to talk about what kinds of mechanisms you have in mind and think would be most helpful.
Custom types is an interesting one--I assume that you mean custom primitive types with "conversion" of some sort between what's actually stored in the database and that type. I don't know how far we'll be able to get in that direction this time around, but it is something we're talking about to a certain extent. One interesting question is whether it's reasonable to model these with custom classes which have properties or fields that are 1-1 with the database (what we call complex types in the EF and will definitely be supported), or if we have critical scenarios where you want a custom type which requires custom code to convert from values in the DB to the custom type fields and back.
Also, what do you have in mind when you talk about extensibility of "persistence approaches"?
Again, thanks for the thoughtful response.
Custom types means that I have data in the database that reside in one or more columns to which I want to do translation in my model.
Simple examples, "yes"/'no" field to boolean, Amount/Currency columns to Money object, BINARY to System.Drawing.Image, etc.
There are various scenarios around that, usually when integrating with legacy databases, but not just there.
Extensibility of persistence approaches - the ability to extend the way that the framework get the data. This is a wide topic, but it include anything from being able to provide custom SQL, custom SQL generators, talking to non relational sources, etc. All within the framework.
It should be something that I, as the end user of the framework, should have the ability to do - those are features that I have found crucial when it comes to integrating in most systems.
Dan,
Regarding change tracking and performance. My apprach has been to first have a light weight read-only Entity without any change tracking plumbing. In many cases Entities are read much more than they are written to so you get performance / footprint gains.
When tracking state it can be done on the Entity or in a State Manager, I have done this both ways and yes it goes againts POCO but a read-write Entity by design needs change tracking so I choose to place it in the Entity, in the end it's a matter of choice, each will be more or less performant depending on the end usage scenario. Writes in most cases happen much less than reads so I think it's a non issue if read-only is used.
I have no Idea what you are doing with EF collections, lazy and aggressive loading, and concurency. Entity Relations and Inheritance are the more complex issues. Why don't you start with some code and relationship scenarios and post for feedback, you could also enable anonymous comments on your blog.
Comment preview