Clemens on O/R Mapeers: Take II

time to read 8 min | 1468 words

Clemens has a new post about O/R Mappers, and this time he brings up several relevant points that I agree with :-). He brings up a comment (by Scott E) from the previous post:

I've hiked up the learning curve for Hibernate (the Java flavor) only to find that what time was saved in mapping basic CRUD functionality got eaten up by out-of-band custom data access (which always seems to be required) and tuning to get performance close to what it would have been with a more specialized, hand-coded DAL.

This comment seems to assume that the hand coded DAL will not need to be tuned, which I find hard to believe. Yes, there are times where you'll need to tune the O/R where you will not need to tune the DAL, but the reverse is true as well. On general, O/R Mappers will generate better code than you'll write (I can't be smart on th 54th search SP that I write to do something a bit more complex than the code generator can handle).

Defining and resolving associations is difficult. 1:N is hard, because you need to know what your N looks like. You don't want to dehydrate 10000 objects to find a value in one of them or to calculate a sum over a column....

No, it's not hard at all. You just need to define the association, the O/RM will take care of the rest. Both example that he gives can be done efficently by O/RM. Finding spesific values in an association:

select

p from Blog blog where blog.Id = @id count(blog.Posts.Comments) > 50

Getting the sum of a property (not column, mind you):

select

sum(c.Credit) from Customer c where c.Id = @id

Both of this statements work on the logical representation of the class, not on the physical schema in the database. The second statement travese two 1:N relationships, without affecting performance. [That said, this is probably not the way to do this, using filtering on the collection is a better way to do it, and just as efficently.]

1:N is so difficult because an object model is inherently about records, while SQL is about sets. N:M is harder.

I just wrote a post on using Active Record with many to many associations. Check it out, it's not hard at all. Not even doing searching / filtering / aggregation on it.

[in relation to in memory cache] What do you do if you happen to find an object you have already loaded as you resolve an 1:N association and realize that the object has meanwhile changed on disk?

Um, doesn't this belong to my concurrency strategy? Either I use optimistic concurrency and throws when the row was updated, or I handle it before had. In genenral, the in memory cache is short lived, it has a life time of a single request, so this isn't a problem. If it is, I can specify that the request will by-pass the first level cache.

Another question is what the scope of the object identity is. Per appdomain/process, per machine or even a central object server (hope not)?

In NHibernate, the scope is the session you're working on. A session is similar to a DbConnection in many ways, so it is usually a short lived object.

Transactions are hard...
... If you are loading and managing data as object-graphs, how do you manage transaction isolation? How do you identify the subtree that's being touched by a transaction? How do you manage rollbacks? What is a transaction, anyways?

A transaction is a kind of a poisonous snake that will give you ACID stomach, I believe.

Yes, transactions are hard. No, I don't give them up when I'm using O/RM. I really don't care about identifying the subtree that the transaction is working on, that is the job of the O/RM and the DB, not the application developer. Here is how I'll manage a transaction:

using

(ITransaction trans = session.BeginTransaction(IsolationLevel.Snapshot))
{
   
//do stuff
  
trans.Commit();
}

This will rollback automatically on exception, of course, but I can do it manually if I really want as well. What is the problem with that again?

Changing the underlying data model is hard.

No, it's not. All your programming is done against the logical layout of your objects not aginst the physical schema. You want to change the data model? Just change the mapping, and you are done. There are some cases where you can't do this without changing the code, but they would usually require code modifications to the code anyway.

Reporting and data aggregation is hard.

Reporting is one thing that I'll try to do in SQL, because the tools to do it are great, and because I'll often be working on a de-normalized database, rather than my production one. But aggregation is easy. The example that Clemens gives is XPath vs. DOM manipulation. And he is correct, if I had to do this stuff manually, but I don't. I get the O/RM to do this stuff for me (in the database, where those actions belong).

One thing that I'm not sure that I understood is:

O/R is object->relational mapping.
LINQ is relational->object mapping.
LINQ acknowledges the relational nature of the vast majority of data, while O/R attempts to deny it.

I'm not sure which O/RM he is talking about, but it's not one that I have used. And O/RM that deny the relational nature of the data just isn't going to work well. It seems to me that Clemens had a bad experiance with O/RMs (maybe the C++ one that he mentions?) and didn't look into the current offering in the field with enough depth. I just don't see many of the problems that he mentions as real problems unless you misuse the tools that you have.

That said, O/RM are indeed a leaky abstraction, and you need to understand what is happening under the cover, but GC is a leaky abstraction as well, and still it seems like a great productivity tool nonetheless.