Clemens on O/R Mappers

time to read 9 min | 1710 words

Clemens has a post about O/R Mappers. From his post:

Another argument I keep hearing is that O/R mapping yields a significant productivity boost. However, if that were the case and if using O/R mapping would shorten the average development cost in a departmental development project by – say – a quarter or more, O/R mapping would likely have taken over the world by now. It hasn't. And it's not that the idea is new. It’s been around for well more than a decade.

O/R Mappers are hard to do without framework services such as reflection. Code generation can support this for lagnauges without reflection, but this is harder to do, and much less flexible. I think that this is one of the reasons that ORM is only now getting a lot of notice from people (now being relative, Hibernate is very strong in the Java space, and has been for quite a while). I believe that the reason that ORM adoption is slow is simply because Microsoft's stance in the matter was "Using Stored Procedures and Result Sets (COM) / DataSets (.Net)" for a long time, and now it is "Wait for DLinq."

O/R Mappers certainly do not take the complexity away, I still need to be aware of it, but it doesn't get shoved in my face every time that I need to do something that touches the database. Understanding what is happening in all layers is important in this, as in all other things. The major benefit of using O/R Mappers in your application comes when you realize how flexible you suddenly are. You need to make a query that you have not thought of before? Just ask the O/R to do it for you, there is no context switch while you add an additional method to your DAL, write the stored procedure, test them both, and then return to your code. Instead, you just tell the O/R mapper to give the data you want. This is very powerful, and it is a huge productivity enhancement.

Clemens goes on to say that we should treat our database as a service, and apply the following principals to it:

  • Boundaries are explicit => Database access is explicit
  • Services avoid coupling (autonomy) => Database schema and in-process data representation are disjoint and mapped explicitly
  • Share schema not code => Query/Sproc result sets and Sproc inputs form data access schema (aliased result sets provide a degree of separation from phys. schema)

I already have an explicit boundary. There is my code, and there is the O/R library. The O/R library expose a set of classes / methods that I can use to query the database. How they do it, I really don't care about. It's pretty clear when I'm accessing the database and when I'm not. The fact that I don't have CustomersDAL or SqlConnection in sight doesn't mean that the boundary is less real.

Coupling between the in process data representation and the database schema is not something I worry about. Why? Because my O/R of choice (NHibernate, in case you missed it) can handle pretty much every schema that I throws at it, and do it more quickly (and likely more efficently) than I can do it. I'm not tied to a particular schema, and I don't care if my database is changed. All I need to do is update the mappings, and I'm done with it.

I'm not sure what is the difference between the second and third points is, since to me it seems like repeating the same statement, with different focus. I don't care for seperation from the physical schema in the database layer because I've already go that seperation by using an O/R mapper.

Every class of data items (table) surround special considerations: read-only, read/write, insert-only; update frequency, currency and replicability; access authorization; business relevance; caching strategies; etc. 

Yes, indeed, I fully agree with Clemens on this one. But I just don't see the relevance of this to the O/R vs Custom DAL argument. I can define all of that with ORM more easily than I could if I was writing my own DAL.

Proper data management is the key to great architecture. Ignoring this and abstracting data access and data management away just to have a convenient programming model is … problematic.

I sort of agree with this, and then I sort of don't. What does Clemens suggest here? Write Stored Procedures and Custom DALs? For what purpose, exactly? The API that I'll get from a well designed DAL is likely to be very similar to the API I get now (for much less work and effort) by using an ORM.

One thing that Clemens said that is likely to cause problems is:

Many of the proponents of O/R mapping that I run into (and that is a generalization and I am not trying to offend anyone – just an observation) are folks who don't know SQL and RDBMS technology in any reasonable depth and/or often have no interest in doing so.

Well, I know SQL, and I think that I know it pretty well. I am not a guru by any means, but I can use the tools that SQL gives me to get the correct result back. I still love O/R Mapping, not because I have no interest in SQL or RDBMS, but because they are grunt work. I enjoy working with SQL to solve complex problems, and I can make SQL Server cry and beg for mercy, but doing this for everyday work? Why on earth would I want to do that? I don't do memory management on my own anymore, and for much the same reasons. It's boring, it's easy to get wrong and it's just not relevant to what I'm trying to do most of the time.

I would like to end that with a story of explaining the cost to develop a new feature to a co-worker. (He didn't work on the project, by the way.) He just couldn't believe my estimates, the conversation was something like:

Me: Well, I guess this thingie will take us about a day.
Co-Worker: Are you sure? You haven't done anything in this area yet.
Me: Yes, we need to build all the UI from scratch, which is why it will take so long. It is mostly showing the data to the user, but there is some complex UI stuff we need to do here.
Co-Worker: But you don't have anything that get the data!
Me: So, it's just querying the database for it?
Co-Worker: But you haven't written anything to do it yet.

.. a couple of minutes goes by where I just can't understand what he is talking about

Co-Worker: Look, you have not done this, and you didn't allocate any time for it, how can you keep you estimate?
Me: Like this [notepad] Container.Repository<EmployeeContract>().FindWhere(Expression.Eq("Employee",employee)) [/notepad], this will fetch us all the contracts for the employee, and we can then show them to the user, who can then do XYZ.
Co-Worker: But it doesn't get you Salary and Bonuses for this contract.
Me: They are loaded when we load the contract, we don't need to take care of it.

(EmployeeContract, Salaray and Bonus are fictional, of course, but the idea was that I needed to pull the data from 4 - 5 tables for this page.) The problem in this conversation was that I got the infrastructure to pull the data from the database with great flexibility, I don't worry about how the database is structured, I let the O/R mapper take care of this.