Lazy loading: The Good, The Bad, And The Evil Witch
Frans Bouma commented about lazy loading:
In general lazy loading is more of a burden than a blessing.
The reason for this is that it leaks persistent storage access to different tiers via the lazy loadable associations. If you want to prevent your UI developers to utilize lazy loading, or are sending entities across the wire to a service, how are you preventing that lazy loading is called under the hood? We support 2 models, one has lazy loading, the other one doesn't (and is more geared towards disconnected environments).
You don't really miss lazy loading in the second model really, as long as you have prefetch paths to specify prefetching what you want (also into an existing graph) The thing is that the model then forces you to write more service oriented software: make the call to the data producer and tell the data producer (or repository, whatever you want to call it) what to get and you get the data and work with it. there's no leaky lazy loading under the hood bypassing the repository, you need to call the dataproducer to get the data, period.
The problem with this approach is that pre-suppose that you are going to provide all the fetch modes in the application. I find that this is quite a burden in my applications, because I explictly don't want to deal with those concerns 90% of the time. I never quite understood the desire to protect the application from the UI developers, but that is another issue.
I have heard lazy loading described as virtual memory for data, and I agree with this description. Lazy loading is a tremendous convenience compare to the manual management of the data. If you walk into a serious performance conversation, you would often hear terms such as memory locality, minimizing paging, etc. This is after quite a bit of time with Operating Systems that take care of that transperantly. I believe that there are still people out there that do their own manual paging, but those are usually guys like Oracle and MS SQL Server.
Those are willing to take the burden of managing paging themselves, because they know hey have better knowledge of is happening, and they need this type of control. There is a reason there is such a thing as SQL OS.
For most applications, even the very big ones, trying to do that is foolhardy in the extreme. You certainly want to be aware of what is going, and you will certainly get significant performance improvements by reducing the paging patterns of the application, but you don't want to manage it yourself.
I view lazy loading in the same manner. It is something that I really don't want to live without. It is something that can cause some really bad perfomance if you are misuing it, but so can any tool in the tool box. (You had better believe that a hammer can do some serious damange).
While I don't think that we are at the same level of maturity for lazy loading as we are for paging, I definitely see this as the way we are headed. Manually loading the stuff we need is cumbersome, it is much easier to let the tools do their job, and hint them in the right direction when they don't do the right thing.
Comments
My only struggle with lazy loading is when I have a n-tiered application that involves a webservice (or remoting)
ie.
UI controller calls for 'Employee' - the web service is invoked, that returns a employee. Employee could have a list of addresses. There is no direct connection between the UI and the database since all traffic is through the webservice. I find either the object graph must all be sent, ie. all addresses or a manual mechanism is required, aka. retrieve addresses by Employee Id.
UI Controller
WebService
Repository/Domain Objects, etc..
DataBase
Now, without the WebService layer, it does ease this. But I can't always architect the application that way.
Perhaps my approach is flawed?
I wouldn't call it flawed, but I wouldn't do it. For remoting scenarios, there are a different set of best practices that should be followed.
In those scenario, self contained messages are probably a good solution, but not if you find yourself building something like:
OrderService.GetOrder()
OrderService.GetOrderWithLines()
CustomerService.GetCustomerWithOrders()
CustomerService.GetCustomerWithOrdersAndLineItems()
etc...
You can actually do lazy loading yourself, it is not very difficult to do, but there are tradeoffs that you may need to consider (complexity, for instance).
I view lazyloading with suspicion now. While Base4 supports lazyloading and pre-loading, the problem is it doesn't support no-loading or custom lazyloading. i.e. I can't tell Base4 not to lazy-load if the data isn't pre-loaded.
That may sound like a minor point, but it is critical. See unlike other systems that I know of Base4 stores the definition of entities, properties, relationships etc in the database and they are accessed and manipulated in the same way as all other data, i.e. same API.
You end up with a sort of chicken and egg problem, and without going into specifics lets just say I lost count of the number of times in the core of the system I wanted to alter how the lazy-loading occurred, so I didn't get into a stupid stack overflow situations, for example while converting a filter expression into SQL.
I can sympathize with this pain. How have you implemented lazy loading? Using proxies, or some other way?
two different ways
For simply foreignkeys it was a simple proxy i.e.
public Person Mother{
get{
}
For one2many or many2many I used a custom collection that tracked things like attempts to enumerate (i.e. do lazyload) or alter the collection and provided interception points to allow for a transaction of necessary synchronizing updates to be created (i.e. the insert into the ManyToMany join table).
This incidently is why I like REST
It is nice to be able to inject logic so that when someone does a post here: http://server/person/8/friends I can have a http handler that can change independantly of the client and do arbitrarily complicated jobs to keep the conceptual / object model in sync with the data model.
Cool huh...
For web applications, I'd embrace lazy loading. It's tricky, but it can be managed.
For smart clients, it's very dangerous, you don't have 'explicit boundaries'. Your 'GetCustomerWithOrdersAndLineItems' it's actually not doing lazy loading but eager loading.
I wrote a post 2.5 years ago that seems relevant http://weblogs.asp.net/aaguiar/archive/2004/10/06/Lazy-Loading-is-a-domain-problem.aspx
Regards,
Andres
I agree with Andres. By definition, a web app defines and knows the context in which the Domain is being called. It can be tricky to figure out exactly what is to be fetched, but it's a solvable problem. I don't even know where to start thinking about how to effectively architect a smart client in this way.
In a previous post I described the two common scenarios for fetching an object from the database. More...
In a previous post I described the two common scenarios for fetching an object from the database. More importantly, I showed the importance of having a different fetching strategy in the case of retrieving data for the purposes of passing...
Recently I replied to a post on Ayende's blog which I'll quote below: In general lazy loading is more
Hey I was just re-reading my last comment and wondered outloud do you understand any of my REST ramblings at all?
More on Lazy Loading vs. Pre-loading in O/R mapping scenarios
@Alex,
That isn't different that the way an OR/M does it. The implementation differs, of course, but the basic concepts are just the same.
I don't claim to grok REST at any meaningful level, but I get intercepting the call and getting the data.
I would rather put the logic for the lazy load in the interception, rather than on the HttpHandler
OR/M may do this stuff with interception etc, but the fundamental issue is that the interception is compiled??? Which re-enforces the dual schema problem?
Or am I missing something?
NHibernate does this by generating runtime proxies for the classes, and intercepting the calls this way. It doesn't require any intervention in the compiled IL.
These runtime proxies live inside the executable right? so if you're data model changes surely you need to restart in order to get it working again?
So I suppose while there is a dual schema problem it is quite a lot less severe... i.e. Restart needed rather than re-code + recompile + restart.
The thing about REST is you can change at Runtime without a restart?
Or am I missing something still???
They live inside an assembly that was gen`ed at runtime.
How do you handle schema change at runtime right now? Do you generate an interface, or work against string like rest["Name"] ?
Basically, how do you get notified about a schema change, and what do you do then?
Just picking up on your comment about wanting to protect the application from the UI developers - I see it like this:
The UI developers shouldn't have to be aware of the "edges" of the object graph.
The main problem I have with ORM and objects being passed to the UI are that they represent a vast object-space which can be navigated theoretically endlessly. Back at the server, with the support of lazy-loading, this is the case, but at the UI end (at least in disconnected systems - a WinForms app, for instance), where there is no database connection, things fall apart.
When ORM objects are passed to the UI developer they need to be aware of the edges. What I prefer is to pass back a crisply-defined business object which (back on the server) was loaded using an ORM (and using lazy-loading features of the ORM to the fullest extent!)
@Craig: why would one need lazy loading on the SERVER side, if you know THERE what you have to load ? Lazy loading is used when you load data LATER ON, so 'on demand'. If you know up front what to load, you define a graph with prefetch paths or whatever they're called by the o/r mapper of choice, and fetch the graph you want and return that, as that's much more efficient.
Anyone who uses lazy loading on the SERVER in a typical client <->service approach doesn't understand what Lazy loading is all about, as it effectively is then the most inefficient way to fetch data available.
Frans,
Because the server is usually not about dumb serving of the data to the client.
It is about sending the result of some operation on behalf of the client, something which usually includes business logic, in which case lazy loading plays a very important part.
Frans: I should clarify further. I'm a LLBLGen Pro user and a CSLA.NET user. When I use LLBLGen to fetch data to populate my CSLA business objects I'm using your self-servicing model.
The reason I like self-servicing is that I can be "lazy" as a developer - I don't have to know up front what the complete object graph I'm going to be loading is. However, when I develop the data access part of my business objects I'm aware of the data that is being loaded. At that point I can setup pre-fetch paths as necessary to make fetching efficient. BUT, I'll only do that once I see there's a problem: premature optimization is the root of all evil (well maybe not all!)
I couldn't agree more with Craig's point from above about not passing "ORM objects" to the front-end. I think THE key to success with ORM in non-trivial server applications is to isolate the "persistent' objects from the "business/domain" objects. This approach sidesteps many of the objections to lazy loading. Another benefit is better decoupling of the front-end from the database schema. I went into this in more depth in my blog (http://softwaredevscott.spaces.live.com/blog/cns!1A9E939F7373F3B7!349.entry) where I posted a notional diagram if you're interested.
Comment preview