Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 401 words

About three weeks ago I introduced the problem of ghost objects in NHibernate. In short, given the following model:

image

This code will not produce the expected result:

var comment = s.Get<Comment>(8454);
if(comment.Post is Article)
{
   //
}

You can check the actual post for the details, it related to proxying and when NHibernate decides to load a lazy loaded instance. In short, however, comment.Post is a lazy loaded object, and NHibernate, at this point in time, has no idea what it is. But since it must return something, it returns a proxy of Post, which will load the actual instance when needed. That leads to some problems when you want to down cast the value.

Well, I got fed up with explaining about this and set about to fix the issue. NHibernate now contains the following option:

<many-to-one name="Post" lazy="no-proxy"/>

When lazy is set to no-proxy, the following things happen:

  • The association is still lazy loaded (note that in older versions of NHibernate, setting it to no-proxy would trigger eager loading, this is no longer the case).
  • The first time that you access the property the value will be loaded from the database, and the actual type will be returned.

In short, this should completely resolve the issue.

However, not the key phrase here, like lazy properties, this work by intercepting the property load, so if you want to take advantage of this feature you should use the property to access the value.

time to read 5 min | 812 words

This feature is now available on the NHibernate trunk. Please note that it is currently only available when using the Castle Proxy Factory.

Lazy properties is a very simple feature. Let us go back to my usual blog example, and take a look at the Post entity:

image

As you can see, it is pretty simple example, but we have a problem. The Text property may contain a lot of text, and we don’t want to load that unless we explicitly asks for it.

If we would try to execute this code:

var post = session.CreateQuery("from Post")
    .SetMaxResults(1)
    .UniqueResult<Post>();

You can see from the SQL that NHibernate will load the Text property. In large columns (text, images, etc), the cost of loading a column value is prohibitive, and should be avoided unless absolutely needed.

image

This new feature allows you to mark a specific property as lazy, like this:

<property name="Text" lazy="true"/>

Once that is done, we can try querying for posts:

var post = session.CreateQuery("from Post")
    .SetMaxResults(1)
    .UniqueResult<Post>();

System.Console.WriteLine(post.Text);

And the resulting SQL is going to be:

image

Note that we aren’t loading the Text property when we query for the post, and if we will inspect the stack trace of the second query we can see it being generated from the Console.WriteLine call.

But what if we want to query for posts with their Text property? Doing it this way may very well lead to SELECT N+1 if we need to load all the posts Text properties. NHibernate provide the HQL hint to allow this:

var post = session.CreateQuery("from Post fetch all properties")
    .SetMaxResults(1)
    .UniqueResult<Post>();

System.Console.WriteLine(post.Text);

Which will result in the following SQL:

image

What about multiple lazy properties? NHibernate support them, but you need to keep one thing in mind. NHibernate will load all the entity’s lazy properties, not just the one that was immediately accessed. By that same token, you can’t eagerly load just some of an entity’s lazy properties from HQL.

This feature is mostly meant for unique circumstances, such as Person.Image, Post.Text, etc. As usual, be cautious in over using it.

One last word of caution, this feature is implemented via property interception (and not field interception, like in Hibernate). That was a conscious decision, because we didn’t want to add a bytecode weaving requirement to NHibernate. What this means is that if you mark a property as lazy, it must be a virtual automatic property. If you attempt to access the underlying field value, instead of going through the property, you will circumvent the lazy loading of the property, and may get unexpected results.

time to read 3 min | 592 words

I got this question a while ago from Kyle, and I think is is a great one. It is especially great since it is an exchange of emails that resulted in the following (all of which are Kyle words):

I've been annoyed lately by the MVVM pattern. It seems like it requires that the data on your business classes be public so that the view-model can get at it, and that completely breaks encapsulation and goes against standard OO design theory (in my opinion).

The UI layer should be allowed to reference the data layer. I recalled a post you wrote where your UI needs to basically pull things out of queries and such directly (that's what I understood it to mean, anyway). I'm not sure how to pull this off easily just yet, because it seems like it would still break encapsulation somewhere down the line, but it's an interesting thought.

And yeah, I realized after sending the email about CQS. I've decided that my preferred way is actually having my model be able to create a view-model. It's still not pretty, but it's much better (in my view) than having all public data on my business models. I can use commands to bind directly to the model, and the view-model can cause that to happen correctly.

I thought about CQS more, and have a really nice way of doing the whole shebang, I think. It does kind of use your "Two different models for read vs write" concept. I've even come up with a little pseudo-enterprisey application to write using this design style. You'll like it - it's a Netflix for books [[netflix for books is a library]], essentially.

My answer to that is that Kyle is correct. On the one hand, we have the needs of the UI to show information, and on the other hand, we want to have good encapsulation for our business entities. UI forces us to expose information to the user, and that encourages properties laden models. The problem with this approach is that often we try to make use of the same model for several tasks, such as using business entities for user interface, or even asking the business entities to generate the view models that they represent.

CQS is a design methodology that is aimed at resolving this conflict, at its heart, it is actually very simple. It simply stipulate that you are going to have two different models for representing it. One for reads (queries) and another for writes (commands). Once we accept that, we can see that we can evolve each of those models independently. And then we get to the point where we see that the data storage mechanism that we use for each model can be optimize independently for each use case.

For example, when using commands, we generally perform lookups by primary key alone, so we can avoid the overhead of indexes, or even select a storage format that is suitable for key based lookups (DHT, for example) while updating the query data store as a background process which allow the entire system to stay stable under high degree of stress.

In other words, once we have split the responsibilities of the system up so we don’t overload the responsibilities of a single model to be both read and write capable, we are in a much better position to shape the way we handle our software.

time to read 4 min | 701 words

Roy Osherove has a few tweets about commercial tools vs. free ones in the .NET space. I’ll let his tweets serve as the background story for this post:

image

image

The backdrop is that Roy seems to be frustrated with the lack of adoption of what he considers to be better tools if there are free tools that deal with the same problem even if they are inferior to the commercial tools. The example that he uses is Final Builder vs. NAnt/Rake.

As someone who is writing both commercial and free tools, I am obviously very interested in both sides of the argument. I am going to accept, for the purpose of the argument, that the commercial tool X does more than the free tool Y who deals with the same problem. Now, let us see what the motivations are for picking either one of those.

With a free tool, you can (usually) download it and start playing around with it immediately. With commercial products, you need to pay (usually after the trail is over), which means that in most companies, you need to justify yourself to someone, get approval, and generally deal with things that you would rather not do. In other words, the barrier for entry is significantly higher for commercial products. I actually did the math a while ago, and the conclusion was that good commercial products usually pay for themselves in a short amount of time.

But, when you have a free tool in the same space, the question becomes more complex. Roy seems to think that if the commercial product does more than the free one, you should prefer it. My approach is slightly different. I think that if the commercial product solves a pain point or remove friction that you encounter with the free product, you should get it.

Let us go back to Final Builder vs. NAnt. Let us say that it is going to take me 2 hours to setup a build using Final Builder and 8 hours to setup the same build using NAnt. It seems obvious that Final Builder is the better choice, right? But if I have to spend 4 hours to justify buying Final Builder, the numbers are drastically different. And that is a conservative estimate.

Worse, let us say that I am an open minded guy that have used NAnt in the past. I know that it would take ~8 hours to setup the build using NAnt, and I am pretty sure that I can find a better tool to do the work. However, doing a proper evaluation of all the build tools out there is going to take three weeks. Can I really justify that to my client?

As the author of a commercial product, it is my duty to make sure that people are aware that I am going to fix their pain points. If I have a product that is significantly better than a free product, but isn’t significantly better at reducing pain, I am not going to succeed. The target in the product design (and later in the product marketing) is to identify and resolve pain points for the user.

Another point that I want to bring up is the importance of professional networks to bring information to us. No one can really keep track on all the things that are going on in the industry, and I have come to rely more & more on the opinions of the people in my social network to evaluate and consider alternatives in areas that aren’t offering acute pain. That allows me to be on top of things and learn what is going on at an “executive brief” level. That allows me to concentrate on the things that are acute to me, knowing the other people running into other problems will explore other areas and bring their results to my attention.

time to read 2 min | 223 words

7i3h.jpgIt has been quite a journey for me, starting in 2007(!) up until about a month ago, when the final revision is out. I am very happy to announce that my book is now available in its final form. 

When I actually got the book in my hands I was ecstatic. That represent about two years worth of work, and some pretty tough hurdles to cross (think about the challenge that editing something the size of a book from my English is). And getting the content right was even harder.

On the one hand, I wanted to write something that is actionable, my success criteria for the book is that after reading it, you can go ahead and write production worthy Domain Specific Languages implementations. On the other hand, I didn’t want to have the reader left without the theoretical foundation that is required to understand what is actually going on.

Looking back at this, I think that I managed to get that done well enough. The total page count is ~350 pages, and without the index & appendixes, it is just about 300 pages. Which, I hope, is big enough to give you working knowledge without bogging you down with too much theory.

time to read 4 min | 611 words

Uncle Bob has a post about why you should limit your use of IoC containers. I read that post with something very close to trepidation, because the first example that I saw told me a lot about the underlying assumptions made when this post was written.

Just to give you an idea about how many problems there are with this example when you want to talk about IoC in general, I made a small (albeit incomplete) list:

  • The example is a class that has two dependencies, who themselves has no dependencies.
  • There is manual mapping between services and their implementations.
  • All services share the same life span.
  • The container is used using the Service Locator pattern.

Now, moving to the concrete parts of the post, I mostly agree that this is an anti pattern, but not because of the code is using IoC. The code is actually misusing it quite badly, and trying to draw conclusions about the practice of IoC from that sample (or similar to that) is like saying that we should abolish SQL because of an example using string concatenation has security issues.

I am not really sure about the practices of IoC usage in the Java side, but on the .NET world, that sort of code is frowned upon for at least 4 or 5 years. The .Net IoC community has been very loud about how you should use an IoC. We have been saying for a long time that the appropriate place to get instances from the IoC is deep in the bowels of the application infrastructure. A good example of that is using ASP.Net MVC Controller Factory, that is the only place in the application that will make use of the container directly.

Now, that takes care of the direct dependency on the container, let us talk about a dependency graph that has more than a single level to it. Here is something that is still fairly simplistic:

 

image

I colored all the things that share the same instance and those that do not. Trying to keep track of those manually, or through factories, would be a pure nightmare. Just try to imagine just how much code you are going to need to do that.

Furthermore, what about when we have different life spans for different components (logger is singleton, database is per request, tracking service is per session, etc). At this point you raise the complexity of the hand rolled solution by an order of magnitude once again. Using an IoC, on the other hand, means that you just need to configure things properly.

Which leads me to the next issue, manually mapping between services and their implementation is something that we more or less stopped doing circa 2006. All containers in the .Net space supports some form of auto registration, which means that usually we don’t have to do anything to get things working.

As I said, I am not really sure what the status is on the Java world, but I have to say that while the issues that Uncle Bob pointed out in the post are real, the root cause isn’t the use of IoC, it is the example he was working with. And if this is a typical example of IoC usage in the Java world, then he should peek over the fence to see how IoC is commonly implemented in the .Net space.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}