PerformanceMultiply Collections Fetch With NHibernate

time to read 12 min | 2310 words

We are back to the traditional model of Blog -> Posts -> Comments. Now, we want to load all the blogs, posts and comments and index them to improve search performance. A naive implemention will use this code:

foreach (Blog blog in session.CreateQuery("from Blog").List<Blog>())

{

       Index(blog,"Blog");

       foreach (Post post in blog.Posts)

       {

              Index(post, "Post");

              foreach (Comment comment in post.Comments)

              {

                     Index(comment, "Comments");

              }

       }

}

This code is going to produce an ungodly amount of database queries. We want to grab the data from the database in as few queries as possible. Obviously we want to use some sort of eager loading, but what can we use? Let us analyze what are our options for solving this.

The easiest route is to tell NHibernate to load the entire object graph in one go. For a long time, NHibernate had a limitation of a single collection eager load per query. This means that you couldn't load both the Posts collection and the Comments collection. Sergey has recently removed this limitation, so let us replacing the HQL query with:

from Blog blog

     left join fetch blog.Posts post

     left join fetch post.Comments

This query results in this SQL (I cleaned up the names a bit):

select  blog.Id ,                                    post.BlogId ,

        post.Id ,                                    post.UserId ,

        comment.Id ,                                 comment.Name ,

        blog.Title ,                                 comment.Email ,

        blog.Subtitle ,                              comment.HomePage ,

        blog.AllowsComments ,                        comment.Ip ,

        blog.CreatedAt ,                             comment.Text ,

        post.Title ,                                 comment.PostId ,

        post.Text ,                                  post.BlogId ,

        post.PostedAt ,                              post.Id ,

        comment.PostId ,                             comment.Id

from    Blogs blog

        left outer join Posts post on blog.Id = post.BlogId

        left outer join Comments comment on post.Id = comment.PostId

A word of warning, though:

There is a non trivial cartesian product here. If you know that you are working on small sets, that is fine, but be wary of using this technique over large sets of data.

More posts in "Performance" series:

  1. (10 Jan 2025) IOPS vs. IOPS
  2. (03 Jan 2025) Managed vs. Unmanaged memory