On Umbraco’s NHibernate’s Pullout

time to read 10 min | 1818 words

I was asked to comment on this topic:

image

Here is the link to the full article, but here is the gist:

Performance.
The culprit of v5 is our usage of NHibernate. This doesn't mean that NHibernate is evil, just that it has turned out to be the wrong fit for our project (in part due to our architecture and db schema, in part due to lack of proper NHibernate experience on the team). It's a badly tamed NHibernate that is caused the many N+1 SQL statements. The long term solution is to replace NHibernate with native, optimized SQL calls but it's a bigger operation that we'll look into over the summer. The short team solution is to continue with finding as many bottlenecks as we can as tune those, combined with introducing a Lucene based cache provider which will cache queries and results which will survive app pool restarts (much like the v4 XML cache). The architecture has been prepared for the latter and we'll have this in place by the end of May.


Another thing that causes massive performance issues is that we're seeing a trend where people are calling ToList() on razor queries to compensate for the lack of LINQ features in our implementation. Unfortunately, this causes most (if not all) of the database to be loaded into memory, naturally causing big performance issues (and an enormous number of N+1 SQL calls). This leads us to...

And a while later:

V5 couldn't be fixed. We needed to completely rewrite the entire business layer to ever get v5 into being the type of product people expect from Umbraco

I marked the parts that I intend to respond to in yellow, green highlight is for some additional thoughts later on, keep it in mind.

Now, to be fair, I looked at Umbraco a few times, but I never actually used it, and I never actually dug deep.

The entire problem can be summed in this page, from the Umbraco documentation site. This is the dynamic model that Umbraco exposes to the views to actually generate the HTML. From the docs:

DynamicModel is a dynamic object, this gives us a number of obvious benefits, mostly a much lighter syntax for accessing data, as well as dynamic queries, based on the names of your DocumentTypes and their properties, but at the same time, it does not provide intellisense.

Any time you provide the view with a way to generate queries, it is a BIG issue. For the purpose of discussion, I’ll focus on the following example,  also taken from Umbraco docs:

   1:  @inherits PartialViewMacroPage
   2:  @using Umbraco.Cms.Web
   3:  @using Umbraco.Cms.Web.Macros
   4:  @using Umbraco.Framework
   5:   
   6:  @{
   7:      @* Get the macro parameter and check it has a value - otherwise set to empty hive Id *@
   8:      var startNodeID = String.IsNullOrEmpty(Model.MacroParameters.startNodeID) ? HiveId.Empty.ToString() : Model.MacroParameters.startNodeID;
   9:  }
  10:   
  11:  @* Check that startNodeID is not an empty HiveID AND also check the string is a valid HiveID *@
  12:  @if (startNodeID != HiveId.Empty.ToString() && HiveId.TryParse(startNodeID).Success)
  13:  {
  14:      @* Get the start node as a dynamic node *@
  15:      var startNode = Umbraco.GetDynamicContentById(startNodeID);
  16:   
  17:      @* Check if the startNode has any child pages, where the property umbracoNaviHide is not True *@    
  18:      if (startNode.Children.Where("umbracoNaviHide != @0", "True").Any())
  19:      {
  20:          <ul>
  21:              @* For each child page under startNode, where the property umbracoNaviHide is not True *@
  22:              @foreach (var page in startNode.Children.Where("umbracoNaviHide != @0", "True"))
  23:              { 
  24:                  <li><a href="@page.Url">@page.Name</a></li>
  25:              }
  26:          </ul>
  27:      }    
  28:  }

This is a great example of why you should never do queries from the views. Put simply, you don’t have any idea what is going to happen. Let us start going through the onion, and see where we get.

Umbraco.GetDynamicContentById – Is actually UmbracoHelper.GetDynamicContentById, which looks like this:

image

I mean, I guess I could just stop right here, look at the first line, and it should tell you everything. But let us dig deeper. Here is what GetById looks like:

image

Note that with NHibernate, this is a worst practice, because it forces a query, rather than allow us to use the session cache and a more optimized code paths that Load or Get would use.

Also, note that this is generic on top of an IQueryable, so we have to go one step back and look at Cms().Content, which is implemented in:

image

I just love how we dispose of the Unit of Work that we just created, before it is used by any client. But let us dig a bit deeper, okay?

image

image

image

And we still haven’t found NHibernate! That is the point where I give up, I am pretty sure that Umbraco 5 is using NHibernate, I just can’t find how or where.

Note that in those conditions, it is pretty much settled that any optimizations that you can offer isn’t going to work, since you don’t have any good way to apply it, it is so hidden behind all those layers.

But I am pretty sure that it all end up with a Linq query on top of  a session.Query<Content>(), so I’ll assume that for now.

So we have this query in line 15, to load the object, then, on line 18, we issue what looks like a query:

if (startNode.Children.Where("umbracoNaviHide != @0", "True").Any())

We actually issue two queries here, one to load the Children collection, and the second in the Where(). It took a while to figure out what the Where was, but I am pretty sure it is:

image

Which translate the string to a Linq query, which then have to be translated to a string again. To be perfectly honest, I have no idea if this is executed against the database ( I assume so ) or against the in memory collection.

Assuming it is against the database, then we issue the same query again, two lines below that.

And this is in a small sample, think about the implications for a more complex page. When you have most of you data access so abstracted, it is very hard to see what is going on, let alone do any optimizations. And when you are doing data access in the view, the very last thing that the developer has in his mind is optimizing data access.

This is just setting up for failure. Remember what I said about: replace NHibernate with native, optimized SQL calls, this is where it gets interesting. Just about every best practice of NHibernate was violated here, but the major problem is not the NHibernate usage. The major problem is that there is very little correlation between what is happening and the database. And the number of queries that are likely to be required to generate each page are truly enormous.

You don’t issues queries from the view, that is the #1 reason for SELECT N+1, and when you create a system where that seems to be the primary mode of operation, that is always going to cause you issues.

Replace NHibernate with optimized SQL calls, but you are still making tons of direct calls to the DB from the views.

I actually agree with the Umbraco decision, it should be re-written. But the starting point isn’t just the architectural complexity. The starting point should be the actual interface that you intend to give to the end developer.