On Umbraco’s NHibernate’s Pullout
I was asked to comment on this topic:
Here is the link to the full article, but here is the gist:
Performance.
The culprit of v5 is our usage of NHibernate. This doesn't mean that NHibernate is evil, just that it has turned out to be the wrong fit for our project (in part due to our architecture and db schema, in part due to lack of proper NHibernate experience on the team). It's a badly tamed NHibernate that is caused the many N+1 SQL statements. The long term solution is to replace NHibernate with native, optimized SQL calls but it's a bigger operation that we'll look into over the summer. The short team solution is to continue with finding as many bottlenecks as we can as tune those, combined with introducing a Lucene based cache provider which will cache queries and results which will survive app pool restarts (much like the v4 XML cache). The architecture has been prepared for the latter and we'll have this in place by the end of May.
Another thing that causes massive performance issues is that we're seeing a trend where people are calling ToList() on razor queries to compensate for the lack of LINQ features in our implementation. Unfortunately, this causes most (if not all) of the database to be loaded into memory, naturally causing big performance issues (and an enormous number of N+1 SQL calls). This leads us to...
And a while later:
V5 couldn't be fixed. We needed to completely rewrite the entire business layer to ever get v5 into being the type of product people expect from Umbraco
I marked the parts that I intend to respond to in yellow, green highlight is for some additional thoughts later on, keep it in mind.
Now, to be fair, I looked at Umbraco a few times, but I never actually used it, and I never actually dug deep.
The entire problem can be summed in this page, from the Umbraco documentation site. This is the dynamic model that Umbraco exposes to the views to actually generate the HTML. From the docs:
DynamicModel is a dynamic object, this gives us a number of obvious benefits, mostly a much lighter syntax for accessing data, as well as dynamic queries, based on the names of your DocumentTypes and their properties, but at the same time, it does not provide intellisense.
Any time you provide the view with a way to generate queries, it is a BIG issue. For the purpose of discussion, I’ll focus on the following example, also taken from Umbraco docs:
1: @inherits PartialViewMacroPage
2: @using Umbraco.Cms.Web3: @using Umbraco.Cms.Web.Macros4: @using Umbraco.Framework5:
6: @{
7: @* Get the macro parameter and check it has a value - otherwise set to empty hive Id *@8: var startNodeID = String.IsNullOrEmpty(Model.MacroParameters.startNodeID) ? HiveId.Empty.ToString() : Model.MacroParameters.startNodeID;
9: }
10:
11: @* Check that startNodeID is not an empty HiveID AND also check the string is a valid HiveID *@12: @if (startNodeID != HiveId.Empty.ToString() && HiveId.TryParse(startNodeID).Success)13: {
14: @* Get the start node as a dynamic node *@15: var startNode = Umbraco.GetDynamicContentById(startNodeID);
16:
17: @* Check if the startNode has any child pages, where the property umbracoNaviHide is not True *@18: if (startNode.Children.Where("umbracoNaviHide != @0", "True").Any())19: {
20: <ul>
21: @* For each child page under startNode, where the property umbracoNaviHide is not True *@22: @foreach (var page in startNode.Children.Where("umbracoNaviHide != @0", "True"))23: {
24: <li><a href="@page.Url">@page.Name</a></li>25: }
26: </ul>
27: }
28: }
This is a great example of why you should never do queries from the views. Put simply, you don’t have any idea what is going to happen. Let us start going through the onion, and see where we get.
Umbraco.GetDynamicContentById – Is actually UmbracoHelper.GetDynamicContentById, which looks like this:
I mean, I guess I could just stop right here, look at the first line, and it should tell you everything. But let us dig deeper. Here is what GetById looks like:
Note that with NHibernate, this is a worst practice, because it forces a query, rather than allow us to use the session cache and a more optimized code paths that Load or Get would use.
Also, note that this is generic on top of an IQueryable, so we have to go one step back and look at Cms().Content, which is implemented in:
I just love how we dispose of the Unit of Work that we just created, before it is used by any client. But let us dig a bit deeper, okay?
And we still haven’t found NHibernate! That is the point where I give up, I am pretty sure that Umbraco 5 is using NHibernate, I just can’t find how or where.
Note that in those conditions, it is pretty much settled that any optimizations that you can offer isn’t going to work, since you don’t have any good way to apply it, it is so hidden behind all those layers.
But I am pretty sure that it all end up with a Linq query on top of a session.Query<Content>(), so I’ll assume that for now.
So we have this query in line 15, to load the object, then, on line 18, we issue what looks like a query:
if (startNode.Children.Where("umbracoNaviHide != @0", "True").Any())
We actually issue two queries here, one to load the Children collection, and the second in the Where(). It took a while to figure out what the Where was, but I am pretty sure it is:
Which translate the string to a Linq query, which then have to be translated to a string again. To be perfectly honest, I have no idea if this is executed against the database ( I assume so ) or against the in memory collection.
Assuming it is against the database, then we issue the same query again, two lines below that.
And this is in a small sample, think about the implications for a more complex page. When you have most of you data access so abstracted, it is very hard to see what is going on, let alone do any optimizations. And when you are doing data access in the view, the very last thing that the developer has in his mind is optimizing data access.
This is just setting up for failure. Remember what I said about: replace NHibernate with native, optimized SQL calls, this is where it gets interesting. Just about every best practice of NHibernate was violated here, but the major problem is not the NHibernate usage. The major problem is that there is very little correlation between what is happening and the database. And the number of queries that are likely to be required to generate each page are truly enormous.
You don’t issues queries from the view, that is the #1 reason for SELECT N+1, and when you create a system where that seems to be the primary mode of operation, that is always going to cause you issues.
Replace NHibernate with optimized SQL calls, but you are still making tons of direct calls to the DB from the views.
I actually agree with the Umbraco decision, it should be re-written. But the starting point isn’t just the architectural complexity. The starting point should be the actual interface that you intend to give to the end developer.
Comments
It doesn't matter what you're using at the end. O/R Mapper, old school ADO.NET or NoSQL. If you don't know your tools and don't know how to use them, there will be always such kind of situations in the development process.
Great post. Umbraco is what I work with daily, it would be great to see more of your take on the mistakes that were made regarding architectural decisions. They are now focusing on the 4.7 branch, but i wonder, how much better is it? Imho there is a lot of funky code and decisions there too. Feels like they could need some help with that kind of stuff.
There's been so much blame slapped in the face of ORMs lately and it almost all cases it comes down to complete mis-use of the ORM to begin with...
One such argument I always here is that ORM's always lead you into this SELECT N+1 and that not using an ORM will solve this common mistake...
?!?!?!
Hmmm, methinks that Umbraco would be even better than it already is with a lovely RavenDB backend. :)
The saddest part about the over abstraction is that umbraco is traditionally a very data centric CMS, as opposed to DNNs page centric approach.
I think it's a good thing, that ORMs are so openly dissed recently. IMO a lot of people believed that they would take away all their database related problems and forgot that the O/R impedance mismatch would never really allow that. Not using an ORM forces you to actually think about what you are doing when accessing the database. So if people are getting more hesitant when deciding to use an ORM for a new project or not I think it's only for the better. Of course those of us who know the delights and limitations of their ORM of choice can still use it anyway.
People often think that they can fully abstract over data access. I have never seen this turn out true.
Semantically, you can. But performance is a cross-cutting concern. You can't abstract it away.
Not sure how ADO.NET will make this a better experience. The issue fully remains while at the same time they loose all the benefits of an ORM.
Being an Umbraco AND NHibernate developer, and also one of those who've criticized the architecture but still love the project, I came here fearing an all out bashing. But (since we fundamentally agree...), this was spot on, Ayende. I guess what sums it up best is:
"And we still haven’t found NHibernate!"
I have two approaches with NHibernate. "Hide" it in a purpose built layer (e.g. don't expose IQueryable) or the opposite, work with "raw" sessions. One layer has all the control. The seemingly flexible middle ground always ends in tears+1.
Being able to query/iterate/transform in the templates/views is part of what makes Umbraco so great, but it has previous to v5 been done on an in-memory data model (XML + XSLT or Razor), which makes it snappy. So, query in the view is OK, just not abstracted querying on the database.
I think it's more a psychological problem. People tend to think that any serious system must have a complex architecture, otherwise it would be trivial.
Hurray for a MVC based cms!
oh, wait. where exactly is the MVC patterns in this? Wasn't it all about letting the CONTROLLER pass data to the VIEW !?
My biggest takeaway from something like this is quite simple: your DATABASE and HOW YOU TALK TO IT matters. I keep seeing these unit of work and repository patterns, but there is a big difference between working with an ISession, a DbContext, an IDocumentSession, an IDbConnection, or plain old in-memory objects. And what about object caches like Redis? Or using Raven as writable transactional system along with an RDBMS for reporting?
These are all abstractions anyway. The repository pattern just further abstracts it away, making it all that much more difficult to work with.
Hi Ayende,
Thanks for the thorough analysis. The core team has said it's going to continue working on Umbraco v4, and I think this will help them focus on 'the good parts'
I should add that the Umbraco v5 project folded not only, or maybe even not at all, because of the performance issues. My tweet may have given some of your readers that idea.
But the persistent performance issues did lead the core team and the community to re-evaluate the whole project, with the conclusion that it was overall too complex (I mean, even you couldn't find NHibernate). This seems to be built bottom-up, whereas your advice would be to design it form a top-down perspective, i.e. starting with the interfaces we can use inside views. Sounds about right, as that's where most CMS developers will write most of their code.
I think a document database like RavenDB is better suited to the needs of a CMS with dynamic content types. I often wonder why we are trying to stuff all this 'dynamic data' into a strictly relational database. Umbraco v5 basically used an EAV model, that can't make querying much faster, can it?
Thanks again!
-Michiel
Heh, modern .Net gives all the guns and ammo to shoot your foot any way you want. I bet the application would run better if they used Visual Basic 6.
Using a repository pattern has some benefits, but simplicity is not one of them. It makes it easier isolate business logic in your unit tests because you can use a mock/fake repository. If your repository is in a separate assembly, it's easy to re-use it if multiple applications need the same data.
I generally avoid the use of ORMs since in my experience they always seem to perform worse than native calls and impose an additional layer of complexity. I'd rather write a repository with SqlConnection, SqlCommand, and DataReader objects making direct use of stored procedures than fight with configuring NHibernate, LLBLGEN, or any other ORM I've experimented with, including Entity Framework in it's current form. If EF Code First ever gets decent support for stored procedures I might give it another look.
@Bryan, agreed. Giving up something at runtime to get (maybe) something during the coding has always struck me as a bad bargain.
100% agree with you Ayende...this is not going to get any better with native SQL...sounds like this Dev-team has bought in to the worst-possible definition of "persistence ignorance".
Wrapping your persistence logic with so many layers that by the time you get there you have no idea what kind of action we are supposed to be doing is never going to really be helpful.
Whatever happened to picking a helpful persistence abstraction for the problem at hand.
Michiel, I was very careful not to inject any RavenDB into this post. Yes, RavenDB is ideal for CMS, and there are quite a few that use us for that.
Looking at their API it seems like they wanted to hide NHibernate behind their API, but the problems bled out while the solutions to those problems were hidden behind their API wall.
I bet the developers sat around and said "We don't want developers knowing that we use NHibernate when developing against Umbraco." When the shit hits the fan: "Hey developers, it was all NHibernates fault."
@tobi is absolutely right!
@Gene - How's assembly language treating you these days? :)
I contributed a little to Umbraco in the early days, and it has been a architectural mess since forever. But this? They effectively ruined their business, brand and community support for a long long time. No-one will invest time in 5 or 6, until something is released and proven to perform. Ouch.
They had a great opportunity to rework/scrap old their architecture, and they missed it completely.
RavenDB i guess would be a no-go for Umbraco, considering the licensing costs. Mongodb or any other free nosql with .net support would be a nice fit imho.
YOU ARE ALL WRONG. THE DATABASE IS AN IMPLEMENTATION DETAIL http://blog.8thlight.com/uncle-bob/2012/05/15/NODB.html
AAAAAAAAHHHHHHHHHHHHHHHHHHHHH
@Jeremy, you did notice the "(maybe)" right? ;-)
Touche, though, I did word that a bit too absolutely. Nonetheless, those types of tradeoffs I look at very closely.
At the end of the day isn't CMS basically CRUD? I don't understand the need for all this complexity.
I've been tasked with replacing our current CMS + shopping cart app. It does a lot of querying from the view and the server just chokes when there are 10 concurrent users. I was hoping to use umbraco for the CMS portion. I really do not want to roll my own but I don't see a good alternative...
No CMS is simply CRUD when you go beyond simple editing.
nick: In umbraco' case the requirements are quite well known, so talking about what database to use is highly relevant. The ability to customize the document-types in umbraco requires a lot in sql database:
Look at this monster: http://blogbustingbeats.files.wordpress.com/2011/01/screenshot046.jpg
This would have been MUCH simpler in a no-sql database.
Never mind the previous link, look at this:
http://blog.hendyracher.co.uk/wp-content/uploads/2009/01/umbracodb1.gif
@AndersM That database schema is from Umbraco v4, right? In v5 they opted for an EAV model. See: http://our.umbraco.org/forum/core/umbraco-5-general-discussion/27399-The-database-schema-using-in-v5
@Michiel: You are right - compared to v4 they added a table for each attribute datatype as well (date, decimal, string, long string and integer). The values are mapped as one-to-many with their attribute.
By why on earth they did not consider performance earlier on I cannot understand. They could have benefited a lot from talking to Ayende, or at least buying nhprof.
Sounds like a copy of NHProf would be well worth their while.
@AndersM: they're called Architecture Astronauts
@AndersM: oh, and a wonderful example of a Second System Effect too!
Great analysis. I apreciate the effort you put into this.
I have been programming for 25 years and focused on ultra high performance data systems for 10. I love the new tools we have to work with such as NHibernate and other ORMs, but because of the high performance aspect of what I do their cost becomes very apparent.
A programmer who I have a lot of respect for once told me that it is impossible to make a program go faster, you can only make it do less work in the process of getting the results you need. Every time you add an abstraction layer of any type you are making the computer do more work. In addition you are programming yourself further away from the core reason for the program, to get results. The tradeoff is easier programming and possibly better maintainability at the surface level. I am not against this, but too many people use these tools without really understanding them and worse, without really understanding their own programming problem. Just slap another layer on and everything will be cool.
When we get down to it a database is an abstraction layer on top of the file system and in memory data. Then we have the database communication layer. Now add an ORM layer. Then a queryable object wrapper. Then expose it as an API. At some point we are so abstracted that we have no clue about what is happening with the core reason for our program, our data.
Hopefully there is a lesson in the Umbraco implosion somewere. Mine would be that I shouldn't use Umbraco again.
This whole debacle is why this quote grates on my nerves every time I hear it:
"Premature optimization is the root of all evil"
Time and time again programmers use it as an excuse for bad programming practices, while it may have had merit in its original context, it has become the catch-all excuse, for bad programmers to write any kind of cr@ppy code they want, and worry about if it runs fast enough to use later. Well in this case, you can plainly see, Umbraco could have benefited from some 'premature' optimization.
This whole debacle is why this quote grates on my nerves every time I hear it:
"Premature optimization is the root of all evil"
Time and time again programmers use it as an excuse for bad programming practices, while it may have had merit in its original context, it has become the catch-all excuse, for bad programmers to write any kind of cr@ppy code they want, and worry about if it runs fast enough to use later. Well in this case, you can plainly see, Umbraco could have benefited from some 'premature' optimization.
EJB - I'd argue Umbraco did the 'premature' optimization.
That is they built a whole bunch of abstraction layers and complexity to solve a problem only to find out it wasn't the right problem to solve.
The actual quote is "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donald Knuth
What drugs does it take to create such a horrible mess!
@Steve sheldon
http://umbraco.com/follow-us/blog-archive/2012/1/4/umbraco-5-on-performance-and-the-perils-of-premature-optimisation.aspx
Several commenters have noted that as much of a CMSs data is not relational, so a document database would be more appropriate.
While this may be true, Umbraco is a mass market CMS, so it needs to be installable with minimum fuss on a wide range of real life hosting providers. For .NET hosting, this usually means SQL Server, with occasional MySQL (Umbraco V4 supports both).
There simply isn't the critical mass of hosting providers to create a widely used .NET+NoSQL only CMS at this time. You could use an embedded RavenDB installation, but you still need to support the other relational DBs to get market share.
I'm very surprised the Umbraco development team didn't figure these glaring performance issues. I don't think the problem is premature optimization, it is a poor understanding of nanoseconds vs milliseconds. You can optimize a piece of code to the point that it can't run any faster, but if it calls a function that copies a terabyte file over a network, it won't matter.
I'm getting the sense that the developers we're seeing in the open source community are young. They were fortunate enough to miss out on the days of assembly language and C as the only way of getting any kind of reasonable performance. With today's compiler technologies and ever increasing storage and speeds, a game like PACMAN can be written in a few days using flash. The same effort would have taken months of painful, handcoded assembly language with no fancy debuggers.
Umbraco 5 can be fixed, but it will take some work, but not 9 months as mentioned by Hartvig. The main problem is that they are trying to store dynamic types that aren't geared for storage and retrieval using SQL and certainly not for indexing. They can get a lot of mileage by simply storing user types as xml blobs or value-pairs with built in object fields such as url, id, security, etc. The data can be deserialized into dynamic objects for the sake of creating a view model. Instead, they made objects fields dynamic which is slow to query even with the most optimized SQL statements. This is a result of aiming for maximum flexibility at the expense of performance.
Command-Query separation. I guess Umbraco has a clever db structure from which I learn how to structure CMS data wisely. However the performance side - the query side, this db scheme not working very well. No-SQL db is a better fit in this case. A fused solution if possible I think serves better. Not sure if CQRS pattern could have some contribution in this case.
Comment preview