Optimizing Windsor
Recently we got a bug report about the performance of Windsor when registering large number of components (thousands). I decided to sit down and investigate this, and found out something that was troublesome.
Internally, registering a component would trigger a check for all registered components that are waiting for a dependency. If you had a lot of components that were waiting for dependency, registering a new component degenerated to an O(N^2) operation, where N was the number of components with waiting dependencies.
Luckily, there was no real requirement for an O(N^2) operation, and I was able to change that to an O(N) operation.
Huge optimization win, right?
In numbers, we are talking about 9.2 seconds to register 500 components with no matching dependencies. After the optimization, we dropped that to 500 milliseconds. And when we are talking about larger number of components, this is still a problem.
After optimization, registering 5,000 components with no matching dependencies took 44.5 seconds. That is better than before (where no one has the patience to try and figure out the number), but I think we can improve up it.
The problem is that we are still paying that O(N) cost for each registration. Now, to suppose systems that already uses Windsor, we can’t really change the way Windsor handle registrations by default, so I came up with the following syntax, that safely change the way Windsor handles registration:
var kernel = new DefaultKernel();
using (kernel.OptimizeDependencyResolution())
{
for (int i = 0; i < 500; i++)
{
kernel.AddComponent("key" + i, typeof(string), typeof(string));
}
}
Using this method, registering 5,000 components drops down to 2.5 seconds.
I then spent additional time finding all the other nooks and crannies where optimizations hid, dropping the performance down to 1.4 seconds.
Now, I have to say that this is not linear performance improvement. Registering 20,000 components will take about 25 seconds. This is not a scenario that we worry over much about.
The best thing about the non linear curve is that for a 1,000 components, which is what we do care about, registration takes 240 milliseconds. Most applications don’t get to have a thousand components, anyway.
There are also other improvements made in the overall runtime performance of Windsor, but those would be very hard to notice outside of a tight loop.
Comments
When can we expect a release with this?
(We dont like running with the SVN version)
Dennis,
After they had some chance to sit in the trunk and we can see that they are stable.
Just for curiosity, could you share with us the details of the optimization?
Ricardo,
Check the SVN log :-)
Strange nobody else stumbled upon this.
And great that it is now improved!
Thanks both for those who reported the bug, and who fixed it!
To clarify, you're saying that the original algorithm was an O(N^2) operation for each of N components and you initially got this down to O(N)?
So in fact the registration of all components was N * O(N^2) = O(N^3) and you got it down to O(N^2)? No wonder it was taking so long for large numbers ;-)
Mike,
Yes & no.
a) you need to pay attention to what N represent (invalid components)
b) it isn't the same N.
Let us say we add 100 invalid components.
1st: O(0)
2nd: O(1)
3rd: O(2)
100: O(99)
I am not sure what the math symbol for that is.
It is usually close enough to O(N^2 / 2) * N, though.
I changed it so it would be O(N^2) without the Optimize and O(N) with the Optimize
Darius,
I suppose not that many people usually have that many distinct components served up by the DI container. that's the problem with exponential performance loss...
Ayende,
Ah, I see. This discussion seems to be degenerating into mathematics ;-) AFAIK with big O notation you're interested in how the algorithm varies with N. So constants, e.g. division by 2, don't vary and are thus dropped from the notation.
1,2,3...N is the canonical arithmetic sequence, the sum of which is given by N(N+1)/2. Since you're starting at zero, i.e. =N-1, the sum becomes N(N-1)/2.
OK, "normal programming service has been resumed..." ;-)
Even so, it's better to register your components in an order that ensures the fewest number of components is awaiting dependencies at a time. CAB might have been onto something there :)
Chris,
That is usually not possible. If I have 500 components, just trying to figure out what the right order is is longer than the registration time.
just curious if similar optimizations could be made in Spring.Net's IoC, or if the same logic would apply
Scott,
I don't know.
The optimization is divided into two pieces.
First, change the way we handle resolving missing dependencies to avoid the requirement for a O(N^2) per registration
Second, aggressively cache resolution queries
Ayende,
That's entirely correct, if you're shoe-horning this onto an existing application using Castle Windsor. It's also true if you're registering all 500 components in one method. If you distribute the registration according to the structure of your application, it's tolerable, but a waste of time.
Scott, do you experience any concrete issue? If so, please let us know. I didn't write specific tests for this yet, but since Spring.NET's wiring logic is quite different from Windsor's, I am not sure how much of the optimizations Oren did are applicable
Thats the thing I can't uderstand. Why should I register all my dependencies every time my application starts if all of them are known to me (as a developer) before compilation.
Here you can see how I used generics as static IoC.
seermindflow.blogspot.com/.../...y-dependency.html
SeeR,
I find code like that REALLY ugly.
And you make invalid assumption when you say that you know what the dependencies are.
In most big apps, you don't. They change, they are flexible.
"In most big apps, you don't. They change, they are flexible."
You mean they change after compilation and they should be in some configuration file ??
Comment preview