My distributed build system
Yes, I know that you are probably getting geared up to hear about some crazy setup, and in some manner, it is crazy. My distributed build system is this:
Yep, that is me manually distributing the build to do a cross check on a reasonable time frame.
I’ve mentioned before that our build time is atrocious. With over 3,500 tests, this is no longer a feasible alternative to just run them normally. So we started parallel efforts (pun intended), to reduce the time it takes the system to build and reduce individual test times as well as the ability to parallelize things.
We are actually at the point where we can run concurrent tests, even those we previously had to distribute. And we can even do that with replicated tests, which are the ones usually taking the longest. But what we’ll probably have in the end is just a bunch of test projects (currently we have ~5) that we will run on different machines at the same time.
We are using Team City for build system, and I know that it has capabilities in this regard, but I’ve never actually looked into that. We’re going to be pretty busy in the next couple of weeks, so I thought that this would be a good time to ask.
My current thinking is:
- One build that would actually do the compilation, etc.
- Then branch off that to a build per test project, which can run on different agents
- Then a build on top of that that can actually deploy the build.
Any recommendations on that? We are going to need 5 – 7 agents to run the tests in parallel. Any recommendations on how to get those? Ideally I would like to avoid having to pay for those machines to sit idle 60% – 80% of the time.
Any other alternatives?
Comments
You could try using Azure. See this: http://blog.maartenballiauw.be/post/2013/08/05/An-autoscaling-build-farm-using-TeamCity-and-Windows-Azure.aspx
Teamcity has support for auto scaling a build farm on EC2. I've never used it but it seems to be a good fit for your needs.
We have Hyper-V server (free OS based on Win Server Core basically with the hypervisor feature only). Build with some fast smoke tests are done in a classical way - as several steps in single TeamCity build configuration (BC here for short). As a result it produces deployable artifacts (several MSI packages plus some additional files just for testing in our case). The second BC 'Long test starter' depends on the first one's artifacts. It triggers powershell script that goes to Hyper-V, creates bunch of VMs there (from some known baseline so that creation takes seconds - I'm not talking about installing OS on VM from zero point) and deploys our MSI there. On that point second BC is finished because we don't want to take build agent busy entire several hours our tests are being executed on VMs.
Then there's reporting of results. It is made via separate (kind of servicing) git repository 'test-results' where we have some statistics data taken for each test (e.g. min/max/avg memory load during each test execution, duration etc), plus there's some 'trigger file' which third TeamCity BC watches. So testing VMs are executing tests and pushing results to that git repository. Each VM has dedicated folder, so that no conflicts (not resolvable automatically) are possible. After pushing its results VM checks if it was the last one. If so - it rewrites 'trigger file' hence kicking third BC to life.
Third BC named 'Gathering test results' basically imports all results from 'test-results' repository and makes nice statistics.
All of this system took some time to get up and running, and there're some rough corners, but overall I'm very happy with it. We're using it for running UI testing in parallel on several VMs.
It is probably reinventing the (cloud) wheel, but we're unable to use Azure or something like that due to legal restrictions.
P.S. I could share the scripts if needed.
As an occasional contributor I have been thoroughly stung by the current test situation and not knowing if something was already broken or if I broke it has really wasted a lot of my time.
This is what I suggest:
Leverage TeamCity fully. Every commit of every branch of every repo of every developer gets a build report. Know exactly when / where something broke. TeamCity can also build every pull request, then can automatically attempt to merge it and build that, and then post the result to githhub (https://github.com/jonnyzzz/TeamCity.GitHub)
If there is a problem and there isn't a corresponding CI build report then it didn't happen. No more "on my machine blah" "but on my machine another blah blah". Nobody cares. One Source Of Truth - the CI build report.
Build agent licences are cheap, fast feedback loops of truth are valuable. Throw agents at the problem. Rule of thumb in my experience is approximately an agent per developer +/-
Leverage TeamCity's Cloud Agent support to scale up number of agents available when needed, and downscale at night / weekends etc. Out shut down quiet period is 20mins. We use this on NEventStore amongst other projects and keeps costs very much under control.
Partition the tests into fast / slow / slow as hell. Use TeamCity's Agent Pools feature and set your build configs to ensure there is sufficient agents available for the fast tests and fast feedback. Slow tests handled by another pool where immediate feedback loop is not as important.
Slow / slow as hell build configs may not need to test every commit. Tweak as you see valuable.
How long does your build currently take to execute on average? How necessary is it to run all the test on these builds?
The immediate issue I see with an auto-scaling cloud of build agents is going to be the ramp-up time to create an agent. If the primary objective is to get the builds to be very quick, that ramp-up is going to be in the order of minutes, rather than seconds and could blow the whole objective.
The way I've solved slow builds in the past has been to try and understand the objective of the automatic build. If your objective is to perform continuous integration, then some intelligence can be applied to which tests to run continuously and reduce the scope of testing. This can be supplemented with a daily build that runs every test you want to throw at it.
It would be ideal to run all your tests all the time, but that's going to be a slow process, getting slower each time you add a test. If that slows you down, isn't that eating into the objective of CI in the first place - finding problems earlier in the process, to save you time tracking them down much later? You can be clever and use parallel builds to try and reduce the time taken, but it's a hard game to play and it's going to come at the cost of quite a lot of complexity. Be sure there's no lower-hanging fruit before tackling this one.
Paul, We have 3,500 tests, they take roughly 2 hours to run at the moment. Ramp up time isn't that meaningful in this case. Especially if we can start a build, and while the first set of tests are running, spin the new agents then run the rest of the tests there. And it is pretty important to run them all if we want to have high confidence and so we can push a build out.
The number of features and abilities that we have is non trivial, and a simple change can affect things that appear unrelated.
Two hours is a pretty impressive; an on-demand build farm could fit this well, especially if your builds are infrequent.
This does seem a prime candidate for splitting your "release" and CI builds into different configurations, and removing all the heavy tests from your CI builds. High confidence is most definitely a requisite for your release mechanism, but you can probably trade in a little confidence for a lot of speed in your CI builds.
Have you determined where the bottleneck in your test execution is?
You might want to investigate parallel test execution within your testing framework. I know of ways to achieve this with MSTest and NUnit (with support from MSBuild), but regardless of the tool the goal is to execute the tests concurrently within the test-runner, to make sure you're getting the most out of a single build machine before the jump to multiple physical build agents.
To facilitate a parallel test mechanism, you would configure builds using the Dependencies feature of TeamCity with "Build Artifacts" to share the output of one build with the next build: http://confluence.jetbrains.com/display/TCD8/Dependent+Build#DependentBuild-ArtifactDependency. As you've already suggested, performing a compile, then fanning out test execution to build agents as dependent builds is the cleanest way to go to set this up.
Configuration of your build agents will depend on your licensing model for TeamCity. I've never gone beyond the 3-agent limit of the free edition, but there's potentially nothing stopping you from making your own development machines into a cheap pool of build agents.
TeamCity has a facility for spinning up agents in the Amazon cloud. It also allows you to configure an idle time setting so that it can shutdown build agents that are not busy to save you money. Pretty effective. http://www.jetbrains.com/teamcity/features/amazon_ec2.html
what we have done is installed multiple teamcity agents on the same machine... so we have a basic compile and then if that passes it kicks off our 3 test projects simultaneously.
The problem for us is that we usually have tests that takes port, which is a system wide resource.
Could you make that port a configurable parameter some how and then feed it in with teamcity?
In general I second the recommendation to cut up the tests a bit -- have a quick suite that runs every time and a full release suite that runs at more appropriate times.
In terms of where to run this -- given your likely MSDN licensing status [windows server licenses for CI at $0 additional] and the cost of hardware these days I would take a hard look at running this stuff locally on iron and/or VMs on local hypervisors if you are kicking around big test run results as you will be paying for and waiting for those bytes to get moved around. Also TeamCity agents don't do TLS so anything you send externally is in the clear which I'm not a huge fan of.
http://www.thoughtworks.com/products/go-continuous-delivery looks interesting
As others have said do you really need to run the full integration tests on every check in? And is 2 hours really that bad for a release? Would a constantly running integration build and only unit tests on every check in be enough?
Are there any other bottle necks (like a hard disk) that you could duplicate to increase parallelism?
I don't like the options of relying on team city too much. Every build should be reproducible on a dev machine and having TC doing crazy things would make that harder.
Based on your requirements I think the best option may be to:
1) Buy a beefy build server (or 3) that has super fast CPUs and memory 2) Using HyperV create many "Build agents" that are just VMs 3) Have TeamCity run the tests across as many build agents as possible
TeamCity can't run tests in parallel as far as I am aware (at least for NUnit) so #3 may not be entirely possible; and the second best option would be separate test projects that could be placed on each build agent automatically.
Incidentally I've been working on a project whose purpose is exactly to serve as distributed build system, with no central server unlike most other systems available today. It's in very early stages but if anyone's interested to jump in here it is: https://github.com/simoneb/audrey/
Flukus, Here is something that happened yesterday. We added a feature to add more diagnostics for replication. That broke attachment replication in a specific scenario. We had to run all our tests to find it out. And that was a small change. Finding out about this days later would make it much harder to track.
When you say that you "had to run all [your] tests" in order to find out the problem with attachment replication, is that really the case? I would've expected a test case to fail, indicated by a failing test. To detect the failure, I would expect you only need to just that failing test.
If we suppose that the test that does detect this failure is removed from the CI build, then you would detect it when the test ran as part of your "scheduled" build. How soon that is depends on how often you run that schedule. I would suggest a nightly build is probably good enough, so then you would have to go over the change-sets from yesterday to find the problem. If you can't identify faulty changes in a day's worth of work, you may have another, more immediate problem to solve.
I admire your desire to work to a more complete solution than being suggested, but you might want to make life easier on yourself and see how much time you could shave off by looking at the biggest and beefiest tests being executed before you take on the complexity of a distributed build system.
Paul, How do you know which test is going to fail? You have to run all tests to see if any of them is going to fail.
Yes, I agree with you. I'm not suggesting you need some kind of clairvoyant build system, just that to detect a failure you only have to run the failing test; the passing tests don't tell you anything definitely, but they help you feel more confident that your changes are good.
This speaks to my earlier point: the purpose of CI is to get quick feedback on your changes, usually the quicker the better. If you could remove 20% of your CI test-coverage and get a red/green build result in less-than a minute, being only 80% confident in the build might be preferable.
"Finding out about this days later would make it much harder to track."
Days yes, but what about a couple of hours?
I also think there are options to perform a build for every checkin, not just the latest version, but every actual commit. Depending on how many commits you make these could catch up overnight. Not ideal from a developer workflow perspective though.
Sounds like it's time to write Rhino.DistributedTester to me..
1) Build the main build on Team City 2) Spin up a test runner service at the end of the build 2) Use a torrent like protocol for distributing binaries to agent 3) Spread the tests out to the agents via hash range or by category if they can run faster side by side 4) Have agent report back the results and aggregate
You can use speculative execution to run tests that haven't completed when agents report there test results back faster than others.
Doing something like this also would allow developers to run the whole suite from their local machines on the agents as well.
I know it will not solve your problems but not many people know that Resharper has the ability to run many unit tests at the same time. If you have them organized in multiple test projects. I described it here - http://writesoft.wordpress.com/2013/09/25/szybkie-testy/ It's in Polish, but screens are universal :)
I think Azure is a good fit for this; have 1 machine that you own that runs your CI server, and write PowerShell scripts to spin up environments to deploy to and run tests on, collating the results back. Billing on Azure is by the minute nowadays, meaning you're not paying for redundant hardware, and spinning up machines doesn't take that long at all relative to a 2 hour test cycle. It could even be something you could perhaps use the new WebJobs to do (http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx), if a full VM is overkill.
Comment preview