Organizing a small development team is an art. Organizing a large team is a challenge. Organizing a global team, scattered across eight countries and four continents is a job for Sisyphus.

Scali’s company was in exactly that situation. Their self-appointed Sisyphus was actually named Steven. Steven’s slot on the org-chart was “Chief Application Architect for the Australian Region”, or CAAAR for short.

Aguja Hilo 1.jpg
Aguja Hilo 1” by Jorge Barrios


After a few successful projects, Steven started chatting with upper management. He made promises- if he ran all software development in the world, every project would go perfectly smooth. They’d deliver more, on tighter timelines. He just needed to be given a little bit more power.

Steven got a little more power. He decreed that everyone in the world would use an Agile methodology. Contrary to Agile principles, this would be imposed from above, teams would all be forced to adhere to standard policies and forbidden from self-organizing. Since all of the existing policies couldn’t be re-written, every team would also have to follow the Waterfall process while being Agile.

The transition didn’t go well, but Steven pointed out that he simply hadn’t been given enough power to do the job correctly. If they let him set a few more policies…

His next decree was to throw away their existing VCS and move everyone over to Git. The distributed nature would better support being Agile. Of course, they also had to comply with old policies, so one central repository was configured. With an associated build and integration server, code could only be released through that repository. It lived in Australia, of course.

Steven was a generous dictator. He provided a set of Ruby programs and build instructions that allowed each local office to set up their own build/integration server. There was no budget for new hardware, so most offices had to make do with whatever desktop machine was sitting in an unused cubicle at the time. Scali was no exception here, but his build server immediately started showing problems.

While the box wasn’t beefy, any time Scali tried to test their core product on it, the entire server fell face down in the dirt and started thrashing like a child being told it was bath time. Scali asked Steven to take a look.

“Well, it works on our server,” Steven said. “You must have broken something during the install process. Wipe the machine and do it again.”

Scali wanted to argue the point, but knew he couldn’t possibly win. Steven was the CAAAR after all. Scali redid the install, documented the entire process to prove he followed the instructions exactly. The problems remained, and Steven continued to blame Scali.

Scali ran the same tests against the “canonical” server in Australia, and while it didn’t fall over and scream, it took far too long to run the tests- before switching to Steven’s scripts, the process only took about 30 minutes. It was almost like the scripts were the problem…

Scali didn’t have commit permissions on that code, but he branched a copy and took a look. He didn’t really know Ruby, but with basic programming knowledge and some searching, he could get a feel for what it did. It didn’t take long for him to spot the problem.

For each unit test in the project, the script spawned a new thread to execute it, then handed that pile of threads off to a test runner that spawned a few of its own threads to do the job. With their gigantic, enterprise-ready code-base, this could mean thousands of threads. It was, in essence, a fork-bomb that could cripple any machine given enough time. The only reason it worked in Australia was that their machine simply had more resources to bring to bear.

With a few more searches and some careful programming, Scali made the script smarter. Instead of blindly spawning threads, it managed a pool. The size of the pool was controlled by an environment variable called MAX_UNIT_TEST_THREADS, and the active pool landed in a variable called unitTestProcesses. The changes were simple, well documented, and well tested by the time Scali was done. And the tests! Even his crappy build server could complete the tests in 35 minutes. With some help from an Australian peer, the whole thing only took 15 minutes on the “canonical” server.

Scali showed his work to Steven. “I appreciate what you’re trying to do, but we can’t deploy this. It’s not up to our coding standards. Look here, just at naming conventions. MAX_UNIT_TEST_THREADS sounds like the total number of possible threads, when it’s really the total number of simultaneous threads. And unitTestProcesses is a list that contains threads, not processes. This is far too confusing and you’ve made my script unsupportable. But don’t worry- I do recognize that there are some performance issues, and I’m going to be releasing a script that performs better in the next few weeks.”

Scali did his best to appeal the decision, but Steven was the CAAAR and had momentum on his side. Within a few days, Steven released his “fixed” script. In what was probably a coincidence, it was almost exactly like the script Scali had shown him. Almost. After all, Scali’s script didn’t depend on a variable named MAX_SIMULTANEOUS_UNIT_TEST_THREADS, nor did it have a variable called executingUnitTestThreads.