GRG's Worst Production Failure

Last month, in //TODO: Uncomment Later I asked, what was your worst production failure? This month, G.R.G. (from Insecurity Doors, Mystery of the High Test Scores, Saving A Few Minutes, and so many more) tells of his worst production failure.

Long ago, I worked as a programmer at a university’s hearing research lab. They were awarded a large government grant to study the effects of different kinds of noise on hearing. For the really loud and really faint noises, the researchers used animal subjects with ears that are similar to human ears. Specifically, chinchillas.

The chinchillas would be put in to a special chamber for several hours at a time to have their hearing tested. Since the little rodents don’t respond so well to questions like, “which sound is louder?,” a good amount of time had to be spent training them to jump over a little bar in their chamber whenever they heard a beep.

Because a large part of the research project was to study the long term effects of hearing, the tests would have to be run twenty-four hours a day, seven days a week, for several years. Obviously, it was pretty important that the chinchilla testing be automated. But not very important, though. If it had been very important, they would have had someone other than a grad student write it.

I joined the team about a year into the project and was tasked with rewriting the beep-jump-reward program. It was a ridiculous mess of spaghetti code that seemed to have more GOTO statements than actual code. There were no comments anywhere nor any documentation on what the program’s algorithm was for controlling the beeps and rewards.

After a little while, I was able to figure out the algorithm and rewrite the application. A month or two later, the rewrite was put into production. I documented my work, said my goodbyes, and moved on to my next contract.

A year or so later, the researchers compiled the data and noticed some very surprising results: the chinchillas were a lot more hearing-impaired than they should have been. While this may not seem too big a deal, the findings would have some serious ramifications. Occupational noise-exposure laws would be changed, lawsuits would be filed, and billions would be spent correcting the issue.

Before publishing the results, another team of researchers went over the data and study with a fine-toothed comb to ensure that the results were correct. And whammo, they find a bug in my code. Under certain conditions, one part of the application did not correctly check that the chinchilla jumped at the right time. This meant that the program would deny the chinchilla a food pellet, giving it negative feedback when it in-fact did the right thing. This led to so some rather confused chinchillas which had no idea when they were actually supposed to jump.

In the end, over a year’s worth of data was thrown out, a few man-years of work was wasted, and there were a whole lot of cute little rodents that were rather confused and hard-of-hearing. I still feel bad for deafening those poor chinchillas...

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!