About two years ago, we took a little trip to the Galapagos- a tiny, isolated island where processes and coding practices evolved… a bit differently. Calvin, as an invasive species, brought in new ways of doing things- like source control, automated builds, and continuous integration- and changed the landscape of the island forever.

Geospiza parvula

Or so it seemed, until the first hiccup. Shortly after putting all of the code into source control and automating the builds, the application started failing in production. Specifically, the web service calls out to a third party web service for a few operations, and those calls universally failed in production.

“Now,” Hank, the previous developer and now Calvin’s supervisor, “I thought you said this should make our deployments more reliable. Now, we got all these extra servers, and it just plumb don’t work.”

“We’re changing processes,” Calvin said, “so a glitch could happen easily. I’ll look into it.”

“Looking into it” was a bit more of a challenge than it should have been. The code was a pasta-golem: a gigantic monolith of spaghetti. It had no automated tests, and wasn’t structured in a way that made it easy to test. Logging was nonexistent.

Still, Calvin’s changes to the organization helped. For starters, there was a brand new test server he could use to replicate the issue. He fired up his testing scripts, ran them against the test server, and… everything worked just fine.

Calvin checked the build logs, to confirm that both test and production had the same version, and they did. So next, he pulled a copy of the code down to his machine, and ran it. Everything worked again. Twiddling the config files didn’t accomplish anything. He build a version of the service configured for remote debugging, and chucked it up to the production server… and the error went away. Everything suddenly started working fine.

Quickly, he reverted production. On his local machine, he did something he’d never really had call to do- he flipped the build flag from “Debug” to “Release” and recompiled. The service hung. When built in “Release” mode, the resulting DLL had a bug that caused a hang, but it was something that never appeared when built in “Debug” mode.

“I reckon you’re still workin’ on this,” Hank asked, as he ambled by Calvin’s office, thumbs hooked in his belt loops. “I’m sure you’ve got a smart solution, and I ain’t one to gloat, but this ain’t never happened the old way.”

“Well, I can get a temporary fix up into production,” Calvin said. He quickly threw a debug build up onto production, which wouldn’t have the bug. “But I have to hunt for the underlying cause.”

“I guess I just don’t see why we can’t build right on the shared folder, is all.”

“This problem would have cropped up there,” Calvin said. “Once we build for Release, the problem crops up. It’s probably a preprocessor directive.”

“A what now?”

Hank’s ignorance about preprocessor directives was quickly confirmed by a search through the code- there was absolutely no #if statements in there. Calvin spent the next few hours staring at this block of code, which is where the application seemed to hang:

public class ServiceWrapper
{
    bool thingIsDone = false;
    //a bunch of other state variables

    public string InvokeSoap(methodArgs args)
    {
        //blah blah blah
        soapClient client = new Client();
        client.doThingCompleted += new doThingEventHandler(MyCompletionMethod);
        client.doThingAsync(args);

        do
        {
            string busyWork = "";
        }
        while (thingIsDone == false)

        return "SUCCESS!" //seriously, this is what it returns
    }

    private void MyCompletionMethod(object sender, completedEventArgs e)
    {
        //do some other stuff
        thingIsDone = true;
    }
}

Specifically, it was in the busyWork loop where the thing hung. He stared and stared at this code, trying to figure out why thingIsDone never seemed to become true, but only when built in Release. Obviously, it had to be a compiler optimization- and that’s when the lightbulb went off.

The C# compiler, when building for release, will look for variables whose values don’t appear to change, and replace them with in-lined constants. In serial code, this can be handled with some pretty straightforward static analysis, but in multi-threaded code, the compiler can make “mistakes”. There’s no way for the compiler to see that thingIsDone ever changes, since the change happens in an external thread. The fix is simple: chuck volatile on the variable declaration to disable that optimization.

volatile bool thingIsDone = false solved the problem. Well, it solved the immediate problem. Having seen the awfulness of that code, Calvin couldn’t sleep that night. Nightmares about the busyWork loop and the return "SUCCESS!" kept him up. The next day, the very first thing he did was refactor the code to actually properly handle multiple threads.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!