When you're a developer like Joe, and your clients all have dedicated servers, and they all call at the same time to complain that their servers have gone down, you can't help but start hoping there was an earthquake. Unless the data center housing all that dedicated hardware was wiped off the face of the earth, the bug was going to be in your software. And sure enough, in the midst of the legacy C++ module responsible for processing the day's transactions, Joe found this:

bool done = false;
while(!done)
{
    try
    {
        //I'm not sure if having a log entry for the irregular
        //Febuary[sic] 29th will destroy everything else.
        //To be on the safe side, we'll just wait 'til tomorrow instead
        Date *currentDate = new Date();
        int DOY = currentDate->dayOfYear();
        if( DOY == 60 && //day 60 is feb 29
            ( lastDigit(currentDate->year()) == 0 ||
              lastDigit(currentDate->year()) == 4 ||
              lastDigit(currentDate->year()) == 8))
        {
            while(currentDate->dayOfYear() == 60) { currentDate = new Date(); }
        }
        else
        {
            //SNIP: code that actually runs part of the maintenance
            done = true;
        }
    }
    catch(...) {} //If we failed we need to try again until we succeed
}

You might have noticed that ill-conceived check for the leap day, which caused this problem to occur on March 1st, 2014, rather than two years prior (an actual leap year), but did you notice what caused the server meltdown? Using new in C++ creates the Date object in allocated memory instead of on the stack. Since nothing in that code deallocates the Date objects, and since there's also nothing in the code that breaks out of the while loop (other than the system clock reaching Day 61), the program happily creates Date objects until it exhausts available memory. Since "if we failed we need to try again until we succeed", it responds to the out-of-memory exception by... continuing to loop and hold onto its memory, bringing every server running the processing job to its RAM-starved knees.

Joe did find a silver lining, however: after the long and painful server-restoration process, he inspected the logs. It turns out the log entries this code allowed to be written on February 29, 2012, had not caused a single issue.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!