“It’ll be a cold day in Hell,” Roger said, “when this system goes down.”


With those words, Roger, Systems Architect, went on sabbatical from Monocorp. The edifice he left behind served its purpose as foretold, until the day Danny O. was pulled out of a meeting by a panicked intern. “Everything is down,” the young man panted, short of breath and sweaty from a brisk dash around the office, trying to find which boardroom the IT team had been assigned for that day’s conference. “Everything! All requests to the web tier are returning some kind of duplicate record error that doesn’t even make sense! We’re dead in the water!”

Danny shook the intern briskly to help restore his composure, and hurried back to his workstation. It was true: all production systems were inaccessible, all requests returning the “duplicate record” error. It sounded like a database problem, so Danny turned to The Database. The Database, as the 90 application servers, each running the same enormous Program that hosted all of Monocorp’s internal and external web services, connected to the same one. Roger had designed The Program to be stateless, so that an arbitrary number of copies could be run in parallel. The single catalog in the single database instance not only provided these copies with their data, but was also the single point of coordination between them.

The Program, meanwhile, was Roger’s answer to the problem of providing a set of web-facing services with common functionality, a sort of “framework for the web”, if you will. It had a sub-system for receiving and routing HTTP requests on top, a sub-system for talking to the database on the bottom, and, sandwiched between, a series of modules that contained all of Monocorp’s business logic, producing results that the web tier could package up as either an HTML or JSON response. Everything, from the Monocorp website to the apps used internally on the employees’ mobiles, relied on The Program. And The Program was trying and failing to put something in The Database.

Danny’s first port of call in The Database was the ERRORS table, a central repository of everything that ever went wrong in any copy of The Program. He could see the “duplicate record” errors, a couple thousand by now, but none of them mentioned a particular module. The errors seemed to be happening in the web tier itself. Danny turned to the MONITORING table next, and saw something very unusual. There was code in the web tier that captured the start and end time of every single web request and saved them to the MONITORING table. A module within The Program kept an eye on this table and analyzed the data for trends and problems, emailing regular reports to the IT team. What Danny noticed when he checked the records in the table since the problems started was… there weren’t any. The last record in MONITORING bore a timestamp from an hour earlier. Though web requests were obviously being received, hence the records in the ERRORS table, they weren’t being logged. Could the monitoring code itself be causing the problem?

Danny studied that last MONITORING record for a long while when it struck him: the ID of the record was 2147483647. I’m sure you noticed, just as Danny did, that this number is the arithmetic limit of a signed, 32-bit integer! The MONITORING table had an auto-incrementing primary key, and when it reached its limit it just kept on trying to insert the same key into the table, resulting in the “duplicate record” errors… on every single request to every copy of The Program. Danny wasn’t sure how to solve the problem, other than sacrificing the existing monitoring data by truncating the table. Before his bosses would allow that, however, they wanted the Architect to weigh in.

It took three tries before Roger answered his phone. Clearly irritated at being interrupted on his sabbatical- Danny could hear the faint sounds of seagulls and surf- he nonetheless listened intently as they described the problem. His thoughts went quickly to the defense of his vision.

“It’s not a problem with the architecture,” he assured them. The bosses nodded, but Danny was unconvinced.

“Are you sure? We’re currently logging a couple million web requests per day across the entire business. If the monitoring table can only hold two billion records, we’ll have to empty the table every year or two. The monitoring system doesn’t have any provision for doing that, or any way to archive historical data…”

“Did you say millions of requests per day?” Roger asked. “It was only maybe ten thousand when I left…”

“We’ve added a lot of modules since then,” Danny explained. “ That was the point of your architecture, right? That we could fit whatever web apps and APIs we needed inside it?”

“Listen,” Roger said, “you can’t blame the architecture when you change the requirements.”

Danny’s boss saw the look on his face and quickly gestured for him to bite his tongue. They confirmed that nothing terrible would happen if they truncated the MONITORING table, and left it at that. When the work was done, Monocorp shuddered back into motion, and Danny put something else in motion, too: his search for a new job. He swore that, by the time the MONITORING table needed truncating again, Monocorp would have to find another monkey to maintain their monolith.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!