Russell M. knew better than to tempt fate. The last time someone asked him about Big Telco’s network downtime, he bragged about not having any since he began … only for the network to go down within minutes. That time, a construction worker plugged a power drill into a UPS and drained it.

This time, with no construction on-site, he couldn’t use that excuse.

A round shield from the 16th century with a gun port in the center, allowing the user to fire a weapon from behind the shield

“No one can call anyone, anywhere, on our regional network,” the CEO rasped through a speakerphone. Russell sat in a conference room with the other four employees in SysOps. The executive board were all on the other line. “I want hourly reports … No, semi-hourly reports. If someone’s not calling me every thirty minutes about what’s been done, there’ll be a box and a pink slip at each of your desks come Monday morning.”

The CEO had a flair for the dramatic, but Russell knew he couldn’t brush this one off, not after last time. “Okay,” he said to the rest of SysOps, “let’s do a visual check.”

Throwing the Book at It

Down in the equipment room, a basement where phone books were once archived, Russell and the others visually inspected Big Telco’s CLEC infrastructure. All of the switches and other equipment were stored here. Russell and the rest of SysOps rarely came through for anything other than an emergency.

Russell, checking that all of those boxes were still turned on, didn’t notice the three-inch-thick paperback manual splayed on top of a keyboard at first. He picked up the blue-and-white doorstop and put it back onto the shelf above. He switched on the monitor and checked the logs. Through bad luck, the manual had mashed just the right key combo to trigger a feedback loop in the network switch. This was what had brought down an entire regional telephone network: some bad commands triggered by a knocked-over equipment manual. However, Russell had no means to stop the loop from the console itself. He sighed, knowing the fire and brimstone about to come down on him.

He rebooted the switch.

The Tel-X Files

Big Telco’s investigative team arrived that Monday. Russell knew you never rebooted part of the network unless you had no other choice, because Big Telco would be obligated to file a report with the FCC. The investigative team, nicknamed “Scully” and “Mulder,” summoned Russell to the conference room.

“Accounting estimates we lost a million dollars in revenue due to the outage,” Scully began. “That’s potential revenue, estimates for lost customers over eight quarterly cycles,” Scully continued. “So tell me: did you really have to reboot that piece of equipment?”

“No other choice,” Russell said. “It would have kept the network down indefinitely.”

“If I could speculate,” Mulder said, “you said that the book had fallen off the shelf. Could it have been sabotage? Do you think a member of SysOps could have done it?”

“No one ever goes down there,” Russell said, “except us and custodial.”

Mulder wrote on a notepad. “Check … custodial … for … recent … hires.”

“I mean, you couldn’t have prevented it,” Russell continued. “Unless you chained the manuals to the shelves and put plexiglass over the keyboards.”

“Is that your recommendation?” Scully asked.

Installation Procedures

A shipping pallet stacked with boxes was delivered to SysOps the following week. Russell and the others unpacked them. Inside were twenty-something pieces of moulded plexiglass, along with hundreds of feet of thin chains. Russell read aloud the attached instructions.

“‘Use a 3/4 inch bit to drill a hole in the spine large enough to pass the chain through. Only use three feet of chain per book.’” That would barely be enough to open the book on its shelf. “‘Each plexiglass shield has been custom-made for a single keyboard in the equipment room, per the latest inventory. Do not put a plexiglass shield over the wrong keyboard.’”

After SysOps implemented the changes recommended by Scully and Mulder, Big Telco submitted its findings to the FCC, and it dodged a hefty fine for the outage.

A few months later, one of the ancient book shelves collapsed onto on a console. The monitor shattered, but the keyboard, protected by its shield, remained intact. None of its sensitive keys had been pressed.

[Advertisement] Incrementally adopt DevOps best practices with BuildMaster, ProGet and Otter, creating a robust, secure, scalable, and reliable DevOps toolchain.