Twenty years out, people have a hard time remembering that Y2K was an actual thing, an actual problem, and it was only solved because people recognized the danger well ahead of time, and invested time and effort into mitigating the worst of it. Disaster didn’t come to pass because people worked their butts off to avoid it.
Gerald E was one of those people. He worked for a cellular provider as a customer service rep, providing technical support and designing the call-center scripts for providing that support. As 1999 cranked on, Gerald was pulled in to the Y2K team to start making support plans for the worst case scenarios.
The first scenario? Handling calls when “all phone communication stopped working”. Gerald didn’t see much point in building a script for that scenario, but he gamely did his best to pad “we can’t answer the phones if they don’t ring” into a “script”.
There were many other scenarios, though, and Gerald was plenty busy. Since he was in every meeting with the rest of the Y2K team, he got to watch their preparedness increase in real time, as different teams did their tests and went from red-to-green in the test results. A few weeks before the New Year, most everything was green.
Y2K fell on a Saturday. As a final preparation, the Y2K team decided to do a final dry-run test, end-to-end, on Wednesday night. They already ran their own internal NTP server which every device on the network pulled from in one way or another, so it was easy to set the clock forward. They planned to set the clock so that at December 29th, 22:30 wall-clock time the time server would report January 1st, 00:00.
The Y2K team gathered to watch their clock count down, and had plans to watch the changeover happen and then go party like it was 1999 while they still had time.
At 22:29, all systems were green. At 22:30- when the time server triggered Y2K- the entire building went dark. There was no power. The backup generator didn’t kick on. The UPSes didn’t kick over. Elevator, Phones, HVAC, everything was down.
No one had expected this catastrophic a failure. The incident room was on the 7th floor of the building. The server room was in the basement. Gerald, as the young and spry CSR was handed a flashlight and ended up spending the next few hours as the runner, relaying information between the incident room and the server room.
In the wee hours of the morning, and after Gerald got his cardio for the next year, the underlying problem became clear. The IT team had a list of IT assets. They had triaged them all, prioritized their testing, and tested everything.
What no one had thought to do was inventory the assets managed by the building services team. Those assets included a bunch of industrial control systems which managed little things, like the building’s power system. Nothing from building services had ended up in their test plan. The backup generator detected the absence of power and kicked on- but the building’s failure meant that the breakers tripped and refused to let that power get where it was needed. Similar issues foiled their large-scale UPS- they could only get the servers powered up by plugging them directly into battery backups.
It was well into the morning on December 30th when they started scrambling to solve the problem. Folks were called back from vacation, electricians were called in and paid exorbitant overtime. It was an all-hands push to get the building wired up in such a way that it wouldn’t just shut down.
It was a straight crunch all the way until New Year’s Eve, but when the clock hit midnight, nothing happened.