We’re still in the early part of the year, and as little glitches show up from “sliding window” fixes to the Y2K bug, we’re seeing more and more little stories of other date rollover weirdness in our inbox.
Like, for example, the Y2K15 bug, which Encore got to get surprised with. It feels like date issues are turning into a sports game franchise: new releases of the same thing every year.
A long, long time ago, Encore’s company released a piece of industrial machinery with an embedded controller. It was so long ago and so embedded that things like floating point operations were a little to newfangled and expensive to execute, and memory was at an extreme premium.
The engineer who originally designed the device had a clever solution to storing dates. One byte of EEPROM could be dedicated to storing the last two digits of the year. In RAM, a nibble- 4 bits- would then store an offset relative to that base year.
Yes, this had Y2K issues, but that wasn’t really a concern at the time. It also had a rollover issue every 16 years. That also wasn’t really a concern, because it was attached to a giant machine which needed annual service to keep functioning properly. Every few years, the service tech could bring an EEPROM progammer device and flash the base year value in the EEPROM. And if someone missed 16 years worth of service calls, they probably had other problems.
Time passed. Some customers did miss 16 years of service calls. Over time, new features got added. The control interface got an improved LCD. Bluetooth got attached. The networking stack changed. A reporting database got bundled with the product, so all the data being produced by the device could get aggregated and reported on. The way the software interacted with the hardware changed, and it meant that the hardware ran at a lower temperature and could go longer between service calls. But at its core, the chip and the software didn’t change all that much.
In that time, there were also changeovers in the engineering team. People left the company, new engineers joined, documentation languished, never getting updated. Years might pass without anybody touching the software, then suddenly a flurry of customer requests that needed patched RIGHT NOW would come through, and anybody who vaguely understood the software got roped in to do the work, then shunted back off to other projects.
On New Year’s Day, 2016, a deluge of tickets started coming in. Encore, as the last person to have touched the software, started picking them up. They all expressed the same problem: the date had rolled over to 2000. The reporting database was confused, the users were confused, and even if they tried to set the clock to 2016 manually, it would roll back from 2015 to 2000.
Now, no one at the company, including Encore, actually knew about the date system in use at this point. The support manual did say that rollovers meant the device had gone 16 years without being properly serviced, but some of these customers had brand new devices, less than a year old. And customers with devices older than 16 years weren’t seeing this problem.
Encore investigated, and picked apart how the date handling worked. That, itself, wasn’t the problem. It took a lot more investigation to track down the problem, including going back to the board schematics to trace how various hardware components were connected. After a few hair-on-fire weeks of crisis management, Encore pieced together the series of events as they were best able.
Sometime after the year 2000, Bluetooth was added to the device. Something about how the Bluetooth module connected to the other components had broken the flasher-software that could update the base year. This meant that the devices had never had their base year set, and simply had a 0 value- 0x00
, or the year 2000.
Which meant, for the next 16 years, everything was fine. Techs went out, tried to flash the EEPROM, reset the clock to the correct date, and went about their business, never aware that they hadn’t actually done anything. But come 2016, all of these devices rolled back over to the year 2000.
Encore was able to figure out a script to trick the system into adjusting the output to correct the base year issue, but it also meant many customers had database crammed with bad data that needed to be adjusted to correct the erroneous year.
After this, Encore’s company released upgraded version of the system which contained a GPS receiver, so that it could set its date based on that, but a large number of their customers weren’t interested in the upgrade. Encore has already blocked off the first few weeks of 2032 in preparation.