Calculating the true cost of downtime is almost impossible. There's not only the obvious loss of labor to consider, but all sorts of indirect losses like missed opportunity, repair expenses, customer frustration and so on. Fortunately for Eric M.'s company, the management knows exactly how many real dollars it will cost them when their system -- "MCL," as I'll call it -- goes down. Eric's employer is a logistics service provider with a sole customer: a major U.S. automaker. His company is primarily responsible for getting the right auto parts to the right areas in the right plants, on time. Any unexpected delay or shipment error and the entire assembly line can shut down -- and when that happens, the service provider gets to foot the bill to the tune of $5,000 per minute.

To manage this mission-critical distribution operation, the logistics service provider trusts MCL. The MCL system is also responsible for running the rest of the company's mission-critical operations. As each of the 500-per-day trucks pull up to the company's massive warehouse with inbound material, MCL tracks and directs them to one of the 200 or so loading docks. Similarly, when the 300 or so crates on the trucks come in, their order number is scanned and the contents moved.

When the thousands of daily parts requests come in from the automaker, warehouse workers fill up hundreds of labeled containers and place them on one of the 200 or so daily outbound trucks. All in all, MCL handles hundreds of thousands of requests from users throughout the company, and is absolutely critical to the operation.

So for such an important application, with such an expensive rate of failure, what highly reliable application platform did the software developers choose? Microsoft Access.

Running the System

Though generally slow, incredibly clunky and very confusing to use, MCL actually manages to work. Most of the time, at least. When it doesn't work, however, all hell breaks loose.

If the automaker's incoming orders can't be added in MCL, the receiving server does the only logical thing: it sends the incoming datastream to large, continuous-feed printers. From there, the IT staff-Eric included-has to manually do all of the error checking, organizing and coordinating, and literally run the orders to the right area in the 45-acre warehouse.

While the occasional hiccup isn't too bad, if all of the incoming orders are dumped to the printer, as has occasionally happened, the IT staff can process them for all of 15 minutes before getting completely overwhelmed with a never-ending stream of paper.

Learning the Hard Way

Of course, as a new developer, Eric had not been apprised of the criticality and sensitivity of MCL. With no development, testing or QA environment for MCL in sight, Eric did the most sensible thing he could think of before developing some minor changes to a report: he exported a table from the live database to his local workstation. That way, he wouldn't have to risk impacting the live system with his report development.

As it turned out, that wasn't such a good idea. After a few minutes of watching the "Exporting Table ..." dialog, Eric's screen flashed and a lock icon appeared in the upper-right corner. Windows Task Manager opened up and, unable to move the cursor himself, Eric watched the ghost in his machine click on one program after another, terminating each one.

Shortly thereafter, Notepad opened up and the following came in, letter after letter:

DO NOT DO WHATEVER YOU WERE DOING!! 
YOU DEADLOCKED -- MCL CRASHED!!!!!
- THANK YOU, IT/MCL SUPPORT

Consequences

Though Eric didn't get in any trouble for single-handedly crashing the company's mission-critical system, he did learn how painful it was to manually process the 17 orders that didn't make it into MCL while he had inadvertently deadlocked it.

He was lucky, though. The last deadlock brought the entire operation to its knees, requiring a few hundred orders to be manually entered. This, in turn, caused the warehouse workers to not only pack the wrong items, but get them out after deadline. All told, the production line had to be shut down, leaving Eric's company with a $285,000 bill. Not too bad for an hour's worth of work.


A Barely Accessible System was originally published in Alex's DevDisasters column in the Jun 1, 2008 issue of Redmond Developer News. RDN is a free magazine for influential readers and provides insight into Microsoft's plans, and news on the latest happenings and products in the Windows development marketplace.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!