"A bunch of orders out of Winnipeg haven't been sending MSDSes!"
"I'm on it," Rick said. He grabbed his six-guns (er, six-pack of soda) and hopped on his horse (er, ergonomic office chair), and went to clean up Dodge (er, Winnipeg).
Material Safety Data Sheets were an important thing. Rick's employer dealt in expensive chemicals. Any time they sold a formulation to a customer, that customer needed explicit instructions detailing the horrible things that the chemical does when you ingest it, smear it on your genitals, or give it to children to play with. Whether it's sodium hydroxide or distilled water, MSDSes have to be sent.
Three years prior, Rick wrote the "GetMSDSRequest" application. This simple little app read records off of a flat-file on the mainframe and sent it to the "Product Management Services" database, owned by the Safety and Health IT team. As with any Enterprise Integration project that involved sharing data with other business units, the whole thing had been a giant mess. On Tuesday, the columns mapped like this. On Wednesday, like that. On Thursday it was discovered they were going to the entirely wrong staging table and needed to be sent elsewhere. Eventually, Rick hammered out an agreement, put the application into production, and mostly forgot about it. The only reminder he had of its existence was a nightly heartbeat email it fired off, proving that it was running. An Outlook rule filed those away for him.
This was the first call about the application after three years of running without incident. Given the initial trouble he had, in addition to the heartbeat email it sent him every night to confirm it ran, he had designed the application to generate a painfully verbose log file. One quick grep through the log file would tell Rick everything he needed to know about what might have gone wrong. It was a slim chance that his application was the culprit; it was more likely that the mainframe wasn't sending the data, but it was extremely likely that PMS was simply losing the records.
Rick called Operations. "Hey, I need you to grab me a log file off of PMSAPP01. It's called msdsprocess.log."
"Where on PMSAPP01?"
"I don't know. You guys installed it. By now, it should be like 700 megs of logging data."
"There's nothing like that on PMSAPP01."
Rick and Operations crawled up one side of the server and back down the other. His documentation said it had been installed there. He had the email request he had sent to install it. Operations had the ticket that said it had been installed there. The only problem was that it wasn't there.
Rick double checked his inbox, but sure enough, the heart-beat emails were still coming in, one a night, just like they were supposed to. So Rick turned to Sharepoint. A search for the application name turned up a diagram, buried in a PowerPoint slide, from over a year before. According to that slide, the PMS environment had his application deployed on PMSAPP07.
"The hell," Rick complained to himself. "You can't just go moving my app around like that." He sent Operations looking over there.
"Nope, nothing there either."
Rick picked up the phone and started the task of trying to find someone- anyone- from Safety and Health IT. The task was more Herculean than it sounds. 99% of their job was to sit in on meetings and hash out which columns in their system mapped to which columns in other systems. The remaining 1% was devoted to the actual support work needed to keep PMS running. When the phone failed, he walked across the complex to their offices. The door was locked and a fine coating of dust covered all the horizontal surfaces.
He went back to the phones, and after going down the emergency contact list, he got Sally, the team leader. "Oh, that application? Yeah, it's been getting bounced between servers as we need to. The only person that would know where it is would be Jack. He's in a meeting in Building C until six."
"You don't have a document, or something? A spreadsheet?" Rick asked.
"Oh, there is, but only Jack knows where the documentation is."
Rick counted to pi before hanging up the phone and releasing a torrent of profanity that blanched the upholstery on his cube walls. He wasn't done yet, though. He went back to the emergency contact list and found Jack's cellphone number. He called it. And when there was no answer, he called it again. And again.
"What? I'm in a meeting."
"Where's my application?" Rick asked. He quickly filled Jack in on the background.
"No idea. I'd have to go back to my desk to check. I think it's either on PMSAPP02 or 05. I have to go back to my meeting."
Actually, it was PMSAPP04. With Operations' help, Rick finally got his log file. In total, it took three days to actually get it. After that, fifteen minutes with grep
proved that the mainframe had sent all of the orders to his application, his application had picked them up and sent them to PMS, and they had arrived without errors. He wrote up that report, and jammed one more critical issue into the 1% of time the S&H IT folks had to solve problems. And, for good measure, he also patched his application. From then on, the application's heartbeat emails no longer said, "I'm alive," and instead announced, "I'm alive on {server}".