Consider a small European country with more than 20 social insurance institutions, each using their own proprietary software. Now consider sharing data between them. After decades of integration failures, these institutions decided to standardize on a handful of applications. One of these institutions hired Philipp’s firm to migrate their data to DB2.
Philipp’s boss gave him the assignment with a clear conscience. “They have a data transfer interface already established. This should be a quick process.”
However, Philipp’s dreams of webservices, integration end-points, clean XML, and a well organized workflow were shattered when he was handed a few examples of the COBOL-generated flat files the company currently used for data transfer, via FTP. There was no documentation regarding the schema. Philipp sat down with William, an employee at the client site who had worked with this data for the better part of a generation, and had discovered its quirks through trial and error.
“Now, these files look exactly like the ones that we actually send, except they may or may not have an extra field stuffed into character 12,” William explained. “If there’s a ‘Q’ there, then we know we’re using the alternate message block, but only if the customer data flag contains a letter ‘B’.”
Philipp struggled to take notes that his brain would be able to parse later. “And where’s the customer data flag?”
“Oh, we call that column ‘R’. That’s a right-aligned field that starts at character 120. Be careful, because column ‘S’ is left aligned and starts at character 125. If you’re just skimming the file, it’s easy to think they’re the same field.” William chuckled. “Column ‘F’ is the tricky one, though- it needs two leading spaces, then a five-character field value, then five trailing spaces. That’s all one field, mind you.”
Mapping the file to the underlying data was even more of a challenge, as William explained. “Field ‘M’ is a substring across one of the database columns from the Patient database.”
“Which column?” Philipp asked.
“Oh, I don’t know. X1 or X8, I think. I’d have to reread the source code to be sure...”
The data from the flat files- sensitive patient data, transmitted as plain text across FTP- needed additional formatting and cleaning before it could move into its new home in DB2. The destination schema was as clearly specified and documented as the source schema- i.e., it wasn’t. The “already established” process Philipp’s boss had mentioned was a single gigantic stored procedure- thousands of lines of Oracle’s PL/SQL.
Philipp braced his temple. “Oracle? How do I log into Oracle?”
“You don’t,” William said. “We don’t have an Oracle database. You have to work with Stephen, he’s got a local instance on an off-site machine.”
“Could I just write my own DB2 stored procedure instead?”
“Absolutely not! Do you know how much we paid to get the PL/SQL procedure written? We can’t afford to pay that again. Work with Stephen.”
Philipp wasn’t the first person who needed to work with Stephen. The process for doing so was well-documented and formalized. Phillip took the output from his flat-file processing and emailed it to Stephen’s Gmail address. Stephen would import the data into Oracle, run the stored procedure against it, export the results to a CSV file, then email that gigantic file back. Finally, Philipp could import the data into the target DB2 database.
Philipp wasn’t a lawyer, so he had no idea how many privacy laws this violated, but he wasn’t allowed to do anything else. It would only be a one-time process, anyway…
…until after they ran through it, and discovered the data that had ended up in DB2 had significant flaws, requiring iterative corrections.
Since the data entry clerks weren’t allowed to access the test database (“It’s a development environment, and they’d only get confused,” William explained), the data had to be loaded into production. There, the clerks would correct it. Philipp had no access to production (“Security is very important to us”), so the DBA would copy the corrected data back to the test environment. The DBA refused to truncate the table before loading, and refused to drop the table, which meant each time through this cycle created a new table, named something like PRODUCTION_TEST_DATA_13, or 14_TEST_ATTEMPT.
The DBA account owned and controlled each new table. Obtaining access was a separate request to the DBAs each time, with a paragraph justifying why Philipp needed access (“Security is very important to us”).
By the time the DBAs granted him access to PRODUCTION_TESTING_47, Philipp was confident that the migration had finally succeeded. Not long after, he got a call from his boss. ”We’re getting complaints from the client. What’s this I hear about you designing an overly complex migration process?”
Images: Oracle plane, and Midsummer bonfire. Collage by Remy Porter.