When you've been in IT for as long as Pat McGee, you're bound to have survived at least one or two COBOL horror stories. While COBOL is certainly not the worst platform to develop software on (MUMPS will most certainly hold that title through at least our grandchildren’s lifetimes), its extreme verbosity and unique idiosyncrasies make it a challenge for organizations to develop clean, maintainable code.
To COBOL's credit, it was one of the first attempts – actually, it was probably the first attempt – at self-obsolescence. Like today, the programmers of old were far too talented to meddle in trite matters like "business rules." After all, if the managers and analysts could conjure up these business rules, they could certainly write them up in a business-oriented language. A COmmon Business-Oriented Language, if you will. Of course, we all know how that story ends, and five decades later, COBOL programmers are still paying for that arrogance today.
Back in the late '90s, Pat found himself doing exactly that. Unlike many of his colleagues, he wasn't working on any exciting Y2K bugs, but instead was tasked with something much more mundane: write a program to import several million records of COBOL-format, tape-based files into Oracle. While the hardware had long since had been upgraded to use "virtual tapes", they had not aged well.
At the heart of the system was a 10,000 line COBOL record descriptor from a design that started back in the 1960s – long before anyone had heard of 3rd Normal Form back then, much less believed it would be a good thing. Record descriptors aren't terribly difficult to follow; they mostly just map field names and data types to positions in a record. For example, a simple descriptor would look like this:
01 Employee-Rec. 02 Employee-ID PIC X(10). 02 Employee-Name. 03 Last-Name PIC X(20). 03 First-Name PIC X(12). 03 Middle-Init PIC X. 02 Position. 03 Job-Code PIC X(4). 03 Department PIC X(3). 03 Manager-ID PIC X(10). 02 Hourly-Pay PIC 9(3)V99. 02 Past-Job-Codes. 03 Past-Job-Code1 PIC X(4). 03 Change-Date1. 04 Change-Month1 PIC 99. 04 Change-Day1 PIC 99. 04 Change-Year1 PIC 99. 03 Past-Job-Code2 PIC X(4). 03 Change-Date2. 04 Change-Month2 PIC 99. 04 Change-Day2 PIC 99. 04 Change-Year2 PIC 99. 03 Past-Job-Code3 PIC X(4). 03 Change-Date3. 04 Change-Month3 PIC 99. 04 Change-Day3 PIC 99. 04 Change-Year3 PIC 99.
While a corresponding record would look like this:
ABCD123456MCGEE JAMES PACCTAR ABCD65432104250CLRK010195INTN010397
COBOL-format record and record descriptors don't respond to change very well, and like any piece of business software, they are changed very often. On the system Pat was working on, they changed very, very often. And this meant that any program that had to deal with the COBOL program's data (such as the Oracle record importer that Pat maintained), had to change just as often.
To make matters worse, Pat had absolutely no influence or visibility into the update process; he simply had to take the COBOL output and make it work. It was boring, tedious work, and Pat had all sorts of ideas on how to improve the process. Of course, it would have taken an Act of God for the customer to be willing to make any changes, and He definitely wasn't on Pat's team. What this meant was that Pat had to update the Oracle importer tool every week or so, whenever the customer made tweaks to the descriptor and corresponding files.
After the third or so week, Pat found that this simple tweak represented a whole lot hassle. The code changes were relatively easy, but they just kept coming and coming and coming. He thought about it for a bit, and figured that he could probably write something that would do exactly what he did: read the COBOL record descriptor and generate a new transfer mapping each time the format changed.
What he ended up with was a LEX/YACC grammar that described the COBOL record format, and some C sections for each parsed item. Those C sections generated a C++ program that implemented the translation. A quick compile of the C++ program and the translator program could chug along, happily reading the virtual tapes and writing text files that we could import into Oracle with the standard tools.
At least, that was the theory. As soon as he passed in the actual COBOL record descriptor, he learned that his LEX/YACC/C/C++ program couldn't quite handle all the oddities that the customer managed to include in the COBOL record descriptor. So he wrote some SED scripts to rewrite sections of the COBOL stuff before feeding it into the program. The SED script worked like a charm, and the program generator program spit out a perfect field translation map.
Then the customer requested another change. And then another. And then another. As it turned out, Pat hadn't quite managed to capture rules for all the really weird changes they could make in the COBOL record format. This meant that, almost every time the customer requested a change, Pat would then have to change the SED/LEX/YACC/C/C++ code, and then re-run it to regenerate the translator.
After a few months, Pat had re-written the SED/LEX/YACC/C/C++ code several times over, each time adding more and more validation capabilities. Despite all this, a change to his program-generator was required at least 50% of the time. And since no one else on his team was willing to learn anything about LEX and YACC, much less SED, the maintaining and executing the program generator became his primary responsibility.
The end came much sooner than Pat had expected. Not to the COBOL program or the project as a whole – just his particular assignment. He maintained ties with the folks in his group and learned that, the very next week after his last, the SED/LEX/YACC/C/C++ program stopped working. In response, the whole project was shut down for a month while someone wrote another C++ program, by hand, to do the translation. Of course, that person got stuck spending a couple of days each week updating the program.
But at least he had job security.