Steven worked for a company that sold “big iron” to big companies, for big bucks. These companies didn’t just want the machines, though, they wanted support. They wanted lots of support. With so many systems, processing so many transactions, installed at so many customer sites, Steven’s company needed a better way to analyze when things went squirrelly.
Thus was born a suite of applications called “DICS”- the Diagnostic Investigation Console System. It was, at its core, a processing pipeline. On one end, it would reach out to a customer’s site and download log files. The log files would pass through a series of analytic steps, and eventually reports would come out the other end. Steven mostly worked on the reporting side of things.
While working on reports, he’d sometimes hear about hiccups in the downloader portion of the pipeline, but as it was “not his circus, not his monkeys”, he didn’t pry too deeply. At least, he didn’t until one day, when his boss knocked on his cubicle divider.
“Hey, Steven. You know Perl, right?”
“Uh… sure.”
“And you’ve worked with XML files, right?”
“I… yes?”
“Great. Bob’s leaving. You’re going to need to take over the downloader portion of DICS. Talk to him ASAP. Great, thanks!”
Perl gets a reputation for being a “write only language”, which is at least partially undeserved. Bob was quite sensitive about that reputation, so he stressed, “I’ve worked really, really hard to keep the code as clean and clear as possible. Everything in the design is object oriented.”
Bob wasn’t kidding. Everything was wrapped up as a class. Everything. It was so class-happy it made the Spring framework jealous. JEE consultants would look at it and say, “Whoa, maybe slow down with the classes there.” A UML diagram of the architecture would drain ten printers worth of toner. The config file was stored in XML, and just for parsing out that file and storing the results, Bob had written 25 different classes, some as small as three lines. All in all, the whole downloader weighed in at about 5,000 lines of Perl code.
In the whirlwind tour, Steven asked Bob about the complexity. “It’s not complex. Each class is extremely simple. Well, aside from the config file wrapper, but it needs to have lots of methods because it has lots of data! There are so many fields in the XML file, and I needed to create getters and setters for them all! That way we can have Data Abstraction! That’s important! Data Abstraction is how we keep this project maintainable. What if the XML file format changes? It’s happened, you know. This will make it easy to keep our code in sync!”
Steven marveled at Bob’s ability to pronounce “data abstraction” as if it were in bold face, and resolved to touch the downloader script as little as possible. That resolution failed pretty much a week after Bob left, when the script fell down in production, leaving the DICS pipeline empty. Steven had to roll up his sleeves and get hands on with the code.
Now, one of Perl’s selling points is its rich library. While CPAN may have its own issues as a package manager, if you want to do something like parse an XML file, there’s a library that does it. There’s a dozen libraries that’ll do it. And they all follow a vaguely Perl-idiom, and instead of classes, they’ll favor associative arrays. That way, when you want to get something like the contents of the ip_addr
tag from the config file, you could write code like this:
$ip_addr = $config->{hosts}[$n]{ip_addr}
This makes it easy to understand how the structure of the XML file relates to the Perl data structure, but that kind of mapping means that there isn’t any Data Abstraction, and thus was utterly the wrong approach. Instead, everything was done as a getter/setter method.
$ip_addr = $Config_object->host($n)->get_addr();
That doesn’t look too different, perhaps, but the devil is in the details. First, 90% of the getters were “thin”, so get_addr
might look something like this:
sub get_addr { return $self->{Addr}; }
That raises questions about the value of these getters/setters for fetching config values, but the bigger problem was this: there was nothing in the config file called “Addr”. Does this method return the IP address? Or a string in the from “$ip_addr:$port”? Or maybe even an array, like [$ip_addr, $port]
.
Throughout the whole API, it was a bit of a crapshoot as to what any given method might return. And as for checking the documentation- they’d created a system that provided Data Abstraction, they didn’t need documentation, did they?
To track any given getter back to the actual field in the XML file it was getting, Steven had to trace through half a dozen different classes. It was frustrating and tedious, and Steven had half a mind to just throw the whole thing out and start over, consequences be damned. When he saw the “Translation” subsystem, he decided that it really did need to be thrown out, entirely.
You see, Bob’s goal with Data Abstraction was to make it so that, if the XML file changed, it would be easy to adapt the code. But the code was a mess. So when the XML file did change a few years back, Bob couldn’t update the config handling classes in any way that worked. So he did the next best thing- he wrote a “translation” module that would, using regular expressions, convert the new-style XML files back into the old-style XML files. Then his config-file classes could load and parse the old-style files.
Steven sums it up perfectly:
Bob’s classes weren’t data abstraction. It was just… data abstracturbation.
When Steven was done reimplementing Bob's work, he had about 500 lines of code, and the downloader stopped failing every few days.