It was only his third day on the job, but Dave could tell it was going to be a long one. His fear had come true; what should've been an easy fix (capture an extra data field) was going to involve him debugging a long regular expression that had no comments revealing its pattern. Its arcane characters may as well have been heiroglyphics, and as regular expressions often do, it looked as though someone had held down shift while randomly mashing the number keys. Worse still, there were recursive methods used to parse these expressions. If you added in linked lists you'd have a CS101 student's personal hell.

"I mean, it's not that regular expressions are bad," he explained to a colleague, "it's just that they're ridiculously hard to interpret when they get to have so many groups and submatches and whatnot."

His colleauge Matt stared back blankly, unblinking. "It used to be worse."

Scrapes and Bruises

The utility was more or less a web scraper, but it also needed the ability to automatically post form input to access certain pages. This was handled by three steps:

  1. Fetch the pages
    This step took an XML-serialized query (not just the page URLs, but any POST data, custom headers, cookies, etc.). The query was transformed to a list of pages via XSLT. Some page requests had custom logic or their own set of regular expressions for some mysterious reason.
  2. Send the request, format the response
    The response would be formatted into XML via a clowder of regular expressions, which is where Dave had gotten tripped up in the first place. Thanks to heaps of cyclic data structures, this is the step that would occasionally take down the web site if it hit the right combination of characters that it would recurse infinitely in parsing. Who knew VB6 would happily gobble up over two gigabytes of RAM? Well, Dave, now.
  3. Re-write the response XML file
    The response generated in step two would be converted to one of two XML representations via XSLT. These two formats were neither documented nor verified.

The XSLT templates were version controlled, but the regular expressions only existed in the wild, untamed database. In production. Thanks to a ridiculously convoluted configuration, no one was able to set up a reasonable test version. The team had just learned to accept that making changes to the regular expressions meant trying a change and then tweaking it like crazy until production wasn't broken anymore.

And Again

Dave threw his hands up in the air and sighed. He went to Matt to complain again.

Matt tried to be reassuring. "It used to be worse."

Turns out that the system used to be built from just two XSLT templates, each over forty thousand lines long. In those dark times, the interface to maintain the regular expressions was an email to the manager with the updated expression. There wasn't even an attempt at an environment other than production.

Naming conventions were... unconventional. Duplicate variables that apparently held the same values were scattered throughout the code; variables like "numPersons," "peopleCount," and "totalPeople." Some functions were designed with arguments passed in as a space-delimited string (for example, doSomething("0 0 0") rather than doSomething(0, 0, 0)). Also worth noting is that doSomething (yes, that was the actual function name) was actually one of the core functions of the system. Apparently the developer couldn't decide on a good name for the function and gave up.

Functions that performed the same function were duplicated, though they used different algorithms to accomplish their goals. There were at least four functions that would return an upper case string: "uc," "getUpperCase," "touppercase," and "uppercase." I'm sure if he dug enough Dave would've found upperCaseIt.

Dave could barely believe it, but Matt's explanation made him feel better. At least he didn't have to manage that godawful version.

Matt smirked. "But even before that... it used to be worse." Matt went on to describe the first version of the software, which had been written by the CEO. In Excel. I'll spare you the details.

Improving It

Months later, Dave had grown accustomed to the insanity, and had even managed to make some positive change in the application. It was still something to be ashamed of, of course, but much of the redundancy, duplicate functions, functions designed to handle tasks that are built-in to the language was eliminated. They even had a generally-working version in test.

Still, the design was a little crazy, the XSLT templates complex, and the regular expressions mostly undocumented. A new developer came to Dave to vent his frustrations.

Dave smiled. "Well Jim, it used to be worse..."