We've all learned to write programs a bit more generically than the immediate problem requires. We use interfaces in our code and include concrete implementation classes via some language-appropriate mechanism. We use factories to produce the object we want, rather than hard code them. We use code generators to spew forth mountains of code from configurable templates. Sometimes, we even generate SQL on-the-fly. It provides more flexibility that way; instead of having to write a separate query for every permutation of question, we can write something that can dynamically create the query, execute it and return the results.

Drop Leaf Table

At Initech, Ian was assigned an interesting JIRA ticket: Investigate errors regarding column length. Since that was all the information written in the ticket, Ian hunted down the author of the ticket to pry out a tad more information. Apparently, the part of Initech's intranet website that was used by the sales agents was suddenly throwing errors about some kind of maximum-size error, and he needed to find out what it was, why it was happening and how to fix it.

Sounds routine enough. Someone probably overflowed some buffer somewhere, or had too much data to fit in some column in the database.

The website in question was nothing out of the ordinary. It was written in PHP using the Zend framework connecting via PDO to a MySQL database. It had been mostly written by several programmers who had left the company a few months prior, and was of a quality that was not unusual for a corporate PHP website. There were functions with reasonably sensible names. The MVC pattern was applied in a rational way. It was possible to follow the code structure without retching or having seizures.

Of course, since this was a fairly decent sized company, there were also the requisite SQL injection vulnerabilities on every single input field. No input was ever sanitized; not even manually. There was no error checking on query result sets. Countless uninitialized variables littered the landscape, scattered amongst informative debug statements like: echo("whyyyyyyyy?");

The page producing the error was written by Gary. It was a multi-tabbed form with only some minor WTFs. One drop-down box had a completely different look&feel than all the others on the page, and any kind of error more complicated than a validation error produced the error message "1".

Sales agents were supposed to use it while talking to customers to enter information about the person they were calling. The error they were getting was a MySQL error regarding maximum row sizes, but it wasn't because of too much data in an input field. After a double-take and a quick Googling, Ian found that yes, MySQL -does- have a maximum size on the length of a row, namely 64KB. But this was just an Insert statement, right? What would maximum row length have to do with an existing table schema?

After several spelunking sessions down through the fetid bowels of the system, Ian found the fecal mass of detritus that was causing the problem.

It turns out that Gary was being micro-managed by the VP of Operations. This led to an endless series of requests to add a new field to the form. Gary got tired of doing the same work over and over, and took it upon himself to automate the problem. To this end, any time a new field was added to the form (usually by someone SSH'ing onto the remote server and just editing the necessary PHP files directly in production), the page would later on detect that the database did not have a matching column for it, and would create an alter table statement to add the new column to the database. Automatically. On the fly. With no human knowledge of, or intervention thereof. Live. In Production.

The fact that it had lasted several months without causing some kind of outage was impressive.

The name of the new column was the ID of the field on the form. But what would the data type be? The answer Gary settled upon was... varchar(255). Fast-forward a few months; all those varchar(255) columns started to add up, as management decreed more and more and more fields for more and more and more customer data to be recorded. While having a database table with 200+ Stringly-Typed columns is a WTF in and of itself, the fact that they were automatically added and with a substantial size only made the problem worse, particularly as 255 characters was far more than necessary for most fields. Two-letter state abbreviations, 5-digit Postal codes, 4-digit Zip+4 postal codes, 8-digit customer IDs, apartment numbers, phone numbers, and many others were all blanketed under the varchar(255) datatype.

While Ian wanted to tear out this monstrosity and banish it to the recesses of the Git commit log, he was informed by his boss that, since the Intranet website was going to be taken down in a few months, it would be more prudent to simply resize some of the columns to buy them enough time to take the whole site down. And so, choosing some of the more egregious offenders, Ian altered ten columns or so to reduce them to a sane size (or, better yet, to turn columns holding integer values into actual Integers).