PT didn’t provide very much information about today’s Representative Line.
Clearly bits and bytes was not something studied in this SQL stored procedure author. Additionally, Source control versions are managed with comments. OVER 90 Thousand!
--Declare @PrevNumber smallint
--2015/11/18 - SMALLINT causes overflow error when it goes over 32000 something
-- - this sp errors but can only see that when
-- code is run in SQL Query analyzer
-- - we should also check if it goes higher than 99999
DECLARE @PrevNumber int --2015/11/18
Fortunately, I am Remy Poirot, the world’s greatest code detective. To your untrained eyes, you simply see the kind of comment which would annoy you. But I, an expert, with experience of the worst sorts of code man may imagine, can piece together the entire lineage of this code.
Let us begin with the facts: no source control is in use. Version history is managed in the comments. From this, we can deduce a number of things: the database where this code runs is also where it is stored. Changes are almost certainly made directly in production.
Which, when those changes fail, they may only be detected when the “code is run in SQL Query Analyzer”. This ties in with the “changes in production/no source control”, but it also tells us that it is possible to run this code, have it fail, and no one notices. This means this code must be part of an unattended process, a batch job of some kind. Even an overflow error vanishes into the ether.
This code also, according to the comments, should “also check if [@PrevNumber] goes higher than 99999”. This is our most vital clue, for it tells us that the content of the value has a maximum width- more than 5 characters to represent it is a problem. This obviously means that the target system is a mainframe with a flat-file storage model.
Already, from one line and a handful of comments, we’ve learned a great deal about this code, but one need not be the world’s greatest code detective to figure out this much. Let’s see what else we can tease out.
@PrevNumber
must tie to some ID in the database, likely the “last processed ID” from the previous run of the batch job. The confusion over smallint
and need to enforce a five-digit limit implies that this database isn’t actually in control of its data. Either the data comes from a front-end with no validation- certainly possible- or it comes from an external system. But a value greater than 99999 isn’t invalid in the database- otherwise they could enforce that restriction via a constraint. This means the database holds data coming from and going to different places- it’s a “business integration” database.
With these clues, we can assemble the final picture of the crime scene.
In a dark corner of a datacenter are the components of a mainframe system. The original install was likely in the 70s, and while it saw updates and maintenance for the next twenty years, starting in the 90s it was put on life support. “It’s going away, any day now…” Unfortunately, huge swathes of the business depended on it and no one is entirely certain what it does. They can’t simply port business processes to a new system, because no one knows what the business processes are. They’re encoded as millions of lines of code in a dialect of COBOL no one understands.
A conga-line of managers have passed through the organization, trying to kill the mainframe. Over time, they’ve brought in ERP systems from Oracle or SAP. Maybe they’ve taken some home-grown ERP written by users in Access and tried to extend them to an enterprise solution. Worse, over the years, acquisitions come along, bringing new systems from other vendors under the umbrella of one corporate IT group.
All these silos need to talk to each other. They also have to be integrated into the suite of home-grown reporting tools, automation, and Excel spreadsheets that run the business. Shipping can only pull from the mainframe, while the P/L dashboard the executives use can only pull from SAP. At first, it’s a zoo of ETL jobs, until someone gets the bright idea to centralize it.
They launch a project, and give it a slick three-letter-acronym, like “QPR” or “LMP”. Or, since there are only so many TLAs one can invent, they probably reuse an existing one, like “GPS” or “FBI”. The goal: have a central datastore that integrates data from every silo in the organization, and then the silos can pull back out the data they want for their processes. This project has a multi-million dollar budget, and has exceeded that budget twice over, and is not yet finished.
The code PT supplied for us is but one slice of that architecture. It’s pulling data from one of those centralized tables in the business integration database, and massaging it into a format it can pass off to the mainframe. Like a murder reconstructed from just a bit of fingernail, we’ve laid bare the entire crime with our powers of observation, our experience, and our knowledge of bad code.