The Rules of Optimization are simple. Rule 1: Don’t do it. Rule 2 (for experts only): Don’t do it yet.
These rules were coined by Michael A. Jackson (no, not that Michael Jackson; yes, that Jackson of the "Jackson Diagram") in his 1975 book, Principles of Program Design. It’s an interesting and depressing read that reminds us how much we knew about good software development then, and how little we follow that knowledge today. In fact, three decades and almost infinitely more computing power later, many programmers still fail to remember these rules on optimization. And among those that do remember, many have no idea what "optimization" means.
As the tech lead at his company, Chris was responsible for purchasing and setting up a database server requested by the development team. Although they recommended a fairly low-end server, Chris knew that the company was growing and wanted to make sure the hardware could keep up. He bought a Dual Xeon 2.8Ghz with 4GB of memory and a 3x36G 15K SCSI RAID5.
Development needed the server for the brand-new data management application they were creating. Their existing system, though vital to the business, was developed around the same time as Jackson’s book and did not quite appreciate with age. Everyone in the company – including Chris, who was responsible for supporting the ancient application – was eager to switch to something new.
The new system was built with the same lessons the original team didn’t learn. In the database, every column was defined as VARCHAR(100). Sure, INTs and DATETIMEs are nice for recording integers and dates, and probably should be used for recording integers and dates, but this way, developers only had to remember a single datatype. In a few months, the database ballooned to 10GB. Several tables were over 1GB in size, despite only having around 100,000 records.
Chris objected to his boss and suggested correcting datatypes throughout the system. "No, finishing development is what we have to worry about now. We’ll optimize later." Considering the complexity involved in changing VARCHAR(100) fields across the whole database to their correct types, Chris new that VARCHAR(100) was here to stay.
The system was already live, too, and Chris’s company was depending on it. Users needed to be able to run reports so they could get their jobs done. Running a report, though, would sometimes cause the servers to chug for up to 30 minutes without any feedback to the user. Chris again objected to his boss, asking to add a progress indicator while the report ran, but again, "it’s not time for optimization!"
Despite his best efforts, Chris took the blame for not tuning it correctly. He tried indexing (not very helpful when everything’s a VARCHAR), query optimization, and even in-memory tables. Nothing worked. Most of the tables pushed the maximum row size limit and all type conversions were done on the database server. That’s right – in order to do even the simplest retrievals (say, a small range of records within a date range), SQL Server had to read every single row, convert the date column to a DATETIME, and then determine if it was a match. All of this added up to considerable computational stress, and the database server could barely keep up.
Chris is no longer there, but the system he worked on is still in use. In fact, if he clicked "Run Report" on the day he quit, it should be finished pretty soon.