Henry was a hotshot developer working on a team that specialized in performance tuning. When other teams in his company had performance problems they couldn’t solve, Henry’s team was called in. Through profiling, analysis of algorithms, and database tuning, his team excelled at turning inefficient, slow, bug-ridden software into applications that actually did real work in a timely manner.

Usually.

Henry was selected to help a customer with issues with a brand-new web application developed by another team in his company. Performance was terrible. Requests often timed out and the server’s CPU typically sat at 100% usage for hours at a time. The server admins had to restart server processes multiple times per day to keep the application running.

several variations on a B-Tree

He met with the lead developer of the other team to discuss their performance woes.

“We’ve combed through every line of code,” the other developer, a young man named Colin, explained. “I don’t think it’s our code’s fault. I think it’s the database, but we already tried switching from MySQL to PostgreSQL and it made no difference.”

After looking at a few profile reports, Henry agreed that something was off with the database. “Who wrote the database layer?”

“Bob did,” Colin replied. “He’s our manager. None of us know anything about SQL so he did all the database work. But he’s on vacation for the next couple weeks.”

“Well, let’s take a look at the customer’s database, see if anything stands out.” After some back-and-forth with the customer, they were able to get a copy of the database in question. It came in at over thirty gigabytes.

Henry cringed when he opened the database up for inspection. The first thing he found was that primary keys and foreign keys did not always use the same datatype. Often, an ID would be a VARCHAR(36) while any foreign keys referencing it would be some other datatype such as NUMERIC, VARBINARY, or even BLOB, and queries were full of casts and other shims to make JOINing possible. For example,

    SELECT * FROM Product
    INNER JOIN Inventory ON CAST(Product.ID as CHAR(36)) = CAST(Inventory.ProductID as CHAR(36))
    WHERE Inventory.Quantity < 100;

Henry explained the concept of foreign keys and JOINs to Colin, then went on to describe how this was a performance problem. “All these casts make MySQL ignore the keys’ indexes and do a much-slower linear search instead. It will help a lot if primary keys and foreign keys use the same datatype.”

Colin nodded as he took notes to pass on to Bob.

Then he noticed that the vast majority of the database’s contents were stored in a table called Audit_Table which was filled with VARCHAR(255) fields. The table was populated by a trigger and it appeared that every single possible user action was logged here. Even page refreshes and button clicks generated entries! The table had over 15 million rows in it, despite the application only being in production for a few days so far. And every single column was indexed.

No wonder it was so slow! Any time a user did anything, an audit record was generated. And since every column was indexed, the application server spent most of its time trying to keep all the indexes updated on a massive 30 GB table.

And then Henry noticed one of the columns was this:

    IsDeletable VARCHAR(255) DEFAULT 'Y'

A not-so-quick “SELECT UNIQUE(IsDeletable) FROM Audit_Table” revealed that every single row had a value of “Y” for this column. Indexing it was a pointless waste of CPU time.

Henry and Colin spent the week trying to fix the database design without breaking things: correcting mismatched datatypes, removing a massive number of unnecessary indexes, and a multitude of other problems that, all added together, easily explained their performance problems. Henry explained everything he did to Colin as he worked, and they committed their fixes to source control late that Friday.

Come Monday he got a panicked email from Colin. It contained a forward originally from Bob who had sent this while on vacation.

To: colin@initech.com
From: bob.the.phb@initech.com
Subject: unauthorized database changes
i noticed you removed indexes. i rolled it back. those are very important for performance, every column must be indexed!!

i locked the schema in source control so no one but me can edit it. you obviously don’t know what you’re doing. i suppose that is my fault for not training your team in database.

we will talk more about this infraction when i return from vacation.

Bob The PHB
Senior Development Manager, Development Teams C - H
bob.the.phb@initech.com
(985) 555–2500
———————————————————
“If I really want to improve my situation, I can work on the one thing over which I have control - myself.” - Stephen Covey
———————————————————
Sent from my BlackBerry

Bob refused to budge on the database design, claiming that indexes always improved performance and the customer was forced to live with it. Several months later, Henry was pulled in again for a different reason. The customer’s server kept running out of disk space, due to the sheer size of the Audit_Table table. It was now measured in terabytes. After some investigation Henry realized that nothing actually used the table and nobody was reporting on this tracking data, so he showed Colin and the customer how to TRUNCATE it to free up disk space. As a happy side effect, the customer realized that the application performed much better after the audit table was truncated and agreed to make a nightly scheduled task out of it, finally defeating the performance-killing audit table indexes.

[Advertisement] Universal Package Manager – store all your Maven, NuGet, Chocolatey, npm, Bower, TFS, TeamCity, Jenkins packages in one central location. Learn more today!