Relational databases have very different constraints on how they structure and store data than most programming languages- tables and relationships don't map neatly to objects. They also have very different ways in which they can be altered. With software, you can just release a new version, but a database requires careful alterations lest you corrupt your data.

There are many ways to try and address that mismatch in our software, but sometimes the mismatch isn't in our software, it's in our brains.

Peter was going through an older database, finally migrating its schema definition into scripts that can be source-controlled. This particular database had never received any care from the database team, and instead all of the data modeling was done by developers. Developers who might not have been quite ready to design a database system from scratch.

The result was this:

mysql> DESC `preferences`; +--------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | user_id | int(11) | YES | MUL | NULL | | | time_format | varchar(255) | YES | | 24 | | | date_format | varchar(255) | YES | | eu | | | boolean_val1 | tinyint(1) | YES | MUL | 0 | | | boolean_val2 | tinyint(1) | YES | MUL | 0 | | | boolean_val3 | tinyint(1) | YES | MUL | 0 | | | boolean_val4 | tinyint(1) | YES | MUL | 0 | | | boolean_val5 | tinyint(1) | YES | MUL | 0 | | | boolean_val6 | tinyint(1) | YES | MUL | 0 | | | boolean_val7 | tinyint(1) | YES | MUL | 0 | | | boolean_val8 | tinyint(1) | YES | MUL | 0 | | | boolean_val9 | tinyint(1) | YES | MUL | 0 | | +--------------+--------------+------+-----+---------+----------------+

So, let's start with the boolean_val* fields. As you can see, they're helpfully named in a way that gives you absolutely no idea what they might be used for. Obviously, as this is the preferences table, they should be storing some sort of preference. Well, at least one of them is- boolean_val2 has a mix of ones and zeroes. All the other boolean_val* fields just store zeroes. Is that because they're unused? Or because all the users have left the corresponding preference at its default value? Nobody knows!

With that in mind, let's turn to time_format and date_format. If one were going "by the book" on normal forms, these should probably be foreign keys to a table which lists the options, but that means extra joins and boy, that just might be overkill if you only have a handful of options.

So it's not wrong that they store these as strings in the field. But it's worth noting that time_format has only two allowed values- 12 and 24. And date_format also has only two allowed values - eu and us. So again, not wrong, but with a solid understanding that there are only two possible values for each field, and that big pile of boolean_val* fields which may or may not be used, it's at least ironic that using booleans never occurred to them.

As a bonus, while time_format only has two possible values, the schema permits it to be null. That means there are definitely nulls:

mysql> SELECT DISTINCT(`time_format`) FROM `preferences`; +-------------+ | time_format | +-------------+ | 24 | | 12 | | NULL | +-------------+

There are definitely cases where it is null. Peter hadn't yet confirmed, but it's likely the front-end wasn't expecting nulls, and this accounts for a number of reported bugs in the UI.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!