How about those NoSQL databases, huh? There’s nothing more trendy than a NoSQL database, and while they lack many of the features that make a traditional RDBMS desirable (like, um… guaranteeing writes?) , they compensate by being more scalable and easier to integrate into an application.
Chuck D’s company made a big deal out of migrating their data to a more “modern”, “JSON-based” solution. Chuck wasn’t involved in that project, but after the result went live, he got roped in to diagnose a problem: the migration of data from the old to the new database created duplicate records. Many duplicates. So he took a look at the migration script, and found piles of code that looked like this:
UPDATE DataItemSet SET Content = '{"id":116,"type":"Plan for Today", "title":"Initech Retirement Fund", "learnItemLink":"","content":"Re-balance fund portfolio."}' WHERE Name = 'OPERATION' and Content like '{"id":116,%'
If reading that line doesn’t cause you to break out into hives, take a closer look at the schema.
Id | Content | Unread | Type | Name |
---|---|---|---|---|
79 | {“id”:9,“title”:“Initech Facilities Revision”,“type”:“CR05”,“img”:“images/initechfac”,content:“{"id:"55, "title":"Billing Code"… ”} | 0 | Global_Customer | CUSTOMERRESOURCE |
102 | {“id”:94,“title”:“Initech Facilities Construction”,“type”:“CR05”,“img”:“images/initechfac”,content:“{"id:"55, "title":"Billing Code"… ”} | 0 | Global_Customer | CUSTOMERRESOURCE |
The Content
column holds a string of text, and that string of text is expected to be JSON, and they often need to filter by the content of the column, which means they have lots of queries with WHERE
clauses like: WHERE Content LIKE '%"title":"Initech"%'
, which rank as one of the slowest possible queries you can run in SQL. The ID field in the Content
column has no relationship to the autonumbered ID field that actually is on the database table. The name
column sometimes contains IDs, many of the fields in the JSON content (like the img
and learnItemLink
fields) are actually foreign key references to other tables in the database. There’s a type
column on the record, which seems to control scope, but shouldn’t be confused with the type
field on the JSON document, which may or may not mean anything.
After many weeks of combing through the migration scripts, it turned out not to be a problem with the scripts at all. One of their customers started a process of simplifying their billing codes, which meant billing codes were constantly changing. Instead of only changing the codes that were… well, actually being changed, the web app which provided that interface created new records for every billing code.