"I don't know, Gerald, this doesn't look right."

"What do you mean?"

"I can't find MedHealth Hospital... NPI number 1123581321, referenced on incident PN-03-49302..." Janice tapped her long fingernails on the table making a slight clicking sound. "When did we last update the NPI data?"

Gerald sighed. "No idea, I'll have to get back to you." Great, he thought, I'm going to have to ask Mort.

Mort was the old-timer on the team; a veteran of the flat-file days. He was now in a DBA role, and was the designer of the NPI Update DTS Package, as well as its protector – he closely and jealously guarded the package's secrets. NPI is short for "National Provider Identifier," a federally-assigned ID given to health care providers. The package that Mort maintained would run daily to download a flat file of health care provider information and import it into the database. Mort was so meticulous about it that he didn't even want people looking at the underlying table let alone the DTS package; they should only use the view he'd created. And the developers were happy to do this – it was one less system for them to worry about, plus it was a critical system with hooks into accounting data and joins in lots of forms and reports – no one wanted to have to worry about breaking it.

Still, Gerald figured there was no harm in taking a peek under the hood, despite his conscience's warnings not to. With how protective Mort was with the DTS package, it had to be bad. It probably opened a million connections, left transactions uncommitted, used concatenation for queries rather than parameterization, received data via 14.4 modem, made incantations to Cthulhu, something.

Except not really. It was designed pretty much as Gerald would have built it himself. It followed as simple a process as it could: 1) Get data file via FTP download, and if successfully downloaded, 2) Import the data into a table.

Gently knocking on Mort's cubicle wall, Gerald asked "so, uh, hey Mort." Mort turned his head about two degrees to the left. "Can you check the provider synch from last night?"

Without hesitation, a dismissive "Ran fine." Gerald insisted that he check, which was met with a grumble, a few clicks, and finally a "See? Ran fine."

Gerald called Janice back to tell her that it had run the previous night, and then cracked open the view. Paging through the results, however, Janice was right – at least one of the NPIs was missing. Perhaps a bug with the view? Gerald pulled up the view's definition.

SELECT rtrim(substring(DataType, 1, 25)) AS ProviderName,
       rtrim(substring(DataType, 26, 15)) AS ProviderAddress,
       rtrim(substring(DataType, 41, 10)) AS ProviderNPI,
       /* *SNIP* more columns */
  FROM dbo.ProviderTable
 WHERE RecordType = '02'

The underlying table had just two columns; DataType (VARCHAR(78)) and RecordType (VARCHAR(2)). And DataType was being parsed into several fields with every row.

Despite the need for a full table scan with every query, however, pulling up a single record by NPI takes less than a second. On the other hand, when a DTS package runs over 500,000+ claims where the NPI number has to be validated on four fields per row...

Optimize Sometimes

For an unscientific experiment, I created a table with 1,048,576 NPIs in it (compared to the over 2.5 million currently assigned) and ran a query 1000 times to select based on an indexed NPI column defined as VARCHAR(10), as well as 1000 queries to select based on a substring. For 1000 selects based on the indexed VARCHAR, 30ms. For 1000 selects based on a substring, a whopping 379220ms. That's over ten thousand times as long! This is based on my relatively crappy laptop and a script I put together here (run at your own risk, you're responsible for whatever happens if you run it, blah blah blah).