If you run a business of any size, you need some sort of resource-management/planning software. Really small businesses use Excel. Medium businesses use Excel. Enterprises use Excel. But in addition to that, the large businesses also pay through the nose for a gigantic ERP system, like Oracle or SAP, that they can wire up to Excel.
Small and medium businesses can’t afford an ERP, but they might want to purchase a management package in the niche realm of “SMB software”- small and medium business software. Much like their larger cousins, these SMB tools have… a different idea of code quality.
Cassandra’s company had deployed such a product, and with it came a slew of tickets. The performance was bad. There were bugs everywhere. While the company provided support, Cassandra’s IT team was expected to also do some diagnosing.
While digging around in one nasty performance problem, Cassandra found that one button in the application would generate and execute this block of SQL code using a SQLCommand
object in C#.
DECLARE @tmp TABLE (Id uniqueidentifier)
--{ Dynamic single insert statements, may be in the hundreds. }
IF NOT EXISTS (SELECT TOP 1 1 FROM SomeTable AS st INNER JOIN @tmp t ON t.Id = st.PK)
BEGIN
INSERT INTO SomeTable (PK, SomeDate) SELECT Id, getdate() as SomeDate FROM @tmp
END
ELSE
BEGIN
UPDATE st
SET SomeDate = getdate()
FROM @tmp t
LEFT JOIN SomeTable AS st ON t.Id = st.PK AND SomeDate = NULL
END
At its core, the purpose of this is to take a temp-table full of rows and perform an “upsert” for all of them: insert if a record with that key doesn’t exist, update if a record with that key does. Now, this code is clearly SQL Server code, so a MERGE handles that.
But okay, maybe they’re trying to be as database agnostic as possible, and don’t want to use something that, while widely supported, has some dialect differences across databases. Fine, but there’s another problem here.
Whoever built this understood that in SQL Server land, cursors are frowned upon, so they didn’t want to iterate across every row. But here’s their problem: some of the records may exist, some of them may not, so they need to check that.
As you saw, this was their approach:
IF NOT EXISTS (SELECT TOP 1 1 FROM SomeTable AS st INNER JOIN @tmp t ON t.Id = st.PK)
This is wrong. This will be true only if none of the rows in the dynamically generated INSERT
statements exist in the base table. If some of the rows exist and some don’t, you aren’t going to get the results you were expecting, because this code only goes down one branch: it either inserts or updates.
There are other things wrong with this code. For example, SomeDate = NULL
is going to have different behavior based on whether the ANSI_NULLS
database flag is OFF
(in which case it works), or ON
(in which case it doesn’t). There’s a whole lot of caveats about whether you set it at the database level, on the connection string, during your session, but in Cassandra’s example, ANSI_NULLS
was ON
at the time this ran, so that also didn’t work.
There are other weird choices and performance problems with this code, but the important thing is that this code doesn’t work. This is in a shipped product, installed by over 4,000 businesses (the vendor is quite happy to cite that number in their marketing materials). And it ships with code that can’t work.