The Daily WTF: Curious Perversions in Information Technology

2024-10-24 Reply Admin

It's a shame that so many developers fail to take into account how their code will work at scale. Sure, it works fine when it's working with a hundred records - but what happens to memory, network bandwidth and CPU usage when it's dealing a hundred thousand records. It's all so easy to pass that off as Someone Else's Problem, but just taking that little bit of time to think of the future will actually save the developer - and all the sysadmins - time and effort.

An hour's work today will avoid a week's work in the future.

Tangentially, I had a self-described 'Data Analyst' who complained that one of his processes kept failing and therefore the IT systems were terrible. Turned out he was essentially doing a 'Select * from *' on a database and putting the results into Excel so he could run a filter on it.

2024-10-24 Reply Admin

Disagree... Consider an application which uses ZipCodes [5 digit] You need information about your customers [their address has a zip code] and some demographics [stored in a zip code table....]... MUCH better (subject to memory) to read the 41,642 rows of the zip code once, rather than doing a join on 10K customers and having the same static information returned on each row.

Steve_The_Cynic · 2024-10-24 Reply Admin

As a general rule, if you're using an RDBMS and can solve your problem using SQL, you should solve your problem using SQL.

And this rule explains all those posts we've had about people trying to do date calculations or basic mathematics using SQL queries rather than writing application-level code for that.

I'd also suggest that the problem was not that the dev used a query, but that the dev used a query.

2024-10-24 Reply Admin

And on the opposite side of the spectrum there's the old adagio "Premature optimization Is the root of all evil"... I'm guessing this started out with only a few tables where the posted approach still worked decently. Nevertheless someone should have changed this at some point.

2024-10-24 Reply Admin

Seeing that everything already is in separate tables, wouldn't the wiser decision have been to not use an Entity table, or if you really, really wanted to (because, say, some Architect had Decided that there would be an Entity Event Bus) use a view that combined the tables? Anyway, a horrible thing.

2024-10-24 Reply Admin

Even on the database. If you can do it in Sql instead of a procedure, do it in Sql. The description / name should have been a field in the entity table and the whole process would have been a simple select on one table.

2024-10-24 Reply Admin

Would it help to include EntityType in the conditions for the joins: ON E.EntityType='Application' AND E.EntityID=A.EntityID etc.? This would allow the database to optimise the query execution.

2024-10-24 Reply Admin

I am more concerned about the fact that you can have an Entity and not know if you are working with an Automation, a Group, an Organization, a Person, a Resource or a Batch.

Are they just bundling fields with the same name into a common table simply because they are the same type of data?

The entire Entity table seems wrong to me. As a minimum it is terribly named, as Entity tells me nothing about the table.

2024-10-24 Reply Admin

"An hour's work today will avoid a week's work in the future"

But managers do not think like that. An hour's work now is an extra $XX I have to pay now, so no.

2024-10-24 Reply Admin

The strange thing is that all these types of entities have a name and yet the name is not in the common table. So it's a two-layer WTF, the schema is WTFy and then WTFy code to process it.

2024-10-24 Reply Admin

This is a case where something that is good in one context is bad in another. Assuming this is SQL Server, you might see that you can write a function, and think to yourself that a function that encapsulates some logic would be a great thing. Unfortunately, a scalar function (as this is written) makes a query ineligible for parallelism. So if you use this function in your query and it has to process a bunch of records, it will have to process them one by one using a single thread. The fix as described might speed things up some, but I'd bet that the query still doesn't go parallel with large data. This particular function could be rewritten as an inline table valued function, which would allow the server to use parallelism, and then your performance hit would be much smaller. But this is one of those cases where SQL Server shows you functionality that looks nice and shiny, but ends up punching you in the face, and it's not really clear why.

2024-10-24 Reply Admin

Since they're dealing with a SQL system that has a case statement, it likely has sub-select capabilities too. Instead of reading the different name columns across the join, they could have sub selected from just the table they know they're operating against in each case statement, so the main clause only needs the Entities table. A bit more PL/SQL approach than the pure SQL statement they used, but still more SQL focused than a pure PL/SQL approach where you just end up with if statements to decide which secondary query to run.

But it's a horrible schema design. If Entities doesn't contain a common name field, WTF does it contain that led to it being normalized into its own table?

2024-10-24 Reply Admin

Or, y'know, you could use a union

TheGreatLobachevsky · 2024-10-24 Reply Admin

And there's more profit to be made if you charge a week's work versus an hour.

2024-10-24 Reply Admin

"The strange thing is that all these types of entities have a name and yet the name is not in the common table." That depends on what you call an "Entity" and how atomic it is. For example, (a real-world example) an insurance company (Acme Corp.) can have a division that handles health insurance (GetWellSoon, inc.) and another branch (Oops, Inc.) that handles auto. It's the same corporate structure (the same home officer, the same financials, etc) But the divisions are regulated differently, so the information you're interested in will vary depending on context. And gathering all the divisions of Acme Corp becomes a simple JOIN.

2024-10-24 Reply Admin

This code should have been done on the server, so the proper query would be ran against the DB based on the entity type.

2024-10-24 Reply Admin

Very much this.
Functions operate in a loop, and calling more than one record will compound the issue.

2024-10-24 Reply Admin

Entities do not contain a name--note that some of the cases the name is a derived concept. That being said, this is probably a case that warrants denormalization.

2024-10-24 Reply Admin

If that's the case though, it's just badly written SQL, it should have been done with correlated subqueries... ie

select @Name = case E.EntityType when 'Application' then (SELECT A.ApplicationName FROM AP_Applications A WHERE E.EntityID = A.EntityID) when etc... end from Entities E where E.EntityID = @EntityID

Would then only look up the other tables when required

2024-10-25 Reply Admin

True WTF is why having an entity table if you're not using it for storing something basic as a name? Yes, it might create some redundancy, and isn't the prettiest thing, but it would perform better and give more feeling with what entity you're dealing with.

Also, you could probably improve performance by using unions and inner joins, or possibly first getting the entity type and then getting the entity from the proper table. This could all be done in the database itself

MaxiTB · 2024-10-25 Reply Admin

Ah, business logic written in untestable SQL. Classic. Not really a surprise that the quality of the script is also somewhere between garbage and full garbage.

Join Our Naming

Leave a comment on “Join Our Naming”