The Daily WTF: Curious Perversions in Information Technology

2020-02-26 Reply Admin

var nullpointer = from null in nullptr
from none in wtf
where null.Id=none.Id
select null;

2020-02-26 Reply Admin

TRWTF is, why not use PIVOT? It's pretty much made for this purpose.

2020-02-26 Reply Admin

Well, the result of this query can easily be imported into an Excel sheet.

2020-02-26 Reply Admin

SELECT a.Name ,b.RegisterId ,pvt.Apples ,pvt.Oranges ,pvt.Zucchini ,pvt.Other ,pvt.Apples + pvt.Oranges + pvt.Zucchini + pvt.Other as GrandTotal FROM Cashier a INNER JOIN Sale b ON a.RegisterId = b.RegisterId AND CONVERT(DATE, b.SaleDate) BETWEEN @startDate AND @endDate LEFT JOIN ( SELECT ProduceName ,ProduceId ,1 AS Cnt FROM ProduceType ) AS src PIVOT(COUNT(Cnt) FOR ProduceName IN (Apples, Oranges, Zucchini, Other)) AS pvt ON pvt.ProduceId = b.ProduceId

2020-02-26 Reply Admin

As someone who used to be able to pass as a mathematician, I am disappointed in SQL every time I realize there is a solution with multiple self joins that I can't use for performance reasons.

2020-02-26 Reply Admin

Or just do a regular group by, and handle pivoting to horizontal in your display layer, as God intended.

Jaime · 2020-02-26 Reply Admin

TRWTF is, why not use PIVOT?

Because the "SQL Expert" hasn't learned anything in twenty years.

My first thought is... what kind of front end can run a query that returns a variable number of columns, with dynamically generated names? Pretty much only Excel, or horrible hand-written result handling. Using a standard group by and having the front end render a sensible UI is actually much easier than trying to have the database figure out the table layout.

Dragnslcr · 2020-02-26 Reply Admin

Am I missing something? I don't see a self-join anywhere in that query. I see Sale joined to ProduceType way too many times, but I don't see a table joined to itself, i.e. the same table on both sides of a comparison in the ON condition.

2020-02-26 Reply Admin

Another way to rephrase Wesley's question: "How does this even occur to someone?"

On the plus side, the names of produce never grow, and are the same world-wide, so this is good to go for eternity.

2020-02-26 Reply Admin

I bet he used Excel to create all those rows. Paste the products in column A, write a skeleton query in column B, and concat them in column C. Copy, paste to repeat on all rows, and bang you get massive SQL written for you with no risk of typos. Recent versions of Excel will even suggest to do the copy for you.

Once you master Excel-generated SQL, there's no need to know the language well, brute force is there for you.

2020-02-26 Reply Admin

Seems like the prior DB guy was more of a vegetable than a produce(r).

2020-02-26 Reply Admin

No kidding! I use Excel when I must ... which is usually never. This is crazy.

2020-02-26 Reply Admin

Aggregate tables please.

2020-02-26 Reply Admin

The query returns a fixed number of columns. Unless you're thinking they update the query and not the front-end when they add new produce.

Barry Margolin 0 · 2020-02-26 Reply Admin

@P The DBMS isn't specified, but if they use MySQL it doesn't have PIVOT.

Does the query even give the correct counts? It's doing a cross-product between all the products, so the counts are multiplied. The idiom

SELECT SUM(ProduceType.ProduceName = 'Apples') AS Apples, SUM(ProduceType = 'Oranges') AS Oranges, ...

doesn't suffer from this bug.

2020-02-26 Reply Admin

That WHERE clause is non-sargable and will cause row by row evaluation. Don't compare a function on a column to fixed parameter values. Compare the column to your parameter values that are appropriate in type and granularity to your column data type.

Improving this query is like shooting fish in a barrel

2020-02-26 Reply Admin

Good call Remy. Good call.

2020-02-26 Reply Admin

Good call Remy, good call. Although the assertion is probably low hanging fruit:

Easy Reader Version: Don't worry, the comments will show you a million ways to write this query.

2020-02-26 Reply Admin

My first thought on looking at this: They're COUNTing, not SUMing the associated field that records the number in each sale?

I mean, OK, you're basically not going to be able to get the number of apples sold, ever, because they're by pound. But you can get the pounds of apples sold at least. Same for any other produce sold by weight.

But just counting, you only get the number of times each type of produce was sold.

Or is this the sort of system where they have to scan each and every item individually, even for the guy who bought 50 pounds of cherries? Do you scan the cherries per bag, per pound, or each cherry?

2020-02-26 Reply Admin

I assume this code has been transcribed wrongly here. There's no alias for x y or z.. which makes me think that this line isn't real:

LEFT OUTER JOIN ProduceType ae ON b.ProduceId = z.ProduceId AND z.ProduceName IS NOT NULL

.. which is the only actual bug in the SQL. The rest of it is pretty much what a PIVOT would be doing behind the scenes anyway. It's ugly, but not broken, and ought to perform pretty well - assuming there are less than a few thousand product types, that outer table should sit happily in a single 8k page which will be 1 logical read and no physical reads.

It's debatable whether a CTE or a PIVOT would be prettier or more readable than this code, but I've certainly seen a lot worse than this.

2020-02-26 Reply Admin

... just realised, that other line isn't a bug either. It's joining on ProduceId, so again, it won't duplicate anything. - so apart from being a little ugly, it's really not that bad.

2020-02-26 Reply Admin

Yep. Assuming you mean the inner join on the date, that is the worst thing about this query. In fact it's the only thing which will make it perform badly.

2020-02-26 Reply Admin

It's counting the non-null joins - which is the same as summing 1s for where the join matched and 0 where it didn't.. It's returning the number of sales for each produceType - we don't know if they store the number of items in each sale anywhere.

Jaime · 2020-02-26 Reply Admin

Does the query even give the correct counts? It's doing a cross-product between all the products, so the counts are multiplied.

Nope. It's OUTER JOINing on the primary key, so it's either a single hit or a miss. The raw data before grouping will either have xx.ProductId null, or with the appropriate id, so a count will count the number of sales records for that ProductId for each cashier name. The real surprising thing to me is that a Sale record has a ProduceId column. This database schema has no way of ringing up more than one type of item at the same time.

bradwood · 2020-02-26 Reply Admin

So the requirements also included that the result must be one row? That's the only way you would need pivot / anything other than group by, right?

2020-02-26 Reply Admin

Ehh... I’ve seen worse.
(Takes another swig of whiskey)

2020-02-26 Reply Admin

If we are talking SQL SERVER, PIVOT can perform like shit under certain cases. Sometimes you will be better off doing an old-school style "pivot" for performance purposes. But do agree with you here that this is a better way to skin the cat.

2020-02-27 Reply Admin

Also many grid widgets.

Jaime · 2020-02-27 Reply Admin

Sure, but any decent widget set will include a crosstab widget, which will do exactly what was requested with just a basic SELECT.

2020-02-27 Reply Admin

The WTF: Not knowing how to use CASEs, Subqueries to reduce the self-joins used to count occurrences of a value The Real WTF: Not knowing to use Pivots where possible

2020-02-27 Reply Admin

Surely dumping the filtered sales table and lookup tables into excel and running an Excel pivot would perform better than this POS, no matter what size box you run your DB on?

Saw a lot of this sort of thing on a recent gig, the DB egg-spurt there I think judged his brillant genus on how much horsepower had to be thrown at his queries rather than how little. He was hogging one of the largest physical clusters in Europe with things that (written properly) could have been done on a meaty laptop.

2020-02-28 Reply Admin

The only performance issue with this query is the non-sargable bit in the date join. The rest of it will perform just fine - nested loops to a PK in a small lookup table. Swapping it do sum(case when...) statements would hardly make any performance difference at all.

2020-02-29 Reply Admin

Hmm, yeah, good point, I just so 30 or saw left joins and got a flashback! TRWTF with that date is why SaleDate in the sales table is not a sensible type, but then I see that so often, along with weirdly formatted "unique" identifiers that need CONVERTing to join I almost don't notice. Organisations are so precious about changing the sales table, or even adding a parallel table to fix this, but don't seem concerned about accepting any old crap into it that comes from the terminals/resellers.

2020-02-29 Reply Admin

Massive self join? I think it's a massive self-own.

2020-02-29 Reply Admin

That's a real lemon

2020-03-03 Reply Admin

My favorite part is the "AND z.ProduceName IS NOT NULL" in the last LEFT OUTER JOIN. Why?

If the matching produce there IS NULL, then it'll emit a NULLy row anyway because it's a LEFT OUTER JOIN... this conditional basically does nothing, EXCEPT the database is really broken and the id field is not unique.

2020-03-05 Reply Admin

If you use a grouo by, you can't have a total for the register. Or, well, you can - but you have to use a sub query top don't everything foe the register.

Tks query gets the individuals and the gran total at the same time. The fact that there are only three products defined aside, i can honestly see them weighing the pros vs cons. Maybe considering EXPLAIN output, and deciding this is the best method (as opposed to a subquery for the grand total).

Producing Self Joins

Leave a comment on “Producing Self Joins”