- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
var nullpointer = from null in nullptr
from none in wtf
where null.Id=none.Id
select null;
Admin
TRWTF is, why not use
PIVOT
? It's pretty much made for this purpose.Admin
Well, the result of this query can easily be imported into an Excel sheet.
Admin
SELECT a.Name ,b.RegisterId ,pvt.Apples ,pvt.Oranges ,pvt.Zucchini ,pvt.Other ,pvt.Apples + pvt.Oranges + pvt.Zucchini + pvt.Other as GrandTotal FROM Cashier a INNER JOIN Sale b ON a.RegisterId = b.RegisterId AND CONVERT(DATE, b.SaleDate) BETWEEN @startDate AND @endDate LEFT JOIN ( SELECT ProduceName ,ProduceId ,1 AS Cnt FROM ProduceType ) AS src PIVOT(COUNT(Cnt) FOR ProduceName IN (Apples, Oranges, Zucchini, Other)) AS pvt ON pvt.ProduceId = b.ProduceId
Admin
As someone who used to be able to pass as a mathematician, I am disappointed in SQL every time I realize there is a solution with multiple self joins that I can't use for performance reasons.
Admin
Or just do a regular group by, and handle pivoting to horizontal in your display layer, as God intended.
Admin
Because the "SQL Expert" hasn't learned anything in twenty years.
My first thought is... what kind of front end can run a query that returns a variable number of columns, with dynamically generated names? Pretty much only Excel, or horrible hand-written result handling. Using a standard group by and having the front end render a sensible UI is actually much easier than trying to have the database figure out the table layout.
Admin
Am I missing something? I don't see a self-join anywhere in that query. I see Sale joined to ProduceType way too many times, but I don't see a table joined to itself, i.e. the same table on both sides of a comparison in the ON condition.
Admin
Another way to rephrase Wesley's question: "How does this even occur to someone?"
On the plus side, the names of produce never grow, and are the same world-wide, so this is good to go for eternity.
Admin
I bet he used Excel to create all those rows. Paste the products in column A, write a skeleton query in column B, and concat them in column C. Copy, paste to repeat on all rows, and bang you get massive SQL written for you with no risk of typos. Recent versions of Excel will even suggest to do the copy for you.
Once you master Excel-generated SQL, there's no need to know the language well, brute force is there for you.
Admin
Seems like the prior DB guy was more of a vegetable than a produce(r).
Admin
No kidding! I use Excel when I must ... which is usually never. This is crazy.
Admin
Aggregate tables please.
Admin
The query returns a fixed number of columns. Unless you're thinking they update the query and not the front-end when they add new produce.
Admin
@P The DBMS isn't specified, but if they use MySQL it doesn't have PIVOT.
Does the query even give the correct counts? It's doing a cross-product between all the products, so the counts are multiplied. The idiom
SELECT SUM(ProduceType.ProduceName = 'Apples') AS Apples, SUM(ProduceType = 'Oranges') AS Oranges, ...
doesn't suffer from this bug.
Admin
That WHERE clause is non-sargable and will cause row by row evaluation. Don't compare a function on a column to fixed parameter values. Compare the column to your parameter values that are appropriate in type and granularity to your column data type.
Improving this query is like shooting fish in a barrel
Admin
Good call Remy. Good call.
<!-- Easy Reader Version: Don't worry, the comments will show you a million ways to write this query. -->Admin
Good call Remy, good call. Although the assertion is probably low hanging fruit:
Easy Reader Version: Don't worry, the comments will show you a million ways to write this query.
Admin
My first thought on looking at this: They're COUNTing, not SUMing the associated field that records the number in each sale?
I mean, OK, you're basically not going to be able to get the number of apples sold, ever, because they're by pound. But you can get the pounds of apples sold at least. Same for any other produce sold by weight.
But just counting, you only get the number of times each type of produce was sold.
Or is this the sort of system where they have to scan each and every item individually, even for the guy who bought 50 pounds of cherries? Do you scan the cherries per bag, per pound, or each cherry?
Admin
I assume this code has been transcribed wrongly here. There's no alias for x y or z.. which makes me think that this line isn't real:
LEFT OUTER JOIN ProduceType ae ON b.ProduceId = z.ProduceId AND z.ProduceName IS NOT NULL
.. which is the only actual bug in the SQL. The rest of it is pretty much what a PIVOT would be doing behind the scenes anyway. It's ugly, but not broken, and ought to perform pretty well - assuming there are less than a few thousand product types, that outer table should sit happily in a single 8k page which will be 1 logical read and no physical reads.
It's debatable whether a CTE or a PIVOT would be prettier or more readable than this code, but I've certainly seen a lot worse than this.
Admin
... just realised, that other line isn't a bug either. It's joining on ProduceId, so again, it won't duplicate anything. - so apart from being a little ugly, it's really not that bad.
Admin
Yep. Assuming you mean the inner join on the date, that is the worst thing about this query. In fact it's the only thing which will make it perform badly.
Admin
It's counting the non-null joins - which is the same as summing 1s for where the join matched and 0 where it didn't.. It's returning the number of sales for each produceType - we don't know if they store the number of items in each sale anywhere.
Admin
Nope. It's OUTER JOINing on the primary key, so it's either a single hit or a miss. The raw data before grouping will either have xx.ProductId null, or with the appropriate id, so a count will count the number of sales records for that ProductId for each cashier name. The real surprising thing to me is that a Sale record has a ProduceId column. This database schema has no way of ringing up more than one type of item at the same time.
Admin
So the requirements also included that the result must be one row? That's the only way you would need pivot / anything other than group by, right?
Admin
Ehh... I’ve seen worse.
(Takes another swig of whiskey)
Admin
If we are talking SQL SERVER, PIVOT can perform like shit under certain cases. Sometimes you will be better off doing an old-school style "pivot" for performance purposes. But do agree with you here that this is a better way to skin the cat.
Admin
Also many grid widgets.
Admin
Sure, but any decent widget set will include a crosstab widget, which will do exactly what was requested with just a basic SELECT.
Admin
The WTF: Not knowing how to use CASEs, Subqueries to reduce the self-joins used to count occurrences of a value The Real WTF: Not knowing to use Pivots where possible
Admin
Surely dumping the filtered sales table and lookup tables into excel and running an Excel pivot would perform better than this POS, no matter what size box you run your DB on?
Saw a lot of this sort of thing on a recent gig, the DB egg-spurt there I think judged his brillant genus on how much horsepower had to be thrown at his queries rather than how little. He was hogging one of the largest physical clusters in Europe with things that (written properly) could have been done on a meaty laptop.
Admin
The only performance issue with this query is the non-sargable bit in the date join. The rest of it will perform just fine - nested loops to a PK in a small lookup table. Swapping it do sum(case when...) statements would hardly make any performance difference at all.
Admin
Hmm, yeah, good point, I just so 30 or saw left joins and got a flashback! TRWTF with that date is why SaleDate in the sales table is not a sensible type, but then I see that so often, along with weirdly formatted "unique" identifiers that need CONVERTing to join I almost don't notice. Organisations are so precious about changing the sales table, or even adding a parallel table to fix this, but don't seem concerned about accepting any old crap into it that comes from the terminals/resellers.
Admin
Massive self join? I think it's a massive self-own.
Admin
That's a real lemon
Admin
My favorite part is the "AND z.ProduceName IS NOT NULL" in the last LEFT OUTER JOIN. Why?
If the matching produce there IS NULL, then it'll emit a NULLy row anyway because it's a LEFT OUTER JOIN... this conditional basically does nothing, EXCEPT the database is really broken and the id field is not unique.
Admin
If you use a grouo by, you can't have a total for the register. Or, well, you can - but you have to use a sub query top don't everything foe the register.
Tks query gets the individuals and the gran total at the same time. The fact that there are only three products defined aside, i can honestly see them weighing the pros vs cons. Maybe considering EXPLAIN output, and deciding this is the best method (as opposed to a subquery for the grand total).