• NULLPTR (unregistered)

    var nullpointer = from null in nullptr
    from none in wtf
    where null.Id=none.Id
    select null;

  • P (unregistered)

    TRWTF is, why not use PIVOT? It's pretty much made for this purpose.

  • Eyeball (unregistered)

    Well, the result of this query can easily be imported into an Excel sheet.

  • digitrev (unregistered)

    SELECT a.Name ,b.RegisterId ,pvt.Apples ,pvt.Oranges ,pvt.Zucchini ,pvt.Other ,pvt.Apples + pvt.Oranges + pvt.Zucchini + pvt.Other as GrandTotal FROM Cashier a INNER JOIN Sale b ON a.RegisterId = b.RegisterId AND CONVERT(DATE, b.SaleDate) BETWEEN @startDate AND @endDate LEFT JOIN ( SELECT ProduceName ,ProduceId ,1 AS Cnt FROM ProduceType ) AS src PIVOT(COUNT(Cnt) FOR ProduceName IN (Apples, Oranges, Zucchini, Other)) AS pvt ON pvt.ProduceId = b.ProduceId

  • Plant (unregistered)

    As someone who used to be able to pass as a mathematician, I am disappointed in SQL every time I realize there is a solution with multiple self joins that I can't use for performance reasons.

  • Dave Hemming (unregistered)

    Or just do a regular group by, and handle pivoting to horizontal in your display layer, as God intended.

  • (nodebb) in reply to P

    TRWTF is, why not use PIVOT?

    Because the "SQL Expert" hasn't learned anything in twenty years.

    My first thought is... what kind of front end can run a query that returns a variable number of columns, with dynamically generated names? Pretty much only Excel, or horrible hand-written result handling. Using a standard group by and having the front end render a sensible UI is actually much easier than trying to have the database figure out the table layout.

  • (nodebb)

    Am I missing something? I don't see a self-join anywhere in that query. I see Sale joined to ProduceType way too many times, but I don't see a table joined to itself, i.e. the same table on both sides of a comparison in the ON condition.

  • Scott (unregistered)

    Another way to rephrase Wesley's question: "How does this even occur to someone?"

    On the plus side, the names of produce never grow, and are the same world-wide, so this is good to go for eternity.

  • Raj (unregistered)

    I bet he used Excel to create all those rows. Paste the products in column A, write a skeleton query in column B, and concat them in column C. Copy, paste to repeat on all rows, and bang you get massive SQL written for you with no risk of typos. Recent versions of Excel will even suggest to do the copy for you.

    Once you master Excel-generated SQL, there's no need to know the language well, brute force is there for you.

  • WTFguy (unregistered)

    Seems like the prior DB guy was more of a vegetable than a produce(r).

  • Andrew (unregistered) in reply to Raj

    No kidding! I use Excel when I must ... which is usually never. This is crazy.

  • Certainly, some sort of Prime Mover (unregistered) in reply to Dave Hemming

    Aggregate tables please.

  • James (unregistered) in reply to Jaime

    The query returns a fixed number of columns. Unless you're thinking they update the query and not the front-end when they add new produce.

  • (nodebb)

    @P The DBMS isn't specified, but if they use MySQL it doesn't have PIVOT.

    Does the query even give the correct counts? It's doing a cross-product between all the products, so the counts are multiplied. The idiom

    SELECT SUM(ProduceType.ProduceName = 'Apples') AS Apples, SUM(ProduceType = 'Oranges') AS Oranges, ...

    doesn't suffer from this bug.

  • Aspie Architect (unregistered)

    That WHERE clause is non-sargable and will cause row by row evaluation. Don't compare a function on a column to fixed parameter values. Compare the column to your parameter values that are appropriate in type and granularity to your column data type.

    Improving this query is like shooting fish in a barrel

  • SyntaxError (unregistered)

    Good call Remy. Good call.

    <!-- Easy Reader Version: Don't worry, the comments will show you a million ways to write this query. -->
  • SyntaxError (unregistered)

    Good call Remy, good call. Although the assertion is probably low hanging fruit:

    Easy Reader Version: Don't worry, the comments will show you a million ways to write this query.

  • Some Ed (unregistered)

    My first thought on looking at this: They're COUNTing, not SUMing the associated field that records the number in each sale?

    I mean, OK, you're basically not going to be able to get the number of apples sold, ever, because they're by pound. But you can get the pounds of apples sold at least. Same for any other produce sold by weight.

    But just counting, you only get the number of times each type of produce was sold.

    Or is this the sort of system where they have to scan each and every item individually, even for the guy who bought 50 pounds of cherries? Do you scan the cherries per bag, per pound, or each cherry?

  • Rich (unregistered) in reply to Barry Margolin 0

    I assume this code has been transcribed wrongly here. There's no alias for x y or z.. which makes me think that this line isn't real:

    LEFT OUTER JOIN ProduceType ae ON b.ProduceId = z.ProduceId AND z.ProduceName IS NOT NULL

    .. which is the only actual bug in the SQL. The rest of it is pretty much what a PIVOT would be doing behind the scenes anyway. It's ugly, but not broken, and ought to perform pretty well - assuming there are less than a few thousand product types, that outer table should sit happily in a single 8k page which will be 1 logical read and no physical reads.

    It's debatable whether a CTE or a PIVOT would be prettier or more readable than this code, but I've certainly seen a lot worse than this.

  • Rich (unregistered) in reply to Barry Margolin 0

    ... just realised, that other line isn't a bug either. It's joining on ProduceId, so again, it won't duplicate anything. - so apart from being a little ugly, it's really not that bad.

  • Rich (unregistered) in reply to Aspie Architect

    Yep. Assuming you mean the inner join on the date, that is the worst thing about this query. In fact it's the only thing which will make it perform badly.

  • Rich (unregistered) in reply to Some Ed

    It's counting the non-null joins - which is the same as summing 1s for where the join matched and 0 where it didn't.. It's returning the number of sales for each produceType - we don't know if they store the number of items in each sale anywhere.

  • (nodebb) in reply to Barry Margolin 0

    Does the query even give the correct counts? It's doing a cross-product between all the products, so the counts are multiplied.

    Nope. It's OUTER JOINing on the primary key, so it's either a single hit or a miss. The raw data before grouping will either have xx.ProductId null, or with the appropriate id, so a count will count the number of sales records for that ProductId for each cashier name. The real surprising thing to me is that a Sale record has a ProduceId column. This database schema has no way of ringing up more than one type of item at the same time.

  • (nodebb)

    So the requirements also included that the result must be one row? That's the only way you would need pivot / anything other than group by, right?

  • AnotherDBA (unregistered)

    Ehh... I’ve seen worse.
    (Takes another swig of whiskey)

  • Barf4Eva (unregistered) in reply to P

    If we are talking SQL SERVER, PIVOT can perform like shit under certain cases. Sometimes you will be better off doing an old-school style "pivot" for performance purposes. But do agree with you here that this is a better way to skin the cat.

  • ichbinkeinroboter (unregistered) in reply to Jaime

    Also many grid widgets.

  • (nodebb) in reply to ichbinkeinroboter

    Sure, but any decent widget set will include a crosstab widget, which will do exactly what was requested with just a basic SELECT.

  • Robert (unregistered)

    The WTF: Not knowing how to use CASEs, Subqueries to reduce the self-joins used to count occurrences of a value The Real WTF: Not knowing to use Pivots where possible

  • MiserableOldGit (unregistered)

    Surely dumping the filtered sales table and lookup tables into excel and running an Excel pivot would perform better than this POS, no matter what size box you run your DB on?

    Saw a lot of this sort of thing on a recent gig, the DB egg-spurt there I think judged his brillant genus on how much horsepower had to be thrown at his queries rather than how little. He was hogging one of the largest physical clusters in Europe with things that (written properly) could have been done on a meaty laptop.

  • Rich (unregistered) in reply to MiserableOldGit

    The only performance issue with this query is the non-sargable bit in the date join. The rest of it will perform just fine - nested loops to a PK in a small lookup table. Swapping it do sum(case when...) statements would hardly make any performance difference at all.

  • MiserableOldGit (unregistered) in reply to Rich

    Hmm, yeah, good point, I just so 30 or saw left joins and got a flashback! TRWTF with that date is why SaleDate in the sales table is not a sensible type, but then I see that so often, along with weirdly formatted "unique" identifiers that need CONVERTing to join I almost don't notice. Organisations are so precious about changing the sales table, or even adding a parallel table to fix this, but don't seem concerned about accepting any old crap into it that comes from the terminals/resellers.

  • concept14 (unregistered)

    Massive self join? I think it's a massive self-own.

  • Espy (unregistered)

    That's a real lemon

  • Rudolf Polzer (google)

    My favorite part is the "AND z.ProduceName IS NOT NULL" in the last LEFT OUTER JOIN. Why?

    If the matching produce there IS NULL, then it'll emit a NULLy row anyway because it's a LEFT OUTER JOIN... this conditional basically does nothing, EXCEPT the database is really broken and the id field is not unique.

  • Two different sums (unregistered)

    If you use a grouo by, you can't have a total for the register. Or, well, you can - but you have to use a sub query top don't everything foe the register.

    Tks query gets the individuals and the gran total at the same time. The fact that there are only three products defined aside, i can honestly see them weighing the pros vs cons. Maybe considering EXPLAIN output, and deciding this is the best method (as opposed to a subquery for the grand total).

Leave a comment on “Producing Self Joins”

Log In or post as a guest

Replying to comment #513044:

« Return to Article