• (cs) in reply to JohnO
    JohnO:

    Jeff S:

    #5 -- And finally, I guess I give up.  I took an awful lot of time to show you some techniques you obviously were not aware of, ...

    Your arrogance is astounding.  I've used SUM() on a CASE many times to achieve a single pass through the table.

    Jeff S:
    JohnO:

    Jeff S:
    And do not forget the "join" most under-utilized and misused above all:  the UNION [ALL] "join".  A very powerful tool for comparing two tables and doing things that you might think only a FOJ can do.

    Taking that argument to it's logical conclusion, we don't need anything but the six fundamental relational operations, which does not include any join but CROSS JOIN.  Are you advocating that?



    umm ... huh? can you expand on that? you lost me there.

    http://en.wikipedia.org/wiki/Relational_algebra



    John, I am not being arrogant at all.  I didn't show you anything overly complicated.  In fact, my entire point is that I showed you something *simplier* than what you were doing in the first place.   If you felt that a FOJ was the best way to handle that situation, as you claimed, then I think it is safe to say that (hopefully) you didn't know how to write a GROUP BY that way. 

    As for the wiki article, uh ... thanks, I guess!  good stuff.   However, if you could, can you still help me out with what my "logical conclusion" is when you get a chance; I still don't follow I am afraid. 
    Somehow because I think UNION's are useful that means that I don't like any join other than a CROSS JOIN?   I don't get where you got that from.  thanks in advance.
  • (cs) in reply to JohnO
    JohnO:

    Jeff S:
    #2 -- RIGHT OUTER JOINS are even *worse* than FOJ's, and they are valid as well.

    To me you lost all credibility with this statement.  You seem to lack some basic understanding of how a good query optimizer woks.  If two statements are functionally equivalent (which you can prove with relational algebra), then an ideal optimizer will come up with the same plan regardless of which statement you use.  Oviously, you can't achive the "ideal" or perfect optimizer.  However, I would hope that you would agree that any "good" optimizer can easily transform between left and right join.



    Point = completely missed.   For someone who apparently agreed with me about the style and clarity benefits of using JOIN syntax, I would hope you could also apprecite the style and clarity benefits of avoiding RIGHT OUTER JOINS.  I think you are just picking fights now, since your response was to something I wrote where I completely agreed with you -- there's no point in a RIGHT JOIN, just use LEFT ones.
  • (cs) in reply to Jeff S

    At the risk of beating a dead horse, I tried to come up with a compelling practical application of FULL OUTER JOIN.  I'll leave RIGHT OUTER JOIN for another day. :)

    Here's what I came up with...

    -- the task is to implement an IM-style buddy list
    -- buddies can optionally be included in a single group
    -- the buddy list should show all active buddies, whether in a group or not
    -- the buddy list should also show all groups, whether there are active buddies or not

    -- there are many ways to do this
    -- one PERFECTLY GOOD WAY is to use FULL OUTER JOIN as shown below
    -- another good way is to run two queries (not shown)
    -- (
    --    That would work like this
    --    The first query lists the groups in alpha order
    --    The second query lists the buddies in group order
    --    The caller then iterates through both queries in "merge sort" style
    -- )

    -- the FULL OUTER JOIN method is very natural...
    -- create a "buddies" table and a "groups" table
    -- the two following requirements suggest a FULL OUTER JOIN:
    -- 1) show empty groups
    -- 2) show buddies that aren't in a group

    create table buddies (
        buddyid int primary key,
        buddyname varchar(50) not null,
        active bit not null,
        ingroupid int null
    )

    create table groups (
        groupid int primary key,
        groupname varchar(50) not null
    )

    go

    create view activebuddies as
    select * from buddies where active = 1

    go

    insert into groups (groupid, groupname)
    values (1, 'Work')

    insert into groups (groupid, groupname)
    values (2, 'Family')

    insert into groups (groupid, groupname)
    values (3, 'Friends')

    insert into buddies (buddyid, buddyname, active, ingroupid)
    values (1, 'Joe', 1, 1)

    insert into buddies (buddyid, buddyname, active, ingroupid)
    values (2, 'Fred', 1, null)

    insert into buddies (buddyid, buddyname, active, ingroupid)
    values (3, 'Sarah', 1, 1)

    insert into buddies (buddyid, buddyname, active, ingroupid)
    values (4, 'Joyce', 0, 2)

    insert into buddies (buddyid, buddyname, active, ingroupid)
    values (5, 'Emily', 0, 2)

    insert into buddies (buddyid, buddyname, active, ingroupid)
    values (6, 'John', 1, null)

    insert into buddies (buddyid, buddyname, active, ingroupid)
    values (7, 'Jill', 1, 2)

    -- here's the query that, in one fell swoop, returns:
    -- 1) grouped buddies by group
    -- 2) empty groups
    -- 3) ungrouped buddies
    -- I maintain that a single report with all of 1), 2), and 3) is rational

    select
        entryname =
            case
                when buddyid is null then groupname + ' (nobody active)'
                when groupid is null then buddyname
                else groupname + ' > ' + buddyname
            end,
        *
    from
        groups as g full outer join
        activebuddies as b on
            g.groupid = b.ingroupid
    order by
        -- just for fun, let's put top-level things in alphabetical order
        -- so ungrouped buddies mix with groups
        isnull(g.groupname, b.buddyname),
        -- watch out for groups and ungrouped buddies with the same name
        case when g.groupid is null then 1 else 2 end,
        -- watch out for groups with the same name
        g.groupid,
        -- within groups, alphabetize by buddy name
        b.buddyname,
        -- then by buddyid
        b.buddyid
    go

    drop view activebuddies
    drop table buddies
    drop table groups
    go

  • (cs) in reply to Jeff S

    Jeff S:
    #2 -- RIGHT OUTER JOINS are even *worse* than FOJ's, and they are valid as well.

    Lol, how could I have agreed with you when your statement was ROJs are worse than FOJs, yet I am arguing that FOJs have their place and can be used in place of convoluted unions of left joins or sums on cases?  Not sure how I am picking fights.  There is no simplistic equivalent to FOJ like there is for ROJ. 

    The point of my reference to the wiki article on relational algrebra was that both LOJ and FOJ can be expressed using only UNION and IJ.  Yet you arbitrarily accept LJ and reject FOJ saying the FOJ can be expressed as the union of LJs.  Well, we can come up with many equivalent constructs using different combinations of operations to achieve exactly the same thing.  There are only two criteria to judge them by: does the optimizer come up with a good plan for them and how easy is it to construct and maintain the query.

    ROJ is worthless in my book, not from a performance standpoint, but from a simplification stand point.  You can't express anything in less SQL using ROJ vs LOJ, so why have it?  That doesn't mean it leads to horrible performance.  Any good optimizer should get the same plan with either A ROJ B or B LOJ A.

    However, FOJ does allow you to express something in less SQL than union of LOJs.  A good optimizer should decompose A FOJ B and (A LOJ B) UNION (B LOJ A) into the same plan.  You've completely ignore the non-self join case.  It's easy to ask for examples and pick them apart piece by piece.

    So to summarize, it seems our whole discussion simplifies to you prefer (A LOJ B) UNION (B LOJ A) over A FOJ B and claim it has better performance.  I disagree.  If it does have better performance, the query optimizer has serious short-comings.  From a writability/readability standpoint, I find A FOJ B easier to understand, write, and read.

  • foxyshadis (unregistered) in reply to Jeff S
    Jeff S:
    a lot of wasted typing...

    How is it that you manage to get into a flamewar over somewhat esoteric database concepts every single time a DB wtf shows up?

  • (cs) in reply to JohnO
    JohnO:

    So to summarize, it seems our whole discussion simplifies to you prefer (A LOJ B) UNION (B LOJ A) over A FOJ B and claim it has better performance. 

    No, i never said that and that is completely incorrect.  You have a tendency to make up your own conclusions and assume that it what I am saying, you've done it about 3 times so far, and each time you've been completely wrong about what you are assuming.  Therefore, it seems I don't have much of a chance of making a point with you since you never hear or consider what I actually write, you only immediately start trying to come up with your own conclusions about what I *might* think.  So, I accept the fact that you will assume I am only some arrogant person trying to prove some meaningless point to get my thrills, and that my arguments or ideas have no merit and should be completely ignored.

    Foxyshadis is correct, I allow myself to get pulled into these discussions when this is definitely not the place.  Why do I do it?  I guess I spent too much time helping people fix up crap like this on forums like sqlteam.com and I start thinking that I should help out some misguided souls here as well.   Not worth it. 

  • (cs) in reply to JohnO
    JohnO:

    Jeff S:
    #2 -- RIGHT OUTER JOINS are even *worse* than FOJ's, and they are valid as well.

    Lol, how could I have agreed with you when your statement was ROJs are worse than FOJs, yet I am arguing that FOJs have their place and can be used in place of convoluted unions of left joins or sums on cases?  Not sure how I am picking fights.  There is no simplistic equivalent to FOJ like there is for ROJ. 

    The point of my reference to the wiki article on relational algrebra was that both LOJ and FOJ can be expressed using only UNION and IJ.  Yet you arbitrarily accept LJ and reject FOJ saying the FOJ can be expressed as the union of LJs.  Well, we can come up with many equivalent constructs using different combinations of operations to achieve exactly the same thing.  There are only two criteria to judge them by: does the optimizer come up with a good plan for them and how easy is it to construct and maintain the query.

    ROJ is worthless in my book, not from a performance standpoint, but from a simplification stand point.  You can't express anything in less SQL using ROJ vs LOJ, so why have it?  That doesn't mean it leads to horrible performance.  Any good optimizer should get the same plan with either A ROJ B or B LOJ A.

    However, FOJ does allow you to express something in less SQL than union of LOJs.  A good optimizer should decompose A FOJ B and (A LOJ B) UNION (B LOJ A) into the same plan.  You've completely ignore the non-self join case.  It's easy to ask for examples and pick them apart piece by piece.

    So to summarize, it seems our whole discussion simplifies to you prefer (A LOJ B) UNION (B LOJ A) over A FOJ B and claim it has better performance.  I disagree.  If it does have better performance, the query optimizer has serious short-comings.  From a writability/readability standpoint, I find A FOJ B easier to understand, write, and read.

    JohnO -- I will leave you with this:  I discussed style a bit, esp. in regards to RIGHT joins, and how those should be written as LEFT joins.  But if you read my discussions on FULL OUTER JOINS, hopefully you will see that style never comes into play -- it's based on the results of a FULL OUTER JOIN -- nulls in key columns, wrapping all columns in COALESCE, the lack of index use when sorting or joining further tables, the lack of a true "driving" recordset*, etc.  I never once that that to produce the results of a FULL OUTER JOIN, you should write them a different way.  I said over and over that those very results of a FOJ are not desirable and you should rewrite your join (or union or grouping) in a more logical and efficient manner to avoid producing the results generated by a FOJ.  The goal isn't to search + replace the FOJ with some alternate syntax just to avoid using them; the goal is to produce clear, logical, efficeint results with proper key values and proper indexes still in effect, which a FOJ does not do. 

    I think I can see a little bit why you might be confused, and my comments about clarity and style in addition to the references to RIGHT joins most likely contributed to that confusion.  I hope that helped clear things up a little and you will take the time to try to think about the words I've written (in this post and especially the previous ones) and hopefull not try to make up your own conclusion at the expense of understanding and considering mine.

     

    * note that I did not say "table", which was one of your previous false conclusions.

  • Honest Questioner (unregistered) in reply to Jeff S
    Jeff S:
    Anonymous:
    Jeff S:


    "real joins" give you:

    1) a much clearer, more logical SELECT with
      a) a clear distinction between the join condition and the criteria
      b) a clear, concise FROM clause to indicate your driving recordset (note I did not say "table") which is the most important part of any SELECT with joins


    Um, huh?  When using the JOIN syntax it seems to me that you get a long, heterogeneous FROM clause versus the short, homogeneous list of table tables when using the WHERE clause.


    2) The ability to clearly express OUTER joins, and join order precendence (though this is rarely necessary, it can be handy)


    The question is really explicit INNER JOINS vs implicit inner joins via the WHERE clause, so let's leave OUTER JOINS out of it.


    3) The ability to add criteria to the join expression itself (technically only needed for outer joins, but it can make inner joins more clear as well):

    SELECT ..
    FROM A
    LEFT OUTER JOIN  B
    ON A.ID = B.ID AND B.Status = "Active"


    Again, this seems to take all the conditions, which were once in the same place in a WHERE clause and scatter them all about.



    4) the ability to see explicitly when a CROSS JOIN is being done on purpose, as opposed to leaving yourself to manually read the entire WHERE clause to check to see if all tables are joined properly and wondering if the cartesian product is intential or not.


    This sounds reasonable.


    5) "Backwards-compatable"?  uh, I guess not.  you are correct.   that is why I never use C#, because it is not backwards compatable with C, or CSS because it is not backwards compatable with netscape 2.0.


    Agreed.



    Finally,

    6) It is ANSI standard to use join syntax.  that means it will be FORWARD compatable.  I will let you decide for yourself which is more important, to be forward-compatable or backwards compatable.


    Are there databases that do not support doing implicit joins via the WHERE clause?  (I'm serious, I do not know, I do not pretend to have that much experience...)

    In summary it seems that all your answers go to style and maintenance, correct?  Are there any performance benefits?

    Thanks,
    H.Q.


    For #1 -- if you feel that

    select 
    ...
    from
    a,b,c,d
    where
     a.id = b.id and
    b.id=c.id and
     b.col = c.col and
     b.startDate > '1/1/2000' and
     c.status='active' and
     d.id *= c.id and
    d.col *=  c.col and
     d.value *= 0   

    is more logical and it is easier to understand and to edit the "structure" of the SELECT rather than

    select
    ...
    from
      a
    inner join b
      on a.id = b.id
    inner join c
      on b.id = c.id and
           c.col = b.col
    left outer join d
      on d.id = c.id and
          d.col = c.col and
          c.value = 0
    where
      b.startDate > '1/1/200' and
      c.status = 'active'

    then there's not much I can do to convince you otherwise, I guess.  yes, the first one is shorter, i will give you that.  Even without the outer join, it is much easier and clearer to see how you have structured your SELECT and what the relations are between the tables.


    I do find the first notation easier, but that is likely mostly due to familarity.  I did prefer the horizontally aligned version someone posted after you better than yours. :-)

    I think that I am beginning to see the appeal; separating the join conditions from the "search" conditions (for lack of a better term).




    (is that c.value *=0 syntax even legal?   Can you write c.value *> 12 ?  I don't even know if I did that right, it is so f'ed up to use that *= syntax I have no idea which need to go on which side .. how do you put in outer join criteria w/o using ANSI syntax?)

    For #3 -- you are missing the effect of putting the extra (non-join) criteria in the outer join. that is NOT the same as putting that criteria in the WHERE clause -- it returns completely different results.


    I'm not really interested in outer joins.  I rarely need them.  I am more interested in hearing about the differences in explicit inner joins and implicit inner joins via the WHERE clause.

    H.Q.
  • HyperNewbie (unregistered)

    20 000 000 000? Thats not much. Lets try bubble sorting that.

  • (cs) in reply to foxyshadis
    Anonymous:
    Jeff S:
    a lot of wasted typing...

    How is it that you manage to get into a flamewar over somewhat esoteric database concepts every single time a DB wtf shows up?

    I guess it's because Jeff S is a DB geek and is extremely passionate about his work and domain of expertise.

    While it provides interresting and sometimes entertaining reads for others, it seems to be quite bad for his heart.

  • Anaerin (unregistered) in reply to Moobar
    Anonymous:
    JohnO:
    Anyone that's ever done any real development know there's not 45 people working on it.


    That's just plain wrong.  Anyone that's ever worked on large complex software would have been in a team of around that size or larger.

    I'm currently on project that has 43 people developing a single system.   The largest project at my site has 200 programmers working on it (and it's the only one they are assigned to).  


    Wow. You're lucky. I had to develop a fully working CMS system, that worked both online and offline, for IE5.x, with searchable fields (Searchable offline, as well as online), and was based around a whole page set to "contentEditable=True" to edit the content. Then add the complication of having to intelligently parse MS Word document's HTML and convert it to match the "House Style" (That actually worked well, but was kinda slow).

    Now, I was an independant contractor, with a manager all to myself, and team of 2 people (That's me and a colleague), who's last coding job was writing COBOL for mainframe systems, and had no knowledge of VBScript, or JavaScript, or XSLT. And this was in JavaScript, XML, XSLT's and a tiny bit of ASP/SQL for server-side storage.

    Oh, and I had a project deadline of 4 months, for both a fully-functioning HTML editor, XML/XSLT *CLIENT-SIDE* interactive dynamic editable menu system (That needs to work both online and offline for remote branch offices), Triple-layer security system (Akin to *nix's security levels), automatic, dynamic encryption, page-level information exclusions (So a manager could see all their department's salary increases, but a peon could only see their own, all from the same page), all to use XHTML1.0 Transitional + CSS/1.0.

    I did my time (including the two 1-month extensions we managed to get), and built the system as best I could. It was almost all there, too. But in the end, they pulled funding (as internal politics threatened to make the whole project moot).

    I was supposed to fully document the code afterwards, for which I'd recieve an amount back, but I took the wage I got for the 6 months and ran like hell (To Canada).

    I wrote the code, and even I could find some huge WTF's there (mostly to get around IE's hideous DOM problems, to fix Word's HTML problems, and to deal with the hideous nature of the task at hand).
  • JR (unregistered) in reply to Mischa
    Anonymous:
    BlackTigerX:
    Anonymous:

    I've seen this type of stuff before.

    This is simplifed of course.

    select col1, col2, col3 from a, b, c, d where a.c4 = b.c4 and b.c4 = c.c4;

    Off course they got back duplicate rows, so the programmer changed it to be

    select unique col1, col2, col3 from a, b, c, d where a.c4 = b.c4 and b.c4 = c.c4;

    All was fine when system was very small, but as data got added the report that had this query ran slower, and slower, and slower.  This is how the SQL was fixed.

    select col1, col2, col3 from a, b, c, d where a.c4 = b.c4 and b.c4 = c.c4 and c.c4 = d.c4;

     

     

    aaaaarrrrghhh... the googles!!! they do nothing!!

    why is people using "where" to join?




    Because you cannot use inner joins (or outer joins for that matter) with some DBs. Oracle 8 anyone?

     

    Ding, ding, ding, we have a winner.....  But it was Oracle version 6, they didnb't support the join syntax yet.

  • wind (unregistered) in reply to consultant
    Anonymous:
    skicow:
    That's also assuming that the consultants can write basic SQL and recognize that they incorrectly joined a few tables.


    It's not just consultants, either.

    I worked with a firm where the development staff (all CS from top rated schools) decided to implement their own object relational mapping atop SQL.  They implemented object inheritance by modeling the is-a relationship with two tables and a join. Eight levels of inheritance = eight tables to join in one query.

    The part catalog built atop this mess took days to import a couple thousand SKU's. to get it to meet the 500,000 SKU client requirement, the consulting wing of the firm rewrote the thing in a sensible manner.


    Gallery 2 extactly do that same thing as you said.
  • NRC (unregistered) in reply to JR
    Anonymous:
    Anonymous:
    BlackTigerX:
    Anonymous:

    I've seen this type of stuff before.

    This is simplifed of course.

    select col1, col2, col3 from a, b, c, d where a.c4 = b.c4 and b.c4 = c.c4;

    Off course they got back duplicate rows, so the programmer changed it to be

    select unique col1, col2, col3 from a, b, c, d where a.c4 = b.c4 and b.c4 = c.c4;

    All was fine when system was very small, but as data got added the report that had this query ran slower, and slower, and slower.  This is how the SQL was fixed.

    select col1, col2, col3 from a, b, c, d where a.c4 = b.c4 and b.c4 = c.c4 and c.c4 = d.c4;

     

     

    aaaaarrrrghhh... the googles!!! they do nothing!!

    why is people using "where" to join?




    Because you cannot use inner joins (or outer joins for that matter) with some DBs. Oracle 8 anyone?

     

    Ding, ding, ding, we have a winner.....  But it was Oracle version 6, they didnb't support the join syntax yet.



    He was right the first time....Oracle 9 was the first to support ANSI 1999 query syntax.
  • humanToLazyToLogIn (unregistered) in reply to NRC
    Anonymous:
    He was right the first time....Oracle 9 was the first to support ANSI 1999 query syntax.
    I assume you mean it was the first Oracle DB to support it not the first for DBs in general.
  • NRC (unregistered) in reply to humanToLazyToLogIn
    Anonymous:
    Anonymous:
    He was right the first time....Oracle 9 was the first to support ANSI 1999 query syntax.
    I assume you mean it was the first Oracle DB to support it not the first for DBs in general.


    Correct
  • Thommy Tanker (unregistered) in reply to JohnO

    4) the ability to see explicitly when a CROSS JOIN is being done on purpose, as opposed to leaving yourself to manually read the entire WHERE clause to check to see if all tables are joined properly and wondering if the cartesian product is intential or not.

    select *

       from abstracttable1 at

       inner join abstractlayer al

       on at.pkcol = at.pkcol

    Sure...

     

     

  • (cs) in reply to Thommy Tanker
    Anonymous:

    select *

       from abstracttable1 at

       inner join abstractlayer al

       on at.pkcol = at.pkcol

    Sure...

     


    SELECT * FROM A INNER JOIN B ON 1 = 1 -- cross join
    SELECT * FROM A LEFT OUTER JOIN B ON 1 = 0 -- fancy way to get A
    SELECT * FROM A FULL OUTER JOIN B ON 1 = 0 -- fancy way to get A and B
    SELECT * FROM A FULL OUTER JOIN B ON 1 = 1 -- cross join but returns records even if one table is empty

  • (cs)

    This situation seem all too familiar....

    I inherited a system a that when the records doubled one year (because we simply received more data)  the reports that took 30 seconds now took 30 minutes....

    Of course after a few join changes, and the addition of much needed indexes, reports ran in seconds....but it took me a Saturday to figure it out. (back when I was a newbie)

  • DWalker (unregistered) in reply to consultant

    Come on, "ConsultantAdam", don't use "totaly gay" as a put-down or an insult.  It's insulting to those of us who happen to be gay.  And who know how to spell "totally".   :-)

  • Ben (unregistered)

    20 billion? That's nothing. We once had an issue with an email subscriber list where short-sighted database design and inefficient use of joins resulted in a query which returned literally TRILLIONS of rows.

Leave a comment on “Reporting All Through The Night”

Log In or post as a guest

Replying to comment #:

« Return to Article