• Donkey (unregistered)

    WTF... This cannot be in production code, right?

  • DanStory (unregistered)

    They most have created an app to generate all the possible orders... :-|

  • RC Pinchey (unregistered)

    For the sake of the coder's sanity, I hope he didn't type that out longhand... although to be fair, I'm not sure there's too much sanity left to salvage based on that mess.

    "Well, this is taking hours, and it's clearly the wrong way to do it... but I've started, so I'll finish". There's a certain tragic beauty in that.

  • Dave (unregistered)
    Comment held for moderation.
  • BarraCoder (unregistered) in reply to Dave
    Comment held for moderation.
  • Doug (unregistered)

    Most likely this was a quick fix for a program than only expected two possible permutations but as time went by, more were added and instead of fixing the problem, it was just perpetuated.

    The WTF here, fix if before it gets out of hand.

    Dev Managers excuse: "If it works, why fix it?"

  • tralala (unregistered)

    Nice way to prevent SQL-Injection

  • tharfagreinir (cs) in reply to DanStory
    DanStory:
    They most have created an app to generate all the possible orders... :-|

    Since some permutations are missing, that's an unlikely explanation, I'm afraid. Some poor soul coded this by hand.

  • Ron (unregistered) in reply to RC Pinchey
    RC Pinchey:
    For the sake of the coder's sanity, I "Well, this is taking hours, and it's clearly the wrong way to do it... but I've started, so I'll finish". There's a certain tragic beauty in that.
    Hah, I've been there. Usually it's around the 8-hour mark that I re-think what the hell I'm doing.
  • ChrisH (unregistered) in reply to Dave
    Comment held for moderation.
  • Diado (unregistered) in reply to tharfagreinir
    tharfagreinir:
    DanStory:
    They most have created an app to generate all the possible orders... :-|

    Since some permutations are missing, that's an unlikely explanation, I'm afraid. Some poor soul coded this by hand.

    Some permutations are missing I imagine because this isn't the entire code snippet, as indication by the "... (original snippet continues in this manner)" text.

  • Glenn Lasher (unregistered) in reply to Dave
    there's the LIKE statement without wild cards which is just a way to use =, but probably slower.

    Depends on the database. I've noticed, for instance, that PostgreSQL will use an index rather than a sequential scan for any LIKE expression that doesn't begin with '%'. For example, if you do something like:

    SELECT * FROM users WHERE name LIKE 'SMITH%';

    ...PostgreSQL will take that and use the index (if an appropriate one exists) to narrow down the options to those that start with SMITH.

    If you do this:

    SELECT * FROM users WHERE name LIKE 'SMITH%JOHN';

    ...then it will, again, use the index to narrow down the candidates before it starts gronking on all of the Smiths to find the SMITH%JOHNs.

    Given that they have performed this optimisation, it would seem that if you did this:

    SELECT * FROM users WHERE name LIKE 'SMITH, JOHN';

    ...that it would likewise hit the index. Since there is no % in the expression, it can match on the whole expression during its index search and land on it every bit as fast as if you had used the = operator.

    Your mileage may vary with other database engines. I only mention PostgreSQL because of my familiarity with it.

    Captcha: dubya (speaking of slow)

  • Aaron (unregistered) in reply to Doug
    Doug:
    Most likely this was a quick fix for a program than only expected two possible permutations but as time went by, more were added and instead of fixing the problem, it was just perpetuated.
    Actually, that's unlikely in this case given that the sorting "parameter" is a varchar or other string containing the actual names of the fields to order by. If it was originally only meant to handle two or three permutations, it would probably have started out as a bit or integer value. I've done that myself on occasion, although in my defense those permutations also had different GROUP BY clauses, which doesn't lend itself well to CASE solutions.

    No, what's really sad here is that this is very likely used to serve up some report that allows dynamic sorting. I'm guessing that the application generates the column list dynamically and passes it as a parameter, and that's what's really sad here - it would have been far simpler and more maintainable to just use dynamic SQL. Probably someone there got the idea into their head that they HAD to use stored procedures because they're more [secure / efficient / maintainable] (pick any or all of the above).

  • abx (cs) in reply to Dave

    Actually, it's loaded with the single-character wildcard (underscore), although that's probably not intended.

  • lmodllmodl (unregistered) in reply to ChrisH
    Comment held for moderation.
  • fennec (cs) in reply to Diado
    Diado:
    tharfagreinir:
    DanStory:
    They most have created an app to generate all the possible orders... :-|

    Since some permutations are missing, that's an unlikely explanation, I'm afraid. Some poor soul coded this by hand.

    Some permutations are missing I imagine because this isn't the entire code snippet, as indication by the "... (original snippet continues in this manner)" text.

    Read The Fine Submission, people.

    Jeremiah found the much, much longer version of a solution to this problem. Once he stopped laughing, he notes for us that many, many permutations are missing, save for some of the ORDER BY criteria being treated as a single unit.
    Emphasis added.
  • wgh (cs)

    Don't make the mistake of thinking this technique is only useful for sorting--it lends itself nicely to a search algorithm for the next Google killer.

    ...[lines snipped]...
    } else if (search == "george bush") {
      sql += "WHERE text LIKE '%george bush%'";
    } else if (search == "brangelina") {
      sql += "WHERE text LIKE '%brangelina%'";
    } else if (search == "pancake houses in san antonio") {
      sql += "WHERE text LIKE '%pancake houses in san antonio%'";
    } else if ...
    
  • daef (unregistered)

    example: 6 columns to order mean 6! = more than 700 if statements... 7 columns would even produce more than 5000 possibilities to order...

    thinking about writing a programm wich produces the sql-if-elseif-statements

  • db2 (cs)

    Just did an experiment in SQL Server 2000 and got interesting results. Both of these queries produce subtly different execution plans:

    SELECT * FROM Northwind..Customers 
    WHERE CompanyName = 'Gourmet Lanchonetes'
    SELECT * FROM Northwind..Customers 
    WHERE CompanyName LIKE 'Gourmet Lanchonetes'

    The difference lies in how the index seek is performed.

    |--Index Seek(OBJECT:([Northwind].[dbo].[Customers].[CompanyName]), 
    SEEK:([Customers].[CompanyName]
    =Convert([@1])) ORDERED FORWARD)
    vs.
    |--Index Seek(OBJECT:([Northwind].[dbo].[Customers].[CompanyName]), 
    SEEK:([Customers].[CompanyName] 
    >= 'Gourmet Lanchonetes' AND [Customers].[CompanyName] 
    <= 'Gourmet Lanchonetes'),  
    WHERE:(like([Customers].[CompanyName], 'Gourmet Lanchonetes', NULL)) ORDERED 
    FORWARD)

    Note the use of >= AND <= rather than just =. The execution cost actually ends up being a tiny bit larger, according to Query Analyzer (50.01% of the batch for the LIKE, and 49.99% for the = version). It's entirely possible that's just rounding error, though.

    SQL Server 2005 comes up with some weird execution plans involving inner joins instead of the bookmark lookup when I tell it to use the index on CompanyName (which it otherwise ignores). Not sure what's going on there...

  • Gedoon (unregistered) in reply to Doug
    Doug:
    Most likely this was a quick fix for a program than only expected two possible permutations but as time went by, more were added and instead of fixing the problem, it was just perpetuated.

    The WTF here, fix if before it gets out of hand.

    Dev Managers excuse: "If it works, why fix it?"

    Been there, done that. Quick-n-dirty is bad if it becomes something you need a lot and may need to expand it. Well actually it's always bad no matter what...

    When I saw a fellow developer create a table with six columns for six different standard messages I was thinking hey, those should be six rows, not columns, so I just noted the guy that that's not how it's supposed to go, but we let it go, cause it would've needed a lot of fixing, and it was for just six messages... This happened two years ago. The application is still in use and has been tweaked a few times since that... Last time I checked, the table had 50 columns, 1 row. It started innocent and, well.. it works... :(

  • RobertB (cs)

    I don't believe Sybase allows you to build SQL on the fly (or at least, it didn't when we started creating our stored procedures, and by golly, we're not going to start doing something different just because the Intarweb says we can). I wouldn't want to count on those strings being passed correctly, though... I'd have used magic numbers, and probably been included in a WTF of my own at some point.

    IF @OrderByCriteria = 1 /* 1 = order by version */
     SELECT * FROM @Table_Input
     ORDER BY receipt_date, Account_Number, major_version, minor_version, Vendor_ID 
    
    ELSE IF @OrderByCriteria = 2 /* 2 = order by vendor */
     SELECT * FROM @Table_Input
     ORDER BY receipt_date, Account_Number, Vendor_ID, major_version, minor_version 
    /* etc, up to ~16383 total conditions */
    

    (cue sound of Coyote screeching to a halt)

    Wait a minute. If we can't build SQL interactively, how the heck does "Select * from @anything" work?

  • mrsticks1982 (cs) in reply to RobertB
    RobertB:
    I don't believe Sybase allows you to build SQL on the fly (or at least, it didn't when we started creating our stored procedures, and by golly, we're not going to start doing something different just because the Intarweb says we can). I wouldn't want to count on those strings being passed correctly, though... I'd have used magic numbers, and probably been included in a WTF of my own at some point.
    IF @OrderByCriteria = 1 /* 1 = order by version */
     SELECT * FROM @Table_Input
     ORDER BY receipt_date, Account_Number, major_version, minor_version, Vendor_ID 
    
    ELSE IF @OrderByCriteria = 2 /* 2 = order by vendor */
     SELECT * FROM @Table_Input
     ORDER BY receipt_date, Account_Number, Vendor_ID, major_version, minor_version 
    /* etc, up to ~16383 total conditions */
    

    (cue sound of Coyote screeching to a halt)

    Wait a minute. If we can't build SQL interactively, how the heck does "Select * from @anything" work?

    @anything would have to be a table object, otherwise you will get a "muhaha, you can't do that stupid!" from the sql compiler.

    http://www.developer.com/db/article.php/3414331 - the bottom example

  • sir_flexalot (cs) in reply to mrsticks1982
    mrsticks1982:
    otherwise you will get a "muhaha, you can't do that stupid!" from the sql compiler.

    Dang! My SQL engine doesn't yell at me, you must be using "option explicit"...

  • anonymous coward (unregistered) in reply to Donkey
    Donkey:
    WTF... This cannot be in production code, right?

    Haven't read much production code, have you son?

    captcha: gotcha

  • president_ch0ice (cs)

    The REAL WTF here is something I see frequently: programmers not understanding the difference between IMPERATIVE and DECLARATIVE languages.

    SQL is a declarative language. If you try to use in SQL the same algorithmic approach that you use in C# or Fortran, the performance of your SQL code will be awful. Generic or dynamic code is excellent in C# but terrible in SQL.

    In a declarative language (like SQL) you provide the interpreter with the RESULT you are looking for, not the METHOD to get the result. If you make things dynamic, the server/interpreter always see the requests as something new and cannot use its internal optimizer efficiently. On the other hand if you hardcode everything, an excellent optimization can be done by the interpreter.

    On SQL Server, if the server can reuse a previously calculated execution plan for a piece of code, the performance will be greatly improved. This is the main purpose of stored procedure. But when a "smartass" writes dynamic code using EXEC(), the server has no way to predict what the dynamic code will be, and therefore cannot reuse an execution plan. And performance goes down the drain.

    So basically the guy who wrote the permutations is doing the right thing. Using dynamic SQL would not be optimal. Unless there is a ridiculous amount of permutations, in which case this is a design issue.

    If there is a relatively small number of permutations, say under 100, a good DBA will use a scripting language to generate the SQL. And the stored procedure itself will contain the fully-generated SQL.

    So before crapping on someone code, it would be better to learn about the language involved.

  • Ben Hutchings (unregistered) in reply to ChrisH

    MSSQL is claimed to implement the most basic level of SQL-92, which is almost the same as SQL-89. However I came across a case where it follows SQL-89 semantics. In short, very little of its functionality is implemented according to current SQL standards.

  • ammoQ (cs) in reply to president_ch0ice
    president_ch0ice:
    But when a "smartass" writes dynamic code using EXEC(), the server has no way to predict what the dynamic code will be, and therefore cannot reuse an execution plan. And performance goes down the drain.

    So basically the guy who wrote the permutations is doing the right thing. Using dynamic SQL would not be optimal. Unless there is a ridiculous amount of permutations, in which case this is a design issue.

    If you have 100 possible permutations, and all of them are used, the database engine still has to keep plans for 100 different statements, no matter if they are created at runtime using exec() or explicitely written in the stored procedure like in the original posting. Anyway, if the string given to EXEC() is always the same, the database should be able to reuse the plan it created last time. (Disclaimer: I mostly use Oracle, but I'm confident SQLServer can do the same)

  • Peter (unregistered) in reply to president_ch0ice
    Comment held for moderation.
  • codemoose (cs) in reply to wgh
    wgh:
    Don't make the mistake of thinking this technique is only useful for sorting--it lends itself nicely to a search algorithm for the next Google killer.
    ...[lines snipped]...
    } else if (search == "george bush") {
      sql += "WHERE text LIKE '%george bush%'";
    } else if (search == "brangelina") {
      sql += "WHERE text LIKE '%brangelina%'";
    } else if (search == "pancake houses in san antonio") {
      sql += "WHERE text LIKE '%pancake houses in san antonio%'";
    } else if ...
    

    you owe me a new keyboard

  • wwwrat (unregistered)
    Comment held for moderation.
  • java.lang.NullReferenceException (cs) in reply to BarraCoder
    BarraCoder:
    If all anyone had to do was "ORDER BY @OrderByCriteria" this probably wouldn't have happened...

    You must be new around here...

  • Blaufish (unregistered) in reply to Donkey
    Donkey:
    WTF... This cannot be in production code, right?

    I think I know why it looks like this.

    Original code was SQL Injection vulnerable. Rewrite had to be done in an hour, with no time to do a proper redesign. Retrofitting security quickly is an exercise in bad engineering and gives birth to dirt ugly code like this.

  • iMalc (unregistered)

    One could perhaps generate a text string containing the exact sql code to execute and then use sp_execsql to run it.

  • Aaron (unregistered) in reply to president_ch0ice
    president_ch0ice:
    But when a "smartass" writes dynamic code using EXEC(), the server has no way to predict what the dynamic code will be, and therefore cannot reuse an execution plan. And performance goes down the drain.
    First of all, that's not true. Execution plans can only be saved for a complete logic control path, so the number of possible execution plans for a stored procedure is equal to the number of different branch combinations. This could be very efficient, if there's no control logic, or horribly inefficient, if there are 10 sequential if/then/else clauses (i.e. 2^10 possible execution plans).

    On the other hand, most databases including MSSQL cache all execution plans, no matter where the SQL "comes from", so executing the same SQL over and over again with EXEC or sp_executesql will only have to generate a single execution plan, and often the results themselves will be cached as well.

    The net result is that there's no performance advantage to having a "dynamic" stored procedure with complex control logic over just generating the SQL client-side and executing it on the server.

    Incidentally, I'm not inclined toward dynamic SQL within SQL server itself. I was talking about generating it from the client, a practice which you're going to see becoming far more common in the near future with the propagation of ORM tools and LINQ.

  • Rick DeBay (unregistered)

    This will be prepared every time, but for a query that on the surface looks like a simple table scan, that should be zero.

    EXECUTE STATEMENT 'SELECT * FROM ' || :Table_Input || ' ORDER BY ' || :OrderByCriteria

    The app should probably be doing the ordering anyway, unless it's a large result set and you want to stop some time before seeing all the results.

    Rick DeBay

  • Zygo (unregistered) in reply to president_ch0ice
    president_ch0ice:
    The REAL WTF here is something I see frequently: programmers not understanding the difference between IMPERATIVE and DECLARATIVE languages.

    SQL is a declarative language. If you try to use in SQL the same algorithmic approach that you use in C# or Fortran, the performance of your SQL code will be awful. Generic or dynamic code is excellent in C# but terrible in SQL.

    Not sure I'd characterize the difference in those terms. In one sense C# is a declarative language too--you want the compiler to generate machine code that does a particular task in a particular way. Some specialized C++ compilers have optimizers which can find codec algorithms in their input files and replace them with equivalents from a library of hand-optimized SSE instructions. In another sense SQL can be interpreted imperatively if you use a RDBMS engine with no optimizer, although that means e.g. all table joins run in O(n*m) time for two tables of size n and m even if the join condition specifies equality between unique columns on both tables. If SQL was truly declarative, it wouldn't matter which of several possible mathematically equivalent ways to express a query were used, but in most real-world implementations it makes a huge difference.

    I do agree with the rest of what was said. It is true that interpreting SQL statements is quite expensive compared to other languages, and a lot of coders don't grasp the concepts of algorithmic complexity or implementation cost. Imagine what performance would be like if you were writing code in C# or some similar language which did the following:

    1. Generate some C#, C++, Java, etc. code for a function in a text string.
    2. Invoke compiler to produce a loadable code library (DLL, dso, .so, .class, whatever).
    3. Load library into application.
    4. Execute function.
    5. Unload library.
    6. Delete DLL, dso, .so, .class file, whatever.
    7. Repeat steps 1-6 in a loop.

    As silly as the above sounds, it does work, and with some caching and slightly more intelligent design it can even work well.

    A simple SQL statement like

    select ca, cb from tbl where cb like 'xyz' order by ca;

    has to answer a bunch of questions before it even starts to execute (and for clarity I'm leaving out several necessary steps like locks and transaction handling):

    1. Which of several objects named 'tbl' is the query referring to? (extra cost here if 'tbl' is actually a view or some exotic kind of dynamically generated object)
    2. Does the user have select privilege on tbl?
    3. When we've read data from disk, which part of it is 'ca' and which data type is it? (extra cost here if the answer is 'user defined type')
    4. When we've read data from disk, which part of it is 'cb' and which data type is it? (extra cost here if the answer is 'user defined type')
    5. Does the user have select privilege on column ca of tbl?
    6. Does the user have select privilege on column cb of tbl?
    7. Is there an index which can help with "cb like [a text string]"?
    8. Is there an index which can help with "order by ca"?
    9. Is there an index which can help with "where cb like [a text string] order by ca"?
    10. Is it better to use one of those indexes, two or three of them, or just read the whole table and sort the results?
    11. Has the user exceeded a time, space, or other resource limit? Will this query change the answer to that question?

    Each of these questions on average requires O(log(N)) disk seeks to answer (which is why modern DB servers are dead in the water without adequate RAM caching). Just imagine how many steps there would be if there were joins, group by clauses, etc. Often there is a combinatorial explosion of possible questions to answer, so the DB has to skip a few steps and choose a suboptimal but working solution just to avoid spending the rest of the Sun's runtime analyzing the query (that's "Sun" as in "the big fusion reactor at the center of the solar system", not "the hardware vendor in California").

    Some DB engines provide a facility to precompile the constant parts of an SQL expression. This incurs the cost of most of the steps above once, then produces an object which starts at step #11 with specific data values each time it is invoked; however, sometimes the result is not as optimized as it would be if the specific data values are known. Consider the potential to optimize "like 'foo'", which is equivalent to "= foo" vs. "like '%foo%'" which requires something like a permuted string index or a full table scan. If the planner is given "like [variable]" then it can't use plans that might be faster but only work for special cases of [variable].

    Usually prepared queries provide the ability to substitute column values (so you could substitute a variable in place of 'xyz' in the example) but not column names, table names, or any part of SQL except constant subexpressions (so you couldn't substitute a variable in place of 'tbl' in the example, or change the "order by" clause). The reason for this limitation is due to the implementation, which is trying to save costs per query by fixing the execution plan of the query in advance--change the query in all but a few specific ways, and you'll have to redo most or all of the work you were trying to avoid.

    On the other hand, if there are only a few ORDER BY expressions, a clever client could simply prepare and cache those queries that it does generate. Clever servers in turn can do this for the clients. Not-so-clever servers take this one step too far, by caching the results of the query too.

  • Mark (unregistered) in reply to db2

    Actually there is a subtle difference between the two anyway, '=' ignores trailing spaces and LIKE doesn't

    SELECT * FROM Northwind..Customers WHERE CompanyName = 'Gourmet Lanchonetes '

    SELECT * FROM Northwind..Customers WHERE CompanyName LIKE 'Gourmet Lanchonetes '

  • Coyne (unregistered)

    Speaking as someone who works at an organization that routinely deals with tables of rather large size (we have 300 tables of >= 1M rows) the idea of using CASE and CONVERT to generate ad-hoc ORDER BY is rather boggling.

    Such an approach is only useful if a table is relatively small and/or there is a need to be able to sort by every possible permutation. I can only imagine how loud our users would scream about response times on the order of hours (if not days) necessitated by such a dynamic ORDER BY applied to such large tables...especially since our transaction response time target is 3 seconds walltime.

    Experience has shown that, in general, few permutations of ordering are needed for any given table. In a requisition line table, for example, who would want to sort all the rows by unit-of-measure code (each, box, pint, etc.) without first sorting by item? (Seems kind of lame to order the rows so that "each/MRI machine" is followed by "each/mug, coffee".)

    Where tables are large, sorting of necessity must be backed up by an appropriate index, and the number of indexes is sharply limited by management and performance issues. So the "multiple-SQL-statement" approach is quite reasonable in such environments, since there are probably no more such statements than there are indexes.

  • chadillac (cs)

    Looking at this and thinking about maintainablity and that being pushed into production made my feel physically nauseous

    No seriously.... I don't feel well

  • Calvin Lawson (unregistered) in reply to ChrisH
    Comment held for moderation.
  • noise (unregistered) in reply to Zygo

    your first seven steps just described ( more or less ) the way xml serialization in c# works :) [ though truth be told it's only between runs not during runs, and i hear you can cache the generated code / dll these days ]

  • Samah (cs) in reply to ChrisH
    ChrisH:
    Dave:

    SELECT * FROM @Table_Input ORDER BY @OrderByCriteria

    Sadly, and very strangely this isn't an option with MSSQL.

    The CASE option mentioned is one way, there are precious few other ways.

    The real WTF is MSSQL. Learn 2 use semicolons thx Microsoft.

  • Krop (unregistered)

    Ordering output should not be handled by the database, this is for the front end to do.

  • Watson (unregistered) in reply to fennec
    fennec:
    Read The Fine Submission, people.
    Jeremiah found the much, much longer version of a solution to this problem. Once he stopped laughing, he notes for us that many, many permutations are missing, save for some of the ORDER BY criteria being treated as a single unit.
    Emphasis added.
    Viz: guessing that major_version and minor_version always appear together and in that order, the number of permutations drops from 120 to a much more manageable 24.
  • dkf (unregistered) in reply to Krop
    Krop:
    Ordering output should not be handled by the database, this is for the front end to do.
    There's a real WTF right there, right up there with the original's desire to order by an enormous number of columns...

    Captcha: dubya (and his database design too...)

  • poochner (unregistered) in reply to Krop
    Krop:
    Ordering output should not be handled by the database, this is for the front end to do.

    WTF?! No, that's one of the things the database is for.

  • icelava (cs)

    MY EYES! MY EYES! IT BURNS!

  • Jon (unregistered)

    This isn't bad, in certain cases I could consider using this. It's in a stored procedure so the interface simply has to stay the same and you can update the datalayer at will. It prevents Sql Injection which is a big plus. Its very simple and repeats itself A LOT!!! Make a program that generates this dynamically and you have your maintainablity. There are probably better more maintainable ways of doing this like trimming the string as it comes in so that it can only have this amount of chars no more no less. Also making sure each element that is expected is present.

  • AdT (unregistered)
    Comment held for moderation.
  • Homser S (unregistered)

    Paid per code-lines --> he'll be rich ;-)

Leave a comment on “The Really, Really Long Way”

Log In or post as a guest

Replying to comment #:

« Return to Article