- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
FP!
Admin
This is the software that tracks your bank account folks...
Admin
Wow!
Really... wow!
Though scarily I can think of a situation in which this would run quicker than creating a second database table and performing a join: Cross database queries in Oracle where the databases are at different physical locations.
In that scenario, a join across the two sites would really be nasty... so executing one simple piece of SQL at one end, returning the results, creating a nasty query like the above and executing locally would be quicker.
And before someone says temporary tables... on older versions of Oracle temporary tables logged... and that means your log files really do start eating disk. In effect, temp tables weren't as temporary and discardable as you would desire. So the use of them was avoided quite heavily.
That said... why not a regular table with queryId and custNum, and populate it each time and truncate occasionally?
Hmm... I dunno... I'm beyond trying to figure out why. Usually I can just about manage to, but even with knowledge of bizarre old Oracle quirks the above is still a WTF to be proud of.
Admin
I Take this case as a layer
this code was made by another program. Take some tiem to look at it closer, for instance you have
Admin
<FONT face=Verdana size=2>Sweet, I just leaned a new skill. I never would have thought of it.</FONT>
Admin
Oracle has this (quite reasonable) limitation that it only allows 999 elements in the IN clause. To get around that one would use IN (1, 2,....999) OR (1000,1001,...). That said, before doing stuff like this one SHOULD think whether this is a good idea in the first place.
The sad part is that the reason I know this limitation is because some 'clever' developers decided to do exactly the same thing in one of our projects :(
Admin
Another good example of where the problem isn't what it seems ...
The problem is not that there are better ways to pass in a list of hundreds of keys as parameters; it is the the fact that they are passing in lists of hundreds of keys as parameters in the first place.
A typical "requirement" on many jobs I've done goes like this:
Customer: "I need a report that shows salaries for these employees." (hands a list of 20 employe numbers)
Me: "Why these 20 employees?"
Customer: " those are the ones I want to see!"
Me: "Yes, but what is it about these 20 employees that makes you want to see them on this particular report? What do they have in common?"
Customer: "Are you dense? What they have in common is that I want the report to show their salaries!"
Me: "Yes, I understand that ... why did you choose this list of 20? Will this list ever change? What makes it change?"
Customer: "I don't follow. I just want to see the salaries of everyone in the Finance department on the report, and this is the list! Also, I don't want employees that have been terminated, of course."
Me: "So you want a report that shows the salaries of active employees in the Finance department? "
Customer: "Yes! Isn't that what I said? Also, we will eventually reports for the other departments as well."
Me: "So, maybe a report that *prompts* for a department, and from there, shows the salaries of the active employees in that department?"
Customer: "Yes! Isn't that what I've been saying?"
The problem is the inability of the customer to say what he really needs, and the inability of the programmer to sometimes stop and think "there must be a way of doing this without hard-coding in 200 employee numbers!" I am amazed at how often conversations like this occur and how often these talks are handled poorly on both ends.
Often, these requirements (when finally stated logically and clearly) demonstrate the need for additional attributes to be stored on the tables that aren't there; if you constantly need to list of a set of rows that have something in common for any purpose, those rows should have a common attribute. Your data should be driving your logic; you shouldn't be embedding data in your code.
Admin
Where I work, one of our products generates a worklist that could potentially have a very long list of IDs in the IN clause. None of our clients ever complained because they didn't have a lot of items on their worklist but we had when running the application against our Oracle test database because the IN clause exceeded 255 values. I ended up fixing this problem by having the application create a temporary table with all the IDs that were needed and joining the main query with the IDs in the temp table. After that it would drop the temporary table.
Maybe that's why he has all of these ORs in the query. Perhaps he was getting the same database error and this was the workaround.
However, our query didn't have a UNION on multiple queries. I can't figure out why they needed to do that.
So, what is a good solution for a situation where you need to select rows from a table and there are a large number of IDs that need to be included in the IN clause and these IDs are not known until runtime?
Admin
Your points are valid, but irrelevant to this WTF. There's no way that someone handed a programmer a paper with thousands of customer numbers and had them write a query against it.
The obvious choice has been stated, poor middleware writing queries like this.
b
Admin
You make a very good point. This kind of dialog between the vendor and customer is very important. Once you understand why the customer needs something, it is much easier to figure out the best way of doing it or come up with better alternatives to just blindly hardcoding in 20 employee IDs.
Admin
And given the incremental way that these requests can grow, you can have such a situation creep up on you. You did not mean to write such code, but you are like the frog in the pot of heating water.
BTDT. I am not alone.
In actuality, you probably have to prompt the customer for whether it should be only active employees and whether this report will be needed for different departments as well. Do not forget the order of the report. Employee number, employee name, salary, or something else?
Quite. Again, the frog in the pot is too easy.
Sincerely,
Gene Wirchenko
Admin
The problem, of course, is that there are LETTERS mixed in with the NUMBERS:
So, what is the solution, SQL people?
A temporary / session table, which you fill with the IDs you want, and join on that?
Returning all the rows is not an option, for large tables...
Admin
The solution is to store common attributes about these entities in the data and to select them based on those attributes w/o explicitly listing them all out.
Unless your app displays a long list of 10,000 items, and forces the user to manually click each one to select them, and the user is just randomly picking from that list without any rhyme or reason or pattern, you should never have to implement a SQL statement like this.
Admin
You see, this is why us crack C++ developers are smarter than anyone else. Only an idiot would work with something as brain-dead SQL. I mean, didn't it occur to the geniuses that invented SQL that sometimes you want to just pull all records from the table without any criteria? I mean is it just too blindingly obvious that SQL should let you just leave out the Where block to accomplish this? But, no, I get stuck with this so-called "language" and have to spend all day dividing a huge In list into manageable bites. And now everytime they add a customer, I have to change this code to add a new Cust_Num. What a WTF.
Well, at least the second step is sensible. That's where I use C++ to copy the query result into a link list so I can loop through and sum the total orders, which is all they really want anyway.
If SQL were a real language, it would have a Sum function.
--Rank
Admin
It is completely relevant to this WTF.
If the specs are that a user must manually click on 1000's of completely unrelated, random customer id's and pass them to the next layer, then the problem isn't the middleware, it's the specs.
Admin
I've been guilty of this in the past, though not by my choice.
The situation I've done this is when I needed to make a selection against a database based on the records contained in a geographical area chosen by a user.
The mapping software would return their IDs, which was the only key that was used to hook it to the database. I then had to query the database for all of the IDs. I would have preferred to use a geospatial database, which would allow me to tie the spatial records directly to the attribute records, but for this client, that was not approved software. So a giant "IN" clause was necessary, and rather unavoidable.
There are scenarios where you have no real choice. And in all honesty, it didn't perform poorly. Even when it was fetching some 10,000 records.
I do not know if creating a temporary table and inserting those records and then doing a join would be faster, I haven't benchmarked it.
Admin
Um....did nobody ever hear of the EXISTS command?? The simple fact they used IN is a WTF to me...
Admin
Aside from EXISTS' behavior being somewhat RDBMS-specific, I'm not sure how it could be used to make this query cleaner? Explain.
Admin
I've unfortunately seen this code, and can explain why they did this.
In Oracle, there's some magic number of IN variables, around 38, after which it will cease to use an index, and will always use a table scan.
Believe it or not, code like the above, as ugly as it is, will actually make use of the index!
If I see someone write code like this I commend them for knowing that this issue exists, then take away their lunch money until they fix it the right way.
Admin
it´s Worst!, the number of comparisons are the same, and you have to insert the values!
You have #orderRows*#clientsToCompare tests in both cases. We can asume that the 2nd table will be indexed, but we'll have a slower insertion
Admin
How exactly would using EXISTS make this "better" ?
Admin
In MS SQL, I prefer to use WHERE CHARINDEX('|' + CONVERT(varchar, ColumnName) + '|', @list) > 0 where @list is varchar(8000). You would pass @list with values like '|100|266|174|' or '|TX|AK|MO|'. This gets around the temp table issues and the in limitations.
Admin
That SELECT will be less efficient than a very long IN() clause; every row needs to be scanned to evaluate the WHERE condition.
If you have to pass a list of values to a SELECT, the most efficient way (unless you are querying a very small table or a table with no indexes) is to populate a temp table or table variable with what you need and to join to that.
Admin
It's sad, but I've run into that one too. It was an unlikely situation that only came up during stress testing. It took me forever to figure out what was happening because I didn't know about this limitation before I found the issue. I took it as proof outsourcing is a bad idea.
Admin
Passing in the IDs as an XML document would be another way.
Admin
Wrong; your assumption completely ignores indexes.
if your table is properly indexed and it is sufficiently large so that efficiency is important, adding rows to a temp table (or table variable) and then joining to that will be much, much more efficient.
Admin
Excellent rant, except for one thing: you've obviously never used SQL before.
This WTF make me shudder... mainly because I actually programmed something like that in one of my apps directly after I learned about information_schema waaaaaay back. "It auto-magically pulls everything from every table guys! Look!" I was really after a way to abuse every resource available.
Admin
You are assuming an index on the column in question.
Admin
You can also check for one item and optimize for that or unwind the string into a temp table in the stored procedure.
Admin
Yes.
Admin
I think he was being sarcastic .... read it again.
Admin
I'll have to try that. I primarily use MySQL -- anyone have experience with temp tables using MySQL 5.0 and its performance? I have never used them before in it. I have in SQL Server.
Admin
So I'll assume that, for the sake of brevity, you omitted the requirement where only certain people could access the report, using some sort of protection mechanism? At least where I work, employee salaries are privileged confidential information that is only available to the people who need to know it to do their job :)
Admin
Caught again! Damn my autism!
Admin
Indeed, this point bears repreating: What criteria generated that list of numbers, and why is that information not being used to select the appropriate data from the database? Even if the list was generated by 100 telemarketing monkeys clicking the "sucker" button on their call sheet, that clcik should be updating the "sucker" flag in the database.
Admin
sometimes you can't properly index for every query you need, maybe this query runes once a month so if it takes 3 minutes tu run theres no problem...
Admin
Point completely missed ! But thanks for playing!
(I bet you're the guy who interrupts with pointless questions when someone is giving an example ... you know, if someone asks "I ride a train travelling 100MPH from point A to point B, 300 miles away. How long did it take?" you'd respond with "How did you get on the train? Did you but a ticket? Where did you sit? What was the conductor's name?")
Admin
<FONT size=2>I think I hear their database screaming in agony from here...</FONT>
<FONT size=2>[pi]</FONT>
Admin
SELECT SQL_STMT
FROM TDWTF
WHERE I LIKE
Admin
No pizza for me.
Admin
I'll admit to falling into this trap once, when handling data synchronisation from remote clients. The magic IN clause got its keys from the sync message, so it was like:
"Give me the rows that were updated after date X, but not those in (keys of rows that were just inserted)"
The real solution would've been to use JavaScript...
Admin
If you're building a huge dynamic SQL query like this WTF (I'm assuming), I've always wondered if it would be better to do an IN with a bunch of parameters, or to do several hundred Where CUST_NUM = 'blah1' OR CUST_NUM = 'blah2' OR CUST_NUM ='blah3' ... ad naseum.
For instance, to return a group of stores in a company or large franchise. Which would be less resource intensive/return results more quickly?
There was a situation I was placed in a couple of years ago, I had to make this choice and vied for the latter... either that or completely reprogram their product, which I didn't have time for.
Admin
I read what you highlighted the first time -- a table could have many indexes, just not one on the column you have criteria for.
Admin
As if that whole thing wasn't bad enough, using "UNION" instead of "UNION ALL" further slows it down.
Admin
"So, what is a good solution for a situation where you need to select rows from a table and there are a large number of IDs that need to be included in the IN clause and these IDs are not known until runtime?"
Two words: bind variables.
Another problem with this particular WTF is that the ids are hard-coded into the SQL text, instead of being bound to SQL placeholders of some sort. Virtually every modern RDBMS supports placeholders or bind variables, and for good reason. With placeholders (and appropriate client-side programming) the db engine only parses and optimizes the query once. Without bind variables, the smallest change to the parameters will force the db to reparse and re-optimize the query every time it's executed. At best, it's extra performance overhead that's easily avoided. At worst, it can drag a db server to its knees, as it attempts to cache many nearly identical query plans, eating up resources, forcing other queries out of the cache, etc. If someone tried crap like this on the dbs I'm responsible for, they'd find their modules locked out immediately.
Admin
There is never a good reason for this sort of code. Ever.
Even if you, god forbid, are taking a list of IDs from some user input in some sick twisted fashion, you would be better off loading those records into a temp table and joining that with the table you're selecting from. Because if the column you're joining with is indexed, it'll be a lot faster than this, and if it's not indexed, well, it sure as hell can't be any slower.
And, of course, there's rarely a good reason to take a list of arbitrary IDs from the user. Usually they're selecting based on something else, like department, or salary, or something that these users have in common. Even if they want multiple sets of commonalities, you can work it into a complex query of some type.
And if this is really a list of every ID in the table, then the programmer needs to be shot. In the gut. So he can feel all the pain we had to feel from reading this code. [:D]
Admin
That is a GREAT point ... Probably 9 times of 10 that UNION is used, UNION ALL should be used instead.
Admin
Tell me that was a joke. PLEASE.
Admin
Good guess on the ratio;-) I've just counted on a project where I've recently checked all unions; it's 398 of 433 times.
Admin
One GUI pattern comes to my mind: a database grid, full of records; one column is a checkbox where the user can select rows he wants to process; and then there is a button labeled "select all" that marks all checkboxes. Implement that in a straight-forward-fashion and it looks like todays WTF.