The Daily WTF: Curious Perversions in Information Technology

Jeff S · 2005-11-10 Reply Admin

Anonymous:
"So, what is a good solution for a situation where you need to select rows from a table and there are a large number of IDs that need to be included in the IN clause and these IDs are not known until runtime?" Two words: bind variables. Another problem with this particular WTF is that the ids are hard-coded into the SQL text, instead of being bound to SQL placeholders of some sort. Virtually every modern RDBMS supports placeholders or bind variables, and for good reason. With placeholders (and appropriate client-side programming) the db engine only parses and optimizes the query once. Without bind variables, the smallest change to the parameters will force the db to reparse and re-optimize the query every time it's executed. At best, it's extra performance overhead that's easily avoided. At worst, it can drag a db server to its knees, as it attempts to cache many nearly identical query plans, eating up resources, forcing other queries out of the cache, etc. If someone tried crap like this on the dbs I'm responsible for, they'd find their modules locked out immediately.

Joel -- the issue is the HUNDREDS of parameters you'd need to accomodate the large list of values passed in. Are you are one of the people that think

SELECT ... FROM ... WHERE ID IN ( ? )

works when the parameter passed in is "1,2,3,4" ?

2005-11-10 Reply Admin

Otto:

There is never a good reason for this sort of code. Ever.

Even if you, god forbid, are taking a list of IDs from some user input in some sick twisted fashion, you would be better off loading those records into a temp table and joining that with the table you're selecting from. Because if the column you're joining with is indexed, it'll be a lot faster than this, and if it's not indexed, well, it sure as hell can't be any slower.

A data gathering company takes millions of "calls" per day for reporting purposes on another company. Each "call" is a record in the "calls" table. The end-user wants to see a report of a group 500 stores that are performing poorly for a certain fiscal period. These stores have nothing in common other than they are in the same company. That's it...

The current tools the end-user is provided with is to select each of the stores in a multiple select list. The next page displays the report.

Are you saying that it would be best to take those millions of records from the whole company - with somewhere near 100 fields per record and nearly 500 users using the same system at any given time - then manipulate them on the reporting page? Would that cause memory problems?

I just figured it would be best to do a Where StoreId IN (...) or Where StoreId = 'blah1' OR StoreId='blah2'... etc. Please school me.

JohnO · 2005-11-10 Reply Admin

rogthefrog:
JohnO:

In MS SQL, I prefer to use WHERE CHARINDEX('|' + CONVERT(varchar, ColumnName) + '|', @list) > 0 where @list is varchar(8000). You would pass @list with values like '|100|266|174|' or '|TX|AK|MO|'. This gets around the temp table issues and the in limitations.

Tell me that was a joke. PLEASE.

Nope, no joke. If you have the requirement that you only interface with the database via stored procedures, you need to provide multi-value filtering for many criteria, and you want to avoid the problems associated with creating a temp table in code that is externally referenced by a stored procedure, this works quite well. As I said above, you can unwind the list into a temp table if you have a selective index on the column in question. Quite often, you have queries where certain highly selective criteria are required but you offer optional filtering on many other less selective criteria so it's not unreasonable to use this technique even without the temp table.

Jeff S · 2005-11-10 Reply Admin

Anonymous:
Otto:

There is never a good reason for this sort of code. Ever.

Even if you, god forbid, are taking a list of IDs from some user input in some sick twisted fashion, you would be better off loading those records into a temp table and joining that with the table you're selecting from. Because if the column you're joining with is indexed, it'll be a lot faster than this, and if it's not indexed, well, it sure as hell can't be any slower.

A data gathering company takes millions of "calls" per day for reporting purposes on another company. Each "call" is a record in the "calls" table. The end-user wants to see a report of a group 500 stores that are performing poorly for a certain fiscal period. These stores have nothing in common other than they are in the same company. That's it...

The current tools the end-user is provided with is to select each of the stores in a multiple select list. The next page displays the report.

Are you saying that it would be best to take those millions of records from the whole company - with somewhere near 100 fields per record and nearly 500 users using the same system at any given time - then manipulate them on the reporting page? Would that cause memory problems?

I just figured it would be best to do a Where StoreId IN (...) or Where StoreId = 'blah1' OR StoreId='blah2'... etc. Please school me.

Do you really not get it?

How does the user pick these 500 stores? Is there a page displaying 10,000 stores, and one by one the user clicks on the 500 he wants from amongst those 10,000, and then after about 4 or 5 hours and he hits "OK" ?

Don't you think the criteria that the user used to pick that 500 stores and check them off one by one could be the same criteria that the report uses later on?

2005-11-10 Reply Admin

How about something like:

select top 500 store_data from store_table group by store_data order by sum(store_sales) asc

Please forgive bad syntax and lack of completeness but you get the idea. Notice you'd get the 500 stores with the lowest sales regardless of what the user thinks the stores are.

OneFactor · 2005-11-10 Reply Admin

Jeff S:

ammoQ:
As if that whole thing wasn't bad enough, using "UNION" instead of "UNION ALL" further slows it down.

That is a GREAT point ... Probably 9 times of 10 that UNION is used, UNION ALL should be used instead.

Except that this is one of the times you "should" use UNION, because there are duplicate IDs in the IN clauses. I like the GUI-with-select-all+dynamic-sql theory.

I've always felt that code which worked but whose only problem was "it could be faster" didn't qualify as a WTF. I mean we all write apps which could run faster but we don't tune it because it is fast enough and there are new features which need to be built or bugs which need to be fixed.

Today's post may violate that rule of mine. [8-)]

I'm surprised no one has mentioned a table variable. We use that for when a user clicks off boxes from a list of things to search and pass a comma delimited string into a query-sproc. The sproce parses out the ids, and creates a table variable and we join/where into that. Of course, we only need to handle 20 items coming in so we've been able to get by on an nvarchar(255).

Though table variables can only be indexed on unique or primary key columns which doesn't work in today's example.

Xepol · 2005-11-10 Reply Admin

Everyone say left join with me.... Even if you had to hard code the cust_num list because it was an arbitrary list, you could create an extra table for just that brain dead purpose...

<sigh> Sometimes, people make me SOOOO sad...

2005-11-10 Reply Admin

Jeff S:
Anonymous:
Otto:

There is never a good reason for this sort of code. Ever.

Even if you, god forbid, are taking a list of IDs from some user input in some sick twisted fashion, you would be better off loading those records into a temp table and joining that with the table you're selecting from. Because if the column you're joining with is indexed, it'll be a lot faster than this, and if it's not indexed, well, it sure as hell can't be any slower.

A data gathering company takes millions of "calls" per day for reporting purposes on another company. Each "call" is a record in the "calls" table. The end-user wants to see a report of a group 500 stores that are performing poorly for a certain fiscal period. These stores have nothing in common other than they are in the same company. That's it...

The current tools the end-user is provided with is to select each of the stores in a multiple select list. The next page displays the report.

Are you saying that it would be best to take those millions of records from the whole company - with somewhere near 100 fields per record and nearly 500 users using the same system at any given time - then manipulate them on the reporting page? Would that cause memory problems?

I just figured it would be best to do a Where StoreId IN (...) or Where StoreId = 'blah1' OR StoreId='blah2'... etc. Please school me.

Do you really not get it?

How does the user pick these 500 stores? Is there a page displaying 10,000 stores, and one by one the user clicks on the 500 he wants from amongst those 10,000, and then after about 4 or 5 hours and he hits "OK" ?

Don't you think the criteria that the user used to pick that 500 stores and check them off one by one could be the same criteria that the report uses later on?

Actually, yes, lol. That is the framework I was speaking of earlier when I said I didn't have time to redesign it (actually I didn't get any time at all). They started off as a small company and now have more phonelines than most fortune 500 companies... but the funniest part of it is that the people in charge don't think that developers have any special skills so they keep hiring people basically off the street to "improve" their product. I was asked by someone who worked there what I thought. I said that if you just need to get by you could do it this way... and so the story goes.

Anyway, listen, I've already submitted to your "superior" knowledge, so I was really looking for an honest answer, not for someone to make me feel stupid. My mother does a good enough job of that, thanks. I'm sorry that you didn't get to sit at the cool table in gradeschool, but that's no reason to take your teen angst out on me. I bet your one of those developers that tells someone a complex chain of commands and then yells at them after they get it wrong. *thumbs up* Way to go.

2005-11-10 Reply Admin

I used to work at a media company that used giant in-clauses almost exclusively. And what's worse, it was the best way to do it. Let me 'splain:

The database was mysql 3.x. Everything was table locked (no choice). We had to pull some number of records (10,000-100,000) from a 2 million record table. Then that data was used as criteria to pull several hundred thousand records from another table. With the amount of write trafficwe always had going to both of those table-lock tables and the join optimization (or lack thereof) in that version of mysql, it was a) faster and b) better for lock contention to use an in-clause than to join the two tables in the first place. Craziest thing was that mysql would let you do in-clauses of over 200,000 records and do it really fast. I guess it was their excuse for locking things up so horribly when you tried to do joins on large tables.

When we started using Sybase for some systems, that wouldn't fly anymore (2k character stack for all SQL) and joins were used instead. For data that used criteria from a mysql DB in the where clause of a Sybase database we did the temptable insert & join trick.

2005-11-10 Reply Admin

Anonymous:

How about something like:

select top 500 store_data from store_table group by store_data order by sum(store_sales) asc

Please forgive bad syntax and lack of completeness but you get the idea. Notice you'd get the 500 stores with the lowest sales regardless of what the user thinks the stores are.

Yeah, I thought that would get by as well. No such luck.

ammoQ · 2005-11-10 Reply Admin

OneFactor:
Jeff S:

ammoQ:
As if that whole thing wasn't bad enough, using "UNION" instead of "UNION ALL" further slows it down.

That is a GREAT point ... Probably 9 times of 10 that UNION is used, UNION ALL should be used instead.

Except that this is one of the times you "should" use UNION, because there are duplicate IDs in the IN clauses.

So what? Supposed you have excatly one record with ID=4711, how many rows does "SELECT * FROM mytable WHERE ID in (4711, 4711,4711)" return?

I've always felt that code which worked but whose only problem was "it could be faster" didn't qualify as a WTF. I mean we all write apps which could run faster but we don't tune it because it is fast enough and there are new features which need to be built or bugs which need to be fixed.

This code is not only slow, it's mostly unreadable. Definitely a WTF.

JohnO · 2005-11-10 Reply Admin

ammoQ:
OneFactor:
Jeff S:

ammoQ:
As if that whole thing wasn't bad enough, using "UNION" instead of "UNION ALL" further slows it down.

That is a GREAT point ... Probably 9 times of 10 that UNION is used, UNION ALL should be used instead.

Except that this is one of the times you "should" use UNION, because there are duplicate IDs in the IN clauses.

So what? Supposed you have excatly one record with ID=4711, how many rows does "SELECT * FROM mytable WHERE ID in (4711, 4711,4711)" return?

How many rows does "SELECT * FROM mytable WHERE ID in (4711) UNION ALL SELECT * FROM mytable WHERE ID in (4711)" return?

ammoQ · 2005-11-10 Reply Admin

JohnO:
ammoQ:
OneFactor:
Jeff S:

ammoQ:
As if that whole thing wasn't bad enough, using "UNION" instead of "UNION ALL" further slows it down.

That is a GREAT point ... Probably 9 times of 10 that UNION is used, UNION ALL should be used instead.

Except that this is one of the times you "should" use UNION, because there are duplicate IDs in the IN clauses.

So what? Supposed you have excatly one record with ID=4711, how many rows does "SELECT * FROM mytable WHERE ID in (4711, 4711,4711)" return?

How many rows does "SELECT * FROM mytable WHERE ID in (4711) UNION ALL SELECT * FROM mytable WHERE ID in (4711)" return?

That would be a different kind of beast, but as far as I can see, the duplicate numbers in todays WTF always appear in the same IN clause, and probably always will. It looks like each SELECT is for a certain range of keys.

pbounaix · 2005-11-10 Reply Admin

whats the emoticon for hitting a nail on the head?

BlackTigerX · 2005-11-10 Reply Admin

ah!!! so that's how you add more items to the IN clause... mmm...

now, back to work [8-|]

2005-11-10 Reply Admin

Assuming that the CUST_NUM list was created on a client, what kind of network traffic would be generated by sending an INSERT INTO #TempTable command for every value on the list to the DBMS? Thinking not of the overhead on the DBMS for inserting so much rows in a table, even if temporary. Maybe that leviathan of a code is still the faster (yet horrible) method to obtain what they wanted.

2005-11-10 Reply Admin

IMNSHO that's the real WTF here.... their results is likely to be subtly wrong on occasion....

2005-11-10 Reply Admin

Anonymous:
IMNSHO that's the real WTF here.... their results is likely to be subtly *wrong* on occasion....

Sorry that was meant to include the UNION vs UNION ALL stuff. (Oh, and if they have duplicate keys in their in list then that's another bug in the generation code IMO... fortunately, duplicates in one single IN clause does not matter, as has been pointed out. )

Actually as an aside: I've seen situations where such code has also been required: On AS/400 systems running old System/36 files under emulation, querying the files using joins can be *enormously* painful. The optimizer is not really capable of properly optimising the joins even if suitable "logicals"/indexes exist. Really really painful. As a result, it is several orders of magnitude quicker to first select a set of orders (say), then extract a list of related customer id's, and then query for customers for those particular id's directly, avoiding joins completely. (Except that, the AS/400 also has limits on the number of entries it allows in IN clauses and SQL statement length limits which neccesitates further jiggery pokery...)

2005-11-10 Reply Admin

Otto:

Even if you, god forbid, are taking a list of IDs from some user input in some sick twisted fashion, you would be better off loading those records into a temp table and joining that with the table you're selecting from. Because if the column you're joining with is indexed, it'll be a lot faster than this, and if it's not indexed, well, it sure as hell can't be any slower.

A couple of people have made this claim, and it is not obvious to me why a using a temporary table would be faster. I have two main reasons:

First, to go the temporary table route, you have to create the table and invoke 10,000 INSERT statements. That's a lot of db calls. I guess you could do it in a stored procedure to reduce db calls, but how do you pass a variable number of arguments?

Second, if that technique is really faster, wouldn't the programmers working for Oracle, etc. know this? Why wouldn't implement the db to do this under the hood? Wouldn't that be the best approach; for the db vendors to implement the optimal solution for SQL constructs saving the customers from constantly having to imnplement it themselves in non-straightforward methods? Perhaps I am being naive.

I'm no db guru and I grant that it is very possible I am wrong, but I would like to be convinced.

Jeff S · 2005-11-10 Reply Admin

Anonymous:

Anyway, listen, I've already submitted to your "superior" knowledge, so I was really looking for an honest answer, not for someone to make me feel stupid. My mother does a good enough job of that, thanks. I'm sorry that you didn't get to sit at the cool table in gradeschool, but that's no reason to take your teen angst out on me. I bet your one of those developers that tells someone a complex chain of commands and then yells at them after they get it wrong. *thumbs up* Way to go.

When you wrote "Please school me" at the end of your post, that kind of implied to me that were ready to take some criticism. And that *was* an honest answer. I'm sorry I made you feel stupid. (Unfortunately, you were also a bit wrong about my personality on both of your assumptions; I suppose this just hasn't been your day overall!) [:(]

Rank Amateur · 2005-11-10 Reply Admin

Pope:
Jeff S:
Pope:
Rank Amateur:

You see, this is why us crack C++ developers are smarter than anyone else. Only an idiot would work with something as brain-dead SQL. I mean, didn't it occur to the geniuses that invented SQL that sometimes you want to just pull all records from the table without any criteria? I mean is it just too blindingly obvious that SQL should let you just leave out the Where block to accomplish this? But, no, I get stuck with this so-called "language" and have to spend all day dividing a huge In list into manageable bites. And now everytime they add a customer, I have to change this code to add a new Cust_Num. What a WTF.

Well, at least the second step is sensible. That's where I use C++ to copy the query result into a link list so I can loop through and sum the total orders, which is all they really want anyway.

If SQL were a real language, it would have a Sum function.

--Rank

Excellent rant, except for one thing: you've obviously never used SQL before.

This WTF make me shudder... mainly because I actually programmed something like that in one of my apps directly after I learned about information_schema waaaaaay back. "It auto-magically pulls everything from every table guys! Look!" I was really after a way to abuse every resource available.

I think he was being sarcastic .... read it again.

Caught again! Damn my autism!

Heh, well, history of this site shows you're in good company. If it's any consolation, you're right: I've nevered really used SQL. Everything I know about SQL would easily fit on a 8.5x11 piece of paper. It's just that the third thing on that paper would be "omit the WHERE clause to select all rows" (the crack about the SUM function was just based on a hunch --I had to Google to see that it really existed).

So remember kids, if the Rank Amateur sees the WTF, it's a real WTF!

--RA, who's hardly a C++ wiz either.

2005-11-10 Reply Admin

"Joel -- the issue is the HUNDREDS of parameters you'd need to accomodate the large list of values passed in. Are you are one of the people that think

SELECT ... FROM ... WHERE ID IN ( ? )"

Absolutely not. But, if you know that it's going to be desirable to run that select with many different values in the IN-clause, then you're a lot better off on deciding what a reasonable number of values is, writing something like:

SELECT ... FROM ... WHERE ID IN (?, ?, ?, ..., ?)

... batching the values to fit the number of bind variables and using a place-holder value or NULL if you don't have enough values to bind, than [deep breathe] hard-coding values into the statement.

Now, I'm not saying that's the only way to approach the problem. E.g., using a temp table, or a join with an intelligent restriction on another table might be a better way to go, depending on the circumstances.

But using a hard-coded list of values instead of using bind variables is very rarely the right thing to do.

2005-11-10 Reply Admin

SELECT ... FROM ... WHERE ID IN (?, ?, ?, ..., ?)

That's just dumb. I prefer to use "SELECT x FROM x WHERE x IN(*)", then:
- Build a parameter string outside the DB structure (I used listboxes to get everything, and a for loop to actually construct this)
- Split the SQL by the character '*'
- Insert the parameter string at where the character '*' used to be.

There is SQL... and then there is too much SQL when you can easily make the native language do most of the work for you. >_>

JohnO · 2005-11-10 Reply Admin

Anonymous:

SELECT ... FROM ... WHERE ID IN (?, ?, ?, ..., ?)

That's just dumb. I prefer to use "SELECT x FROM x WHERE x IN(*)", then:
- Build a parameter string outside the DB structure (I used listboxes to get everything, and a for loop to actually construct this)
- Split the SQL by the character '*'
- Insert the parameter string at where the character '*' used to be.

There is SQL... and then there is too much SQL when you can easily make the native language do most of the work for you. >_>

His point was that you won't get a cached query plan using your technique. That may or may not be that relevant depending on the frequency with which the statement gets called and how complex the statement his.

d4ddyo · 2005-11-10 Reply Admin

Won't the UNIONs allow the optimizer to split the query into multiple threads? Perhaps this was an attempt to get this beast of a query to run a bit more quickly.

2005-11-10 Reply Admin

"That's just dumb. I prefer to use "SELECT x FROM x WHERE x IN(*)", then:

Build a parameter string outside the DB structure (I used listboxes to get everything, and a for loop to actually construct this)
Split the SQL by the character '*'
Insert the parameter string at where the character '*' used to be."

Yes, and I prefer to shut users who do this out of the database.

Seriously: while that may be okay for one-off ad-hoc-ie kind of queries, it is a crappy way to work with a production database of any significance. If you don't understand why, then I suggest you read up on how (say) the query plan engine in Oracle parses, optimizes and caches query plans. The db is trying to help you, but when you do what you're suggesting -- hard-coding frequently-changing parameters into the SQL string -- you're subverting the database and ultimately hurting its performance. You're forcing it reparse and reoptimize the query every time you run it. That's not free.

Now, maybe you're only running a few hundred queries a day, in which case it's no big deal. We run tens if not hundreds of millions. You don't have to get anywhere near that scale to see performance benefits when you use bind variables. When you're at that scale, not using bind variables will kill your database.

BJReplay · 2005-11-10 Reply Admin

Anonymous:
Without bind variables, the smallest change to the parameters will force the db to reparse and re-optimize the query every time it's executed. At best, it's extra performance overhead that's easily avoided. At worst, it can drag a db server to its knees, as it attempts to cache many nearly identical query plans, eating up resources, forcing other queries out of the cache, etc. If someone tried crap like this on the dbs I'm responsible for, they'd find their modules locked out immediately.

Not So - from SQL Server Books Online:

If an SQL statement is executed without parameters, SQL Server 2000 parameterizes the statement internally to increase the possibility of matching it against an existing execution plan.

Consider this statement:

SELECT * FROM Northwind.dbo.Products WHERE CategoryID = 1

The value 1 at the end of the statement can be specified as a parameter. The relational engine builds the execution plan for this batch as if a parameter had been specified in place of the value 1. Because of this auto-parameterization, SQL Server 2000 recognizes that the following two statements generate essentially the same execution plan and reuses the first plan for the second statement:

SELECT * FROM Northwind.dbo.Products WHERE CategoryID = 1
SELECT * FROM Northwind.dbo.Products WHERE CategoryID = 4

M$ decided they'd better do that when Oracle started doing it.

Which is more efficient - ORs or In()

They optimise to the same query plan.

2005-11-10 Reply Admin

Jeff S:
Anonymous:

Anyway, listen, I've already submitted to your "superior" knowledge, so I was really looking for an honest answer, not for someone to make me feel stupid. My mother does a good enough job of that, thanks. I'm sorry that you didn't get to sit at the cool table in gradeschool, but that's no reason to take your teen angst out on me. I bet your one of those developers that tells someone a complex chain of commands and then yells at them after they get it wrong. *thumbs up* Way to go.

When you wrote "Please school me" at the end of your post, that kind of implied to me that were ready to take some criticism. And that *was* an honest answer. I'm sorry I made you feel stupid. (Unfortunately, you were also a bit wrong about my personality on both of your assumptions; I suppose this just hasn't been your day overall!) [:(]

Bah, I've gone and made myself feel bad. Sorry, for being.. well, me. I can take criticism... I was just having fun. Ever heard of projection? lol. Yeah... *sniff*

Anyway, to get back to your other post, and my original question: Yes, their system is really set up to where they would spend 5 hours picking through some 10,000 stores, build a report and save the report. Hurray for homegrown help! I talked to the guy again today and he said that even though the reports are saved, they still have to loop through all the stores and set up the whole query when the reports are built. There's just no way around it right now since they acquiesce to every single little project a client asks for and have NO time for updating old algorithms (3 web developers and over 100 clients, that take a remarkable amount of "calls" every day - seriously, I think the owner has a phobia for succeeding). Compiling scores, even nightly, isn't an option now with old clients because their used to up to the minute data. AAAAANND... to make it worse, ONE, count 'em, ONE damn server carries the whole load. This system deserves a standing ovation for still being able to process anything. But I digress...

I was wondering, out of sheer curiosity, would it be faster for these poor guys to create a query with a ton of IN parameters (StoreId IN('1', '2',...)), or a ton of OR StoreId = '1' OR StoreId = '2'.... Which may not have any value in the "real world," but I'm still curious which is less of a strain on SQL.

2005-11-10 Reply Admin

BJReplay:

Which is more efficient - ORs or In()

They optimise to the same query plan.

Thank you.

2005-11-10 Reply Admin

"Not So - from SQL Server Books Online:

If an SQL statement is executed without parameters, SQL Server 2000 parameterizes the statement internally to increase the possibility of matching it against an existing execution plan."

"M$ decided they'd better do that when Oracle started doing it."

Really? Do you know the Oracle version/patch, and if there are any required configuration settings to enable it?

This certainly isn't the behavior I've seen.

codeSuidae · 2005-11-10 Reply Admin

There is a tool for Borland Delphi called Bold (now called MDA since Borland bought it and integrated it) that does this. It is an object persistance tool that loads all the object the user works with into memory where the objects can then be manipulated with a query language called OCL. While OCL is quite powerful and can sometimes be mapped directly to SQL, it often uses functions that must be evaluated in memory. The result is that if one is not careful, an OCL query can load the contents of entire tables into objects to parse them. When it needs to retrieve a list of objects it frequently uses huge 'IN' queries.

Its a cool tool, but it does make it easy to shoot oneself in the foot.

olddog · 2005-11-10 Reply Admin

This clearly has the potential of over-working the sql. Oddly, at the same time it can be highly efficient. Based on the number of conditions, - and the speed of the comparison. In this example the comparison is 6 bytes of text. with a tempting pattern of 2 numeric bytes followed by a char byte followed by a 3 bytes numeric. This would demonstrate a pattern..

OMG! it's dynamic SQL block cipher.

WTFer · 2005-11-11 Reply Admin

Rank Amateur:

Heh, well, history of this site shows you're in good company. If it's any consolation, you're right: I've nevered really used SQL. Everything I know about SQL would easily fit on a 8.5x11 piece of paper. It's just that the third thing on that paper would be "omit the WHERE clause to select all rows" (the crack about the SUM function was just based on a hunch --I had to Google to see that it really existed).

So remember kids, if the Rank Amateur sees the WTF, it's a real WTF!

--RA, who's hardly a C++ wiz either.

I hereby propose de Rank Amateur Line, if Rank Amateur would not do that WTF in SQL then it is no real WTF after all.

Xepol · 2005-11-11 Reply Admin

Anonymous:

First, to go the temporary table route, you have to create the table and invoke 10,000 INSERT statements. That's a lot of db calls. I guess you could do it in a stored procedure to reduce db calls, but how do you pass a variable number of arguments?

Actually, I have a fairly complex report for 1 program, which does processing best left in a stored procedure first. Rather than make 1000 stored procedures, instead, I use a client side scripting language to build the stored procedure, and THAT is what I send to the server (drop/create procedure) - each user has their own stored procedure. I imagine a similar process would work here. 10 or 50 stored procedure updates containing multiple inserts and a call to the stored procedure between each block would greatly reduce the client server chatter, allow you do the inserts server side. It could definitely work. Heck, depending on your needs and server limits, perhaps even just the 1 stored procedure.

Anonymous:

Second, if that technique is really faster, wouldn't the programmers working for Oracle, etc. know this? Why wouldn't implement the db to do this under the hood? Wouldn't that be the best approach; for the db vendors to implement the optimal solution for SQL constructs saving the customers from constantly having to imnplement it themselves in non-straightforward methods? Perhaps I am being naive.

DB designers program for the generic situation, not specialized cases. And Oracle has a bad enough reputation already that you have to wonder if their developers are the top of their industry.

Regardless, 10000 inserts would perform siginificantly beter than this mess. Want to be convinced? Easy, try seeing how well the IN ( set ) opertator actually works, and esp. how well the union operator works. Usually, those are the 2 worst implemented calls. Besides which, each SELECT statement in the union will require rescanning the table each time, again, usually NOT implemented efficently.

This mess is CLEARLY written by someone who has had less than zero training in SQL and just used what they could figure out from crappy documents and bad examples (actually, it resembles some of the amateur code I have seen for mysql and php, but you at least understand why that is badly written)... Attempts at join likely resulted in cardinal joins causing the guy to just give up and use something that worked instead of something that killed the db server every time he tried it.

2005-11-11 Reply Admin

Anonymous:
"Not So - from SQL Server Books Online:
If an SQL statement is executed without parameters, SQL Server 2000 parameterizes the statement internally to increase the possibility of matching it against an existing execution plan."

"M$ decided they'd better do that when Oracle started doing it."

Really? Do you know the Oracle version/patch, and if there are any required configuration settings to enable it?

This certainly isn't the behavior I've seen.

Oracle supports this feature, alter session/system set cursor _sharing = force

Its disabled by default, as its a crutch for shit software

see http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:992630543630 for more info

2005-11-11 Reply Admin

Rank Amateur:

You see, this is why us crack C++ developers are smarter than anyone else. Only an idiot would work with something as brain-dead SQL.

...

If SQL were a real language, it would have a Sum function.

Good joke, man. Ever had *any* exposure to SQL or still caught in the maze of pointers?

ammoQ · 2005-11-11 Reply Admin

Anonymous:
"Not So - from SQL Server Books Online: If an SQL statement is executed without parameters, SQL Server 2000 parameterizes the statement internally to increase the possibility of matching it against an existing execution plan." "M$ decided they'd better do that when Oracle started doing it." Really? Do you know the Oracle version/patch, and if there are any required configuration settings to enable it? This certainly isn't the behavior I've seen.

The feature is called "cursor_sharing", IIRC it was introduced in Oracle8i EE.

2005-11-11 Reply Admin

Otto:
There is never a good reason for this sort of code. Ever.

Even if you, god forbid, are taking a list of IDs from some user input in some sick twisted fashion, you would be better off loading those records into a temp table and joining that with the table you're selecting from. Because if the column you're joining with is indexed, it'll be a lot faster than this, and if it's not indexed, well, it sure as hell can't be any slower.

Unless there are so many values that a hash join will be effective, you're crazy. I would prolly put that at the 10,000 or so mark.

Otherwise it'll be an ordinary join, so you're proposing either a lot of insert statements (v bad) or a bulk insert followed by a table read to get the values back followed by the same indexed query you would have had in the first place!

Justin.

brazzy · 2005-11-11 Reply Admin

Anonymous:
Anyway, to get back to your other post, and my original question: Yes, their system is really set up to where they would spend 5 hours picking through some 10,000 stores, build a report and save the report.

But by what criteria are the sores selected from that list? If it's not done on complete whim, it MUST be possible to store those criteria in the DB (if they not already are) and use them in the query.

It's a business process WTF. It's just fundamentally wrong to pay someone to sit down 5 hours and pick 500 entries from a list of 10,000, if not because you pay 5 hours of salary and wait 5 hours for something that a computer can do in fractions of a second, then because a human will almost certainly make mistakes in such a boring task.

If I ever saw a good reason to call in a $2000-a-day consultant to do business analysis, this is it. The sad thing is that any half-competent developer could tell them the same for free, but management won't believe it unless they've paid $$$ to a consultant.

ammoQ · 2005-11-11 Reply Admin

brazzy:
Anonymous:
Anyway, to get back to your other post, and my original question: Yes, their system is really set up to where they would spend 5 hours picking through some 10,000 stores, build a report and save the report.

But by what criteria are the sores selected from that list? If it's not done on complete whim, it MUST be possible to store those criteria in the DB (if they not already are) and use them in the query.

It's a business process WTF. It's just fundamentally wrong to pay someone to sit down 5 hours and pick 500 entries from a list of 10,000, if not because you pay 5 hours of salary and wait 5 hours for something that a computer can do in fractions of a second, then because a human will almost certainly make mistakes in such a boring task.

If I ever saw a good reason to call in a $2000-a-day consultant to do business analysis, this is it. The sad thing is that any half-competent developer could tell them the same for free, but management won't believe it unless they've paid $$$ to a consultant.

It's perfectly possible that a manager has a jealously guarded Excel file which contains the 500 magic entries, as well as some well-though formulas how the select them; and it would be more than difficult for the programmer to convince the manager to tell him how the selection is done, since these formulas are a strategic advantage for the rivalry within the company.

2005-11-11 Reply Admin

JohnO:
rogthefrog:
JohnO:

In MS SQL, I prefer to use WHERE CHARINDEX('|' + CONVERT(varchar, ColumnName) + '|', @list) > 0 where @list is varchar(8000). You would pass @list with values like '|100|266|174|' or '|TX|AK|MO|'. This gets around the temp table issues and the in limitations.

Tell me that was a joke. PLEASE.

Nope, no joke. If you have the requirement that you only interface with the database via stored procedures, you need to provide multi-value filtering for many criteria, and you want to avoid the problems associated with creating a temp table in code that is externally referenced by a stored procedure, this works quite well.

Argggh! That is a horrible (and terribly inefficient) way of doing this. Build an API that takes an array of values! (Possibly MS SQL doesn't allow this? I'm an Oracle guy).

This technique guarantees a totally unnecessary full table scan. I wouldn't use it on any table with more than a few hundred, maybe a thousand, rows.

2005-11-11 Reply Admin

Anonymous:
Otto:

Even if you, god forbid, are taking a list of IDs from some user input in some sick twisted fashion, you would be better off loading those records into a temp table and joining that with the table you're selecting from. Because if the column you're joining with is indexed, it'll be a lot faster than this, and if it's not indexed, well, it sure as hell can't be any slower.

A couple of people have made this claim, and it is not obvious to me why a using a temporary table would be faster. I have two main reasons:

First, to go the temporary table route, you have to create the table and invoke 10,000 INSERT statements. That's a lot of db calls. I guess you could do it in a stored procedure to reduce db calls, but how do you pass a variable number of arguments?

Second, if that technique is really faster, wouldn't the programmers working for Oracle, etc. know this? Why wouldn't implement the db to do this under the hood? Wouldn't that be the best approach; for the db vendors to implement the optimal solution for SQL constructs saving the customers from constantly having to imnplement it themselves in non-straightforward methods? Perhaps I am being naive.

I'm no db guru and I grant that it is very possible I am wrong, but I would like to be convinced.

Because table joins are one thing you can be sure it's optimized to the max on any database. Some databases have a limit on a in list length far far lower then 10000. Some databases will quit using indexes on long in lists. In lists will trash your query plan cache. Some databases will internaly create temp tables for in lists. So - using a in list you face many unkowns and limits while using temp tables there are no unknowns and no limits.

RevMike · 2005-11-11 Reply Admin

SELECT ORDR_NUM, ORDR_TTL, TAX_CD
FROM ORDR
WHERE CUST_NUM IN (
    SELECT '110019' FROM DUAL
    UNION ALL
    SELECT '110015' FROM DUAL
    UNION ALL
    SELECT '110013' FROM DUAL
    UNION ALL
    SELECT '110011' FROM DUAL
    UNION ALL
    SELECT '110017' FROM DUAL
    UNION ALL
    SELECT '110015' FROM DUAL
    UNION ALL
    SELECT '110013' FROM DUAL
    UNION ALL
    SELECT '110018' FROM DUAL
    UNION ALL
    SELECT '110016' FROM DUAL
    UNION ALL
    SELECT '110014' FROM DUAL
    UNION ALL
    SELECT '110012' FROM DUAL
    UNION ALL
    SELECT '120015' FROM DUAL
    UNION ALL
    SELECT '120016' FROM DUAL
    UNION ALL
    SELECT '150019' FROM DUAL
    UNION ALL
    SELECT '150010' FROM DUAL
    UNION ALL
    SELECT '150012' FROM DUAL
    UNION ALL
    SELECT '150018' FROM DUAL
    UNION ALL
    SELECT '150016' FROM DUAL
    UNION ALL
    SELECT '150017' FROM DUAL
    UNION ALL
    SELECT '150011' FROM DUAL
    UNION ALL
    SELECT '150017' FROM DUAL
    UNION ALL
    SELECT '150015' FROM DUAL
    UNION ALL
    SELECT '210017' FROM DUAL
    UNION ALL
    SELECT '210013' FROM DUAL
    UNION ALL
    SELECT '210011' FROM DUAL
    UNION ALL
    SELECT '21V000' FROM DUAL
    UNION ALL
    SELECT '22K018' FROM DUAL
    UNION ALL
    SELECT '22W010' FROM DUAL
    UNION ALL
    SELECT '22W018' FROM DUAL
    UNION ALL
    SELECT '230057' FROM DUAL
    UNION ALL
    SELECT '230237' FROM DUAL
    UNION ALL
    SELECT '230502' FROM DUAL
    UNION ALL
    SELECT '230311' FROM DUAL
    UNION ALL
    SELECT '230314' FROM DUAL
    UNION ALL
    SELECT '230318' FROM DUAL
    UNION ALL
    SELECT '230357' FROM DUAL
    UNION ALL
    SELECT '41K020' FROM DUAL
    UNION ALL
    SELECT '41K016' FROM DUAL
    UNION ALL
    SELECT '41K018' FROM DUAL
    UNION ALL
    SELECT '41K011' FROM DUAL
    UNION ALL
    SELECT '41K012' FROM DUAL
    UNION ALL
    SELECT '41K018' FROM DUAL
    UNION ALL
    SELECT '41K019' FROM DUAL
    UNION ALL
    SELECT '41K015' FROM DUAL
    UNION ALL
    SELECT '41K015' FROM DUAL
    UNION ALL
    SELECT '41K018' FROM DUAL
    UNION ALL
    SELECT '41K016' FROM DUAL
    UNION ALL
    SELECT '41K014' FROM DUAL
    UNION ALL
    SELECT '41K012' FROM DUAL
    UNION ALL
    SELECT '41M012' FROM DUAL
    UNION ALL
    SELECT '41M014' FROM DUAL
    UNION ALL
    SELECT '41M013' FROM DUAL
    UNION ALL
    SELECT '41M018' FROM DUAL
    UNION ALL
    SELECT '41M013' FROM DUAL
    UNION ALL
    SELECT '41M015' FROM DUAL
    UNION ALL
    SELECT '41N014' FROM DUAL
    UNION ALL
    SELECT '41S006' FROM DUAL
    UNION ALL
    SELECT '41T003' FROM DUAL
    UNION ALL
    SELECT '41T001' FROM DUAL
    UNION ALL
    SELECT '41T001' FROM DUAL
    UNION ALL
    SELECT '41T012' FROM DUAL
    UNION ALL
    SELECT '41T010' FROM DUAL
    UNION ALL
    SELECT '41T018' FROM DUAL
    UNION ALL
    SELECT '41T015' FROM DUAL
    UNION ALL
    SELECT '41V003' FROM DUAL
    UNION ALL
    SELECT '41V002' FROM DUAL
    UNION ALL
    SELECT '610016' FROM DUAL
    UNION ALL
    SELECT '610012' FROM DUAL
    UNION ALL
    SELECT '610010' FROM DUAL
    UNION ALL
    SELECT '610013' FROM DUAL
    UNION ALL
    SELECT '610024' FROM DUAL
    UNION ALL
    SELECT '610032' FROM DUAL
    UNION ALL
    SELECT '610040' FROM DUAL
    UNION ALL
    SELECT '610019' FROM DUAL
    UNION ALL
    SELECT '610010' FROM DUAL
    UNION ALL
    SELECT '610017' FROM DUAL
    UNION ALL
    SELECT '610010' FROM DUAL
    UNION ALL
    SELECT '610019' FROM DUAL
    UNION ALL
    SELECT '610012' FROM DUAL
    UNION ALL
    SELECT '610010' FROM DUAL
    UNION ALL
    SELECT '610018' FROM DUAL
    UNION ALL
    SELECT '610018' FROM DUAL
    UNION ALL
    SELECT '610016' FROM DUAL
    UNION ALL
    SELECT '610017' FROM DUAL
    UNION ALL
    SELECT '611011' FROM DUAL
    UNION ALL
    SELECT '611012' FROM DUAL
    UNION ALL
    SELECT '612011' FROM DUAL
    UNION ALL
    SELECT '612019' FROM DUAL
    UNION ALL
    SELECT '68E015' FROM DUAL
    UNION ALL
    SELECT '68K000' FROM DUAL
    UNION ALL
    SELECT '68K012' FROM DUAL
    UNION ALL
    SELECT '68K019' FROM DUAL
    UNION ALL
    SELECT '68K016' FROM DUAL
    UNION ALL
    SELECT '68K012' FROM DUAL
    UNION ALL
    SELECT '68K018' FROM DUAL
    UNION ALL
    SELECT '68K014' FROM DUAL
    UNION ALL
    SELECT '68K012' FROM DUAL
    UNION ALL
    SELECT '68K010' FROM DUAL
    UNION ALL
    SELECT '68K019' FROM DUAL
    UNION ALL
    SELECT '68N019' FROM DUAL
    UNION ALL
    SELECT '68T010' FROM DUAL
    UNION ALL
    SELECT '68U015' FROM DUAL
    UNION ALL
    SELECT '68V021' FROM DUAL
    UNION ALL
    SELECT '68V039' FROM DUAL
    UNION ALL
    SELECT '68V047' FROM DUAL
    UNION ALL
    SELECT '68V054' FROM DUAL
    UNION ALL
    SELECT '68V062' FROM DUAL
    UNION ALL
    SELECT '68V070' FROM DUAL
    UNION ALL
    SELECT '68V088' FROM DUAL
    UNION ALL
    SELECT '68V096' FROM DUAL
    UNION ALL
    SELECT '68V104' FROM DUAL
    UNION ALL
    SELECT '68V112' FROM DUAL
    UNION ALL
    SELECT '68V120' FROM DUAL
    UNION ALL
    SELECT '68V138' FROM DUAL
    UNION ALL
    SELECT '68V146' FROM DUAL
    UNION ALL
    SELECT '68V153' FROM DUAL
    UNION ALL
    SELECT '68V017' FROM DUAL
    UNION ALL
    SELECT '68V018' FROM DUAL
    UNION ALL
    SELECT '68V011' FROM DUAL
    UNION ALL
    SELECT '68V016' FROM DUAL
    UNION ALL
    SELECT '920001' FROM DUAL
    UNION ALL
    SELECT '920002' FROM DUAL
    UNION ALL
    SELECT '920003' FROM DUAL
    UNION ALL
    SELECT '920000' FROM DUAL
    UNION ALL
    SELECT '920008' FROM DUAL
    UNION ALL
    SELECT '920007' FROM DUAL
    UNION ALL
    SELECT '920015' FROM DUAL
    UNION ALL
    SELECT '920010' FROM DUAL
    UNION ALL
    SELECT '920012' FROM DUAL
    UNION ALL
    SELECT '920018' FROM DUAL
    UNION ALL
    SELECT '920013' FROM DUAL
    UNION ALL
    SELECT '920019' FROM DUAL
    UNION ALL
    SELECT '920019' FROM DUAL
    UNION ALL
    SELECT '920012' FROM DUAL
    UNION ALL
    SELECT '920013' FROM DUAL
    UNION ALL
    SELECT '920012' FROM DUAL
      );

Snicker Snicker

vhokstad · 2005-11-11 Reply Admin

ammoQ:

It's perfectly possible that a manager has a jealously guarded Excel file which contains the 500 magic entries, as well as some well-though formulas how the select them; and it would be more than difficult for the programmer to convince the manager to tell him how the selection is done, since these formulas are a strategic advantage for the rivalry within the company.

In which case it's seriously time to do some analysis or even break out your books on machine learning and try a Bayesian classifier, neural network or genetic programming to figure out the criteria and make sure his manager or a rival manager of his overhears you talk about how you've "automated that simple little report" that this guy claims in his status reports takes up 50% of his time... :)

Vidar

ammoQ · 2005-11-11 Reply Admin

vhokstad:
ammoQ:

It's perfectly possible that a manager has a jealously guarded Excel file which contains the 500 magic entries, as well as some well-though formulas how the select them; and it would be more than difficult for the programmer to convince the manager to tell him how the selection is done, since these formulas are a strategic advantage for the rivalry within the company.

In which case it's seriously time to do some analysis or even break out your books on machine learning and try a Bayesian classifier, neural network or genetic programming to figure out the criteria and make sure his manager or a rival manager of his overhears you talk about how you've "automated that simple little report" that this guy claims in his status reports takes up 50% of his time... :)

Vidar

You ignore the fact that at least 10% of these 500 entries are determined by black magic rituals, ancient lore, voodoo, sudden inspirations or TheWordOfGod(tm). Those formulas only determine candidates for the list of 500, but the manager insists on doing the selection by himself.

ammoQ · 2005-11-11 Reply Admin

RevMike:
SELECT ORDR_NUM, ORDR_TTL, TAX_CD
FROM ORDR
WHERE CUST_NUM IN (
    SELECT '110019' FROM DUAL
    UNION ALL
    SELECT '110015' FROM DUAL
    UNION ALL
snip
    UNION ALL
    SELECT '110013' FROM DUAL

    UNION ALL
    SELECT '920012' FROM DUAL
      );

Snicker Snicker

How I wish I could laugh about that.

brazzy · 2005-11-11 Reply Admin

ammoQ:
It's perfectly possible that a manager has a jealously guarded Excel file which contains the 500 magic entries, as well as some well-though formulas how the select them; and it would be more than difficult for the programmer to convince the manager to tell him how the selection is done, since these formulas are a strategic advantage for the rivalry within the company.

Which makes the hiring of the $2000-a-day consultant from a big name consulting company, whom the upper management will believe that this is a bad thing, the strategic contermeasure by that guy's rivals.

I think we are in agreement.

ammoQ · 2005-11-11 Reply Admin

brazzy:
ammoQ:
It's perfectly possible that a manager has a jealously guarded Excel file which contains the 500 magic entries, as well as some well-though formulas how the select them; and it would be more than difficult for the programmer to convince the manager to tell him how the selection is done, since these formulas are a strategic advantage for the rivalry within the company.

Which makes the hiring of the $2000-a-day consultant from a big name consulting company, whom the upper management will believe that this is a bad thing, the strategic contermeasure by that guy's rivals.

Managers in the middle tier know they can be more easily replaced or removed if their "knowledge" (or voodoo magic) is generally available. Anyway, the rival managers won't do anything about that, because they most likely have their magic Excel files, too.

Rank Amateur · 2005-11-11 Reply Admin

Frank:
Rank Amateur:

You see, this is why us crack C++ developers are smarter than anyone else. Only an idiot would work with something as brain-dead SQL.

...

If SQL were a real language, it would have a Sum function.

Good joke, man. Ever had *any* exposure to SQL or still caught in the maze of pointers?

I'm a SQL expert. Know it inside and out. I've must have seen "The Empire Strikes Back" 16 times alone.

--Rank

2005-11-11 Reply Admin

That's fine too. Just instruct him that he needs to make a hard copy and get out his trusty highlighter. Some things just shouldn't be automated.

Everything ... and I mean Everything

Leave a comment on “Everything ... and I mean Everything”