The Daily WTF: Curious Perversions in Information Technology

phelyan · 2005-07-28 Reply Admin

And in addition, has anyone ever checked how many of those comments come from registered users and how many are anonymous?

2005-07-28 Reply Admin

SteveM:
Has anyone done any analysis on the average time it takes a DailyWTF thread to move from 'this particular piece of code is a WTF' to 'Microsoft is evil?'

Or perhaps how long it takes for a thread to move into a language war?

johnl · 2005-07-28 Reply Admin

There was one not so long ago written in Perl. There have been plenty in C++ (I don't remember whether it was VC++ or otherwise). SQL isn't strictly an MS product, they just make one of the DB engines (that is to say, I could do this exact same WTF in DB2 or Oracle or MySQL if I wanted to).

So, rather than "shoot the tool maker", it's a case of "randomly shoot someone who may or may not have had anything to do with making the tool" or possibly "just shoot MS, even if they're completely innocent".

felix · 2005-07-28 Reply Admin

I think it's been said before on the forum, but I'll stress it, just in case: M$ development tools make it very easy for clueless people to become DBAs, programmers and so on. Even when the tools themselves are ok, those using them don't do it right, because they lack a minimum of formal training. And that's because with those tools, they can afford it.

Now, honestly, the Redmond guys aren't the only ones making programming too easy. Think Javascript. But that was an acident. It's promoting mediocrity on purpose I criticise.

Oh, and I think the last posts belong to another thread, in another section.

askeeto · 2005-07-28 Reply Admin

Performance of a query is 100% unrelated to the nr of lines in a query. I can write 2 line queries that have horrible performance. And I can write 2000 line queries that produce results in a matter of miliseconds.

Also, using "in ()" or "inner join" to filter out data performs the same in any modern database because the query governor will work out the same query plan for both constructions.

This query is not slow due to the way it is written, although the construction is horrible.

DrConcerned · 2005-07-28 Reply Admin

mizhi:
Anonymous:

I beg to differ. I see much WTF-ery in Java, Perl and tools like Ant pretty regularly. Programming languages don't kill applications...$hitty programmers kill applications. Maybe I'll port "theDaily" to Java.

Put another way, any language that is turing complete can be used to create a wtf. [:)]

With some of the SQL "code" I've seen recently at this site, I guess the language doesn't even have to be Turing complete for people to make an absolute mess of it :-)

felix · 2005-07-28 Reply Admin

askeeto:

Performance of a query is 100% unrelated to the nr of lines in a query.

Oh? Has it occured to you that a longer query is inherently more complex? And more error-prone because of sheer code size? Think again, Mr. Expert :-)

askeeto:

Also, using "in ()" or "inner join" to filter out data performs the same in any modern database because the query governor will work out the same query plan for both constructions.

Let me guess: you have studied the output of EXPLAIN on all modern database servers :D

askeeto:

This query is not slow due to the way it is written, although the construction is horrible.

You could be right in this particular case. But I wouldn't generalize.

2005-07-28 Reply Admin

I think it's pretty clear that the original developer was paid per line of code. Heh. They multiplied their pay for that query!

/fears days when I find contributions for The Daily WTF

2005-07-28 Reply Admin

Eh?

I think you do not understand that just because you create a view from your sql statement it will not perform any faster. It is the same SQL query, you just access it more conveniently through a view name... a messed up sql select will perform equally poorly when you create a view from it.

What I really find amusing, as long as I don't have to clean it up, is when people create a bunch of views, and then start joining them together... 'But it looks so simple, all I'm doing is joining these three views together, why is it taking such a long time to execute?'... Priceless.

2005-07-28 Reply Admin

Alex Papadimoulis:
Or, most likely, it was SQL Server, which everyone knows is a really slow database.

I don't know if you were being sarcastic here or not, but I didn't realize "everyone knows" that SQL Server is slow.

In fact, in my experience, it is comparably fast compared to other fine databases such as MySQL, Oracle, and AS/400, and also has a ton more features (Views, Stored Procedures, DTS).

I have a feeling this is either sarcasm that I'm missing, or more senseless MS bashing.

2005-07-28 Reply Admin

Alex Papadimoulis:
Or, most likely, it was SQL Server, which everyone knows is a really slow database.

I don't know if you were being sarcastic here or not, but I didn't realize "everyone knows" that SQL Server is slow.

In fact, in my experience, it is comparably fast compared to other fine databases such as MySQL, Oracle, and AS/400, and also has a ton more features (Views, Stored Procedures, DTS).

I have a feeling this is either sarcasm that I'm missing, or more senseless MS bashing.

DelawareBoy · 2005-07-28 Reply Admin

Gag.. Cough.. This is awful... I can think of two different ways to accomplish this without giving much thought at all...

dhromed · 2005-07-28 Reply Admin

felix:
askeeto:

Performance of a query is 100% unrelated to the nr of lines in a query.

Oh? Has it occured to you that a longer query is inherently more complex? And more error-prone because of sheer code size? Think again, Mr. Expert :-)

What askeeto means is that the amount of cases in which the performance of a query and its length cq complexity are not inversely related is significant, or even close to 50%, as he showed with his two pseudo-examples, and I assume he assumes that the theoretical query is error-free.

But I claim 100% truth in 0% of the cases.

Which is an inherently false statement.

Still, comparing each value to 22 10-char strings doesn't strike me as terribly efficient.

2005-07-28 Reply Admin

I'd bet that this Query had been clicked together by some visual SQL Admin tool...

Alex Papadimoulis · 2005-07-28 Reply Admin

I don't know if you were being sarcastic here or not, but I didn't realize "everyone knows" that SQL Server is slow ... I have a feeling this is either sarcasm that I'm missing, or more senseless MS bashing.

Your sarcasm meter needs repair. I actually tried to put a "censor filter" to replace M$ and Micro$oft with humorous replacement text linking to that class Penny Arcade comic (which I now cannot find ...). Naturally, this worthless forums software doesn't like "$" symbols in censor words.

2005-07-28 Reply Admin

mizhi:

I think one trick engineers use is to tell people that an elevator can handle only 10 people and design it to work with 20. [:)]

No, telling people it can only handle 10 isn't enough. They design it so physically you can't fit more than 10 live (you can pack a lot of dead people in an elevator I suppose, but that generally isn't something that people do) people in the little box. Then they make it able to handle 20.

dhromed · 2005-07-28 Reply Admin

you can pack a lot of dead people in an elevator

:)
This has got to be made into some kind of quote, or experession to use in a certain situation.

eg.

"This sucks"
"Well, you sure can pack a lot of dead people in that elevator"

2005-07-28 Reply Admin

Anonymous:
I think it's pretty clear that the original developer was paid per line of code. Heh. They multiplied their pay for that query!

Not likely. I can easily tripple the LOC count for that query without making any signifiacnt changes

SELECT

DISTINCT

SMS_G_System_SYSTEM.Name

See, I've already trippled my pay compared to this guy, and I didn't even have to think about it. I wish I could get paid by lines of code, I'd be retired by the end of the year. Course I pitty the poor guy who replaces me. Particularly if whitespace is counted too - my teachers always liked double spacing in school, and tripple spacing must be better, right?

SELECT

DISTINCT

SMS_G_System_SYSTEM.Name

Ewww.... I hope none of you get paid per LOC on something I have to take over.

2005-07-28 Reply Admin

Alex, regarding censor words, it's probably using a reg ex for this, try "$"

daralick · 2005-07-28 Reply Admin

Anonymous:
sozin:
...and somehow I doubt whoever coded it thought about the 255 max # of elems you can have in the IN( ... ) clause

Microsoft SQL Server has no such limitation.

I've more experience with MSSql but when recently doing a data 'cleanup' for Oracle 8i ran into the IN (n, n...) limitation of 1000. I've seen alot of comments against MSSql temp tables but create #mytable( attributes) was always a favorite of mine.

2005-07-28 Reply Admin

Anonymous:
SQL Server is actually a seriously good product, albeit one written by an arguably evil company.

Sybase is evil?

http://en.wikipedia.org/wiki/Microsoft_SQL_Server

mizhi · 2005-07-28 Reply Admin

Anonymous:
mizhi:

I think one trick engineers use is to tell people that an elevator can handle only 10 people and design it to work with 20. [:)]

No, telling people it can only handle 10 isn't enough. They design it so physically you can't fit more than 10 live (you can pack a lot of dead people in an elevator I suppose, but that generally isn't something that people do) people in the little box. Then they make it able to handle 20.

Yeah, but then you get frat boys stuffing pledges into elevators during Greek week or Rush. [:)]

JThelen · 2005-07-28 Reply Admin

Alex Papadimoulis:
I don't know if you were being sarcastic here or not, but I didn't realize "everyone knows" that SQL Server is slow ... I have a feeling this is either sarcasm that I'm missing, or more senseless MS bashing.

Your sarcasm meter needs repair. I actually tried to put a "censor filter" to replace M$ and Micro$oft with humorous replacement text linking to that class Penny Arcade comic (which I now cannot find ...). Naturally, this worthless forums software doesn't like "$" symbols in censor words.

Are you referring to the one where Tycho secretly switched Gabe's XBox controller with an 800lb Gorilla per the old Folgers commercials?

2005-07-28 Reply Admin

Ross Day:
I would imagine that if the query were analyzed and re-written with joins rather than many IN and NOT IN clauses that it woudl be much faster. IN and NOT IN seem inherently slow, and so are subqueries.

I know it's a different DBMS, but I once helped optimize an Access query from about 15 seconds down to 1.5 just by rewriting IN and NOT IN subqueries as good joins that did the same thing. I've seen similar optimizations make things faster in some brief experiences with Oracle, however YMMV.

In SQL Server, there doesn't seem to be much difference between joins and equivalent IN/NOT IN conditions. In fact, some joins get rewritten by SQL Server as IN/NOT IN conditions before execution, so I hope it's not that much slower...

bugsRus · 2005-07-28 Reply Admin

Anonymous:
Isn't funny how pretty much all WTFs are written in Microsoft products?

You're obviously a script kiddie, so just stick to what you know.

There are plenty of examples on this site alone to prove you are mercifully free of the ravages of intelligence.

2005-07-28 Reply Admin

The use of SELECT DISTINCT does not necessarily indicate poor normalization.

You're right, most of the time it doesn't. However, most of the time I see people using distinct, it's because they're not joining properly or some similar poor practice that needs to be corrected. It's like using count(*)-- most of the time it just isn't necessary.

askeeto · 2005-07-28 Reply Admin

felix:

Oh? Has it occured to you that a longer query is inherently more complex? And more error-prone because of sheer code size? Think again, Mr. Expert :-)

If we are talking performance, complexity is not important. When querying a 90 million record table I can't afford to write "simple" queries that are less "error prone". I have to write the query that returns the data the fastest. And if that requires a 200 line super complex query, so be it.

felix:

Let me guess: you have studied the output of EXPLAIN on all modern database servers :D

Try it instead of making a smartass remark.

John Smallberries · 2005-07-28 Reply Admin

Anonymous:
The use of SELECT DISTINCT does not necessarily indicate poor normalization.

You're right, most of the time it doesn't. However, most of the time I see people using distinct, it's because they're not joining properly or some similar poor practice that needs to be corrected. It's like using count(*)-- most of the time it just isn't necessary.

What information are you adding? I said the exact same thing in my original post:

John Smallberries:

The use of SELECT DISTINCT does not necessarily indicate poor normalization.

If I want to know which of the 50 states I have customers in,

select distinct state from customer

works fine. Typically I would use GROUP BY (obviously if want to know how many customers in each state) but SQL Server usually generates the same query plan.

I usually see SELECT DISTINCT when somebody has a bad join condition and can't figure out how to fix it.

JamesCurran · 2005-07-28 Reply Admin

Anonymous:
Sybase is evil?

http://en.wikipedia.org/wiki/Microsoft_SQL_Server

Since Version 6.0 (released 1995), MS Sql Server has contained no Sybase written code. (They had a little party when they replaceed the last module)

JamesCurran · 2005-07-28 Reply Admin

SteveM:
Has anyone done any analysis on the average time it takes a DailyWTF thread to move from 'this particular piece of code is a WTF' to 'Microsoft is evil?'

Actually, it doesn't happen that often here. Now, over on Slash-Dot, that's a different story. Usually, it takes only about 10-20 messages, and it takes that long only because many discussions there have nothing to do with computers. Also, if the first mention of Microsoft on a slash-dot thread isn't a bash, the first response to the message (with 100% certainty) will be.

2005-07-28 Reply Admin

askeeto:
Performance of a query is 100% unrelated to the nr of lines in a query. I can write 2 line queries that have horrible performance. And I can write 2000 line queries that produce results in a matter of miliseconds.

Also, using "in ()" or "inner join" to filter out data performs the same in any modern database because the query governor will work out the same query plan for both constructions.

This query is not slow due to the way it is written, although the construction is horrible.

In my experience, horrible construction usually goes hand-in-hand with horrible inefficiency.

I happen to agree with the people who say this looks like an otherwise good coder suffering from temporary insanity (due to time pressure, or frustration, or whatever). I've been there - "JUST WORK DAMMIT".

2005-07-28 Reply Admin

Ahh, SMS. My nemesis rears his ugly head, yet again.

John Smallberries · 2005-07-28 Reply Admin

Anonymous:
...horrible inefficiency...

wouldn't horrible inefficiency be good?

Alex Papadimoulis · 2005-07-28 Reply Admin

JamesCurran:

SteveM:
Has anyone done any analysis on the average time it takes a DailyWTF thread to move from 'this particular piece of code is a WTF' to 'Microsoft is evil?'

Actually, it doesn't happen that often here. Now, over on Slash-Dot, that's a different story. Usually, it takes only about 10-20 messages, and it takes that long only because many discussions there have nothing to do with computers. Also, if the first mention of Microsoft on a slash-dot thread isn't a bash, the first response to the message (with 100% certainty) will be.

One thing that really helps is that the moderator over here (me) is pro-Microsoft. The "MS is evil" posts get quickly deleted for wasting everyone's time ... Slashdot is the place to go for that ...

DWalker59 · 2005-07-28 Reply Admin

Rick said: At an interview recently, I was asked, "How many tables can you join in one Sybase query?" I responded, "More than you should use, but it used to be 16. I don't know what the current limit is." I wasn't impressed with the interviewer.

I'm impressed with that answer! That's perfect.

MS SQL 2000 has some patch that raises the total number of tables + views that you can reference in one query, from 256 to 260. Wow. I was not impressed with that either. (See http://support.microsoft.com/kb/828269/) Obviously, if you're hitting the 256 table limit, something is wrong -- and the fix adds a measly 4 to the limit?

Jeff S · 2005-07-28 Reply Admin

Alex Papadimoulis:
JamesCurran:

SteveM:
Has anyone done any analysis on the average time it takes a DailyWTF thread to move from 'this particular piece of code is a WTF' to 'Microsoft is evil?'

Actually, it doesn't happen that often here. Now, over on Slash-Dot, that's a different story. Usually, it takes only about 10-20 messages, and it takes that long only because many discussions there have nothing to do with computers. Also, if the first mention of Microsoft on a slash-dot thread isn't a bash, the first response to the message (with 100% certainty) will be.

One thing that really helps is that the moderator over here (me) is pro-Microsoft. The "MS is evil" posts get quickly deleted for wasting everyone's time ... Slashdot is the place to go for that ...

Alex / James -- have you seen this:

http://www.bbspot.com/toys/slashtitle/

It's hilarious .... keep refreshing or clicking "next" to see the different "slashdot" articles that come up. It's pretty funny, and they did a good job of incorporating some Microsoft hate into at least every other article ....

felix · 2005-07-29 Reply Admin

Anonymous:
Eh?

I think you do not understand that just because you create a view from your sql statement it will not perform any faster. It is the same SQL query, you just access it more conveniently through a view name... a messed up sql select will perform equally poorly when you create a view from it.

What I really find amusing, as long as I don't have to clean it up, is when people create a bunch of views, and then start joining them together... 'But it looks so simple, all I'm doing is joining these three views together, why is it taking such a long time to execute?'... Priceless.

I'm not an SQL guru, but I think SQL servers can optimise queries better when you just tell them what you want, one step at a time (hence my plead for using views), instead of making a big mess in which you try to impose a particular query plan yourself.

And what about my second argument, the one about query correctness? Using views is like expressing a complex algorithm as many small functions instead of one big lump of spaghetti code. Doesn't that matter at all?

tufty · 2005-07-29 Reply Admin

I'm not an SQL guru, but I think SQL servers can optimise queries better when you just tell them what you want, one step at a time (hence my plead for using views), instead of making a big mess in which you try to impose a particular query plan yourself.

Well, that's where you're wrong. You start with that approach, sure. But when it becomes painfully obvious that the server is fouling up the query (not having the 'out of the box' knowledge that you have it's understandable, of course) you have to start optimising using a "forced" query plan.

The gains you can achieve, regardless of DBMS, are enormous. You only do this on "problem" queries, of course. profiling and "explain" are your friends here.

And what about my second argument, the one about query correctness? Using views is like expressing a complex algorithm as many small functions instead of one big lump of spaghetti code.

It's an appealing view, if you'll pardon the pun. but it pretty much destroys performance in most cases.

Doesn't that matter at all?

Nope. Query "correctness" matters not a jot if it takes until the heat death of the universe to complete.

Complex joins on views are hideously hard for query optimisers to deal with. Couple this with the fact that views are generally created based on relatively 'difficult' queries that have already been optimised to do that job; pulling two or more such queries together will tend to pull the optimiser in about 40 directions simultaneously, with the net result that it gives up and ends up tablescanning pretty much everything, almost always at the bottom of massively nested subqueries.

It's hellish difficult to optimise queries based on views. The first thing to do in cases when you're faced with one of these is to reconstitute it into one query, which can be an "interesting" task in itself; I once spent 2 days optimising one query at $major_financial_institution. It started off as a 20 line query using a shitload of complex views and nasty conditions joining to other tables, and generally took up to an hour to run. The 'finished' version was big (well, very big, in fact, getting on for 300 lines) but ran in 20 seconds in the worst case, and averaged under 10 seconds. A large part of the work was to do with flattening the views (and subselects thereupon) back into the main query. It was a maddening task, and frankly it would have been easier if I'd started from scratch.

Simon

askeeto · 2005-07-29 Reply Admin

felix:

I'm not an SQL guru, but I think SQL servers can optimise queries better when you just tell them what you want, one step at a time (hence my plead for using views), instead of making a big mess in which you try to impose a particular query plan yourself.

And what about my second argument, the one about query correctness? Using views is like expressing a complex algorithm as many small functions instead of one big lump of spaghetti code. Doesn't that matter at all?

I like your idea of using views to reduce the spagetti of SQL, but the use of views has NO impact on optimisation for SQL server.

Assume I create a view:

create view v_customer
for
select customer_nr, customer_name
from t_customer
where nationality = 'US'

When you run the query:

Select customer_nr, customer_name from v_customer where customer_nr = 100

What SQL Server will execute is the following

Select customer_nr, customer_name 
from (select customer_nr, customer_name
         from t_customer
         where nationality = 'US')  v
where customer_nr = 100

In most cases this will produce the same query plan as the quey below, but as the complexity of your view increases this becomes less and less predictable:

Select customer_nr, customer_name 
from t_customer
where customer_nr = 100 and nationality = 'US'

There is no caching of pre-fetching of view results done. And the query plan of the view is not even stored. SQL server will cache segments of the query plan, but that is unrelated to the fact that it is a view. It does exactly the same for normal queries.

askeeto · 2005-07-29 Reply Admin

and why did this forum destroy my post?

Quinnum · 2005-07-29 Reply Admin

DWalker59:
MS SQL 2000 has some patch that raises the total number of tables + views that you can reference in one query, from 256 to 260. Wow. I was not impressed with that either. (See http://support.microsoft.com/kb/828269/) Obviously, if you're hitting the 256 table limit, something is wrong -- and the fix adds a measly 4 to the limit?

Fry: "Oh the fools! If only they'd built it with 6001 hulls! When will they learn?"

2005-07-29 Reply Admin

Your sarcasm meter needs repair. I actually tried to put a "censor filter" to replace M$ and Micro$oft with humorous replacement text linking to that class Penny Arcade comic (which I now cannot find ...). Naturally, this worthless forums software doesn't like "$" symbols in censor words.

You are truly a man after my own heart.

Jeff S · 2005-07-29 Reply Admin

tufty:
> I'm not an SQL guru, but I think SQL servers can > optimise queries better when you just tell them what > you want, one step at a time (hence my plead for using > views), instead of making a big mess in which you try to > impose a particular query plan yourself. Well, that's where you're wrong. You start with that approach, sure. But when it becomes painfully obvious that the server is fouling up the query (not having the 'out of the box' knowledge that you have it's understandable, of course) you have to start optimising using a "forced" query plan. The gains you can achieve, regardless of DBMS, are enormous. You only do this on "problem" queries, of course. profiling and "explain" are your friends here. > And what about my second argument, the one about > query correctness? Using views is like expressing a > complex algorithm as many small functions instead of > one big lump of spaghetti code. It's an appealing view, if you'll pardon the pun. but it pretty much destroys performance in most cases. > Doesn't that matter at all? Nope. Query "correctness" matters not a jot if it takes until the heat death of the universe to complete. Complex joins on views are hideously hard for query optimisers to deal with. Couple this with the fact that views are generally created based on relatively 'difficult' queries that have already been optimised to do that job; pulling two or more such queries together will tend to pull the optimiser in about 40 directions simultaneously, with the net result that it gives up and ends up tablescanning pretty much everything, almost always at the bottom of massively nested subqueries. It's hellish difficult to optimise queries based on views. The first thing to do in cases when you're faced with one of these is to reconstitute it into one query, which can be an "interesting" task in itself; I once spent 2 days optimising one query at $major_financial_institution. It started off as a 20 line query using a shitload of complex views and nasty conditions joining to other tables, and generally took up to an hour to run. The 'finished' version was big (well, _very_ big, in fact, getting on for 300 lines) but ran in 20 seconds in the worst case, and averaged under 10 seconds. A large part of the work was to do with flattening the views (and subselects thereupon) back into the main query. It was a maddening task, and frankly it would have been easier if I'd started from scratch. Simon

The part about how it is often "better to start from scratch" rather than using existing spaghetti code is absolutely true, but the rest is pretty questionable.

Using views that properly fit into the SELECT you are writing is EXACTLY the same as using derived tables in all respects, except for the "side effect" of having more readable code. Now, using random views that just happen to be there w/o understanding what they do, and with logic and joins that you don't need, and just joining them all together and randomly messing with things until it "looks" right (i.e., often by adding DISTINCT as mentioned earlier) is of course going to generate horrible execution plans. Maybe that is the argument you are trying to make? But that is NOT an arguement against using views in general ! That is like saying "code in C++ can be slow and buggy; therefore C++ is bad."

2005-07-29 Reply Admin

Well, at least his FROM clause prevents the query from dead-locking itself. :)

ML

2005-07-29 Reply Admin

Amateurs!

Now if you really want to slow the system down, it's just silly to link them by any fields. So get rid of it.

You really want a Cartesian join with the system's largest table. And if that query finishes within the day, you're going to have to join that table a couple of times.

<FONT size=+0>SELECT</FONT> <FONT size=+0>DISTINCT</FONT> SMS_G_System_SYSTEM.Name 
  <FONT size=+0>FROM</FONT> SMS_G_System, SMS_R_System [r1, SMS_R_System r2...]
       <FONT size=+0>INNER</FONT> <FONT size=+0>JOIN</FONT> SMS_G_System_SYSTEM
               <FONT size=+0>ON</FONT> SMS_G_System_SYSTEM.ResourceID = SMS_R_System.ResourceID 
 <FONT size=+0>WHERE</FONT> SMS_G_System_SYSTEM.Name <FONT size=+0>NOT</FONT> <FONT size=+0>IN</FONT> (
         <FONT size=+0>SELECT</FONT> SMS_G_System_SYSTEM.Name 
           <FONT size=+0>FROM SMS_G_System, SMS_R_System [r1, SMS_R_System r2...]
</FONT>                <FONT size=+0>INNER</FONT> <FONT size=+0>JOIN</FONT> SMS_G_System_SYSTEM
                   <FONT size=+0>ON</FONT> SMS_G_System_SYSTEM.ResourceID = SMS_R_System.ResourceId 
          <FONT size=+0>WHERE</FONT> SMS_G_System_SYSTEM.Name <FONT size=+0>IN</FONT> ( 
            ...
            )
         )

Now, we're cooking...some silicon.

2005-07-29 Reply Admin

Join is valid here because he wants to only select "name" from SMS_R_System if it also exists in SMS_G_System_SYSTEM.

The 'distinct' gives him unique names (i.e. it could be a 1:M). He could use 'group by name' also to give him unique names.

bmschkerke · 2005-08-01 Reply Admin

Without knowing the database it's hard to say if this is a screwed up situation or not. That query could have been auto generated by a tool of some sort without the knowledge to optimize, for instance.

I like to point out to those who make condemnations quickly something: what's good for the system isn't always good for the business. Case in point: I have a stored procedure which uses (gasp) cursors. It does per record logic on a subset of data that gets fairly indepth. I used to have the business layer do it but that took upwards of five minutes because of the heavy data manipulation. Placing this routine in a stored procedure on the server it now takes less than a second to execute. Is it "horrible" code, abusing SQL Server for things it's not supposed to be used for?

You bet.

Did my end users love me when I made the change?

You bet.

Sometimes priorities have to be set, and priority #1 is almost never "good code."

Jeff S · 2005-08-01 Reply Admin

bmschkerke:
Without knowing the database it's hard to say if this is a screwed up situation or not. That query could have been auto generated by a tool of some sort without the knowledge to optimize, for instance.

I like to point out to those who make condemnations quickly something: what's good for the system isn't always good for the business. Case in point: I have a stored procedure which uses (*gasp*) cursors. It does per record logic on a subset of data that gets fairly indepth. I used to have the business layer do it but that took upwards of five minutes because of the heavy data manipulation. Placing this routine in a stored procedure on the server it now takes less than a second to execute. Is it "horrible" code, abusing SQL Server for things it's not supposed to be used for?

You bet.

Did my end users love me when I made the change?

You bet.

Sometimes priorities have to be set, and priority #1 is almost never "good code."

Would it be even faster, shorter, and more efficient if it was set-based?

You bet.

And if it was set-based, would your users AND your fellow developers love you even more?

You bet!

2005-08-04 Reply Admin

Neither does Postgresql. (Just tried it, with 300 elements in an "in" statement - it worked fine)

What kind of bone-headed database allows only 255 elements in an "in" clause?

2005-08-05 Reply Admin

Anonymous:

<font>//Fixed

SELECT</font> <font>DISTINCT</font> SMS_G_System_SYSTEM.Name 
  <font>FROM</font> SMS_R_System
       <font>INNER</font> <font>JOIN</font> SMS_G_System_SYSTEM
               <font>ON</font> SMS_G_System_SYSTEM.ResourceID = SMS_R_System.ResourceID 
 <font style="font-weight: bold;" size="4">WHERE</font><font size="4"> SMS_G_System_SYSTEM.Name </font><font style="font-weight: bold;" size="4">NOT</font><font size="4"> </font><font style="font-weight: bold;" size="4">IN</font> (
            <font>'STC0017127'</font>, <font>'STC0015341'</font>, <font>'STC0015342'</font>, <font>'STC0015343'</font>, 
            <font>'STC0015344'</font>, <font>'STC0015345'</font>, <font>'STC0015346'</font>, <font>'STC0015347'</font>, 
            <font>'STC0015348'</font>, <font>'STC0017117'</font>, <font>'STC0017118'</font>, <font>'STC0017119'</font>, 
            <font>'STC0017120'</font>, <font>'STC0017121'</font>, <font>'STC0017122'</font>, <font>'STC0017123'</font>, 
            <font>'STC0017124'</font>, <font>'STC0017125'</font>, <font>'STC0017126'</font>, <font>'STC0017128'</font>, 
            <font>'STC0019158'</font>, <font>'STC0019160'</font> 
            )

And my guess is that he doesn't need to do the inner join... unless he's really sure he needs records that are both in SMS_G and SMS_R, which I think are fucked up table names.

Having written hairy SQL myself all the time, I can tell what the person's trying to do, and they should have chosen EITHER a subquery OR an innerjoin, but not an innerjoin with a subquery containing an innerjoin. The alternating IN and NOT IN forms didn't help readability either.

When I first looked at the query, I thought "WTF is this inner join for? None of the data from the other table is used, so why include it" then I realized that it WAS being used... the query-writer was looking for "this ResourceID exists in both _G_ and _R_ tables".

The obvious solution would be to alter the table and add a column something like "we_care_about_this_row boolean" (sorry, I use postgres, so I dont't know SQL Server's datatypes) and just use that column instead of this IN/NOT IN junk, but if the schema just can't be changed, just replacing the IN (SELECT ...) with NOT IN (the list in the subquery) like you did would certainly speed it up.

Are You IN Or NOT IN?

Leave a comment on “Are You IN Or NOT IN?”