The Daily WTF: Curious Perversions in Information Technology

Mike Dimmick · 2006-10-25 Reply Admin

Anonymous:
This reminds me of a previous job. Our system architect had forbidden the use of temp tables (apparently the chance of a tempdb lock was bad) so everyone used these 'temp real' tables instead.

The only situation I can think of where this would occur would be contention on the database schema stability lock, or blocking on tempdb's data files being extended. The schema stability lock would apply whether using tempdb or your real database. The downside of using your real database is that you get junk in the transaction log, which increases the size of the log backup and reduces the number of real transactions that can be recorded.

The default for tempdb's size can be altered, I believe. This KB article suggests that the size used for tempdb at startup is the size last configured. To configure, it appears you use 'ALTER DATABASE tempdb MODIFY FILE ...'

Erick · 2006-10-25 Reply Admin

"we can't possibly risk altering the code for this procedure; the risk of failure is just too high."

Translation:

"my code is crappy and I don't know what I'm doing."

Besides, if it's erroring out and not cleaning up its temp tables, it's already failing!

savar · 2006-10-25 Reply Admin

Anonymous:
Digitalbath:
Anonymous:

WTF is "order by 2"?
WTF ironic CAPTCHA: random (2nd time in a row)
You can specify column order (1 based counting) from the query in the order by instead of using the column name.
Yeah, though why you'd want to bother ordering when you're inserting into a table is a bit of a mystery.
Truth be told, I still have some code that uses temporary tables like this. Back in 1994 or so, old Ingres didn't do temporary tables so well (or at all?), so we had lots of worthless temp-yet-too-permanent tables laying around. This month I started rewriting code to ditch the last major use of them.

In a large DB, the order you insert records can affect where they get written on disk. We do stuff like this so that we can align storage with expected future queries. We're dealing with a little over 1TB...I doubt their sales commissions data is nearly as big.

Of course, nobody in their right mind would allow a developer to submit code that used "order by 2". Besides being unecessarily confusing syntax (some people don't know what it is, even if you do you still have to look back to the FROM and JOIN clauses to figure out what column that actually is, it creates maintenance headaches when somebody comes along and modifies the query in the future.

2006-10-25 Reply Admin

XMLord
A tip an experienced DBA once taught me:

Whenever someone screws with the database, drop their tables, drop their procedures, drop their user and so on. Basically drop everything they have. It sort of like a 3 strikes rule, only that there's only one strike. Of course, a logical backup is advised first, but most of the time you never use it.
At the time, it seemed a bit harsh, but there's such an abundance of stupid developers, that you just can’t be easy on them.

BOFH style, but effective.

Fp
Yep, got an exact app just like that from a 3rd party here in the UK! Taken 3 years to get them to even consider using "real" temp tables as opposed to what they considered temp tables. At first they were dumping them in the system tablespace (Orakel), until we managed to convince them to let us change the processing user's defaults. Processing runs and leaves the tables then another jobs comes in and drops them, absolutely awful but we can't touch it, for fear of voiding our support, what little we actual get out of these clowns. Just another example of that db-is-a-black-box mentality.

Maybe you _should_ void your support :D At my current workplace I am trying to rely as less as possible on the 3rd party support (even if that means more hours spent in my cubicle), not to mention that many in-house improvements were fed back at us (for some real money) as "upgrades" and/or "patches".

Mike Dimmick
If this is SQL Server 2000 or later, table variables are even better than temp tables. Only use temp tables if you need to use the same table in more than one batch on the same connection - temp tables are scoped to a connection, not a procedure. Temp tables also need space in the transaction log. A downside is that you can't SELECT INTO a table variable - you must declare the table variable first, then use INSERT ... SELECT. I think that Alex, in the anonymization process, has transformed a SELECT INTO into INSERT ... SELECT, because SELECT INTO always creates a table.

The original design causes transaction log explosion as well. All the metadata creation has to be logged, then all the stuff inserted, plus any indexing, and this isn't cleared when the table is dropped, so this happens whether or not the drop table statements run. This also happens for temp tables, but tempdb is always set in Simple recovery mode, so the log space is reused after the transactions commit. If your live database is in Simple recovery mode, that's a WTF of its own, since you can never recover to the point of failure (only to the point of last full backup).
...
Anonymous:
This reminds me of a previous job. Our system architect had forbidden the use of temp tables (apparently the chance of a tempdb lock was bad) so everyone used these 'temp real' tables instead.

The only situation I can think of where this would occur would be contention on the database schema stability lock, or blocking on tempdb's data files being extended. The schema stability lock would apply whether using tempdb or your real database. The downside of using your real database is that you get junk in the transaction log, which increases the size of the log backup and reduces the number of real transactions that can be recorded.

The default for tempdb's size can be altered, I believe. This KB article suggests that the size used for tempdb at startup is the size last configured. To configure, it appears you use 'ALTER DATABASE tempdb MODIFY FILE ...'

Mike, you're good :) I have a similar problem with "tempdb is full" because of a poorly written SP (one of my coleagues wrote it and I was too lazy to rewrite it) but now I have a serious incentive to look at it :D After all, I wouldn't like to be responsible for another WTF...

The KB article is relevant, but how many monkeys read KBs? Not to mention that sharing knowledge implies _having_ it in the first place.

Peace

captcha = java (oh, yeah)

Mike Rod · 2006-10-25 Reply Admin

Hi guys, this is Mike Rod

And yes, everything posted here was real, it was extemely painful to debug this code. One thing I didn't tell Alex is the (incredibly high) number of cursors: basically each Inner Join/Left Outer Join required:

- One select into a "temp" table for the left side of the join

- One select into a "temp" table for the right side of the join

- One, maybe two cursor(s)

- A third "temp" table to hold the result

A simple UNION statement also required three temp tables in this code (this is real).

They asked me to find a bug (which happened to be a variable inside a cursor that wasn't reset on every loop) but I found the whole 5000+ LOC procedure a gargantual, hellish bug. You can only imagine the pain and frustration to even understand a part of this code, I was like:

(after undertanding 50 lines of code)

"Gaaa! this is only a inner join implemented with a cursor"

(after reading the union with three "temp" tables)

"Holy cow! I am getting nowhere... I need a smoke"

I finished all my smokes before lunch =(

Mike Rod

DigitalLogic · 2006-10-25 Reply Admin

Anonymous:
> How could anyone be in charge of developing such a vital stored procedure without KNOWING SQL.
I suspect he just learned the bare minimum of what he thought he needed to know, and then stuck with the design principles of C/Pascal/Fortran/Basic or whatever else procedural language he knew. I did the exact same thing when I first learned db stuff -- but then, I wasn't being paid for it, and I did bother to keep learning for a while before I did get paid for it. Even today, I consider myself a novice.

That's exactly what I dealt with at my last job. I inherited a database system designed by a cobol developer. All the logic in the system was in the stored procedures which he wrote, and the design of that system was the first time he'd ever done anything with SQL. I won't even get started on the table layout. I get nauseous when I think about it.

notromda · 2006-10-25 Reply Admin

Oh, no no no," the developer replied, "we can't possibly risk altering the code for this procedure; the risk of failure is just too high."

The thing that bothers me, is the fact that it appears the developer no longer know how it works, and is afraid to make changes. If an organization no longer has anyone that can understand the problem, much less the solution.... it's screwed.

2006-10-25 Reply Admin

notromda:
Oh, no no no," the developer replied, "we can't possibly risk altering the code for this procedure; the risk of failure is just too high."
The thing that bothers me, is the fact that it appears the developer no longer know how it works, and is afraid to make changes. If an organization no longer has anyone that can understand the problem, much less the solution.... it's screwed.

Imagine If we are able to built inmortal robots with wrong Rules Of Robotic, and no one remenber how to recode these. We will be really screwed. Except all Inmortal Robots will die in 2038 :/

Note: I will be away with images of female bot-sex

2006-10-25 Reply Admin

Anonymous:
Step 2 (if step 1 isn't sufficiently effective): just tweak something somewhere so that the commissions no longer get credited. A mere DBA may not be able to kick a development manager into action, but an entire sales force baying for blood most certainly will.

Oh, sabotage. Great idea.

2006-10-25 Reply Admin

A 5700 line stored procedure is pretty ugly in and of itself. Having stored procedures in production that can error off with incomplete results and not, at least, produce an alert to the operations staff is a serious problem.

Sometimes you can't just add # to the begining of table names. Temp tables have limited visibility. They would not be visible witnin any stored procedures that the big one called, unless you used the ## form.

2006-10-25 Reply Admin

Anonymous:

<snip>I've actually seen code use this in conjunction with an identity field. Inserting them in a certain order makes sure that the identity field's values are assigned in that order. </snip>
No, no, no! Go to the back of the class. Ordering does not ensure IDs get assigned in that order.

Oh, no. Let's not start that flame war again: http://thedailywtf.com/forums/permalink/29028/28969/ShowThread.aspx#28969

Gsquared · 2006-10-25 Reply Admin

Anonymous:

A 5700 line stored procedure is pretty ugly in and of itself. Having stored procedures in production that can error off with incomplete results and not, at least, produce an alert to the operations staff is a serious problem.

Sometimes you can't just add # to the begining of table names. Temp tables have limited visibility. They would not be visible witnin any stored procedures that the big one called, unless you used the ## form.

Temp tables (at least in MS-SQL 2000/5) can be accessed by any proc called by the proc that created them. I have a couple of systems that use that functionality to make the code more modular. They don't need the ##<tablename> format for that, just #<tablename>.

I only use temporary tables in these situations where I have to pass a table of data from one proc to another, since you can't have a table variable as an input parameter, but you can pass a temp table's control from proc to proc, so long as one proc calls the other (that keeps it in the same scope). In a situation like the one outlined in this WTF, if I had to use "temp tables" at all, I'd use table variables, since it's all one giant proc. That's assuming I couldn't do the whole thing with a few set-based statements that would take the place of the whole proc (which is quite likely the real solution).

nobody · 2006-10-25 Reply Admin

Developer: "Tell me again, why are you moving me to Kazakstan?"
Boss: "Don't worry, it's only a temp. position."

2006-10-25 Reply Admin

powerlord:
DigitalLogic:
What if Itm_Nm is not unique? You'd receive a different data set with the Group By clause.

And I am not sure about other vendors as I've only used SQL Server, but with MS you would receive an error: "Itm_Nm is invalid in the selection list because it is not contained in either an aggregate function or the group by clause."

I haven't yet seen a DB server that would allow it. Here's what a few other DBs return:

Oracle: [ORA-00979: not a GROUP BY expression]
MySQL: ERROR 1140 (42000): Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause
PostgreSQL: 42803 GROUPING ERROR

Well, actually, I don't know how PostgreSQL returns errors... that was just a guess from the manual.

What I get from PostgreSQL 8.1 is:

ERROR: column "<tablename>.<columnname>" must appear in the GROUP BY clause or be used in an aggregate function

I have worked with a lot of database products and have never seen one which would return the desired results if you drop GROUP BY as previously suggested. Some would return huge result sets which would not be very useful; most have not taken to generating an error.

The most common SQL WTF I see is programmers who don't get set logic. Many times I have replaced a set of statements which laboriously wended through many tables with a single statement which joined the tables. It generally runs several times as fast as a single statement, partly because of the reduced overhead in parsing and planning, and partly because an application programmer can't generally use (for example) bitmaps and hashes the same way a database engine can.

Someone asked whether the optimizer wouldn't do the same thing the application programmer did here. Hard to say from available information, but the odds are extremely high that if the optimizer was given a description of the RESULT NEEDED using set logic, it would have come up with a better way to navigate the data to get there.

2006-10-25 Reply Admin

Anonymous:
powerlord:
DigitalLogic:
What if Itm_Nm is not unique? You'd receive a different data set with the Group By clause.

And I am not sure about other vendors as I've only used SQL Server, but with MS you would receive an error: "Itm_Nm is invalid in the selection list because it is not contained in either an aggregate function or the group by clause."

I haven't yet seen a DB server that would allow it. Here's what a few other DBs return:

Oracle: [ORA-00979: not a GROUP BY expression]
MySQL: ERROR 1140 (42000): Mixing of GROUP columns (MIN(),MAX(),COUNT(),...) with no GROUP columns is illegal if there is no GROUP BY clause
PostgreSQL: 42803 GROUPING ERROR

Well, actually, I don't know how PostgreSQL returns errors... that was just a guess from the manual.

What I get from PostgreSQL 8.1 is:
ERROR: column "<tablename>.<columnname>" must appear in the GROUP BY clause or be used in an aggregate function
I have worked with a lot of database products and have never seen one which would return the desired results if you drop GROUP BY as previously suggested. Some would return huge result sets which would not be very useful; most have not taken to generating an error.
The most common SQL WTF I see is programmers who don't get set logic. Many times I have replaced a set of statements which laboriously wended through many tables with a single statement which joined the tables. It generally runs several times as fast as a single statement, partly because of the reduced overhead in parsing and planning, and partly because an application programmer can't generally use (for example) bitmaps and hashes the same way a database engine can.
Someone asked whether the optimizer wouldn't do the same thing the application programmer did here. Hard to say from available information, but the odds are extremely high that if the optimizer was given a description of the RESULT NEEDED using set logic, it would have come up with a better way to navigate the data to get there.

Er, that should read "most have now taken"...

unklegwar · 2006-10-25 Reply Admin

Anonymous:
Anonymous:
mrprogguy:

The WTF for me is that people who write stuff in a syntax like
INSERT INTO [__TMP_STEP05SC]
SELECT [Itm_Nm], MAX([Itm_Amt]) AS [Max_Itm_Amt]
FROM [__TMP_STEP04SC]
GROUP BY [Itm_Nm]
ORDER BY 2
get to call themselves "developers."
What's wrong with this syntax? It's perfectly acceptable.
What would you do instead?

How about:
INSERT INTO [__TMP_STEP05SC]
SELECT [Itm_Nm], MAX([Itm_Amt]) AS [Max_Itm_Amt]
FROM [__TMP_STEP04SC]

That's NOT the same. You are now inserting the overall MAX instead of the per item max. (I am not a DBA, but that's what I'm seeing)

DigitalLogic · 2006-10-25 Reply Admin

Anonymous:
The most common SQL WTF I see is programmers who don't get set logic. Many times I have replaced a set of statements which laboriously wended through many tables with a single statement which joined the tables. It generally runs several times as fast as a single statement, partly because of the reduced overhead in parsing and planning, and partly because an application programmer can't generally use (for example) bitmaps and hashes the same way a database engine can.

One of my close friends never took any database classes in college, but he learned set theory. He became a competent database programmer after about 2-3 weeks of working with SQL. The system architect at my last job had been working with SQL for 3 years when I started. I had to rewrite almost everything he wrote. One of the first things he said to me was, "Forget everything you learned in college. It's completely useless."

Maybe if he would have remembered how to measure the order of magnitude of an algorithm he would have realized why his queries that used cursors to iterate over records in a table were so much slower then ones that did updates in a single SQL statement.

Alexis de Torquemada · 2006-10-25 Reply Admin

Grimoire:
How about not using temp tables, and using proper queries instead? Or did I miss the sarcasm tag again?

Well, of course you could use views and proper queries and stuff, but the risk of failure would be too high.

2006-10-25 Reply Admin

Actually, the "ORDER BY #" syntax (where # is equal to a counting number) is frequently needed when doing a UNION query. At times, the column name may be different across two tables, so ordering by a column name is invalid.

For example:

SELECT taField1, COUNT(taField2) FROM tblTableA GROUP BY taField1
UNION
SELECT tbField1, COUNT(tbField2) FROM tblTableB GROUP BY tbField1
ORDER BY 2

You might get away with column aliases, but not always. Sometimes, depending on the database, the syntax above is the only one that works.

-- Kluge Doctor

2006-10-25 Reply Admin

Anonymous:
Actually, the "ORDER BY #" syntax (where # is equal to a counting number) is frequently needed when doing a UNION query. At times, the column name may be different across two tables, so ordering by a column name is invalid.
For example:
SELECT taField1, COUNT(taField2) FROM tblTableA GROUP BY taField1
UNION
SELECT tbField1, COUNT(tbField2) FROM tblTableB GROUP BY tbField1
ORDER BY 2

You might get away with column aliases, but not always. Sometimes, depending on the database, the syntax above is the only one that works.
-- Kluge Doctor

In an ANSI compliant database product, you use the alias of the column in the first SELECT statement of the UNION, or the unqualified name if there is no alias. In your example, and alias would be required, but only on the first SELECT. I haven't worked with a database that fails to comply with this standard in a while, although I'm sure they're out there. (Of course, the ordinal column number is also valid ANSI syntax, it's just a bit more fragile in the face of modifications to the query.) Most products also extend the standard to allow the ORDER BY clause to reference any column or expression which would be valid in the context of the result column declarations, but that is non-standard and subject to problems in a UNION.

2006-10-26 Reply Admin

And I thought this would be about TempDB in MSSQL 2005 growing 50GB large on occasion...

Gsquared:
Anonymous:

A 5700 line stored procedure is pretty ugly in and of itself. Having stored procedures in production that can error off with incomplete results and not, at least, produce an alert to the operations staff is a serious problem.

Sometimes you can't just add # to the begining of table names. Temp tables have limited visibility. They would not be visible witnin any stored procedures that the big one called, unless you used the ## form.
Temp tables (at least in MS-SQL 2000/5) can be accessed by any proc called by the proc that created them. I have a couple of systems that use that functionality to make the code more modular. They don't need the ##<tablename> format for that, just #<tablename>.
I only use temporary tables in these situations where I have to pass a table of data from one proc to another, since you can't have a table variable as an input parameter, but you can pass a temp table's control from proc to proc, so long as one proc calls the other (that keeps it in the same scope). In a situation like the one outlined in this WTF, if I had to use "temp tables" at all, I'd use table variables, since it's all one giant proc. That's assuming I couldn't do the whole thing with a few set-based statements that would take the place of the whole proc (which is quite likely the real solution).

Oh noes! Not the nested stored procedure nightmare! Been there, done that. It's a real PITA to maintain something like this. Repeat: TSQL is not a proper programming language!

2006-10-27 Reply Admin

<40

DrCode · 2006-10-30 Reply Admin

XMLord:

Whenever someone screws with the database, drop their tables, drop their procedures, drop their user and so on. Basically drop everything they have. It sort of like a 3 strikes rule, only that there's only one strike. Of course, a logical backup is advised first, but most of the time you never use it.

Where I work, this would be followed by a DROP DBA.

2016-12-10 Reply Admin

i've been reading these in reverse order, so i can't help but think of a later story where some idiot wanted to clean up temp files by deleting "temp"...this would have also deleted the TEMPerature warnings, and important financial TEMPlate files...

They're Just Temp Tables

Leave a comment on “They're Just Temp Tables”