The Daily WTF: Curious Perversions in Information Technology

2007-01-08 Reply Admin

Me no want neither.

PS. No funny comments to read yet...

2007-01-08 Reply Admin

wow. Sad thing is that I have seen worse. But... wow

2007-01-08 Reply Admin

OMG - Where can I hire the boy-genius that came up with this? :)

DaveK · 2007-01-08 Reply Admin

Wouldn't it be great if it turned out that 'replace' and 'REPLACE' were two completely different functions? ;)

jrwr00 · 2007-01-08 Reply Admin

Wow, I'm a Complete Noob at SQL coding, and I've could of come up with something better then that, i wonder, is this PHP?

2007-01-08 Reply Admin

Maybe some art of SQL obfuscation?

2007-01-08 Reply Admin

i'm writing sql right now @ work and this makes me want to quit and go skydiving with no parachute.

greets from Argentina, the bad-english country!

The MAZZTer · 2007-01-08 Reply Admin

jrwr00:
Wow, I'm a Complete Noob at SQL coding, and I've could of come up with something better then that, i wonder, is this PHP?

No, this is just SQL. SQL has functions too.

2007-01-08 Reply Admin

Well AFAIK there is no better way of doing character substitution in SQL. Unless you create a table & perform lookup from that table using a stored procedure.

2007-01-08 Reply Admin

Suraj:
Well AFAIK there is no better way of doing character substitution in SQL. Unless you create a table & perform lookup from that table using a stored procedure.

I agree. Thing I cannot figure out is that this person seems to be replacing all control characters between 1 and 31, inclusive, w/ nulls. My guess is that somewhere, in either the translate or filter functions, these nulls are removed. I understand the desire to eliminate the control chars...but to do it in sql? What a pain in the arse.

2007-01-08 Reply Admin

I'm guessing this is Oracle, which only introduced regular expressions in 10g - so granted, in earlier versions this sort of thing can be annoying - but then such cleanup would normally be done in an input filter, not as an in-table update. At least use a UDF...

2007-01-08 Reply Admin

Good night... Can anyone tell me what exactly that does? I lost interest somewhere around the 15th "Replace".

mkb · 2007-01-08 Reply Admin

serge:
I'm guessing this is Oracle, which only introduced regular expressions in 10g - so granted, in earlier versions this sort of thing can be annoying - but then such cleanup would normally be done in an input filter, not as an in-table update. At least use a UDF...

Similar gotchas exist in MySQL.

ammoQ · 2007-01-08 Reply Admin

If this is Oracle (kinda looks like), "translate" is a build-in function. Using a UDF would have been by far more beautiful, but it comes with a small performance penalty.

2007-01-08 Reply Admin

Can oracle not call out to c functions? If so, wouldn't character replacement be done more efficiently and comprehensibly there?

2007-01-08 Reply Admin

mav:
Good night... Can anyone tell me what exactly that does? I lost interest somewhere around the 15th "Replace".

Replaces the first 31 chars ('\0', bell, tab, etc. The unprintables) with NULL.

Jeff S · 2007-01-08 Reply Admin

This can't be SQL Server .. replacing a single character in a string (i.e., varchar) with a NULL results in NULL for the entire string:

select replace('jeff', 'j',null)

returns

NULL

(1 row(s) affected)

As expected .... I mean, why would you think that you could replace a single character with NULL ? Makes no sense. If the goal is to take a string with ANY of these characters ANYWHERE and simply return NULL for the entire string, I guess it would work "properly", but I doubt that's it ...

2007-01-08 Reply Admin

get a better database

REPLACE isn't an SQL95 standard call either

postgresl has

translate(string text, from text, to text) Any character in string that matches a character in the from set is replaced by the corresponding character in the to set

e.g. translate('12345', '14', 'ax') a23x5

so we have to translate any of \1-\31 with sentinel (I've chosen \1) and then replace that with our desired value (NULL for some reason)

set name=replace(translate(new_name, '\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\21\22\23\24\25\26\27\28\29\30\31', '\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1'), '\1', NULL);

but this will output a NULL string if any of \1-\31 are encountered

2007-01-08 Reply Admin

Maybe that code was written in a contest to see who could call replace() the most.

Michael Buschbeck · 2007-01-08 Reply Admin

Jeff S:
This can't be SQL Server .. replacing a single character in a string (i.e., varchar) with a NULL results in NULL for the entire string:

I don't actually know which RDBMS was supposed to run this statement, but I'd venture the guess that REPLACE(text, substring, NULL) effectively strips all occurrences of the search string from the searched string on that database system.

PostgreSQL would return NULL for that input just like SQL Server.

2007-01-08 Reply Admin

I'd guess this is Oracle, too. Replacing a character with null will effectively delete that specific character from the string:

select replace('abcdef','a',null) from dual;

REPLA

bcdef

1 row selected.

2007-01-08 Reply Admin

There's different ways to do that, but the most elegant solution would be something like:

CREATE FUNCTION dbo.RemoveBadChars(@InputString VARCHAR(8000) = NULL)
 RETURNS @OutputString VARCHAR(8000) AS
 BEGIN
 /*
 You can either walk through each character and eliminate the ones
 not needed or do the nested replaces. You could also build your own
 extended procedures that uses some compiled DLL, but that could be
 an overkill depending on how this code is used.
 */

 SET @OutputString = REPLACE(REPLACE........**INSERT SCARY CODE HERE**.......))
 RETURN
END

Then, you can just call that ONE function for ALL columns. And since the functions are compiled optimized, the code will also run faster.

2007-01-08 Reply Admin

This is when you need to write a (Perl/Python/PHP/C/whatever) program that writes your program. It's the only thing that will keep you out of the padded office if you have to, dog forbid, change it.

2007-01-08 Reply Admin

I think the function idea would be best - something along the lines of a series of replaces, each on a different line.

however, the original writer may have been trying to avoid the need for create function or procedure privileges - so just jamming it all in one script leaves you with less options.

if it is Oracle (i suspect so) then at least the script could be written as a PLSQL block - and it could look a little more like traditional code

captcha = craptastic

ammoQ · 2007-01-08 Reply Admin

maht:
get a better database
REPLACE isn't an SQL95 standard call either

postgresl has

translate(string text, from text, to text) Any character in string that matches a character in the from set is replaced by the corresponding character in the to set

e.g. translate('12345', '14', 'ax') a23x5

so we have to translate any of \1-\31 with sentinel (I've chosen \1) and then replace that with our desired value (NULL for some reason)

set name=replace(translate(new_name, '\1\2\3\4\5\6\7\8\9\10\11\12\13\14\15\16\17\18\19\20\21\22\23\24\25\26\27\28\29\30\31', '\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1\1'), '\1', NULL);

but this will output a NULL string if any of \1-\31 are encountered

In Oracle, the "translate" function does exactly the same; unfortunately, escaping special characters like '\1' doesn't work.

2007-01-08 Reply Admin

and god invented for loops for a reason...

Satanicpuppy · 2007-01-08 Reply Admin

I don't see why anyone would design a table that had a record that would need to be updated like this. This code is just the tip of the WTF.

Designing a database, you should have fields that change often, and fields that change rarely if at all. Records that change often should be foreign keyed to fields that change seldom, and when they need to be changed, they should be REPLACED, not abused with some crazy ass serial update stored procedure.

Temporary items like semaphore implementations, should be deleted, or the entries should be moved to a log table or something. They should never be cleaned.

If you do need to clean up your tables, it should be done by dates, which is easy as pie.

This sort of thing is almost always the result of poor table design. It's just absurd.

2007-01-08 Reply Admin

In SQL Server:

DECLARE @BadStrings VARCHAR(50) SET @BadStrings = 'Hello' + CHAR(5) + ' There'

SELECT REPLACE(@BadStrings, SUBSTRING(@BadStrings, PATINDEX('%[^a-zA-Z0-9 '''''']%', @BadStrings), 1), '')

That shoudl about do it.

2007-01-08 Reply Admin

I was going to put something witty here, however my captcha is "giggity" and that seems to sum it up pretty well.

2007-01-08 Reply Admin

I dunno about you, but I would use a trigger that calls a cleanup stored procedure for each field... :new.name=fix(:new.name) ....etc???

2007-01-08 Reply Admin

Data cleanup scripts should only be used once, and then only after every bit of data entry has been fixed to not insert undesired data in the first place.

2007-01-08 Reply Admin

You people are morons. This is the only way to perform the required operation in sql without sacrificing performance. Not all programming is rainbows and sunshine. There will come days with the code gets ugly in oreder to perform. Get a grip, and let this one go.

Jeff S · 2007-01-08 Reply Admin

DarthGeek:
In SQL Server:
DECLARE @BadStrings VARCHAR(50) SET @BadStrings = 'Hello' + CHAR(5) + ' There'

SELECT REPLACE(@BadStrings, SUBSTRING(@BadStrings, PATINDEX('%[^a-zA-Z0-9 '''''']%', @BadStrings), 1), '')

That shoudl about do it.

It should about do it if you only want to replace only the very first "bad" character, and if you are defining bad characters as including punctuation and other things that you probably would want to keep.

2007-01-08 Reply Admin

Anon:
This is when you need to write a (Perl/Python/PHP/C/whatever) program that writes your program. It's the only thing that will keep you out of the padded office if you have to, dog forbid, change it.

No kidding. For a moment I thought that this code was generated in such a way, but then I saw the random capitalization. I guess you'd want something like:

s = columnName
for i in range(1, 32):
    s = 'replace(%s,CHR(%d),NULL)' % (s, i)

slamb · 2007-01-08 Reply Admin

In Oracle, replace('foo', null) is just a stupid way of writing replace('foo', ''), because in Oracle NULL and '' are the same thing. (Which is a WTF for another day...)

There are (at least) three WTFs here:

(1) They regularly have to run a script to remove this sort of thing from a well-respected data source (can't they just fix whatever keeps putting them in there, and run it once on the already-entered data?)

(2) The writer of this fragment apparently didn't know how to write functions in SQL, though someone must have written "filter" and "translate". Even if this horrible expresssion were justified once, there's no reason to write it out once for each of the 32 columns.

(3) They aren't using regular expressions, or built-in translate function correctly...they could at least do

translate('x' || chr(1) || chr(2) || chr(3) || ... chr(31), 'x')

which would strip the control characters without all the nested calls. (I guess their translate is something else...IIRC, Oracle would allow a user-defined translate/1 (one-argument translate) to co-exist with the builtin translate/3 (three-argument translate).

2007-01-08 Reply Admin

Mind that this is clean-up code : write once, execute once!

Not quite T-SQL I think, so I can't be sure on what other 'better' functions might be available; but apart from using a UDF (that holds more or less the same code, be it better 'hidden' and most probably a lot slower) I don't think there is much improvement to be made on this.

From personal experience, more often than not you're given 'full' access (SUID) to a set of tables, but find yourself totally locked away from making your own objects (eg. functions) when doing ETL stuff, so the guy might have had to do what was available and did so nicely. Although it might not be the nicest thing to look at, it's correct and it's fast! Where's the WTF ?

I'm pretty sure the author in question wrote a little script to generate given code (32 x 32 is A LOT of replaces =), if not, well hmm, THAT might be a WTF indeed =)

PS: translate is a 'standard' SQL function according to "SQL in a nutshell", not quite sure I understand 'what it can look like' ?!

Translate : Converts a string from one character set to another.

2007-01-08 Reply Admin

DarthGeek:
In SQL Server:
DECLARE @BadStrings VARCHAR(50) SET @BadStrings = 'Hello' + CHAR(5) + ' There'

SELECT REPLACE(@BadStrings, SUBSTRING(@BadStrings, PATINDEX('%[^a-zA-Z0-9 '''''']%', @BadStrings), 1), '')

That shoudl about do it.

That will eliminate the first instance of a bad character, but not more.

For example:

DECLARE @BadStrings VARCHAR(50) SET @BadStrings = 'Hello' + CHAR(5) + char(2)+' There'

SELECT REPLACE(@BadStrings, SUBSTRING(@BadStrings, PATINDEX('%[^a-zA-Z0-9 '''''']%', @BadStrings), 1), '')

(I added a second, different, undesired character)

leaves a char(2) in the result.

A UDF with a loop in it like:

while patindex(@string, '%[-]%', @string) > 0 select @string = replace(@string, substring(@string, patindex(@string, '%[-]%', @string), 1), '')

Will clean up the whole string. Performance isn't too bad, and it's much easier to maintain than 30+ nested replace() functions.

(I hope the characters in my example show. I copied-and-pasted them into the editor here.)

slamb · 2007-01-08 Reply Admin

deroby, you're wrong on all points. It sounds like they're running this regularly, it's not a question of access (someone of similar skill must have written filter and translate), and they're not using the built-in translate function...it's overloaded with their own. The performance penalty for user-defined functions obviously is not a concern if they're calling two of them and nesting 32 calls to a built-in (31 of which are unnecessary). They must not have written a script to generate this, since the case is not consistent.

2007-01-08 Reply Admin

And what if the feed is actually sending ISO-8859-1 or UTF-8? Then what? I would hope there's a better solution than enumerating all the "missing" characters in the Unicode space!

Oracle supports in-DB Java since at least Oracle 8.

2007-01-08 Reply Admin

mav:
Good night... Can anyone tell me what exactly that does? I lost interest somewhere around the 15th "Replace".

I lost interest a lot sooner, but yes, I can tell you what it does. It replaces in the "name" column any character with an ASCII value of 1 to 31 with a null value. In other words, it strips out most "control" characters.

Of course, it COULD be written like this:

variable SRCH_STR VARCHAR2(40)

BEGIN :SRCH_STR := chr(1)||chr(2)||chr(3)||chr(4)||chr(5) ; :SRCHSTR := :SRCHSTR || chr(6)||chr(7)||chr(8)||chr(9)||chr(10) ; :SRCHSTR := :SRCHSTR || chr(11)||chr(12)||chr(13)||chr(14)||chr(15) ; :SRCHSTR := :SRCHSTR || chr(16)||chr(17)||chr(18)||chr(19)||chr(20) ; :SRCHSTR := :SRCHSTR || chr(21)||chr(22)||chr(23)||chr(24)||chr(25) ; :SRCHSTR := :SRCHSTR || chr(26)||chr(27)||chr(28)||chr(29)||chr(30)||chr(31) ; END; /

UPDATE OWNER_USER.all_candidates SET name = filter(translate( dbms_standard.translate( name, 'a'||:SRCH_STR, 'a' ) )), ...

This perversion probably occurred because somebody DID write their own TRANSLATE() function (therefore my "dbms_standard." junk).

But I don't even want to go there ...

2007-01-08 Reply Admin

Probably written by a DBA with a obsessive compulsive disorder

2007-01-08 Reply Admin

The biggest WTF here is that it is being done in the database. Doesn't matter what RDBMS it is. This sort of string manipulation belongs in a scrubbing application. It's like building HTML in the database. Why?

brendan · 2007-01-08 Reply Admin

The main problem is that it is done at the wrong level, it should of been implemented as a stored function, so that the 32 other columns could call it. Or it should of been part of the applications job to validate the columns. Also it could of been implemented as a trigger.

The other problem is the way it was done. It could of been done like this (implemented using PostgreSQL, Note I've only known about Stored Procedures for a week, so if there is any little errors, blame it on lack of experience with them):

CREATE OR REPLACE FUNCTION cleanUpCandidateColum(TEXT columnData)
      RETURNS TEXT
BEGIN
   RETURN regexp_replace(columnData,'[\x1-\x1F]+', '', 'g');
END;

2007-01-08 Reply Admin

I wouldn't be surprised if this code was actually generated by a program, maybe something which translates SQL queries from one database to another...

2007-01-09 Reply Admin

I'm not going to read the comments.

I just want to say.. "RegEx primer"

PS this place has the funniest CAPTCHA, thanks :)

2007-01-09 Reply Admin

I was working for a very large US-company (which actually should know better), and we did all the system integration in VBScript (!) - where any kind of necessary text encoding cleanup was done like this:

out = replace(out, "ä", "ae") out = replace(out, "Ä", "Ae") ... out = replace(out, "é", "e") ...

etc., etc.. hundreds of lines like this. This didn't even take into account that the replacement Ä -> Ae only works with some languages.

Oh yes, and the target encoding was actually UTF-8...

I quit there after half a year.

DM

Captcha: Java

2007-01-09 Reply Admin

I will admit to having written code which did something like this, for a routine which generated XML files from Oracle tables. You see, ASCII 0-8, 11-12, 14-31 are not valid XML characters, so I had to strip them out from the data. I just used a single REPLACE call which was nowhere near as ugly as this example.

Cheers, APC

2007-01-09 Reply Admin

slamb: not going to start a YesNo war here. I simply wanted to point out that sometimes there are situations where you need 'do' with the tools you've been offered. Nested replaces are not that terrible IMHO. Sure it can be done nicer/faster/better/whatever, but without proper context I consider it a bit harsh on the author to call this a proper WTF.

Makes me think of situations where you're sent to some company with the mission to convert some mapping from the "mainframe" to "the application server". You're fully briefed and prepared on the functional part, you know plenty of the app-servers architecture and best of all, you get the entire week to 'massage' a measly couple of ten-thousand lines into the new format, easy! Off course the local IT guys will love to help you out ("damn external overpaid consultant I don't have time for"), so they give you terminal access to a 'well-respected data source' (not familiar with AS400 ? Don't worry, it's self-explanatory). After 2 days lost you manage to write the files to disk and FTP them to your server where you then get a 'just-a-bit-more-than-public' account to manipulate and load the data again, only to find they're full of strange characters after you BCP the file in. So you start looking for problems and fix them one by one, hoping it will all be done before the end of the week because you don't want to miss your flight back home. Sometimes getting it working is a matter of economics (read: within the allotted time-frame and budget), not proper design. Sad but true. Does that make it a WTF, imho not, YMMV.

FWIW :

Copy pasting this only takes only 5 iterations, casing seems to contradict scripting, but heck, I HOPE the other columns were cased identically ?!
The Translate() and Filter() functions make my 'not enough access' a moot point indeed, although it might be that he only added those afterwards and didn't want to break what already worked.
Although NOW I'd rather "loose" my first days fighting the local powers that be in order to get the proper tools for the task (eg. DTS or alike for a job like this), I've run into 'described' situations myself while I was a 'newbie' and 'getting it done' was what counted!
Although it's not said explicitly that this script runs on a regular basis, if they can than this is proof that he made it 'rerunnable', much better than those "manuscripts" left behind by "some experts" that explain in 18 steps using 15 tools how to redo the reload of the data
ETL can be hell.

As a general note on this web-site : get real guys, there's way too much elitists (sp?) here. Sure it's cool to write "... I quit the week after that ...", but that doesn't make the problem go away. If everyone would react like that its no surprise that people get replaced by technical "know-nothings" with total lack of "coding-style" and maybe a funny accent, because at least THAT guy got something working the way "the boss" had requested. (sort off). Although I like to think the opposite too sometimes, coding is NOT a call, it's a JOB.

2007-01-09 Reply Admin

So this is what happens when you have bad karma: you come back as a SQL programmer writing crap like this.

And "repeated nearly verbatim for the 32 other columns in the table". That really is icing on the WTF cake.

2007-01-09 Reply Admin

Ross:
You people are morons. This is the only way to perform the required operation in sql without sacrificing performance. Not all programming is rainbows and sunshine. There will come days with the code gets ugly in oreder to perform. Get a grip, and let this one go.

mmm, I love the way troll smells in the morning

The Data Cleanup

REPLA

Leave a comment on “The Data Cleanup”