The Daily WTF: Curious Perversions in Information Technology

jonnyq · 2011-11-21 Reply Admin

snoofle:
Nobody knows everything, least of all beginners, so I can understand not knowing that the "unusual" characters are effectively translated-down when using "like" (I didn't realize that either).
Not knowing enough to ask if there's a better way than what they came up with, which looks sufficiently complicated that there just has to be a better way, is unforgivable in this day and age.

It's ok to not know things. TRWTF is in how you approach it. If you say "databases suck" and start putting in hacks, even those labeled as such, then you end up here. If you Google "how to mysql/t-sql/oracle/yourmomql characters with accents" because databases are a part of your job and you should learn how to deal with them, you'll end up reading the first article on collation and figure it out.

I use mysql. In MySQL, the = operator defaults to a case-insensitive search and the collation depends on the database/table/column settings, so often you wouldn't even need the extra "collate blabbityblah like" bit. I assume other DBMSs have a way to set a default collation.

2011-11-21 Reply Admin

DavidTC:
Erm, let's assume I don't know anything about collation...why the hell would I use SQL functions? Hell, the code _itself_ asks why they're doing it in SQL. You do the replacement outside the SQL, and pass that variable in. Uh...duh?

Fails if the database contains accented characters. Which it presumably does, because the replacement is being performed on the database columns, not on the parameters being passed in.

Vombatus · 2011-11-21 Reply Admin

Robbert:
I don't think this is very wtf-worthy. If I hadn't accidentally used a accent-insensitive primary key in a project that had both an accented and non-accented version of the same primary key I would have never found out about this feature (until now of course, that's why TDWTF is awesome).

Some would argue that having anything other than long integers as a primary key is wrong.

ContraCorners · 2011-11-21 Reply Admin

gizmore:
I wonder if it would transcript "Umlauts" like Ü to UE.
abico: I am a wooden calculator.

Wrong. abico is the company that makes the wooden calculators.

2011-11-21 Reply Admin

DavidTC:
You do the replacement outside the SQL, and pass that variable in. Uh...duh?

You want to pass a variable that will match several differently looking entries (with/without accents) in an SQL query. How would you do that outside of SQL? What should the modified string look like?

If you don't use the collation way (and assuming you don't know more SQL specifics than the coders in this example for the particular database), you would have to pass all possible variations of the string. That soon comes to a lot of string. Another way would be to fetch all data, then do the selection in PHP. That also happens to probably be the worst solution imaginable.

"haero" - I think of Usagi Yojimbo.

2011-11-21 Reply Admin

Daniel:
DavidTC:
You do the replacement outside the SQL, and pass that variable in. Uh...duh?
You want to pass a variable that will match several differently looking entries (with/without accents) in an SQL query. How would you do that outside of SQL? What should the modified string look like?
If you don't use the collation way (and assuming you don't know more SQL specifics than the coders in this example for the particular database), you would have to pass all possible variations of the string. That soon comes to a lot of string. Another way would be to fetch all data, then do the selection in PHP. That also happens to probably be the worst solution imaginable.

"haero" - I think of Usagi Yojimbo.

Oh dear.

2011-11-21 Reply Admin

Machtyn:
And this is exactly why I use Google. Sometimes, the correct answer is given to us when I need it, not when there's a relevant article by chance. (Not being a db programmer or an experienced programmer, I would have never known about trying the correct method.)

FTFY

2011-11-21 Reply Admin

Richard:
Neither query is SARGable [1]; any index on the name or surname columns will be ignored. As the table grows, the query will get slower, which will no doubt be the fault of the database, and used as an argument to go NoSQL [2].
The simple solution is to change the collation on the name columns and drop the "collate ..." clauses from the query.

If that won't work for some reason, use a computed column with the correct collation instead. [3]

[1] http://en.wikipedia.org/wiki/Sargable [2] http://en.wikipedia.org/wiki/NoSQL [3] http://www.sqlteam.com/article/using-indexed-computed-columns-to-improve-performance

You definitely want to preserve your full data in the actual column, and use a computed column for the lossy collation if you plan on showing the accent characters.

2011-11-21 Reply Admin

Oded:
/* HACK AGAIN! Oh yeah, no cod reuse. Why are we doing this in SQL!? */
I do try to reuse my fish, whenever possible.

That's disgusting! I bet you do the same with toilet paper as well.

swiers · 2011-11-21 Reply Admin

frits:
/* HACK! Being SQL, there's no concept of RegEx, so we have to this horrible hack */
Well they should have constructed the query in a proper language that has Regexes, then. ;)

Like, say, MySQL? http://dev.mysql.com/doc/refman/5.1/en/regexp.html

Or SQL server? http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Or probably just about every SQL flavor out there that you care to Google with "SqlLanguageName + regex" ?

2011-11-21 Reply Admin

Vombatus:
Robbert:
I don't think this is very wtf-worthy. If I hadn't accidentally used a accent-insensitive primary key in a project that had both an accented and non-accented version of the same primary key I would have never found out about this feature (until now of course, that's why TDWTF is awesome).

Some would argue that having anything other than long integers as a primary key is wrong.

Some would argue that using long integers as a primary key, when any other field is sufficient for uniquely defining a row, is being like those people who click "apply" before they click "OK", just to be sure.

It's redundant information and therefore, by definition, bad database design. Unless, of course, you have a strong performance-related technical argument for the use of long integers, but those are for experts. The way you use the word "wrong" in your sentence indicates you're not one of those. Also, I can personally not think of any case in which such an argument would apply, because I too, am no expert, but I do like to think I know a few things about binary trees and indexing.

Anyway, back on topic: I had to explain the difference between a primary key and an auto_increment column to my boss the other day (who has been developing database-driven applications for a decade now). Talk about a WTF. The conversation started because in my first month with the company, I'd made a table with three integer columns in it, all of which combined were the primary key. He had no idea that was possible.

2011-11-21 Reply Admin

PedanticCurmudgeon:
me_again:
fisherman:
benmurphyx:
These people always flounder around for a solution
Looks like we not be gettin many bite today...
If I were on the hook for this I would have been like a fish out of water.
I would expect someone to carp about these puns.

I haven't seen so many bad puns come down the pike in quite some time.

DaveK · 2011-11-21 Reply Admin

Mike:
PedanticCurmudgeon:
me_again:
fisherman:
benmurphyx:
These people always flounder around for a solution
Looks like we not be gettin many bite today...
If I were on the hook for this I would have been like a fish out of water.
I would expect someone to carp about these puns.

I haven't seen so many bad puns come down the pike in quite some time.

They're giving me a haddock.

2011-11-21 Reply Admin

The real WTF is both the use of LIKE and the REPLACE commands to table scan search. Even with a 'text index' (lol!) this is terribly inefficient as the replace commands are all performed on each row and each row needs to be scanned for matches!

This could all be replaced with a ridiculously more efficient text search system, like MSSQL's and use of FREETEXT or CONTAINS. It will actually index words and search based on the words in the string and can be marked as accent insensitive and solve the speed, indexing and accent problem all in one go - bam! or can even return results based on relevance for you

2011-11-21 Reply Admin

Toon:
Some would argue that using long integers as a primary key, when any other field is sufficient for uniquely defining a row, is being like those people who click "apply" before they click "OK", just to be sure.

A case to illustrate my point while I'm up here on my soapbox: people might be defined by a long integer, because there can be two people living on the same address with the same name, sex and date of birth. That, as I sometimes explain to people who say they're "just a number", is why we have social security numbers, employee numbers, etc.

(captcha: vereor. it sounds like a gurgle!)

aliquot · 2011-11-21 Reply Admin

Toon:
there can be two people living on the same address with the same name, sex and date of birth.

Wait, what?

Same address = roommates or family. How many parents can there really be who give more than one of their kids the same name?

I guess it's possible for two people with the same first, last and middle to decide to move in together. Or two with same first & middle to get married & take the same last name. Now I want to know if this has ever happened. And how confused their credit reports are.

2011-11-21 Reply Admin

Toon:
Some would argue that using long integers as a primary key, when any other field is sufficient for uniquely defining a row, is being like those people who click "apply" before they click "OK", just to be sure.

The catch is "when any other field is sufficient". But often, there's no such field (or combination of fields) - either nothing is really unique, or something is unique but nullable. Actual unique business identifiers are pretty rare in my experience.

2011-11-21 Reply Admin

Vombatus:
Robbert:
I don't think this is very wtf-worthy. If I hadn't accidentally used a accent-insensitive primary key in a project that had both an accented and non-accented version of the same primary key I would have never found out about this feature (until now of course, that's why TDWTF is awesome).

Some would argue that having anything other than long integers as a primary key is wrong.

And they should be auto-generated and not changeable by the user.

swiers · 2011-11-21 Reply Admin

aliquot:
Toon:
there can be two people living on the same address with the same name, sex and date of birth.
Wait, what?
Same address = roommates or family. How many parents can there really be who give more than one of their kids the same name?

Not many, but you often have a father and son with exactly the same name, and no legal "junior", let alone a "the third". I knew I guy who used to get hauled into jail for his father's outstanding warrants on a semi-regular basis.

2011-11-21 Reply Admin

swiers:
aliquot:
Toon:
there can be two people living on the same address with the same name, sex and date of birth.
Wait, what?
Same address = roommates or family. How many parents can there really be who give more than one of their kids the same name?

Not many, but you often have a father and son with exactly the same name, and no legal "junior", let alone a "the third". I knew I guy who used to get hauled into jail for his father's outstanding warrants on a semi-regular basis.

Not often a father and son would share the same date of birth, I would think....

2011-11-21 Reply Admin

Vombatus:
Robbert:
I don't think this is very wtf-worthy. If I hadn't accidentally used a accent-insensitive primary key in a project that had both an accented and non-accented version of the same primary key I would have never found out about this feature (until now of course, that's why TDWTF is awesome).

Some would argue that having anything other than long integers as a primary key is wrong.

Its not wrong to use other things than long. But to be future proof is advised to always use surrogate keys, because primary keys that are based on data from real world cant be garanted to be always unique in the future. requeriments can change and will change, and you cool primary key will need to be changed, except that milions of thing now depends on it.

D-Coder · 2011-11-21 Reply Admin

PedanticCurmudgeon:
Simon:
Bob:
Anyone who thinks regexs are a solution to character set issues deserves a good shooting. I can understand not knowing the answer, but thinking regex is the solution is insane.
Yes, I know: Anyone who thinks regexs are a solution deserves a good shooting. FTFY.

Yup... the good old "two problems" solution....
Someone should have brought up the "two problems" thing at least 2 hours ago. What's wrong with you people?

We're retarded.

And let me assure you...

2011-11-21 Reply Admin

Toon:
Vombatus:
Robbert:
I don't think this is very wtf-worthy. If I hadn't accidentally used a accent-insensitive primary key in a project that had both an accented and non-accented version of the same primary key I would have never found out about this feature (until now of course, that's why TDWTF is awesome).

Some would argue that having anything other than long integers as a primary key is wrong.

Some would argue that using long integers as a primary key, when any other field is sufficient for uniquely defining a row, is being like those people who click "apply" before they click "OK", just to be sure.

It's redundant information and therefore, by definition, bad database design. Unless, of course, you have a strong performance-related technical argument for the use of long integers, but those are for experts. The way you use the word "wrong" in your sentence indicates you're not one of those. Also, I can personally not think of any case in which such an argument would apply, because I too, am no expert, but I do like to think I know a few things about binary trees and indexing.

Anyway, back on topic: I had to explain the difference between a primary key and an auto_increment column to my boss the other day (who has been developing database-driven applications for a decade now). Talk about a WTF. The conversation started because in my first month with the company, I'd made a table with three integer columns in it, all of which combined were the primary key. He had no idea that was possible.

I agree with you, but you cannot say that always is a good thing to pick the key from real world. In practice the use of surrogate keys simplify things and even make the system more performant. Its is not bad design to use autoincrement (or surrogate). Most of commercial database products defaults to it because an reason, the reason that in most situations it is just fine. And you always can use some unique constraint that dont cause a lot of dependencies (foreign relationships) to be changed when requirements change.

this say all: "Actual unique business identifiers are pretty rare in my experience" Real world provide poor unique keys to be used.

Why lose time carefully crafting an database schema, only to wait a bit of time for it to be massacrated by crude reality. And it will cost more to change all the primary keys and foreign keys.

I have some experiences using CPF (in brazil, it is like social number) in some system, but then the client make a change in its business and allowed clients with only telephone and address to buy in the system. and all damn keys needed to be changed. Never more i think much when choosing to use surrogate keys instead of natural key, it doest hurts, only if you are a purist teoric moron that come from academia and havent lived in real world for much time.

Sorry because i am poor in english.

2011-11-21 Reply Admin

aliquot:
How many parents can there really be who give more than one of their kids the same name?

You say that like it's a bad thing.

2011-11-21 Reply Admin

Duh!

Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace(name, ' ',''), 'É','E'),'È','E'),'Ê','E'),'Ë','E'), 'À','A'),'Â','A'),'Ä','A'), 'Ï','I'),'Î','I'), 'Ç','C'), 'Ô','O'),'Ö','O'), 'Ü','U'),'Ù','U'),'Û','U')

2011-11-21 Reply Admin

George Foreman:
aliquot:
How many parents can there really be who give more than one of their kids the same name?

You say that like it's a bad thing.

Try being a person with a Lusitanic name in India. Accented characters be dammed, most IT systems reject an apostrophe as an illegal charachter.

2011-11-22 Reply Admin

Bob has introduced a bug: the original code also removed spaces from the fields, but his code does not.

Replace(name, ' ','')

2011-11-22 Reply Admin

Oh my cod!

2011-11-22 Reply Admin

et:
trwtf is that most DBMS's has RegEx support

Yes and I can use my screwdriver as a hammer

2011-11-22 Reply Admin

Some would argue that having anything other than long integers as a primary key is wrong.
And they should be auto-generated and not changeable by the user.

And they should come from non-overlapping ranges for different tables.

In theory, you should never have a loose ID floating around. In practice, when you're on a support call and someone says "I have a number", you want to say "thank you, I see it" rather than "is that an account number, a person number, an invoice number or a ...?".

BTW, for stripping accents, rather than manually making a list and probably missing one, it's probably better to normalise it to Unicode NFD (or NFKD) and filter out anything non-ASCII.

2011-11-22 Reply Admin

Luiz Felipe:
Vombatus:

Some would argue that having anything other than long integers as a primary key is wrong.

Its not wrong to use other things than long. But to be future proof is advised to always use surrogate keys, because primary keys that are based on data from real world cant be garanted to be always unique in the future. requeriments can change and will change, and you cool primary key will need to be changed, except that milions of thing now depends on it.

I've long thought that the issue of surrogate keys divides the database community into Programmers and DBA's.

Programmers are taught to always abstract, to always encapsulate, to always separate interface from implementation.

DBA's are taught to never duplicate information.

I've been on both sides of the fence. There are good reasons for both approaches. And database efficiency is NOT the only, or even the primary reason for avoiding duplicate information. But my heart lies with the programmers.

Severity One · 2011-11-22 Reply Admin

Oh yeah, no cod reuse.

It's people like these, that don't reuse their cods, that put this fish on the brink of extinction.

2011-11-22 Reply Admin

They must have been knowing that :-) And they have a good reason to hate database for that.

If they wanted to eliminate the upper half of ASCII table, then 'COLLATE Latin_General_CI_AI' is not enough; for example it does not transform the letter 'ó' into 'o'.

And by the way, the second WTF is that SQL Server still doesn't have native Regex support, also in upcoming v. 2012 :-)

2011-11-22 Reply Admin

What does Nethack have to do with this?

captcha: nulla - ...I got nothing

2011-11-22 Reply Admin

Toon:
That, as I sometimes explain to people who say they're "just a number", is why we have social security numbers, employee numbers, etc.

Even social security numbers aren't good primary identifiers; they can be (and are) reused, and not everyone has one - even if you include only US citizens. Makes me cringe thinking how many incorrectly designed systems there are that use SSNs as primary keys (not to mention the inability to store them securely; encrypted primary keys anyone?)

This demonstrates why meaningless long integers are the best primary keys to use. As another poster pointed out, truly unique natural keys are very rare. If you think you have one, you don't. If you are sure you have one, you probably still don't.

If you are 100% double dog dare sure that you have one and have 10 business / DB experts that agree with you, it's STILL better to use integers for performance reasons anyway, since computers cannot compare /anything/ faster than they can compare integers; certainly not composites thereof.

Using "business keys" as primary keys is a basic error; and one of the few that can sink a project all by itself if requirements change. The design technique is fundamentally flawed.

2011-11-22 Reply Admin

Thos of you who are replying to me about the primary keys are making valid points; I've learned a few things this winter morning.

However, just to be clear, I wasn't advocating the use of a social security number as a primary key for one's own application, but explaining that I feel that arguably, a number is the only way to describe a person, especially when you're dealing with as many people as live in the United States, or Brazil.

QJo · 2011-11-22 Reply Admin

DaveK:
Mike:
PedanticCurmudgeon:
me_again:
fisherman:
benmurphyx:
These people always flounder around for a solution
Looks like we not be gettin many bite today...
If I were on the hook for this I would have been like a fish out of water.
I would expect someone to carp about these puns.

I haven't seen so many bad puns come down the pike in quite some time.

They're giving me a haddock.

Now, now - this is no plaice for postings like this - are you herring me, out there?

2011-11-22 Reply Admin

There was an accident with a contraceptive and a time machine. I can't explain it right now.

QJo · 2011-11-22 Reply Admin

Buddy:
Oh yeah, we've had anti-DB people. I wouldn't be surprised if the code sample was from the same shop. One guy didn't trust databases to run queries, so he dumped everything to enormous files and ran his homemade C utilities on them. When I showed him how much easier and faster it is to do queries in SQL, he would point out the differences in the results as evidence that "SQL wasn't working right". When I countered that it was because NULLs weren't being handled properly in his C utilities, e.g. atoi("0") == atoi("NULL"), he said "That's okay, the errors aren't significant." What do you say to someone so refractorily obtuse? At the time, I think I was going through two-fifths of vodka a week.

What do you say to them? I'd say: "Please clear your desk and report to HR to collect your paperwork."

QJo · 2011-11-22 Reply Admin

Simon:
Toon:
Some would argue that using long integers as a primary key, when any other field is sufficient for uniquely defining a row, is being like those people who click "apply" before they click "OK", just to be sure.

The catch is "when any other field is sufficient". But often, there's no such field (or combination of fields) - either nothing is really unique, or something is unique but nullable. Actual unique business identifiers are pretty rare in my experience.

More to the point, using long integers consistently in a database design for primary keys reduces the overall long-term complexity and maintenance burden. As all the PKs are in the same format, the algorithms to handle them are much more similar than they would be if the PKs consisted of various disparate structures. It may not actually be the best way of designing your database in your particular instance, but a good engineer would take such a point into consideration before jumping in with a half-baked design.

2011-11-22 Reply Admin

How presumptuous of Bob to just change the code. How does he know that the existing method was not derived through many iterations to be the absolute best that the company business required and that his change won't jeopardize the entire company and leave his co-workers unemployed.

QJo · 2011-11-22 Reply Admin

Luiz Felipe:
Toon:
Vombatus:
Robbert:
I don't think this is very wtf-worthy. If I hadn't accidentally used a accent-insensitive primary key in a project that had both an accented and non-accented version of the same primary key I would have never found out about this feature (until now of course, that's why TDWTF is awesome).

Some would argue that having anything other than long integers as a primary key is wrong.

Some would argue that using long integers as a primary key, when any other field is sufficient for uniquely defining a row, is being like those people who click "apply" before they click "OK", just to be sure.

It's redundant information and therefore, by definition, bad database design. Unless, of course, you have a strong performance-related technical argument for the use of long integers, but those are for experts. The way you use the word "wrong" in your sentence indicates you're not one of those. Also, I can personally not think of any case in which such an argument would apply, because I too, am no expert, but I do like to think I know a few things about binary trees and indexing.

Anyway, back on topic: I had to explain the difference between a primary key and an auto_increment column to my boss the other day (who has been developing database-driven applications for a decade now). Talk about a WTF. The conversation started because in my first month with the company, I'd made a table with three integer columns in it, all of which combined were the primary key. He had no idea that was possible.

I agree with you, but you cannot say that always is a good thing to pick the key from real world. In practice the use of surrogate keys simplify things and even make the system more performant. Its is not bad design to use autoincrement (or surrogate). Most of commercial database products defaults to it because an reason, the reason that in most situations it is just fine. And you always can use some unique constraint that dont cause a lot of dependencies (foreign relationships) to be changed when requirements change.

this say all: "Actual unique business identifiers are pretty rare in my experience" Real world provide poor unique keys to be used.

Why lose time carefully crafting an database schema, only to wait a bit of time for it to be massacrated by crude reality. And it will cost more to change all the primary keys and foreign keys.

I have some experiences using CPF (in brazil, it is like social number) in some system, but then the client make a change in its business and allowed clients with only telephone and address to buy in the system. and all damn keys needed to be changed. Never more i think much when choosing to use surrogate keys instead of natural key, it doest hurts, only if you are a purist teoric moron that come from academia and havent lived in real world for much time.

Sorry because i am poor in english.

Fair play to you - your English is considerably better than my Portuguese.

QJo · 2011-11-22 Reply Admin

DD:
Duh!
Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace(name, ' ',''), 'É','E'),'È','E'),'Ê','E'),'Ë','E'), 'À','A'),'Â','A'),'Ä','A'), 'Ï','I'),'Î','I'), 'Ç','C'), 'Ô','O'),'Ö','O'), 'Ü','U'),'Ù','U'),'Û','U')

Reads like a song by Sonic Youth.

2011-11-22 Reply Admin

aliquot:
Wait, what?
Same address = roommates or family. How many parents can there really be who give more than one of their kids the same name?

Just think of two men, named John Smith who happen to have the same date of birth living in NY,NY in the same building. One in the 3rd one in the 5th floor.

Most databases I know don't have an extra field for the number of the appartment you are living in, so what do you do now? ;-)

QJo · 2011-11-22 Reply Admin

bob:
et:
trwtf is that most DBMS's has RegEx support

Yes and I can use my screwdriver as a hammer

If I use enough brute force, I can use my hammer as a screwdriver.

2011-11-22 Reply Admin

And sometimes it is the right thing to do. I'm a Perl guy. In Perl, to execute an Oracle stored procedure, you have to explicitly bind the input and output parameters to your PL/SQL procedure calling code (e.g., to begin foo(bar); end;).

Now, I had to tie into probably 50+ PL/SQL stored procedures, many of which use the same parameters, but for different reasons. So I started off on the simple approach: write the wrapper, including bind_param statement for the first method, copy the method, change the procedure name and parameters as necessary, later, rinse, repeat.

Sometime about the 5th iteration of this my brain pulled me up short with "there has to be a better way to do this..." Hours passed as I pondered the situation. Then, like a flash of lightning, it hit me...

One routine does the binding. It takes a procedure name (as a scalar, including database schema name), an arrayref of parameter descriptors (name, parameter size, whether it is in/out), and the input data values via @. It walks the parameter list, in parallel with @, binding to the procedure wrapper code, executes the procedure, and bubbles any output parameters back up via return.

Now a stored procedure wrapper is a Perl routine wrapper with some descriptors calling the worker method--and I can call my PL/SQL procedures as though they were native Perl methods.

xorsyst · 2011-11-22 Reply Admin

Vombatus:
Robbert:
I don't think this is very wtf-worthy. If I hadn't accidentally used a accent-insensitive primary key in a project that had both an accented and non-accented version of the same primary key I would have never found out about this feature (until now of course, that's why TDWTF is awesome).

Some would argue that having anything other than long integers as a primary key is wrong.

I prefer UUIDs as primary keys. They are automatically unique across tables, can be generated outside the database if desired, and avoid primary key clashes when using multi-master database replication.

2011-11-22 Reply Admin

Actual unique business identifiers that stay unique over more than a few years are even rarer

The poop of DOOM · 2011-11-22 Reply Admin

Deepak D'Souza:
George Foreman:
aliquot:
How many parents can there really be who give more than one of their kids the same name?

You say that like it's a bad thing.

Try being a person with a Lusitanic name in India. Accented characters be dammed, most IT systems reject an apostrophe as an illegal charachter.

An apostrophe in your name? You haker schoolboy!

2011-11-22 Reply Admin

Our old mess Sargeant's taste buds had been shot off in the war But his savory collations add to our esprit de corps...

The Anti-SQL Coalition

Leave a comment on “The Anti-SQL Coalition ”