- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
It's ok to not know things. TRWTF is in how you approach it. If you say "databases suck" and start putting in hacks, even those labeled as such, then you end up here. If you Google "how to mysql/t-sql/oracle/yourmomql characters with accents" because databases are a part of your job and you should learn how to deal with them, you'll end up reading the first article on collation and figure it out.
I use mysql. In MySQL, the = operator defaults to a case-insensitive search and the collation depends on the database/table/column settings, so often you wouldn't even need the extra "collate blabbityblah like" bit. I assume other DBMSs have a way to set a default collation.
Admin
Fails if the database contains accented characters. Which it presumably does, because the replacement is being performed on the database columns, not on the parameters being passed in.
Admin
Some would argue that having anything other than long integers as a primary key is wrong.
Admin
Admin
If you don't use the collation way (and assuming you don't know more SQL specifics than the coders in this example for the particular database), you would have to pass all possible variations of the string. That soon comes to a lot of string. Another way would be to fetch all data, then do the selection in PHP. That also happens to probably be the worst solution imaginable.
"haero" - I think of Usagi Yojimbo.
Admin
Oh dear.
Admin
Admin
You definitely want to preserve your full data in the actual column, and use a computed column for the lossy collation if you plan on showing the accent characters.
Admin
Admin
Like, say, MySQL? http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Or SQL server? http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Or probably just about every SQL flavor out there that you care to Google with "SqlLanguageName + regex" ?
Admin
Some would argue that using long integers as a primary key, when any other field is sufficient for uniquely defining a row, is being like those people who click "apply" before they click "OK", just to be sure.
It's redundant information and therefore, by definition, bad database design. Unless, of course, you have a strong performance-related technical argument for the use of long integers, but those are for experts. The way you use the word "wrong" in your sentence indicates you're not one of those. Also, I can personally not think of any case in which such an argument would apply, because I too, am no expert, but I do like to think I know a few things about binary trees and indexing.
Anyway, back on topic: I had to explain the difference between a primary key and an auto_increment column to my boss the other day (who has been developing database-driven applications for a decade now). Talk about a WTF. The conversation started because in my first month with the company, I'd made a table with three integer columns in it, all of which combined were the primary key. He had no idea that was possible.
Admin
I haven't seen so many bad puns come down the pike in quite some time.
Admin
Admin
The real WTF is both the use of LIKE and the REPLACE commands to table scan search. Even with a 'text index' (lol!) this is terribly inefficient as the replace commands are all performed on each row and each row needs to be scanned for matches!
This could all be replaced with a ridiculously more efficient text search system, like MSSQL's and use of FREETEXT or CONTAINS. It will actually index words and search based on the words in the string and can be marked as accent insensitive and solve the speed, indexing and accent problem all in one go - bam! or can even return results based on relevance for you
Admin
A case to illustrate my point while I'm up here on my soapbox: people might be defined by a long integer, because there can be two people living on the same address with the same name, sex and date of birth. That, as I sometimes explain to people who say they're "just a number", is why we have social security numbers, employee numbers, etc.
(captcha: vereor. it sounds like a gurgle!)
Admin
Same address = roommates or family. How many parents can there really be who give more than one of their kids the same name?
I guess it's possible for two people with the same first, last and middle to decide to move in together. Or two with same first & middle to get married & take the same last name. Now I want to know if this has ever happened. And how confused their credit reports are.
Admin
The catch is "when any other field is sufficient". But often, there's no such field (or combination of fields) - either nothing is really unique, or something is unique but nullable. Actual unique business identifiers are pretty rare in my experience.
Admin
And they should be auto-generated and not changeable by the user.
Admin
Not many, but you often have a father and son with exactly the same name, and no legal "junior", let alone a "the third". I knew I guy who used to get hauled into jail for his father's outstanding warrants on a semi-regular basis.
Admin
Admin
Its not wrong to use other things than long. But to be future proof is advised to always use surrogate keys, because primary keys that are based on data from real world cant be garanted to be always unique in the future. requeriments can change and will change, and you cool primary key will need to be changed, except that milions of thing now depends on it.
Admin
And let me assure you...
Admin
I agree with you, but you cannot say that always is a good thing to pick the key from real world. In practice the use of surrogate keys simplify things and even make the system more performant. Its is not bad design to use autoincrement (or surrogate). Most of commercial database products defaults to it because an reason, the reason that in most situations it is just fine. And you always can use some unique constraint that dont cause a lot of dependencies (foreign relationships) to be changed when requirements change.
this say all: "Actual unique business identifiers are pretty rare in my experience" Real world provide poor unique keys to be used.
Why lose time carefully crafting an database schema, only to wait a bit of time for it to be massacrated by crude reality. And it will cost more to change all the primary keys and foreign keys.
I have some experiences using CPF (in brazil, it is like social number) in some system, but then the client make a change in its business and allowed clients with only telephone and address to buy in the system. and all damn keys needed to be changed. Never more i think much when choosing to use surrogate keys instead of natural key, it doest hurts, only if you are a purist teoric moron that come from academia and havent lived in real world for much time.
Sorry because i am poor in english.
Admin
You say that like it's a bad thing.
Admin
Duh!
Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace( Replace(Replace(Replace(Replace(name, ' ',''), 'É','E'),'È','E'),'Ê','E'),'Ë','E'), 'À','A'),'Â','A'),'Ä','A'), 'Ï','I'),'Î','I'), 'Ç','C'), 'Ô','O'),'Ö','O'), 'Ü','U'),'Ù','U'),'Û','U')
Admin
Try being a person with a Lusitanic name in India. Accented characters be dammed, most IT systems reject an apostrophe as an illegal charachter.
Admin
Bob has introduced a bug: the original code also removed spaces from the fields, but his code does not.
Replace(name, ' ','')
Admin
Oh my cod!
Admin
Yes and I can use my screwdriver as a hammer
Admin
And they should come from non-overlapping ranges for different tables.
In theory, you should never have a loose ID floating around. In practice, when you're on a support call and someone says "I have a number", you want to say "thank you, I see it" rather than "is that an account number, a person number, an invoice number or a ...?".
BTW, for stripping accents, rather than manually making a list and probably missing one, it's probably better to normalise it to Unicode NFD (or NFKD) and filter out anything non-ASCII.
Admin
Programmers are taught to always abstract, to always encapsulate, to always separate interface from implementation.
DBA's are taught to never duplicate information.
I've been on both sides of the fence. There are good reasons for both approaches. And database efficiency is NOT the only, or even the primary reason for avoiding duplicate information. But my heart lies with the programmers.
Admin
Admin
They must have been knowing that :-) And they have a good reason to hate database for that.
If they wanted to eliminate the upper half of ASCII table, then 'COLLATE Latin_General_CI_AI' is not enough; for example it does not transform the letter 'ó' into 'o'.
And by the way, the second WTF is that SQL Server still doesn't have native Regex support, also in upcoming v. 2012 :-)
Admin
What does Nethack have to do with this?
captcha: nulla - ...I got nothing
Admin
Even social security numbers aren't good primary identifiers; they can be (and are) reused, and not everyone has one - even if you include only US citizens. Makes me cringe thinking how many incorrectly designed systems there are that use SSNs as primary keys (not to mention the inability to store them securely; encrypted primary keys anyone?)
This demonstrates why meaningless long integers are the best primary keys to use. As another poster pointed out, truly unique natural keys are very rare. If you think you have one, you don't. If you are sure you have one, you probably still don't.
If you are 100% double dog dare sure that you have one and have 10 business / DB experts that agree with you, it's STILL better to use integers for performance reasons anyway, since computers cannot compare /anything/ faster than they can compare integers; certainly not composites thereof.
Using "business keys" as primary keys is a basic error; and one of the few that can sink a project all by itself if requirements change. The design technique is fundamentally flawed.
Admin
Thos of you who are replying to me about the primary keys are making valid points; I've learned a few things this winter morning.
However, just to be clear, I wasn't advocating the use of a social security number as a primary key for one's own application, but explaining that I feel that arguably, a number is the only way to describe a person, especially when you're dealing with as many people as live in the United States, or Brazil.
Admin
Now, now - this is no plaice for postings like this - are you herring me, out there?
Admin
There was an accident with a contraceptive and a time machine. I can't explain it right now.
Admin
What do you say to them? I'd say: "Please clear your desk and report to HR to collect your paperwork."
Admin
More to the point, using long integers consistently in a database design for primary keys reduces the overall long-term complexity and maintenance burden. As all the PKs are in the same format, the algorithms to handle them are much more similar than they would be if the PKs consisted of various disparate structures. It may not actually be the best way of designing your database in your particular instance, but a good engineer would take such a point into consideration before jumping in with a half-baked design.
Admin
How presumptuous of Bob to just change the code. How does he know that the existing method was not derived through many iterations to be the absolute best that the company business required and that his change won't jeopardize the entire company and leave his co-workers unemployed.
Admin
Fair play to you - your English is considerably better than my Portuguese.
Admin
Reads like a song by Sonic Youth.
Admin
Most databases I know don't have an extra field for the number of the appartment you are living in, so what do you do now? ;-)
Admin
If I use enough brute force, I can use my hammer as a screwdriver.
Admin
And sometimes it is the right thing to do. I'm a Perl guy. In Perl, to execute an Oracle stored procedure, you have to explicitly bind the input and output parameters to your PL/SQL procedure calling code (e.g., to begin foo(bar); end;).
Now, I had to tie into probably 50+ PL/SQL stored procedures, many of which use the same parameters, but for different reasons. So I started off on the simple approach: write the wrapper, including bind_param statement for the first method, copy the method, change the procedure name and parameters as necessary, later, rinse, repeat.
Sometime about the 5th iteration of this my brain pulled me up short with "there has to be a better way to do this..." Hours passed as I pondered the situation. Then, like a flash of lightning, it hit me...
One routine does the binding. It takes a procedure name (as a scalar, including database schema name), an arrayref of parameter descriptors (name, parameter size, whether it is in/out), and the input data values via @. It walks the parameter list, in parallel with @, binding to the procedure wrapper code, executes the procedure, and bubbles any output parameters back up via return.
Now a stored procedure wrapper is a Perl routine wrapper with some descriptors calling the worker method--and I can call my PL/SQL procedures as though they were native Perl methods.
Admin
I prefer UUIDs as primary keys. They are automatically unique across tables, can be generated outside the database if desired, and avoid primary key clashes when using multi-master database replication.
Admin
Actual unique business identifiers that stay unique over more than a few years are even rarer
Admin
Admin
Our old mess Sargeant's taste buds had been shot off in the war But his savory collations add to our esprit de corps...