- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I feel so pwned.
Admin
I know there will probably still be retrospective records for dead people on the IRS database until their tax affairs are finalised but that's not what it says above...
Admin
In America, a Social Security Number conveniently serves two vital purposes:
It is a secret password that gives access to sensitive financial and personal information about a person. As such, it should only be given to people with a validated, vital need to see this information.
It is your universal identification number that can be used to uniquely identify you for a variety of purposes. As such, it should be given to everyone with whom you do business in any way.
Admin
"One way to prevent this using a naming schema like id_
."I would say that is a bad idea as it makes the names non-generic and redundant when properly qualified, i.e. suppliers.suppliers_id rather than suppliers.id
Instead use "id" for the current table, but
Admin
"Correct testing in an environment similar to production would have caught the timestamp performance issue,"
But what if the specifications for the system were "process no more than x items per second". Then the slow system met the spec and the faster system broke the spec.
Admin
This is what they hammered into our heads during a database-course during university studies.
Having a automatically incremented key acting as a part of a primary key was said to be the last way out. This was something that conflicted with my previous view of the field. But after thinking about it, it does make sense to greater extent. Especially in pure theory.
Still, there are alot of cases where you can't live without id's. Like a transaction log for example. It's very possible that two equal transactions can happen at the same time. Even though they consist of the same data, they are not the same entity.
Admin
Any time someone urges the use of natural keys, the advocates of surrogate keys always point to examples of how someone made really bad choice for a natural key.
The story in this article is an excellent example of incompetent use of natural keys. I can just hear someone saying, "To make the primarky key, we'll string together the customer number, item number, and transaction type." Then someone objects, "But what if a customer buys two of the same item?" "Oh, good point. Okay, let's slap a timestamp on there to make it unique. There's no way both items will get processed in one millisecond."
I worked on a system years ago where someone got the brilliant (or brillant) idea that customer last name plus birth date plus zip code would make a good primary key. Hey, they said, what's the chance that two people would have the same last name, live in the same town, AND have the same birthdate? Ummm, how about twin brothers who are both still living in the town where they were born?
In general, any time someone says, "The chance that we'll get two of these is so small that ...", you have a problem. I don't care if the odds are a million to 1 against a duplicate. 1 in a million that any given key will be the same as any other given key is not the same as 1 in a million that there will be at least one key in the database that is a duplicate of some other key in the database. And in any case, any system that "probably" should work "most of the time" is, by definition, unreliable.
Admin
Fortunately, this is not true. According to the Social Security Administration's website, http://www.ssa.gov/history/hfaq.html
"Q20: Are Social Security numbers reused after a person dies?
A: No. We do not reassign a Social Security number (SSN) after the number holder's death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system."
If it was true, it would be a really, really bad design. All sorts of organizations keep all sorts of records for many many years. How could the Social Security Administration know when it's "safe" to re-use a number? Will they check with every business in the country to make sure that no one still has any data using that SSN? If not, then if they recycled a number, you would never know when old data about some other person would suddenly become attached to you. I suppose if the bank suddenly told me that I had a large sum of money in an account under my SSN, I wouldn't complain. But imagine if some child is assigned his SSN, and the next day he gets a letter from the collection agency for this 50-year-old debt, and then a little later the police show up to arrest him for an outstanding warrant, and ... and ...
Admin
Suppose you have serial numbers on every item you sell? Suppose that you must track to whom you have sold every item for, say, warranty purposes? Then a record with "storeid, itemid, quantity" is not going to cut it. You need a separate record for each item.
Of course this record should not include all the data about the item. It should have an identifier of the item, which should link to an item table that contains things that are the same for all such items, like description, weight, manufacturer, etc.
Admin
Here's an interesting excersize for those thinking that a 1 in a million chance happens only at a million records.
There are 365 days in a year. The chances that someone will have the same birthday as someone else is 1 in 365. How many people do you need before the chances of a duplicate birthday are statiscally garunteed?
The answer is 27 people, not 365 people.
It's not you havign the same birthday as another, it is any two people having the same birthday.
With 2 people it is 1 in 365 (A to B). With 3 people it is 3 in 365 (A to B, A to C, B to C). With 4 people it is 6 in 365 (A to B, A to C, A to D, B to C, B to D, C to D) With 5 people it is 10 in 365(A to B, A to C, A to D, A to E, B to C, B to D, B to E, C to D, C to E, D to E). etc... With 27 people it is 367 to 365.
Admin
[quote user="Jay"] Fortunately, this is not true. According to the Social Security Administration's website, http://www.ssa.gov/history/hfaq.html
[quote]
If that was the only place SSN's are given, this would not be a problem, thing is it isn't.
There is a pool of numbers that most people are not aware of that is used for foriegn nationals getting treatment from either our armed forces or through travel to the USA and being treated in our hospitals. They do not have an SSN given to them by our government, thier "SSN" designation is assigned by these entities during treatment, in the field. These numbers are then used throughout the treatment process, and sometimes also reused for other patients lacking a number. These numbers are also at times used to assign to a john doe entering a hospital.
Now if SSN was needed to be unique in these systems and used as a key there woudl be problems when a number is reused, or the person is indentified and the number needs to be changed. The numberis not required to be unique, nor is it used as the identifying number for these patients, a patient number is assigned by the system and that is used by the system to track that patient. SSN is simply meta data that can be sorted, indexed and searched on, but not listed as unique.
Edit: I should add that SSN is assigned because other linking systems require this information.
Admin
Did you ever notice that statements that start out, "I am not X ...", very often end up with a statement that pretty much identifies the speaker as X by definition? Like, "I'm not a racist, I just think that all black people are ...", or "I'm not a censor, I just think that books like that should not be allowed to be printed."
I particular like, "I'm not X, I just use common sense, and I conclude that ...". Umm, I don't doubt that your common sense or intuition or careful study of history or whatever leads you to certain conclusions. But if those conclusions are what we generally call, say, "libertarian", then pretty much by definition that makes you a libertarian. Like, how do you think people who disagree came to adopt their ideas, if not by using THEIR common sense or logic or study? Ditto if those ideas are socialist or vegetarian or whatever, of course.
Admin
Ummm, no, it's not guaranteed. 23 people gives you a 50% chance. 27 puts you at about the 60% mark.
See http://en.wikipedia.org/wiki/Birthday_problem
Damn Akismet.
Admin
[quote user="nonpartisan] Ummm, no, it's not guaranteed. 23 people gives you a 50% chance. 27 puts you at about the 60% mark.
See http://en.wikipedia.org/wiki/Birthday_problem
Damn Akismet. [/quote]
Hell, I knew it was different, just the actual equation I got wrong, still makes the point.
a 1 in a million chance is not something that will happen out at the million mark for your indexes. It happens much sooner.
Admin
Hmm, that really doesn't correlate with your anecdote, now does it?
Admin
Well now, wait. You're saying that SSN is not unique, because in some computer systems people enter a number into a field labeled "SSN", but the number they enter is not really an SSN, but some other number that they've made up? Then the problem isn't that SSN is not unique, but that the union of the set of SSN and made-up numbers is not unique. That is, the system doesn't work because users enter invalid data.
I think pretty much by definition, a "Social Security Number" is a number assigned by the Social Security Administration. If somebody makes up a random string of digits and types it into the SSN field, that's not the fault of either SSA or the system designers. That's the fault of users who don't know how to use the system.
Well, I suppose it could be a design flaw if the system requires an SSN, but we must enter people into the system who don't have an SSN (e.g. they are from other countries), or whose SSN we have no way to know (e.g. an unidentified person is brought into the hospital emergency room). But in that case, surely the solution is not to tell the user to make up a random number and enter it into the SSN field. We'd have no way to assure uniqueness. You need to not require SSN.
You might as well say that phone number does not uniquely identify a telephone and so is useless for routing calls, because a user could type his birth date into the phone number field.
Admin
Sure, the birthday problem is relevant, but maybe the odds of a collision are a little less likely than you think, and harder to predict without knowing the expected age distribution of people being uniquely identified.
Admin
Quoting Jay: Well, I suppose it could be a design flaw if the system requires an SSN, but we must enter people into the system who don't have an SSN (e.g. they are from other countries), or whose SSN we have no way to know (e.g. an unidentified person is brought into the hospital emergency room). But in that case, surely the solution is not to tell the user to make up a random number and enter it into the SSN field. We'd have no way to assure uniqueness. You need to not require SSN.
Which is exactly the point to make here, SSN should not be a unique field even though some system out there tries to say it is unique. Within the system of usage it is not. Even if you make it an optional field, more than one person will have blank, therefore making it non-unique.
Admin
Birthdays specifically are not the point, the point is that a 1 in a million chance becomes statistically garunteed long before you get to 1 million.
Just as a birthday is 1 in 365, but it takes only about 55 people to statiscally garuntee a match.
Admin
I'm with you on theoretical reason to use a generated id and never, ever believe that any key is truly static. It's certainly the problem 95% of the time. However, sometimes it makes performance suck really bad. Like a huge transaction table that has a currency_id field instead of the actual currency code which is what you need 99% of the time. In this case I very much like using a natural key and ON UPDATE CASCADE. If all you're implementing is essentially a lookup and not a true entity, that's usually a better solution. It saves you endless joins to find that currency_id 1 = USD.
Admin
And please consider that the manager responsible for databases in the second WTF is also a woman. I have some fifteen years of DB experience, and I'd take her word as gospel any day of the week.
I also remember an extremely competent female DB engineer who in my opinion would have earned better money as a top model. Or maybe not, management probably knew how good she was.
Admin
It's happened in the UK too with National Insurance numbers. It only got spotted when one of the women retired. Unravelling 40+ years of NI contributions was amusing for someone, I'm sure (they will point blank refuse to accept it's their fault until the evidence become irrefutable).
Admin
Admin
British military personnel serial numbers are a pool assignment, iirc. They have no meaning and are just assigned from the pool and returned once you leave. To identify someone you would need to know their serial number and their period of service. I've no idea if there's a minimum period between the reuse.
Admin
And I have seen a query rewrite lower execution time from 7 hours to 7 seconds. OK that's only factor 3600, probably less than your example, but I sure it impressed management a lot more. And the guys who wrote the original query swore that they had optimized it so well that they didn't want us to touch it.
Admin
Just doubling the number doesn't make it correct.
To guarantee a match, you need 366 people.
Do please go and read that wikipedia page on the topic to which you were referred earlier. You're recollection of the problem is faulty, and your reasoning defective.
Admin
My shining moments at my job are when I take one look at a subsystem that is known to be a performance hog, after several man weeks are spent by another team "optimizing", add an index to the backend DB and watch the performance jump 5000%.
The funny (sad?) part is that it is so obvious where the issue is but everyone still treats me like I'm some sort of black magic wielding magician.
Admin
fortunately for them Name + SSN does = unique key. that's probably the only thing that keeps the IRS and the credit reporting agencies from being utterly confused.
IIRC their birthdates are close if not identical. and they're both guys.
Admin
that's not even a true statement, because a clerical error can result in two people having the same one. generally one or the other of them will elect to get a new one assigned. neither of these guys were willing to do that. both of them are registered real estate agents.
Admin
using a natural key is not the solution. natural keys contain meaningful information and most of the time lead to the database becoming fragile.
the solution to your issue is that the primary key shouldn't be the only UNIQUE column, and that your submissions page should be contain some semi-intelligence to prevent double submits. like a shopping system drains the users cart upon an order being submitted, second one goes "wtf? empty cart, no!"
Admin
that's not inventory data, that's sales history data, and would appropriate be in a different table than an associator between store and item.
Admin
Admin
so you're saying absolutist positions on surrogate vs natural keys are TRWTF? :P
Admin
You also didn't notice where I later agreed she was the bigger WTF because she had 10 years' experience vs the student.
I object to someone declaring a person to be a WTF solely based on gender. That's not being a feminist. That's being someone who thinks skills or lack thereof should determine a person's WTFness.
Admin
Indeed. I work for a local hospital on a network engineering team with one lady who helped build their first Ethernet network from scratch. Token Ring, Ethernet, AppleTalk, NetWare, thicknet, thinnet, I'm sure other protocols, she's done it all. She knows a lot of things, doesn't know everything, admits that she doesn't know everything, but will help with what she can (which is most things). She's not conceited and is one of my first go-to people. Less than three years away from retirement -- she will be sorely missed.
Admin
Thanks boog.
I call myself nonpartisan because, while knowing that everyone is biased in some sense, I try hard to look at every situation from as many points of view as I can. The fact that I come to a conclusion at all means I'm not truly nonpartisan -- I weigh facts, accept those that seem reasonable and discard those that don't. So there is always some bias.
But highlighting all the female pronouns and saying "There's your WTF!" was over the top.
Admin
That's what unique indexes are for.
Yes, because we all know that business users are all rational and think about how a hard to remember, randomly assigned number is so much better than just remembering that Bob Smith from Winterset is the nice guy who always buys on Monday and Bob Smith from Waterloo is the guy who only buys sale items.
Oh, but the randomly assigned 9 digit customer number had meaning and was in no way arbitrary.
And there is the problem. Users should never be doing data mining directly. RDBMs are not designed to be front end systems so the muckety-mucks can say, "go query table A for all the invoices from september 1, 1999 to september 30, 2001."
Admin
Good comment! Now i feel like i'm not alone in this DBWorld!
Admin
And you use a CID as PK right?
Admin
Mommy... I do my entire Business logic in database... At least i use Identity PK fields... Tell me i'm good kid mommy please!
Admin
Admin
Admin
If you got tables with config that should be exportable and reusable in other applications, it's a deadly sin to use an autonumber as PK. It simply adds unnecessary bloat.
If you got a table that's a list of people, like first_name, last_name, day_of_birth, then adding an autonumber as PK is a must. You can't go through your entire application referencing to "John", "Doe", "25-12-1947". It's alot better to pull his number(snicker) after quering for his name and dob, then using that ID as reference from there on. It will also allow for referencing the correct John Doe born on 25-12-1947 (in case there are two or more fitting those criteria)
So really, it depends on the table's purpose whether or not to add an autonumber PK.
Admin
Sorry, but I've got to bite on this tasty piece of bait:
Never a good idea for business users to have direct access to the database. You should design the application to provide a search screen so that your user can select this user. If this is too much like hard work for the user (too many steps to enter the data when he wants to go to the same user repeatedly) then implement a system of favourites.
If the user genuinely has a user ID which is publicised and unique per user, and understood by a business user, then use that. But at no stage should an internal DB identifier for any user be exposed to a business user (internal or external). If there is no other way to uniquely identify a user without needing to use this identifier, you've done the analysis wrong.
Many systems address this problem by requiring that the user select a unique identifier, which will usually contain the user's name (or nickname, or amusing alias, whatever) and often a number as well. If that is then used as a primary key, then off you go.
Admin
<quote>To guarantee a match, you need 366 people.</quote>
I think you'll find you need 367 people.
Admin
Admin
And I did not say guarantee, I said Statistically guarantee. To guarantee you need 366, to statistically guarantee you need 55.
That is an important difference.
Admin
There are some years that have 366 days. Ergo, to guarantee a match you need 367 people . . .
Admin
I'll just leave this here.
Admin