The Daily WTF: Curious Perversions in Information Technology

CodeRage · 2011-06-29 Reply Admin

oh:
CodeRage:
Ace:
The problem arises when "developers" refuse to use triggers and instead set the sequence number in the service layer (OpenJPA seems to prefer this by default, or at least so it seems). Did anyone say "data integrity"? Aka "it's not the software's fault; I think we need a more powerful server..or two". </rant>
Ugh, a "DBA". How about just start transaction, get next value of sequence, insert new record using sequence number, then insert child records using sequence number, then close transaction.
Using a trigger makes it very hard to find the foreign key to set in the child records.

Perhaps you should try "Insert into y returning x", why don't you?

I feel so pwned.

2011-06-29 Reply Admin

Maurits:
itsmo:
Kazan:
with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN . as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...

SSNs are only given to one person at a time. When someone dies that SSN is thrown back into the "unused" pool and is handed out again at a later date.

I know there will probably still be retrospective records for dead people on the IRS database until their tax affairs are finalised but that's not what it says above...

2011-06-29 Reply Admin

Zebedee:
You give your social security number to every site you wanr to make a purchase from? hmmm

In America, a Social Security Number conveniently serves two vital purposes:

It is a secret password that gives access to sensitive financial and personal information about a person. As such, it should only be given to people with a validated, vital need to see this information.
It is your universal identification number that can be used to uniquely identify you for a variety of purposes. As such, it should be given to everyone with whom you do business in any way.

2011-06-29 Reply Admin

"One way to prevent this using a naming schema like id_

."

I would say that is a bad idea as it makes the names non-generic and redundant when properly qualified, i.e. suppliers.suppliers_id rather than suppliers.id

Instead use "id" for the current table, but

_id for the foreign key, i.e. fk(supplier_id) = suppliers.id

2011-06-29 Reply Admin

"Correct testing in an environment similar to production would have caught the timestamp performance issue,"

But what if the specifications for the system were "process no more than x items per second". Then the slow system met the spec and the faster system broke the spec.

2011-06-29 Reply Admin

This is what they hammered into our heads during a database-course during university studies.

Having a automatically incremented key acting as a part of a primary key was said to be the last way out. This was something that conflicted with my previous view of the field. But after thinking about it, it does make sense to greater extent. Especially in pure theory.

Still, there are alot of cases where you can't live without id's. Like a transaction log for example. It's very possible that two equal transactions can happen at the same time. Even though they consist of the same data, they are not the same entity.

2011-06-29 Reply Admin

Any time someone urges the use of natural keys, the advocates of surrogate keys always point to examples of how someone made really bad choice for a natural key.

The story in this article is an excellent example of incompetent use of natural keys. I can just hear someone saying, "To make the primarky key, we'll string together the customer number, item number, and transaction type." Then someone objects, "But what if a customer buys two of the same item?" "Oh, good point. Okay, let's slap a timestamp on there to make it unique. There's no way both items will get processed in one millisecond."

I worked on a system years ago where someone got the brilliant (or brillant) idea that customer last name plus birth date plus zip code would make a good primary key. Hey, they said, what's the chance that two people would have the same last name, live in the same town, AND have the same birthdate? Ummm, how about twin brothers who are both still living in the town where they were born?

In general, any time someone says, "The chance that we'll get two of these is so small that ...", you have a problem. I don't care if the odds are a million to 1 against a duplicate. 1 in a million that any given key will be the same as any other given key is not the same as 1 in a million that there will be at least one key in the database that is a duplicate of some other key in the database. And in any case, any system that "probably" should work "most of the time" is, by definition, unreliable.

2011-06-29 Reply Admin

Maurits:
itsmo:
Kazan:
with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN. as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...

SSNs are only given to one person at a time. When someone dies that SSN is thrown back into the "unused" pool and is handed out again at a later date.

Fortunately, this is not true. According to the Social Security Administration's website, http://www.ssa.gov/history/hfaq.html

"Q20: Are Social Security numbers reused after a person dies?

A: No. We do not reassign a Social Security number (SSN) after the number holder's death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system."

If it was true, it would be a really, really bad design. All sorts of organizations keep all sorts of records for many many years. How could the Social Security Administration know when it's "safe" to re-use a number? Will they check with every business in the country to make sure that no one still has any data using that SSN? If not, then if they recycled a number, you would never know when old data about some other person would suddenly become attached to you. I suppose if the bank suddenly told me that I had a large sum of money in an account under my SSN, I wouldn't complain. But imagine if some child is assigned his SSN, and the next day he gets a letter from the collection agency for this 50-year-old debt, and then a little later the police show up to arrest him for an outstanding warrant, and ... and ...

2011-06-29 Reply Admin

Kazan:
design wtf... a record for each item per store?
should be a stores table, a distribition centers table, and an items table.

there should be an associator table from store to distribution center, and an associator table from store to items table. the store<-->items associator would hold an inventory quantitiy as well. you could also then have an associator table from distribution center o items.

huge amount of duplicate data eliminated from the DB.

Suppose you have serial numbers on every item you sell? Suppose that you must track to whom you have sold every item for, say, warranty purposes? Then a record with "storeid, itemid, quantity" is not going to cut it. You need a separate record for each item.

Of course this record should not include all the data about the item. It should have an identifier of the item, which should link to an item table that contains things that are the same for all such items, like description, weight, manufacturer, etc.

KattMan · 2011-06-29 Reply Admin

Here's an interesting excersize for those thinking that a 1 in a million chance happens only at a million records.

There are 365 days in a year. The chances that someone will have the same birthday as someone else is 1 in 365. How many people do you need before the chances of a duplicate birthday are statiscally garunteed?

The answer is 27 people, not 365 people.

It's not you havign the same birthday as another, it is any two people having the same birthday.

With 2 people it is 1 in 365 (A to B). With 3 people it is 3 in 365 (A to B, A to C, B to C). With 4 people it is 6 in 365 (A to B, A to C, A to D, B to C, B to D, C to D) With 5 people it is 10 in 365(A to B, A to C, A to D, A to E, B to C, B to D, B to E, C to D, C to E, D to E). etc... With 27 people it is 367 to 365.

KattMan · 2011-06-29 Reply Admin

[quote user="Jay"] Fortunately, this is not true. According to the Social Security Administration's website, http://www.ssa.gov/history/hfaq.html

[quote]

If that was the only place SSN's are given, this would not be a problem, thing is it isn't.

There is a pool of numbers that most people are not aware of that is used for foriegn nationals getting treatment from either our armed forces or through travel to the USA and being treated in our hospitals. They do not have an SSN given to them by our government, thier "SSN" designation is assigned by these entities during treatment, in the field. These numbers are then used throughout the treatment process, and sometimes also reused for other patients lacking a number. These numbers are also at times used to assign to a john doe entering a hospital.

Now if SSN was needed to be unique in these systems and used as a key there woudl be problems when a number is reused, or the person is indentified and the number needs to be changed. The numberis not required to be unique, nor is it used as the identifying number for these patients, a patient number is assigned by the system and that is used by the system to track that patient. SSN is simply meta data that can be sorted, indexed and searched on, but not listed as unique.

Edit: I should add that SSN is assigned because other linking systems require this information.

2011-06-29 Reply Admin

nonpartisan:
I'm far from a feminist. I just calls 'em as I sees 'em. ...

Did you ever notice that statements that start out, "I am not X ...", very often end up with a statement that pretty much identifies the speaker as X by definition? Like, "I'm not a racist, I just think that all black people are ...", or "I'm not a censor, I just think that books like that should not be allowed to be printed."

I particular like, "I'm not X, I just use common sense, and I conclude that ...". Umm, I don't doubt that your common sense or intuition or careful study of history or whatever leads you to certain conclusions. But if those conclusions are what we generally call, say, "libertarian", then pretty much by definition that makes you a libertarian. Like, how do you think people who disagree came to adopt their ideas, if not by using THEIR common sense or logic or study? Ditto if those ideas are socialist or vegetarian or whatever, of course.

nonpartisan · 2011-06-29 Reply Admin

KattMan:
Here's an interesting excersize for those thinking that a 1 in a million chance happens only at a million records.
There are 365 days in a year. The chances that someone will have the same birthday as someone else is 1 in 365. How many people do you need before the chances of a duplicate birthday are statiscally garunteed?

The answer is 27 people, not 365 people.

It's not you havign the same birthday as another, it is any two people having the same birthday.

With 2 people it is 1 in 365 (A to B). With 3 people it is 3 in 365 (A to B, A to C, B to C). With 4 people it is 6 in 365 (A to B, A to C, A to D, B to C, B to D, C to D) With 5 people it is 10 in 365(A to B, A to C, A to D, A to E, B to C, B to D, B to E, C to D, C to E, D to E). etc... With 27 people it is 367 to 365.

Ummm, no, it's not guaranteed. 23 people gives you a 50% chance. 27 puts you at about the 60% mark.

See http://en.wikipedia.org/wiki/Birthday_problem

Damn Akismet.

KattMan · 2011-06-29 Reply Admin

[quote user="nonpartisan] Ummm, no, it's not guaranteed. 23 people gives you a 50% chance. 27 puts you at about the 60% mark.

See http://en.wikipedia.org/wiki/Birthday_problem

Damn Akismet. [/quote]

Hell, I knew it was different, just the actual equation I got wrong, still makes the point.

a 1 in a million chance is not something that will happen out at the million mark for your indexes. It happens much sooner.

boog · 2011-06-29 Reply Admin

Jay:
nonpartisan:
I'm far from a feminist. I just calls 'em as I sees 'em. ...

Did you ever notice that statements that start out, "I am not X ...", very often end up with a statement that pretty much identifies the speaker as X by definition? Like, "I'm not a racist, I just think that all black people are ...", or "I'm not a censor, I just think that books like that should not be allowed to be printed."

I particular like, "I'm not X, I just use common sense, and I conclude that ...". Umm, I don't doubt that your common sense or intuition or careful study of history or whatever leads you to certain conclusions. But if those conclusions are what we generally call, say, "libertarian", then pretty much by definition that makes you a libertarian. Like, how do you think people who disagree came to adopt their ideas, if not by using THEIR common sense or logic or study? Ditto if those ideas are socialist or vegetarian or whatever, of course.

Uh, sure, except his (her?) post didn't do that. The part of the comment that you omitted was railing on the female character as the bigger idiot. Not a single letter of the comment suggests that "nonpartisan" actually is a feminist, in fact potentially the opposite.

Hmm, that really doesn't correlate with your anecdote, now does it?

2011-06-29 Reply Admin

KattMan:
Jay:
Fortunately, this is not true. According to the Social Security Administration's website, http://www.ssa.gov/history/hfaq.html

If that was the only place SSN's are given, this would not be a problem, thing is it isn't.

There is a pool of numbers that most people are not aware of that is used for foriegn nationals getting treatment from either our armed forces or through travel to the USA and being treated in our hospitals. They do not have an SSN given to them by our government, thier "SSN" designation is assigned by these entities during treatment, in the field. These numbers are then used throughout the treatment process, and sometimes also reused for other patients lacking a number. These numbers are also at times used to assign to a john doe entering a hospital.

Now if SSN was needed to be unique in these systems and used as a key there woudl be problems when a number is reused, or the person is indentified and the number needs to be changed. The numberis not required to be unique, nor is it used as the identifying number for these patients, a patient number is assigned by the system and that is used by the system to track that patient. SSN is simply meta data that can be sorted, indexed and searched on, but not listed as unique.

Edit: I should add that SSN is assigned because other linking systems require this information.

Well now, wait. You're saying that SSN is not unique, because in some computer systems people enter a number into a field labeled "SSN", but the number they enter is not really an SSN, but some other number that they've made up? Then the problem isn't that SSN is not unique, but that the union of the set of SSN and made-up numbers is not unique. That is, the system doesn't work because users enter invalid data.

I think pretty much by definition, a "Social Security Number" is a number assigned by the Social Security Administration. If somebody makes up a random string of digits and types it into the SSN field, that's not the fault of either SSA or the system designers. That's the fault of users who don't know how to use the system.

Well, I suppose it could be a design flaw if the system requires an SSN, but we must enter people into the system who don't have an SSN (e.g. they are from other countries), or whose SSN we have no way to know (e.g. an unidentified person is brought into the hospital emergency room). But in that case, surely the solution is not to tell the user to make up a random number and enter it into the SSN field. We'd have no way to assure uniqueness. You need to not require SSN.

You might as well say that phone number does not uniquely identify a telephone and so is useless for routing calls, because a user could type his birth date into the phone number field.

boog · 2011-06-29 Reply Admin

nonpartisan:
KattMan:
Here's an interesting excersize for those thinking that a 1 in a million chance happens only at a million records.
There are 365 days in a year. The chances that someone will have the same birthday as someone else is 1 in 365. How many people do you need before the chances of a duplicate birthday are statiscally garunteed?

The answer is 27 people, not 365 people.

It's not you havign the same birthday as another, it is any two people having the same birthday.

With 2 people it is 1 in 365 (A to B). With 3 people it is 3 in 365 (A to B, A to C, B to C). With 4 people it is 6 in 365 (A to B, A to C, A to D, B to C, B to D, C to D) With 5 people it is 10 in 365(A to B, A to C, A to D, A to E, B to C, B to D, B to E, C to D, C to E, D to E). etc... With 27 people it is 367 to 365.

Ummm, no, it's not guaranteed. 23 people gives you a 50% chance. 27 puts you at about the 60% mark.

See http://en.wikipedia.org/wiki/Birthday_problem

Birthdates (when used as part of a unique identifier, which is what started this discussion) usually include a year, if I recall.

Sure, the birthday problem is relevant, but maybe the odds of a collision are a little less likely than you think, and harder to predict without knowing the expected age distribution of people being uniquely identified.

KattMan · 2011-06-29 Reply Admin

Quoting Jay: Well, I suppose it could be a design flaw if the system requires an SSN, but we must enter people into the system who don't have an SSN (e.g. they are from other countries), or whose SSN we have no way to know (e.g. an unidentified person is brought into the hospital emergency room). But in that case, surely the solution is not to tell the user to make up a random number and enter it into the SSN field. We'd have no way to assure uniqueness. You need to not require SSN.

Which is exactly the point to make here, SSN should not be a unique field even though some system out there tries to say it is unique. Within the system of usage it is not. Even if you make it an optional field, more than one person will have blank, therefore making it non-unique.

KattMan · 2011-06-29 Reply Admin

boog:
Birthdates (when used as part of a unique identifier, which is what started this discussion) usually include a year, if I recall.
Sure, the birthday problem is relevant, but maybe the odds of a collision are a little less likely than you think, and harder to predict without knowing the expected age distribution of people being uniquely identified.

Birthdays specifically are not the point, the point is that a 1 in a million chance becomes statistically garunteed long before you get to 1 million.

Just as a birthday is 1 in 365, but it takes only about 55 people to statiscally garuntee a match.

2011-06-29 Reply Admin

Databastard:
All because somebody was ignorant of the basic idea that a key should never hold any meaningful information.

I'm with you on theoretical reason to use a generated id and never, ever believe that any key is truly static. It's certainly the problem 95% of the time. However, sometimes it makes performance suck really bad. Like a huge transaction table that has a currency_id field instead of the actual currency code which is what you need 99% of the time. In this case I very much like using a natural key and ON UPDATE CASCADE. If all you're implementing is essentially a lookup and not a true entity, that's usually a better solution. It saves you endless joins to find that currency_id 1 = USD.

2011-06-29 Reply Admin

nonpartisan:

Way to be a sexist pig, considering that the first WTF mentioned was a guy.

And please consider that the manager responsible for databases in the second WTF is also a woman. I have some fifteen years of DB experience, and I'd take her word as gospel any day of the week.

I also remember an extremely competent female DB engineer who in my opinion would have earned better money as a top model. Or maybe not, management probably knew how good she was.

hikari · 2011-06-29 Reply Admin

itsmo:
Kazan:
with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN. as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...

It's happened in the UK too with National Insurance numbers. It only got spotted when one of the women retired. Unravelling 40+ years of NI contributions was amusing for someone, I'm sure (they will point blank refuse to accept it's their fault until the evidence become irrefutable).

boog · 2011-06-29 Reply Admin

KattMan:
boog:
Birthdates (when used as part of a unique identifier, which is what started this discussion) usually include a year, if I recall.
Sure, the birthday problem is relevant, but maybe the odds of a collision are a little less likely than you think, and harder to predict without knowing the expected age distribution of people being uniquely identified.

Birthdays specifically are not the point, the point is that a 1 in a million chance becomes statistically garunteed long before you get to 1 million.

Just as a birthday is 1 in 365, but it takes only about 55 people to statiscally garuntee a match.

Now you know why I qualified my comment by admitting "the birthday problem is relevant".

hikari · 2011-06-29 Reply Admin

Jay:
Maurits:
itsmo:
Kazan:
with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN. as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...

SSNs are only given to one person at a time. When someone dies that SSN is thrown back into the "unused" pool and is handed out again at a later date.

Fortunately, this is not true. According to the Social Security Administration's website, http://www.ssa.gov/history/hfaq.html

"Q20: Are Social Security numbers reused after a person dies?

A: No. We do not reassign a Social Security number (SSN) after the number holder's death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system."

If it was true, it would be a really, really bad design. All sorts of organizations keep all sorts of records for many many years. How could the Social Security Administration know when it's "safe" to re-use a number? Will they check with every business in the country to make sure that no one still has any data using that SSN? If not, then if they recycled a number, you would never know when old data about some other person would suddenly become attached to you. I suppose if the bank suddenly told me that I had a large sum of money in an account under my SSN, I wouldn't complain. But imagine if some child is assigned his SSN, and the next day he gets a letter from the collection agency for this 50-year-old debt, and then a little later the police show up to arrest him for an outstanding warrant, and ... and ...

British military personnel serial numbers are a pool assignment, iirc. They have no meaning and are just assigned from the pool and returned once you leave. To identify someone you would need to know their serial number and their period of service. I've no idea if there's a minimum period between the reuse.

2011-06-29 Reply Admin

mickeyding:
I have seen response times of over 30 seconds drop to a few milliseconds simply by getting the primary keys and indexes right - especially in Oracle.

And I have seen a query rewrite lower execution time from 7 hours to 7 seconds. OK that's only factor 3600, probably less than your example, but I sure it impressed management a lot more. And the guys who wrote the original query swore that they had optimized it so well that they didn't want us to touch it.

2011-06-29 Reply Admin

Just as a birthday is 1 in 365, but it takes only about 55 people to statiscally garuntee a match.

Just doubling the number doesn't make it correct.

To guarantee a match, you need 366 people.

Do please go and read that wikipedia page on the topic to which you were referred earlier. You're recollection of the problem is faulty, and your reasoning defective.

2011-06-29 Reply Admin

Topper:
mickeyding:
I have seen response times of over 30 seconds drop to a few milliseconds simply by getting the primary keys and indexes right - especially in Oracle.

And I have seen a query rewrite lower execution time from 7 hours to 7 seconds. OK that's only factor 3600, probably less than your example, but I sure it impressed management a lot more. And the guys who wrote the original query swore that they had optimized it so well that they didn't want us to touch it.

My shining moments at my job are when I take one look at a subsystem that is known to be a performance hog, after several man weeks are spent by another team "optimizing", add an index to the backend DB and watch the performance jump 5000%.

The funny (sad?) part is that it is so obvious where the issue is but everyone still treats me like I'm some sort of black magic wielding magician.

Kazan · 2011-06-29 Reply Admin

itsmo:
Kazan:
with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN. as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...

fortunately for them Name + SSN does = unique key. that's probably the only thing that keeps the IRS and the credit reporting agencies from being utterly confused.

IIRC their birthdates are close if not identical. and they're both guys.

Kazan · 2011-06-29 Reply Admin

Maurits:
itsmo:
Kazan:
with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN. as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...

SSNs are only given to one person at a time. When someone dies that SSN is thrown back into the "unused" pool and is handed out again at a later date.

that's not even a true statement, because a clerical error can result in two people having the same one. generally one or the other of them will elect to get a new one assigned. neither of these guys were willing to do that. both of them are registered real estate agents.

Kazan · 2011-06-29 Reply Admin

Les:
Just to be clear, I didn't say that you should NEVER use a surrogate key. In this specific WTF, adding a unique sequence column to the existing primary key has the same effect as having no primary key at all.
That may be an acceptable fix when orders are failing, but someone needs to have a good look at the schema and figure-out what columns (if any) need to be in the primary key for that table. Right now, you've got something that looks like a primary key, but doesn't behave like a primary key.

Imagine this. It's a few weeks later, and a bug in the front end (or a user with a sticky keyboard) is submitting orders twice. The DB won't notice because the primary key is just there for show. Rick's customers (and his manager) won't be happy when they start getting invoiced for double of what they ordered.

using a natural key is not the solution. natural keys contain meaningful information and most of the time lead to the database becoming fragile.

the solution to your issue is that the primary key shouldn't be the only UNIQUE column, and that your submissions page should be contain some semi-intelligence to prevent double submits. like a shopping system drains the users cart upon an order being submitted, second one goes "wtf? empty cart, no!"

Kazan · 2011-06-29 Reply Admin

Jay:
Kazan:
design wtf... a record for each item per store?
should be a stores table, a distribition centers table, and an items table.

there should be an associator table from store to distribution center, and an associator table from store to items table. the store<-->items associator would hold an inventory quantitiy as well. you could also then have an associator table from distribution center o items.

huge amount of duplicate data eliminated from the DB.

Suppose you have serial numbers on every item you sell? Suppose that you must track to whom you have sold every item for, say, warranty purposes? Then a record with "storeid, itemid, quantity" is not going to cut it. You need a separate record for each item.

Of course this record should not include all the data about the item. It should have an identifier of the item, which should link to an item table that contains things that are the same for all such items, like description, weight, manufacturer, etc.

that's not inventory data, that's sales history data, and would appropriate be in a different table than an associator between store and item.

Kazan · 2011-06-29 Reply Admin

Jay:

Well now, wait. You're saying that SSN is not unique, because

it isn't unique. it is not uncommon for clerical errors to result in two different people having the same number. it is not as common with younger people, but older people when everything was done on paper it is not terribly uncommon.

Kazan · 2011-06-29 Reply Admin

Kjella:
Databastard:
All because somebody was ignorant of the basic idea that a key should never hold any meaningful information.

I'm with you on theoretical reason to use a generated id and never, ever believe that any key is truly static. It's certainly the problem 95% of the time. However, sometimes it makes performance suck really bad. Like a huge transaction table that has a currency_id field instead of the actual currency code which is what you need 99% of the time. In this case I very much like using a natural key and ON UPDATE CASCADE. If all you're implementing is essentially a lookup and not a true entity, that's usually a better solution. It saves you endless joins to find that currency_id 1 = USD.

so you're saying absolutist positions on surrogate vs natural keys are TRWTF? :P

nonpartisan · 2011-06-29 Reply Admin

Jay:
nonpartisan:
I'm far from a feminist. I just calls 'em as I sees 'em. ...

Did you ever notice that statements that start out, "I am not X ...", very often end up with a statement that pretty much identifies the speaker as X by definition? Like, "I'm not a racist, I just think that all black people are ...", or "I'm not a censor, I just think that books like that should not be allowed to be printed."

I particular like, "I'm not X, I just use common sense, and I conclude that ...". Umm, I don't doubt that your common sense or intuition or careful study of history or whatever leads you to certain conclusions. But if those conclusions are what we generally call, say, "libertarian", then pretty much by definition that makes you a libertarian. Like, how do you think people who disagree came to adopt their ideas, if not by using THEIR common sense or logic or study? Ditto if those ideas are socialist or vegetarian or whatever, of course.

You also didn't notice where I later agreed she was the bigger WTF because she had 10 years' experience vs the student.

I object to someone declaring a person to be a WTF solely based on gender. That's not being a feminist. That's being someone who thinks skills or lack thereof should determine a person's WTFness.

nonpartisan · 2011-06-29 Reply Admin

Lorens:
nonpartisan:

Way to be a sexist pig, considering that the first WTF mentioned was a guy.

And please consider that the manager responsible for databases in the second WTF is also a woman. I have some fifteen years of DB experience, and I'd take her word as gospel any day of the week.

I also remember an extremely competent female DB engineer who in my opinion would have earned better money as a top model. Or maybe not, management probably knew how good she was.

Indeed. I work for a local hospital on a network engineering team with one lady who helped build their first Ethernet network from scratch. Token Ring, Ethernet, AppleTalk, NetWare, thicknet, thinnet, I'm sure other protocols, she's done it all. She knows a lot of things, doesn't know everything, admits that she doesn't know everything, but will help with what she can (which is most things). She's not conceited and is one of my first go-to people. Less than three years away from retirement -- she will be sorely missed.

nonpartisan · 2011-06-29 Reply Admin

boog:
Jay:
nonpartisan:
I'm far from a feminist. I just calls 'em as I sees 'em. ...

Did you ever notice that statements that start out, "I am not X ...", very often end up with a statement that pretty much identifies the speaker as X by definition? Like, "I'm not a racist, I just think that all black people are ...", or "I'm not a censor, I just think that books like that should not be allowed to be printed."

I particular like, "I'm not X, I just use common sense, and I conclude that ...". Umm, I don't doubt that your common sense or intuition or careful study of history or whatever leads you to certain conclusions. But if those conclusions are what we generally call, say, "libertarian", then pretty much by definition that makes you a libertarian. Like, how do you think people who disagree came to adopt their ideas, if not by using THEIR common sense or logic or study? Ditto if those ideas are socialist or vegetarian or whatever, of course.
Uh, sure, except his (her?) post didn't do that. The part of the comment that you omitted was railing on the female character as the bigger idiot. Not a single letter of the comment suggests that "nonpartisan" actually is a feminist, in fact potentially the opposite.

Hmm, that really doesn't correlate with your anecdote, now does it?

Thanks boog.

I call myself nonpartisan because, while knowing that everyone is biased in some sense, I try hard to look at every situation from as many points of view as I can. The fact that I come to a conclusion at all means I'm not truly nonpartisan -- I weigh facts, accept those that seem reasonable and discard those that don't. So there is always some bias.

But highlighting all the female pronouns and saying "There's your WTF!" was over the top.

2011-06-29 Reply Admin

A Gould:
Ideally (and let's emphasize the "ideal" here), your Primary Key should be how your company tells the various entities apart. An arbitrary key that doesn't relate back to the business risks duplication.

That's what unique indexes are for.

A Gould:
In practice, this isn't a massive deal - any company with sufficient suppliers will start handing out "vendor numbers" and "customer numbers" (to avoid having to try and tell two "Bob Smith"s apart). Ditto for products. The trick is to make sure that the system's key matches the implicit key that is already being used as much as possible.

Yes, because we all know that business users are all rational and think about how a hard to remember, randomly assigned number is so much better than just remembering that Bob Smith from Winterset is the nice guy who always buys on Monday and Bob Smith from Waterloo is the guy who only buys sale items.

A Gould:
The only exception I've seen is when the distinction is truly arbitrary. Order numbers, for example (the number doesn't mean anything other than acting as a reference to a specific order).

Oh, but the randomly assigned 9 digit customer number had meaning and was in no way arbitrary.

A Gould:
(I spend a lot of time doing database mining at work - a well-designed database is a pleasure to work with, while a badly designed one makes me wish the designer was within punching distance.)

And there is the problem. Users should never be doing data mining directly. RDBMs are not designed to be front end systems so the muckety-mucks can say, "go query table A for all the invoices from september 1, 1999 to september 30, 2001."

2011-06-29 Reply Admin

Good comment! Now i feel like i'm not alone in this DBWorld!

2011-06-29 Reply Admin

Meep:
K:
But then I want to ask something more serious: Why would anyone ever not want a unique, autoincremented integer to be the primary key? I learned in uni that there are many ways to do PKs, but all I have ever implemented was an extra integer with no other meaning than "this is the PK".

Example: Say you have two tables A and B, and you're a fan of surrogate keys, so A has a column ID, and B has a column ID, both of which are autogenerated surrogate keys.

Now lets say, further, that there's a many to many relationship between A and B. So you create a table C with two columns AID and BID that have foreign key constraints to A and B, respectively.

Clearly the natural key to use is the compound key AID and BID.

Given that many-many relationships are fairly ubiquitous, this seems like a pretty common use case.

And you use a CID as PK right?

2011-06-29 Reply Admin

Andrew:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

Yeah, understand your argument but entities can evolve too as the system grows. I had to support (ugh) a system with 90% of the business logic implemented at database level. Changes on tables fired triggers called stored procedures making changes on other tables firing triggers...

Now, the big deal here was the EmployeeNumber (Employee.EmployeeNumber as a candidate for an entity defining its own PK, yes?). Well that's what the designers did. EmployeeNumbers spread through the entire schema as FKs. And what happened when they implemented a feature to change an EmployeeNumber? (happens, contractor workers suddenly becoming corporate workers and need their emp-numbers to match what SAP says. All hail mighty SAP!)

They used a transaction encapsulating a whole lot of INSERT and DELETE statements. Remember what I said about the business logic implemented with triggers? A lot of extra logic to maintain to prevent orphaned records. And customized tables was a a different animal altogether.

So... not saying "always use auto-num" (design principles and all that), but entities (even well defined ones) evolve too. If they used an auto-num, the entity could have been mutable and a lot of issues would have been avoided (even if they didnt choose to do business logic in triggers. Still gives me nightmares).

Mommy... I do my entire Business logic in database... At least i use Identity PK fields... Tell me i'm good kid mommy please!

2011-06-29 Reply Admin

lofwyr:
Ob boy! The real WTF seems to be those comments about comparing different DBMS und data model design.
To answer a question that someone here asked: Yes, DB programmers still exist, I consider myself to be one. That said, I'm under the impression that we're a dying breed. Why bother with engineering a data model when you can use language x and framework y to put everything on the application layer (and hope for the best)?

The Oracle DBMS does provide sequences: unique numeric ids that can't have duplicates. Version 11g and onward doesn't even require triggers anymore, you can put them as default values for your columns. People, who claim otherwise, should consult the user manual.

As for the natural vs. synthetic PK debate: It's as old as data modeling and it'll never get solved. Those that advocate natural PKs either love writinge sql code (a lot of it) or have very basic data models which didn't even make it to a master <-> detail <-> detail relationships with additional lookup tables etc. Then they would realize that those joins they're writing are getting hard to read. Needless to say that if that natural PK is not as unique as hoped or can even be changed by the user, you'll end up in "update hell" - no problem when the used database is small (in size), big problem when you have to update millions (or even billions) of records.

BTW, don't name those colums only ID. Sooner or later you'll end up mixing IDs of different tables or end up joing them. One way to prevent this using a naming schema like id_
. So in the example with tables A and B in a master <-> detail relationship, you get:
A: id_a (PK) B: id_b (PK), id_a (FK)
Even a model with additioal levels of details is straight forward:
C: id_c (PK), id_b (FK) D: id_d (PK), id_c (FK)
PK ... Primary Key, FK ... Foreign KEY
A join would take only one condition per relationship, just imagine how a natural key approach would like, where you would have to add at least one addional column for each level to ensure uniqueness.
Even someone who is not familiar with the data model should have no trouble writing those joins in the model I presented, I'm not sure that holds true for the natural key approach.
As for some questions regarding how to decide , all I can offer is a of thumb that has been working for me: Entities where any of their assumed unique attributes could get changed, even by correcting a typo from data entered by a user, should go straight to synthetic PK which should not have a meaning at all. In fact, the user should never even be aware of its existance. Natural PK works best for tables at the bottom of relationships, such as lookup tables, like currency, country. However, even there, chances are that you'll encounter problems. Famous examples are the conversion of several currencies into the EURO or the change of german zip codes from 4 to 5 digits after the German Reunification.
I've got only 20+ years of experience in the IT industry and 10+ years of DB programming, so I'm open for suggestions by people that have collected their share of experience, please don't hesitate to come forward and educate me/us.
l.
I'm with you bro... That's exactly my everyday DB design. Power to the DBA (and the UI programmer that respects this)
Captcha: "commoveo", doy! That's spanish for "As i see, i give"

2011-06-30 Reply Admin

John:
"One way to prevent this using a naming schema like id_."
I would say that is a bad idea as it makes the names non-generic and redundant when properly qualified, i.e. suppliers.suppliers_id rather than suppliers.id
Instead use "id" for the current table, but

_id for the foreign key, i.e. fk(supplier_id) = suppliers.id
I'm not sure that aiming for "generic" column names is a good idea either. Aside from the fact, that there is the option to use aliases, for example SELECT su.id_suppliers FROM suppliers su WHERE ... which comes in handy when you have to write queries joining several queries.
l.

2011-06-30 Reply Admin

Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

Yes and no. It depends on, really.

If you got tables with config that should be exportable and reusable in other applications, it's a deadly sin to use an autonumber as PK. It simply adds unnecessary bloat.

If you got a table that's a list of people, like first_name, last_name, day_of_birth, then adding an autonumber as PK is a must. You can't go through your entire application referencing to "John", "Doe", "25-12-1947". It's alot better to pull his number(snicker) after quering for his name and dob, then using that ID as reference from there on. It will also allow for referencing the correct John Doe born on 25-12-1947 (in case there are two or more fitting those criteria)

So really, it depends on the table's purpose whether or not to add an autonumber PK.

2011-06-30 Reply Admin

Pedant:
A Gould:
Ideally (and let's emphasize the "ideal" here), your Primary Key should be how your company tells the various entities apart. An arbitrary key that doesn't relate back to the business risks duplication.

That's what unique indexes are for.

A Gould:
In practice, this isn't a massive deal - any company with sufficient suppliers will start handing out "vendor numbers" and "customer numbers" (to avoid having to try and tell two "Bob Smith"s apart). Ditto for products. The trick is to make sure that the system's key matches the implicit key that is already being used as much as possible.

Yes, because we all know that business users are all rational and think about how a hard to remember, randomly assigned number is so much better than just remembering that Bob Smith from Winterset is the nice guy who always buys on Monday and Bob Smith from Waterloo is the guy who only buys sale items.

Sorry, but I've got to bite on this tasty piece of bait:

Never a good idea for business users to have direct access to the database. You should design the application to provide a search screen so that your user can select this user. If this is too much like hard work for the user (too many steps to enter the data when he wants to go to the same user repeatedly) then implement a system of favourites.

If the user genuinely has a user ID which is publicised and unique per user, and understood by a business user, then use that. But at no stage should an internal DB identifier for any user be exposed to a business user (internal or external). If there is no other way to uniquely identify a user without needing to use this identifier, you've done the analysis wrong.

Many systems address this problem by requiring that the user select a unique identifier, which will usually contain the user's name (or nickname, or amusing alias, whatever) and often a number as well. If that is then used as a primary key, then off you go.

2011-06-30 Reply Admin

<quote>To guarantee a match, you need 366 people.</quote>

I think you'll find you need 367 people.

2011-06-30 Reply Admin

Someone who can't be bothered to login from work:
C-Octothorpe:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.
Wha?

It's not actually as insane as it sounds. You make the auto-number field unique, but not the primary key. It can still be used for quick look ups from an index, but you make the primary key some combination of values from the row which must always be unique. It's essentially just there as a row ID (the ROWID pseudo-column will change if you export and then reimport the database).

We deal with racing data, our primary key for a race is the date of the race, the identity of the course and the race number. There can only be one race one at course foo on a given date. If anything ever tries to insert that race again it will choke with a unique key constraint violation, rather than creating a duplicate entry.

What the frack's the use of a PK if you're going to consistently use another unique field for lookups?

KattMan · 2011-06-30 Reply Admin

Pedant:
<quote>To guarantee a match, you need 366 people.</quote>
I think you'll find you need 367 people.

And I did not say guarantee, I said Statistically guarantee. To guarantee you need 366, to statistically guarantee you need 55.

That is an important difference.

nonpartisan · 2011-06-30 Reply Admin

KattMan:
Pedant:
<quote>To guarantee a match, you need 366 people.</quote>
I think you'll find you need 367 people.

And I did not say guarantee, I said Statistically guarantee. To guarantee you need 366, to statistically guarantee you need 55.

That is an important difference.

There are some years that have 366 days. Ergo, to guarantee a match you need 367 people . . .

frits · 2011-06-30 Reply Admin

Jay:
In general, any time someone says, "The chance that we'll get two of these is so small that ...", you have a problem. I don't care if the odds are a million to 1 against a duplicate. 1 in a million that any given key will be the same as any other given key is not the same as 1 in a million that there will be at least one key in the database that is a duplicate of some other key in the database. And in any case, any system that "probably" should work "most of the time" is, by definition, unreliable.

I'll just leave this here.

boog · 2011-06-30 Reply Admin

nonpartisan:
KattMan:
Pedant:
<quote>To guarantee a match, you need 366 people.</quote>
I think you'll find you need 367 people.

And I did not say guarantee, I said Statistically guarantee. To guarantee you need 366, to statistically guarantee you need 55.

That is an important difference.

There are some years that have 366 days. Ergo, to guarantee a match you need 367 people . . .

Yes, but those years of 366 days only occur 25% of the time*, so technically you need 366.25 people to guarantee a match.

Yes, I know it's not exactly once every 4 years, but let's approximate just for the sake of this silly comment.

The Key to a Good Schema

Leave a comment on “The Key to a Good Schema”