The Daily WTF: Curious Perversions in Information Technology

Jeff S · 2005-11-04 Reply Admin

John Smallberries:

*The canonical example of this is a bank using AccountNumber as the PK and printing it on a person's statements. Works fine until you merge with another bank. Then if the AccountNumbers have differing formats, (or worse, overlap) you're in for a big rewrite.

Another one of my favorite examples. So, you use an identity to represent acount 1000 (say, "Cash"), instead of it's account number, "1000". So in your master Accounts table, the table that defines each account, the question is: does your database stop you from having multiple accounts numbered "1000"?

Hopefully, assuming you are just a little misguided about identities, have good data modelling skills, and are working hard to implement a good, solid, reliable and accurate database model, you have two answers to give:

* No, There can only be one account number 1000. We add a unique constraint on Account Number, of course! We want a reliable database with integrity.

or

* Yes, but that's because we have more than one company, so each company can have an account number 1000. So, we have a unique constraint on the combination of comany/account. This unqiue constraint allows us to have a reliable database with integrity so we know each company can only have each account number entered once.

In situation #1, the "No" answer, you have a useless identity as a primary key in addition to your unique constraint. So you have wasted valueable space that many claim to save by using an identity, and added an extra index as well. And now, of course, you cannot expand your table when you merge, since your constraint doesn't allow it. Nothing gained by using your indentity column in your "merger" scenerio. And every table that has an "AccountID" column requires a join just to display the AccountNumber, or to do a simple sort, that otherwise would not be necessary if that number was stored in your tables directly as a natural foreign key.

In situation #2, you are have the same redundant/useless primary key scenerio, and of course you have indicated what a better natural key of your table should be -- a composite of Company/Account. And guess what happens when you need to expand when you take over more companies !?? Your table will allow it either way. So, again, nothing gained by using your identity, and the difference is you have an extra, useless identity column and index to store your artifical key. In addition, any table that contains a reference back to your Accounts table can only an AccountID listed and no Company anywhere, right? (Your answer needs to be yes if you believe in normalization, otherwise you have redundant data). So if I want all transactions for a Company, I need extra joins -- horror, all those "hard joins" that we were trying to avoid -- just to list the transactions instead of filtering my transaction table directly on the company of the transaction.

The only way that identities "help" you in this situation is if you have a weak data model and wish to just keep stuffing random account numbers and values into your tables, because your identity will allow it, when you have that big merge. Not good, fellas. As I've mentioned before, literally *every day* at sqlteam.com we have to help people unravel messes like that.

Jeff S · 2005-11-04 Reply Admin

Anonymous:
Well that sounds to me like in your database I would never be able to have an unordered set of anything. What about a 1->many where the many doesn't have a logical "type." I have three cans of tomato soup in my cubbard, do I really care about differentiating them? Not tonight I don't. And I don't want to add a "sequence" column either. I have three cans of soup in no particular order. And I like it that way. Sometimes. Geesh this is stupidest thread EVAR.

"..do I really care..", "...not tonight, I don't ... "., "I don't want...", "...I like it...", "...stupidest thread EVAR..."

Based on those, uh, "highlights", I assume you are just trolling/goofing around, but if you really do want to have an intelligent discussion about your "cans of soup" scenario (which is a good one to think about, actually) let me know.

procyon112 · 2005-11-05 Reply Admin

<blockquote> Anonymous wrote:

The 'boolean' is vital (perhaps not a great name) but really, you have to know the value of your bits.

My bits are quite valuable to me, but that doesn't mean it's a "yes or no" thing. I mean, if someone was going to take my life or my bits, I would have to make a judgement there; and if all I've got is "true" meaning "Yes, I value my bits," then I've got no context for making the decision between life and bits. Furthermore, I could never instruct a computer to make the same choice for me, other than to make random choice between everything that has a value of 'true'.

That's why every table should have a unique 'SortOrder' column.</blockquote>

I must commend you on the first person to make me snort in at least 6 months. I kept imagining a "Scotty" voice.

2005-11-05 Reply Admin

Yup. As Jeff indicated above, it's better if all your other data tables embedded the composite of Company/Account when referring to "that" table. Much better than using some artificial "index" or some such equally unique entry to mean the same thing. And you've saved the space you'd otherwise waste storing that artificial index.

Yup. Much better.

John Smallberries · 2005-11-05 Reply Admin

Jeff S:
John Smallberries:

*The canonical example of this is a bank using AccountNumber as the PK and printing it on a person's statements. Works fine until you merge with another bank. Then if the AccountNumbers have differing formats, (or worse, overlap) you're in for a big rewrite.
In situation #1, the "No" answer, you have a useless identity as a primary key in addition to your unique constraint. So you have wasted valueable space that many claim to save by using an identity, and added an extra index as well. And now, of course, you cannot expand your table when you merge, since your constraint doesn't allow it. Nothing gained by using your indentity column in your "merger" scenerio. And every table that has an "AccountID" column requires a join just to display the AccountNumber, or to do a simple sort, that otherwise would not be necessary if that number was stored in your tables directly as a natural foreign key.

I appears you've missed a key point. The AccountId (identity) has no meaning other than to provide uniqueness; it never leaves the database (or near-db business code). Therefore, you're free to restructure the tables after the merger any way you like. You can reseed the identity, combine or break up tables, etc. By giving your PK to customers (in the form of AccountNumber), you lock yourself in to that design...you can't call all your customers up tell them "the next time you call customer service, tell them your AccountNumber is 1234 and your CompanyNumber is 5678".

John Smallberries · 2005-11-05 Reply Admin

Anonymous:
Yup. As Jeff indicated above, it's better if all your other data tables embedded the composite of Company/Account when referring to "that" table. Much better than using some artificial "index" or some such equally unique entry to mean the same thing. And you've saved the space you'd otherwise waste storing that artificial index.

Yup. Much better.

The "space saving" argument is dubious.

To provide uniqueness, you would probably need a composite key. This key will likely comprise larger data types than ints, possibly large character columns. When used as an FK, all those attributes get migrated to the child tables.

If a single int is used, you save a lot of space.

Jeff S · 2005-11-05 Reply Admin

Anonymous:
Yup. As Jeff indicated above, it's better if all your other data tables embedded the composite of Company/Account when referring to "that" table. Much better than using some artificial "index" or some such equally unique entry to mean the same thing. And you've saved the space you'd otherwise waste storing that artificial index.

Yup. Much better.

Since that is 99% sarcasm, I can't tell if you are being serious or not, but hopefully you have understood that the "space saving" argument is one made by proponens of surrogate keys and that is simply addressing it. Saving space is fair less important than having data integrity.

Jeff S · 2005-11-05 Reply Admin

John Smallberries:
Jeff S:

John Smallberries:

*The canonical example of this is a bank using AccountNumber as the PK and printing it on a person's statements. Works fine until you merge with another bank. Then if the AccountNumbers have differing formats, (or worse, overlap) you're in for a big rewrite.

In situation #1, the "No" answer, you have a useless identity as a primary key in addition to your unique constraint. So you have wasted valueable space that many claim to save by using an identity, and added an extra index as well. And now, of course, you cannot expand your table when you merge, since your constraint doesn't allow it. Nothing gained by using your indentity column in your "merger" scenerio. And every table that has an "AccountID" column requires a join just to display the AccountNumber, or to do a simple sort, that otherwise would not be necessary if that number was stored in your tables directly as a natural foreign key.

I appears you've missed a key point. The AccountId (identity) has no meaning other than to provide uniqueness; it never leaves the database (or near-db business code). Therefore, you're free to restructure the tables after the merger any way you like. You can reseed the identity, combine or break up tables, etc. By giving your PK to customers (in the form of AccountNumber), you lock yourself in to that design...you can't call all your customers up tell them "the next time you call customer service, tell them your AccountNumber is 1234 and your CompanyNumber is 5678".

You said a table of Accounts, and I took that to mean a table of GL Accounts, not a table of Customers. That, again, is a different story.

So, if no customers ever recieve that artificial key, it is for the DB only, then what do they tell you their account # is when they call up? What shows up on their invoice? You said we shouldn't give the "Real" PK to them, so what do you give them?

2005-11-05 Reply Admin

Jeff S:
Anonymous:
Yup. As Jeff indicated above, it's better if all your other data tables embedded the composite of Company/Account when referring to "that" table. Much better than using some artificial "index" or some such equally unique entry to mean the same thing. And you've saved the space you'd otherwise waste storing that artificial index.

Yup. Much better.

Since that is 99% sarcasm, I can't tell if you are being serious or not, but hopefully you have understood that the "space saving" argument is one made by proponens of surrogate keys and that is simply addressing it. Saving space is fair less important than having data integrity.

Damn. Missed perfection by 1%. But I am relieved to see that you recognized it. On this forum, one worries about such things.

The problem is that using the composite key (Company/Account) to refer to this record from other tables can actually threaten data integrity, rather than ensuring it. Because the individual columns of "Company" and "Account" have meaning outside the database, so their contents are "free" in other contexts within the database.

That's confusing. Let's consider an example.

You're building a table of transactions. And each transaction wants to refer to one (or maybe two, if you're doing double-entry accounting) "accounts". You can choose to (a) use an index internal to the database to refer to the company/account, or (ii) use the composite company/account. Except that under (ii) you can't ensure the company/account is unique, and you can't ensure it actually corresponds to an entry in the accounts table. Under (a) you can't really ensure the index has a corresponding entry, but because it's internal to the database, you can come a lot closer to ensuring that fact: because it's internal, nobody is tempted to make it visible except through a lookup in the "accounts" table.

If you're guaranteeing that company/account is unique in one table, you've got to ensure consistency across all tables, even where it's not unique (e.g. the transactions table I just discussed). But if you're storing company/account in those other tables, it's tempting to use that information directly and not go "through" the proper tables to ensure that uniqueness.

Jeff S · 2005-11-05 Reply Admin

Anonymous:

You're building a table of transactions. And each transaction wants to refer to one (or maybe two, if you're doing double-entry accounting) "accounts". You can choose to (a) use an index internal to the database to refer to the company/account, or (ii) use the composite company/account. Except that under (ii) you can't ensure the company/account is unique, and you can't ensure it actually corresponds to an entry in the accounts table. Under (a) you can't really ensure the index has a corresponding entry, but because it's internal to the database, you can come a lot closer to ensuring that fact: because it's internal, nobody is tempted to make it visible except through a lookup in the "accounts" table.

If you're guaranteeing that company/account is unique in one table, you've got to ensure consistency across all tables, even where it's not unique (e.g. the transactions table I just discussed). But if you're storing company/account in those other tables, it's tempting to use that information directly and not go "through" the proper tables to ensure that uniqueness.

You seem to be confusing primary keys and foreign keys, and speaking about transaction tables and entity tables and going back and forth between the two. It helps to state clearly which aspects of a database design you are discussing.

This is the worst part about these discussions -- people who don't understand composite primary keys. Under (ii), assuming you are talking about the accounts table (not the transaction table), of course you are guaranteeing that company/account is unique if you do it properly -- it is the primary key! As for the transaction table, then, that references that Accounts table: again, it seems many people also don't undertand how foreign key constraints work, either. You can have more than 1 column participate in a foreign key constraint also! This is definitely new stuff to lots of people who think that a pk = an identity and nothing else, or that a foreign key can only rerfence a single identity column and nothing else.

So, as you said, "it's tempting to use that information directly if you store company/account in each table" -- OF COURSE! That is, again, the point of a transaction table. If the transaction is booked to a combination of acompany/account, then that is what is should contain -- the company and the account it is booked to. If that combo is not valid, the fk constraint fails. If it is valid, it works and now you don't have to use a join on some silly AccountID just to determine which account number or company that each transaction is booked to.

If you mean something else, then an example might be useful.

Jeff S · 2005-11-05 Reply Admin

John Smallberries:
Anonymous:
Yup. As Jeff indicated above, it's better if all your other data tables embedded the composite of Company/Account when referring to "that" table. Much better than using some artificial "index" or some such equally unique entry to mean the same thing. And you've saved the space you'd otherwise waste storing that artificial index.

Yup. Much better.

The "space saving" argument is dubious.

To provide uniqueness, you would probably need a composite key. This key will likely comprise larger data types than ints, possibly large character columns. When used as an FK, all those attributes get migrated to the child tables.

If a single int is used, you save a lot of space.

And, again, John, you are missing the point. As we said -- we want data integrity. We must make this assumption during this debate, otherwise it is over, I win and you lose, since my data is accurate, yours is not, and it doesn't matter which is "easier to use" or "has less hard joins to write" or "uses less space".

If you agree that we are going to build a database that is modelled to be accurate and have full integrity, then you must add a unique constraint on your combination of company/account in addiition to your extra identity "primary key". So, in effect, you are using more space by adding your extra identity column.

growse · 2005-11-05 Reply Admin

So, if I have a table with just unique email addresses, doesn't it make sense to make an artificial key if I ever want to do any foreign-key-ing?

Say I have another table, which has record which relate to email addresses, surely it'd take up less room to just have an integer pointing to the index of the email address table, rather than have the whole email address? Especially if there are 4000 records for one email address?

Sure, natural keying might be "neater" on the email address table, if I've got other properties like name, address in there, but when I come to foreign key that, isn't it quick and more efficient to have an artificial index key thingie?

murphyman · 2005-11-05 Reply Admin

Anonymous:

The problem is that using the composite key (Company/Account) to refer to this record from other tables can actually threaten data integrity, rather than ensuring it. Because the individual columns of "Company" and "Account" have meaning outside the database, so their contents are "free" in other contexts within the database.

That's confusing. Let's consider an example.

No, it's not confusing, merely confused. Data is just data as far as a schema is concerned - meaning is irrelevant. That's what presentation / interpretation / datawarehousing is for.

Anonymous:

You can choose to (a) use an index internal to the database to refer to the company/account, or (ii) use the composite company/account. Except that under (ii) you can't ensure the company/account is unique, and you can't ensure it actually corresponds to an entry in the accounts table. Under (a) you can't really ensure the index has a corresponding entry, but because it's internal to the database, you can come a lot closer to ensuring that fact: because it's internal, nobody is tempted to make it visible except through a lookup in the "accounts" table.

Foreign Keys anyone? Data integrity is incredibly important for commercial data - it is not something you want to "come a lot closer to" - its something you simply enforce. But, giiven that you find it hard to enforce integrity on your own numbering system, I can only imagine how hard you find it with actual data.

2005-11-05 Reply Admin

Take a look at the following excerpt.

http://www.acm.org/classics/nov95/s1p4.html

Please tell me: what is a man#? Who assigns it and maintains it? How is it a natural data attribute of a person?

Jeff S · 2005-11-05 Reply Admin

growse:
So, if I have a table with just unique email addresses, doesn't it make sense to make an artificial key if I ever want to do any foreign-key-ing?

Say I have another table, which has record which relate to email addresses, surely it'd take up less room to just have an integer pointing to the index of the email address table, rather than have the whole email address? Especially if there are 4000 records for one email address?

Sure, natural keying might be "neater" on the email address table, if I've got other properties like name, address in there, but when I come to foreign key that, isn't it quick and more efficient to have an artificial index key thingie?

Again, and I apologize for constaintly repeating myself, can you have more than 1 entry for the same email address in your "master" table of email addresses? An identity as the PK allows this unless you add a unique constraint on email address.

And, again, as I've said over and over there are many situations where you might want to add a surrogate key, and identities are great for those cases. It's not one way or the other in all cases. It is just scary to read some of these comments, where natural keys are a "last resort" or "should never be used" or people who don't understand the difference between a primary key and an identity.

I would estimate, if you poll 100 database "experts" out there and ask them to describe a primary key, 50 of them will say "it's a unique autonumber of some sort you add to a table", 40 will say "it is the column that uniquely identifies each row in your table", and probably only 10 will say (the correct answer) "it is the set of columns in your table that uniquely identifies each row." Far too many database experts out the have never joined on more than 1 column between two tables, or never created a multi-column fk constraint, and not only don't know how, they don't know that they even can!

2005-11-05 Reply Admin

murphyman:
Anonymous:

The problem is that using the composite key (Company/Account) to refer to this record from other tables can actually threaten data integrity, rather than ensuring it. Because the individual columns of "Company" and "Account" have meaning outside the database, so their contents are "free" in other contexts within the database.

That's confusing. Let's consider an example.

No, it's not confusing, merely confused. Data is just data as far as a schema is concerned - meaning is irrelevant. That's what presentation / interpretation / datawarehousing is for.

Anonymous:

You can choose to (a) use an index internal to the database to refer to the company/account, or (ii) use the composite company/account. Except that under (ii) you can't ensure the company/account is unique, and you can't ensure it actually corresponds to an entry in the accounts table. Under (a) you can't really ensure the index has a corresponding entry, but because it's internal to the database, you can come a lot closer to ensuring that fact: because it's internal, nobody is tempted to make it visible except through a lookup in the "accounts" table.

Foreign Keys anyone? Data integrity is incredibly important for commercial data - it is not something you want to "come a lot closer to" - its something you simply enforce. But, giiven that you find it hard to enforce integrity on your own numbering system, I can only imagine how hard you find it with actual data.

Some days it's just not worth explaining things to people. They just don't get it. They think that data integrity is guaranteed because of good design, that things can never go wrong if only some simple rules are followed. And they never understand when something lighthearted has been injected into the discussion: their comprehension doesn't run that deep.

It makes me want to scream. It makes me want to call them names, but they won't recognize the truth there, either.

Data integrity is never guaranteed by good design. But it is almost impossible with bad design.

We've discussed various aspects of bad design. And yes, foreign keys are a valuable (if performance reducing) tool which are available.

But let's talk about another aspect of bad design: cross-populating multiple tables with keys based on data that you don't own.

Some have advocated vociferously for embedding a company's name across multiple tables, as part of the best possible key to use. Except that they forget that the real world works on a multidimensional "best", instead of their unidimensional theoretical best.

They forget that companies change name. They get absorbed into their parent holding companies, spun out from them, and acquired by other companies. Each time, a decision is made as to whether it's a new relationship or the same relationship under a different name. And their data is screwed because they used data they don't own all over the place.

Jeff S · 2005-11-05 Reply Admin

Anonymous:
Take a look at the following excerpt.

http://www.acm.org/classics/nov95/s1p4.html

Please tell me: what is a man#? Who assigns it and maintains it? How is it a natural data attribute of a person?

Again (boy I am tired of starting every post with "Again") there is nothing wrong with deciding to assign a number to an entity upon creation, or to use that number as the primary key in your table, if that is part of your business process. And, in fact, you might even choose to use an identity column to do this! Why not, that's what it does, right? If your business process states that creating a unique code per customer is useful, then do so. If that business process states that a good time to assign that code is upon entry into the database, then there you ! And guess what? In that case, that identity is a NATURAL key ! That's right, "anti-natural key" folks, some of your identities out there are NATURAL KEYS! The horror! Quick, time for a re-design! This cannot be right! Our keys MUST be meaningless, the Access 97 "new table wizard" told me so!

Guys and gals, long before relational databases existed, people have recognized the advantage of assigning numeric codes to entities to keep things short and simple, for tagging items, customer reference codes, printing on invoices, etc. A numeric code being assigned to an entity as part of your business process NOT the same as tagging every row of every table with a meaningless, random ID, and then using that meaningless, random ID to completely definie all relations in all tables throughout your system!

Jeff S · 2005-11-05 Reply Admin

Anonymous:

But let's talk about another aspect of bad design: cross-populating multiple tables with keys based on data that you don't own.

Some have advocated vociferously for embedding a company's name across multiple tables, as part of the best possible key to use. Except that they forget that the real world works on a multidimensional "best", instead of their unidimensional theoretical best.

They forget that companies change name. They get absorbed into their parent holding companies, spun out from them, and acquired by other companies. Each time, a decision is made as to whether it's a new relationship or the same relationship under a different name. And their data is screwed because they used data they don't own all over the place.

Hmmmm ... I don't recall anyone hear saying that a good choice for a key would be an external customer name for a table ....

If you want to join the debate, scream a little, let it out, and then start addressing the actual points raised in the posts and don't make up your own.

However, it does give me a good chance to address another of my favorite "anti-natural key" reasons: Because external company names can change! Or people can get married! Therefore, you should never use natural keys because your system becomes a mess!

OK, let's tackle that one ... the premise is "an external company name might change" and the conclusion is "therefore, never use natural keys". Hmmm .. I might rework this a little and say " an external company name might change" with a conclusion of "therefore, don't use an external company name as a primary key". Hmmm .. Makes a little more sense, right? now we are not jumping to absolute conclusions, right?

So, here's what we have learned: Don't use external data that can change as your PK. OK, done. Now, how does that tell me that natural keys are bad? Based on that premise, have you proved me to that natural keys are bad and should never be used, and that all tables must have random, meaningless ID columns added?

2005-11-05 Reply Admin

Jeff S:

Again (boy I am tired of starting every post with "Again") there is nothing wrong with deciding to assign a number to an entity upon creation, or to use that number as the primary key in your table, if that is part of your business process. And, in fact, you might even choose to use an identity column to do this! Why not, that's what it does, right? If your business process states that creating a unique code per customer is useful, then do so. If that business process states that a good time to assign that code is upon entry into the database, then there you ! And guess what? In that case, that identity is a NATURAL key ! That's right, "anti-natural key" folks, some of your identities out there are NATURAL KEYS! The horror! Quick, time for a re-design! This cannot be right! Our keys MUST be meaningless, the Access 97 "new table wizard" told me so!

Guys and gals, long before relational databases existed, people have recognized the advantage of assigning numeric codes to entities to keep things short and simple, for tagging items, customer reference codes, printing on invoices, etc. A numeric code being assigned to an entity as part of your business process NOT the same as tagging every row of every table with a meaningless, random ID, and then using that meaningless, random ID to completely definie all relations in all tables throughout your system!

Jeff,

I think we agree that not every single table needs a numeric ID column. But I would argue that the reason many "experts" fall on such a practice is that, in the large majority of business applications, entities like "Parts", "Orders", and "Employees" all commonly have an otherwise meaningless ID number assigned to them. It is absolutely unnecessary to have an ID in a relationship table, e.g.:

ID     EmployeeNo     DepartmentNo
 1             50               17

I don't think anyone here disagrees. What I think is misleading is people conflating your argument with the argument that "No entity should ever have a meaningless identity number!"

Jeff S · 2005-11-05 Reply Admin

Anonymous:
Jeff S:

Again (boy I am tired of starting every post with "Again") there is nothing wrong with deciding to assign a number to an entity upon creation, or to use that number as the primary key in your table, if that is part of your business process. And, in fact, you might even choose to use an identity column to do this! Why not, that's what it does, right? If your business process states that creating a unique code per customer is useful, then do so. If that business process states that a good time to assign that code is upon entry into the database, then there you ! And guess what? In that case, that identity is a NATURAL key ! That's right, "anti-natural key" folks, some of your identities out there are NATURAL KEYS! The horror! Quick, time for a re-design! This cannot be right! Our keys MUST be meaningless, the Access 97 "new table wizard" told me so!

Guys and gals, long before relational databases existed, people have recognized the advantage of assigning numeric codes to entities to keep things short and simple, for tagging items, customer reference codes, printing on invoices, etc. A numeric code being assigned to an entity as part of your business process NOT the same as tagging every row of every table with a meaningless, random ID, and then using that meaningless, random ID to completely definie all relations in all tables throughout your system!

Jeff,

I think we agree that not every single table needs a numeric ID column. But I would argue that the reason many "experts" fall on such a practice is that, in the large majority of business applications, entities like "Parts", "Orders", and "Employees" all commonly have an otherwise meaningless ID number assigned to them. It is absolutely unnecessary to have an ID in a relationship table, e.g.:
ID     EmployeeNo     DepartmentNo
 1             50               17
I don't think anyone here disagrees. What I think is misleading is people conflating your argument with the argument that "No entity should ever have a meaningless identity number!"

You are probably right. I tried to state over and over that surrogate keys have their place, but people read/understand what they want. The issue is those people who don't understand the difference between a pk and an identity, those that think "joins are too hard" w/o identities and thefore always use them, and those in the camp that you should NEVER have a natural key or anything other than an identity. Unfortunately (and unlike the assumption many are making about me always being against surrogates) those statements have been made, many times and repeated over and over, and that is the worst part.

UncleMidriff · 2005-11-05 Reply Admin

Here come a bunch of posts in response to some of the other posts in this thread. Sorry for the post blitz:

Jeff S:
Hey Uncle -- read the thread I posted a link to. Sorry for the smug comment, I tried pretty sincerely and with a great deal of effort to discuss the topic previously to no avail before.

I have read that thread...and reread it and thought about it and read it again, etc. You (and Alex) made smug comments that pissed me off in that thread as well. And while I don't remember the specific content of each of your posts in that thread, I did read them, and I do remember not being convinced. Maybe if I have time later I'll go back and read it again, but be sure that I have read the entire thread at least twice..

UncleMidriff · 2005-11-05 Reply Admin

Jeff S:

Jeff S:
Hey Uncle -- read the thread I posted a link to. Sorry for the smug comment, I tried pretty sincerely and with a great deal of effort to discuss the topic previously to no avail before.

Oh, and by the way, what did I say that was so bad that insulted you so much? If you think that an identity and a primary key are the same thing, then what is the problem? I didn't say it wrong to think that way in my "smug" post, did I?

First of all, I'd like to apologize for the flame-y-ness of my post. I was already hacked off about something at work and I posted without remembering that every freaking time I make a flame-o-rific post, I end up regretting it. Anyway...sorry.
What insults me so much is that I am being told that the way I design databases, using artificial primary keys, is a) fundamentally incorrect, b) obviously stupid, c) will surely result catastrophic failure and financial ruin for all parties involved at some point in the future (or at least such catastrophe has happened in the past and artificial keys were obviously to blame). Being relatively fresh out of college, this scares me to death, especially coming from people like Alex and yourself, who are much more experienced than I am.

Then, when I press for an explanation as to why using artificial primary keys is fundamentally incorrect, obviously stupid, and will result is disaster, I get a handful of examples of really crappy databases that used artificial primary keys and eventually blew up in the faces of everyone involved. Essentially, all that those examples tell me is that a really crappy database that uses artificial primary keys is a really crappy database; they don't explain how artifical primary keys, own their very own, make a database crappy.

At this point I should point out that you've done a better job of steering clear of this kind of thing than most of the other natural key proponents on this site. Although you do mention that you've had to rescue artificial primary key proponents over at sqlteam.com, you also at least try to give some -good- examples of when/why artificial primary keys are bad. None of them have convinced me yet, but at least you've tried.

Anyway, I'm told by you guys, that have more experience than I do, that the way I'm doings things is horribly wrong. My experience, education, and discussions with other people more experienced than myself disagree, so I ask for clarification. What I get is mostly scary stories and smugness. I find it insulting when people tell me what I do sucks and then can't or won't give a good reason as to why.

And now, for a bit of clarification:

I like artificial primary keys a lot.
I do not thnk every database table in existence should use them.
I do not think natural primary keys are "teh OMG Evils!11!"
I do not think that "an identity and a primary key are the same thing"

Jeff S · 2005-11-05 Reply Admin

UncleMidriff:
Here come a bunch of posts in response to some of the other posts in this thread. Sorry for the post blitz:

Jeff S:
Hey Uncle -- read the thread I posted a link to. Sorry for the smug comment, I tried pretty sincerely and with a great deal of effort to discuss the topic previously to no avail before.

I have read that thread...and reread it and thought about it and read it again, etc. You (and Alex) made smug comments that pissed me off in that thread as well. And while I don't remember the specific content of each of your posts in that thread, I did read them, and I do remember not being convinced. Maybe if I have time later I'll go back and read it again, but be sure that I have read the entire thread at least twice..

OK, then, read up in this thread in which I made more sincere efforts to clear things up, and feel free to join the debate. I hope you can appreciate my honest effort not to preach but to hopefully help others to improve their database knowledge.

UncleMidriff · 2005-11-05 Reply Admin

RevMike:
Ytram:
I have yet to see someone explain to me how using identities is NOT relational. Perhaps you can tell me without telling me to go read a book by Date.

It is relational, just not third normal. "The key, the whole key, and nothing but the key, so help me Codd."

If there is a true natural key and an artificial identity then your row fails the "nothing but the key" test and the design is not third normal.

Designers should be very careful, however, when choosing a natural candidate key as a primary key. Some things work in limited domains but fail easily has the system gets extended. For example, I once maintained a personnel system that used social security number as a primary key for person. It failed when we had to 1) add contract workers for whom we had no SSN and 2) we had guest employees who worked for a foreign sister company come to the US temporarily. Also there are a (fairly small) number of people who have been issued duplicate SSNs.

Can you give me a reference or an example explaining why the use of artificial primary keys breaks third normal form? I wasn't aware that this was so.

UncleMidriff · 2005-11-05 Reply Admin

Anonymous:
UncleMidriff:

Why do you suck so hard at explaining why artificial primary keys are bad relational database design? I and several other people here have asked you and the other natural primary key proponents here on more than one occassion to convince us of your position. All we get is the same old bullshit, straw men arguments and comments like yours above to Alex, every damn time. Perhaps if you cannot defend your position (acting smug isn't a valid defense) you should find a new one.

Give me a break, throw out a non-utteraly fallacious argument, and I won't throw out the straw man card. Arguing against ID columns is ok with me (Hint: I never proclaimed to be an ID column advocate). However, to make up your own invalid argument, such as this rediculous example and claim that it has any bearing at all for the pro ID crowd is complete BS. It certainly does not constitute a "valid defense".

I'm not sure what to make of your post. The post of mine that you quoted was directed at Jeff S. and at him alone. Also, based on the rest of your posts in this thread, I'd say we agree with each other. Sorry for the confusion.

UncleMidriff · 2005-11-05 Reply Admin

Alex Papadimoulis:

Ytram:
I have yet to see someone explain to me how using identities is NOT relational. Perhaps you can tell me without telling me to go read a book by Date.

Did anyone else trip on this? Let's take a trip across business lines ...

    * Please explain to me why it's a bad idea to go above 3% copper *without* explaining the fundamentals of mettalurgy and alloys.
    * What's so wrong with booking accrual expenses in the actuary accounts? And don't tell me to go learn the "accounting" nonesense.
    * So what if I don't check for liens prior to appraisal? And don't get all into the "basics of the mortgage process" with me.

You need to learn the fundamentals of something before you can understand why what you are doing is fundamentally wrong.

I think we understand that just fine. And I know I wouldn't be opposed to you refuting my stance that artificial primary keys are A-OK with something like, "No, Uncle Midriff, you're wrong. See, here in "An Introduction to Database Systems, Eighth Edition [Hardcover] by Date, C.J.", chapter 6, he says that artificial primary keys aren't valid in the relational database model because..."

But that's not what we get from you. Instead, we get something like, "No, you're wrong. Artificial primary keys aren't valid in the relational database model. Date, Codd and all the books on the subject I've read agree. Trust me."

And now, for some more clarification:
I understand that some folks consider identities in the various relational database management systems in existence to be non-relational because, in some of those systems, once you specify that a column is an identity column it can't be changed. That makes at least a small amount of sense to me, but I don't think it has much to say about the use of artificial primary keys in general.

UncleMidriff · 2005-11-05 Reply Admin

Anonymous:

rikkus:
I'm still waiting to hear an argument against identities from someone who wasn't just trying to make themselves appear to be enlightened in the zen of database design. The 'arguments' always seem to be of the angry-significant-other kind: "Well, if you don't know what's wrong, I'm not going to tell you!" or the I've-been-developing-for-200-years kind: "You try using artificial primary keys in an Ultra Enterprise system and see where that gets you, sonny!"

There are good logical arguments for using either approach, but there seems to be some kind of snob value in avoiding artificial keys, perhaps due to the seeming 'purity' of a 'naturally' keyed system.

There's little point arguing the toss here.

As was pointed out above one of the major simple arguments against artificial keys is Codd's (the guy that got us all hooked on relational databases) insistence that there be nothing in that database but real data of real use in describing the data. Thus any 'magic' identity column you tack onto a tuple that doesn't have a real world meaning breaks the model and allows you to put two otherwise identical tuples into your table which then allows you to alter one and not the other and then you have all the data redundancy issues that relational databases were supposed to save us from. Now at this point you can hop up on your high horse and say that you could constrain your columns to not allow data redundancy. That is true but now you've added two artificial properties to your data, you've added an artificial key that is meaningless to the data itself and you've had to add an artificial set of constraints to your table in order to preserve the data integrity that would have already been protected by the system had it been used properly in the first place. So in reality you've actually complicated your DB (+1 column, + what ever is built to support your 'fake'
constraints) in order to make your life as a programmer simpler. It also offends the OO and Relational purists in that you are exposing implementation details. According to Codd if you want information about purple elephants then you ask the DB for information about purple elephants and there should never ever be a way to get information about purple elephants without asking for information about purple elephants or some superset that includes purple elephants. If you have artificial keys then you can ask for some nosense like 123 and get information about purple elephants... now what the f* does 123 have to do with purple elephants nothing except in your particular implementation of the DB. Now your going to have code that is asking the DB for 123 and it won't make sense where as code that asks the DB for purple elephants will.

I see I've begun to ramble, anyway to make a long story short there is probably not a really nice pat answer to your question. Any explaination as to why fake keys are bad is going to go into deep theory and what not because finding the best way to represent data is a 'hard' question that hasn't really been solved even now, though we are doing much better than we used to when we were stuck with Flatfile and CODASYL systems. Also every real world DB system has all sorts of crutches to allow real programmers to abuse the relational model to get data into it how they want and use it the way they want so you can 'work around' most of the problems of breaking the relational model and many people just expect this. Things like non-key constraints, 'like' operators, stored procedures and what not all break the model but people need and use them every day.

I'd like to thank you for your well thought out post, and I'd also like a little bit of clarification if you have the time.

Can you give me any sort of reference to support the claim that Codd insists that if I need information about purple elephants I should ask the database for information about purple elephants and not for information about entity #123? While that does make some sense, I wonder (sincerely, not merely trying to be an ass) what Codd would have to say about a column named "Cust#" in a table containing customer information? Would he insist that in order to get information about Adam Collins at 3446 N. Washington Ave, Oklahoma City, Oklahoma, 73516 that I ask the database for information about Adam Collins at 3446 N. Washington Ave, Oklahoma City, Oklahoma, 73516 instead of asking the database for information about Customer #4452? If not, why not?

Also, how exactly do stored procedures break the relational model? I've always understood them to be primarily and merely saved SQL statements.

UncleMidriff · 2005-11-05 Reply Admin

Jeff S:
Anonymous:

Now at this point you can hop up on your high horse and say that you could constrain your columns to not allow data redundancy. That is true but now you've added two artificial properties to your data, you've added an artificial key that is meaningless to the data itself and you've had to add an artificial set of constraints to your table in order to preserve the data integrity that would have already been protected by the system had it been used properly in the first place. So in reality you've actually complicated your DB (+1 column, + what ever is built to support your 'fake' constraints) in order to make your life as a programmer simpler. It also offends the OO and Relational purists in that you are exposing implementation details. According to Codd if you want information about purple elephants then you ask the DB for information about purple elephants and there should never ever be a way to get information about
purple elephants without asking for information about purple elephants or some superset that includes purple elephants. If you have artificial keys then you can ask for some nosense like 123 and get information about purple elephants... now what the f* does 123 have to do with purple elephants nothing except in your particular implementation of the DB. Now your going to have code that is asking the DB for 123 and it won't make sense where as code that asks the DB for purple elephants will

Good stuff. That's one of the same points I made in the thread mentioned earlier, one which is commonly ignored.

To simplify: You need a table of U.S. states. You create "StateID" as PK. One day, we get two entries for "North Dakota". So we add a constraint saying only 1 entry per state name. Then someone enters "N. Dakota". More problems. Now what? Looking at the data, we realize we can add a constraint saying only 1 entry per 2-char abbreviation. Now we are feeling good. And, of course, this all could have been avoided if we just declared the PK as the standard 2-char state abbreviation in the first place, and now we have extra columns and extra constraints in our table, and a meaningless "StateID" value all over our place in which a join is required just to print the 2-char abbreviation.

In a table of U.S. States, State name and State abbreviation are both candidate keys and therefore both need uniqueness constraints, regardless of whether or not you decide to append an artificial primary key to the table. So, adding an artificial primary key to the table adds exactly one column and exactly one constraint. Your statement makes it seem that adding an artificial primary key to the table requires the addition of unnecessary uniqueness constraints to the other columns in the table, and that is incorrect.

Also, most reasonable artificial primary key proponents, including myself, would probably agree with you that a table of U.S. States probably doesn't really need an artificial primary key, so using it as an example of why artificial primary keys are bad in general is somewhat lacking.

Jeff S:
Another one I see often is "OK, then, what is the natural PK of an address table? It must be all columns!" no ... it's whatever the PK of your entity is, plus an addressType identifier if your entities can have more than 1 address. (and/or maybe a "status" of some sort) Storing more than 1 address for an entity without explicitly classifying the difference between the two is completely, uterly pointless, and something that identies often not only allow but encourage. But if I force you to have exactly 1 "billing" address, and 1 "home" address, and so on, by forcing you to create or assign types for each one you add, then your data has meaning and your address table has a perfectly natural, intelligent primary key.

You can and should do all that even if you're using artificial primary keys. Just because I tack an AddressID column onto an address table to make life simpler doesn't mean I shouldn't or can't enforce the uniquness constraints you specify.

Now, what if Joe, Sue, Bubba, and Bob all live together and join your club. You'd have four rows in your address table all with the exact same data save for the Customer foreign key. While this would work well, and I can't forsee it causing any problems, it bugs me that we're storing the exact same address more than once. I might propose that we move all the address info to a separate table, append some kind of AddressID to it, and then add a reference to that AddressID to our first table. That way, all the same constraints are enforced, but instead of repeating the entire address 4 times, were only repeating the small AddressID value 4 times. What do you think? Have I gone off the deep end?

UncleMidriff · 2005-11-05 Reply Admin

Jeff S:
John Smallberries:
*The canonical example of this is a bank using AccountNumber as the PK and printing it on a person's statements. Works fine until you merge with another bank. Then if the AccountNumbers have differing formats, (or worse, overlap) you're in for a big rewrite.

Another one of my favorite examples. So, you use an identity to represent acount 1000 (say, "Cash"), instead of it's account number, "1000". So in your master Accounts table, the table that defines each account, the question is: does your database stop you from having multiple accounts numbered "1000"?

Hopefully, assuming you are just a little misguided about identities, have good data modelling skills, and are working hard to implement a good, solid, reliable and accurate database model, you have two answers to give:

* No, There can only be one account number 1000. We add a unique constraint on Account Number, of course! We want a reliable database with integrity.

or

* Yes, but that's because we have more than one company, so each company can have an account number 1000. So, we have a unique constraint on the combination of comany/account. This unqiue constraint allows us to have a reliable database with integrity so we know each company can only have each account number entered once.

In situation #1, the "No" answer, you have a useless identity as a primary key in addition to your unique constraint. So you have wasted valueable space that many claim to save by using an identity, and added an extra index as well. And now, of course, you cannot expand your table when you merge, since your constraint doesn't allow it. Nothing gained by using your indentity column in your "merger" scenerio. And every table that has an "AccountID" column requires a join just to display the AccountNumber, or to do a simple sort, that otherwise would not be necessary if that number was stored in your tables directly as a natural foreign key.

In situation #2, you are have the same redundant/useless primary key scenerio, and of course you have indicated what a better natural key of your table should be -- a composite of Company/Account. And guess what happens when you need to expand when you take over more companies !?? Your table will allow it either way. So, again, nothing gained by using your identity, and the difference is you have an extra, useless identity column and index to store your artifical key. In addition, any table that contains a reference back to your Accounts table can only an AccountID listed and no Company anywhere, right? (Your answer needs to be yes if you believe in normalization, otherwise you have redundant data). So if I want all transactions for a Company, I need extra joins -- horror, all those "hard joins" that we were trying to avoid -- just to list the transactions instead of filtering my transaction table directly on the company of the transaction.

I'd answer with the first option, until a merger happened, as it is correct until such a time. Then when a merger happened, I'd add a Company column to the accounts table and change the uniqueness constraint to include the new column. Then I'd say, "Done!"

If I understand you correctly, when a merger happened, you'd add a Company column to the accounts table, change the primary key on the table to include the new column, and then change each and every row in each and every table that references the accounts table to include the Company column in the foreign key.

Each method has it advantages and disadvantages. I like my method better than yours is all I'm sayin'.

Jeff S:
The only way that identities "help" you in this situation is if you have a weak data model and wish to just keep stuffing random account numbers and values into your tables, because your identity will allow it, when you have that big merge. Not good, fellas. As I've mentioned before, literally *every day* at sqlteam.com we have to help people unravel messes like that.

There you go again, acting like only no-talent ass-clowns would ever consider using artificial primary keys.

UncleMidriff · 2005-11-05 Reply Admin

Jeff S:
A numeric code being assigned to an entity as part of your business process NOT the same as tagging every row of every table with a meaningless, random ID, and then using that meaningless, random ID to completely definie all relations in all tables throughout your system!

Why?

How does a randomly generated number from a business rule differ so much from a randomly generated number from the developer? Why is the former ok, but the later magically breaks the relational model into a billion pieces, makes both Codd and Date cry bitter tears, and makes your entire business explode?

</hyperbole>

UncleMidriff · 2005-11-05 Reply Admin

And so ends the post blitz. Sorry, but I didn't have time to respond -and- make sure my posts were ready to be posted before just now, so when I finally got some time I decided to post them all at once. If this upset any of you, please feel free to insult my mother.

And Jeff, thanks for taking the time you have taken to better explain your position. Some of the first posts I just put up probably sound a little more antagonistic than they should, given some of the things you've said in this thread between when I wrote them and posted them. Sorry for that. Based on your last few posts, you seem more reasonable to me now than you did when I was writing the first few posts.

2005-11-05 Reply Admin

oh cmon! this is a real WTF.. been visiting this site for months now and still can' belive it

does that person know that there a way to get date infos on every Database system??!!!

its surely not standard

2005-11-05 Reply Admin

Time to rewrite the isTrue function:

function isTrue($x) {
$db = open_connection('oracledb','tWTF','1234');
exec_sql($db, 'CREATE TABLE temp ( tempval int ) AS SELECT COUNT(*) FROM tBoolean WHERE tBoolean.BooleanID = '.$x.' AND t.BooleanName = 'Yes' );
$true = exec_sql($db, 'SELECT * FROM temp').row[0][0];
exec_sql($db,'DROP TABLE temp');

if (!isFalse($true)) {
return $true;
} else {
return $false;
}

close_connection($db);
}

ammoQ · 2005-11-05 Reply Admin

We've gone through this whole discussion before and IMO it's more about religious believe than anything else. I must admit that I'm shocked by the amount of premature arrogance shown here.

John Smallberries · 2005-11-05 Reply Admin

ammoQ:
We've gone through this whole discussion before and IMO it's more about religious believe than anything else. I must admit that I'm shocked by the amount of premature arrogance shown here.

There is a little zealotry here, but I think it's more "theory" vs. "the real world".

Now, I'm big on theory, and I fully appreciate that the application of theory helps us in engineering. That said, sometimes theory cannot be applied to the letter because of real world constraints.

I would love to design a database on top of solid and immutable business rules. So far, I have never had this luxury...not even close. Businesses change (rapidly) and an agile database make life much easier.

I believe anyone can fuck up a DB design whether they use natural keys or surrogate keys. Care must be taken ensure integrity no matter what. I use natural keys where they make sense, but surrogate identities where there is no concise natural key I can depend on not changing. I add alternate keys and inversion entries where appropriate, and enforce FKs with constraints. I'm concerned about scalability & performance (in that order) but not really about space because that's cheap.

Jeff: I appreciate your arguments, but instead of repeatedly beating the same dead horse (and offending everybody by implying they have a lack of knowledge or skill), why don't you try seeing it from our point of view, and speak to pragmatics, not theory.

Jeff S · 2005-11-05 Reply Admin

John Smallberries:
I use natural keys where they make sense, but surrogate identities where there is no concise natural key I can depend on not changing. I add alternate keys and inversion entries where appropriate, and enforce FKs with constraints. I'm concerned about scalability & performance (in that order) but not really about space because that's cheap.

Now, that is pretty different from your earlier comments, and 100% agrees with every post I've made here.

John Smallberries:

Jeff: I appreciate your arguments, but instead of repeatedly beating the same dead horse (and offending everybody by implying they have a lack of knowledge or skill), why don't you try seeing it from our point of view, and speak to pragmatics, not theory.

Each example I've given, each argument I've refuted, has absolutely nothing to do with theory. Every one has been a real-world scenerio, I don't think I've used "tuples" or "widgets" or anything purely theoretical. Honestly, believe or not, I don't care much for theory. My goal is all practical: modelling accurate data, planing for the future, ensuring RI, and so on. Now, some call those goals "theoretical" and "unattainable", but I've had pretty good success with it and it's pretty easy once you get into the habit of taking the time to plan ahead and come up with a good solid model. Which, of course, often includes surrogates where appropriate.

I don't choose natural keys or surrogate keys because of a rule, or because some one told me to, or because I want to look smart. I use the best key where appropriate after because they help contribute to an intelligent db design. People who don't do this, and blindly follow rules or guidelinees w/o thinking intelligently about their design (i.e., just add an ID to each table!) are the ones I am speaking to, so if you are not in that camp you have no reason to be offended.

2005-11-05 Reply Admin

This thread has made me wince several times.

So much confusion over logical versus physical...

I’ll say up front I lean more to the “natural” side of things simple because I spend most of my time modelling databases. At this level there is no such thing as “identity” properties or “autonumbering”. just attributes wanting a nice safe place to live….. “Won’t someone think of the non-key attributes!.” :-)

When it comes time for my beautiful models to be “implemented”, they ALWAYS get mangled. Theory is forced to live with reality… Artificial keys appear and other weird by-products of the SQLisation.process.

It is not really OK, but I can cope. Why? Because as long as I have been able to exactly duplicate my logical model’s constraints and purpose I don’t really care.

“We can’t create this ternary junction table because it exceeds the key limit.” “Damn! We’ll have to chuck a surrogate candidate key in back here.. Damn you SQL, damn you all to hell!”

emurphy · 2005-11-05 Reply Admin

Jeff S:
However, it does give me a good chance to address another of my favorite "anti-natural key" reasons: Because external company names can change! Or people can get married! Therefore, you should never use natural keys because your system becomes a mess!

OK, let's tackle that one ... the premise is "an external company name might change" and the conclusion is "therefore, never use natural keys". Hmmm .. I might rework this a little and say " an external company name might change" with a conclusion of "therefore, don't use an external company name as a primary key". Hmmm .. Makes a little more sense, right? now we are not jumping to absolute conclusions, right?

So, here's what we have learned: Don't use external data that can change as your PK. OK, done. Now, how does that tell me that natural keys are bad? Based on that premise, have you proved me to that natural keys are bad and should never be used, and that all tables must have random, meaningless ID columns added?

Well, now you've gone and asked about the absolute case again, with "never" and "all". Let me back up to the part that I fully agree with, and respond from there:

"Don't use external data that can change as your PK."

Corollaries:

External data can often change.
Even if you think it never ever will.
The only sufficiently safe exception to #2 that comes to mind offhand is when the external data is some sort of glacially-moving global standard, such as USPS state abbreviations.

Questions, since I have this nagging feeling that another big chunk of the argument (once absolutism and "external data that can change" are cleared up) is a matter of terminology:

Suppose you have a table of customers. There is a column CustID (varchar) whose value is chosen by a human being, which must be unique, which is printed on statements and other things. There is a column CustKey (int) whose value is determined by a standard auto-increment routine, which must be unique, which is never printed on anything (used only for joins).

Do you use "external key" to mean "key which is printed or displayed on screen during normal usage of the software", and "internal key" to mean "key which is not"?
Do you use "natural key" as synonymous with "external key", and "artificial key" as synonymous with "internal key"?

If the answers are all "yes", then CustID is an external key which can change; so we should not make it the PK; so, since any other external keys are probably also changeable and thus cannot be the PK, we must implement CustKey and make it the PK.

The original post is not really a key argument at all, but rather "examples of data which doesn't even need its own table". Unless (as someone else pointed out) you need to crunch such a huge amount of data that encoding, say, weekend/holiday rules in a table is actually necessary for efficiency.

John Smallberries · 2005-11-05 Reply Admin

Jeff S:
John Smallberries:
I use natural keys where they make sense, but surrogate identities where there is no concise natural key I can depend on not changing. I add alternate keys and inversion entries where appropriate, and enforce FKs with constraints. I'm concerned about scalability & performance (in that order) but not really about space because that's cheap.

Now, that is pretty different from your earlier comments, and 100% agrees with every post I've made here.

No, I've been pretty consistent:

John Smallberries:

Anyway, all thoeretical points aside, I advocate the use of meaningless, artificial keys unless there is a very good reason not to use them. I find that (in the "real world") you get into far less trouble with PKs that have no other semantics and simply provide uniqueness: tables are smaller (which usually indicates better normalization), keys are smaller and easier to manage (for code & people), you don't run into "changing the PK value" problem*, etc.

I use natural keys far less frequently than surrogate keys.

And I do it simply to look smart.

2005-11-05 Reply Admin

John Smallberries:
Anyway, all thoeretical points aside, I advocate the use of meaningless, artificial keys unless there is a very good reason not to use them. I find that (in the "real world") you get into far less trouble with PKs that have no other semantics and simply provide uniqueness: snip..

Maybe you just worded that poorly, but that just seems plan weird.

I am trying to fathom a model that has a “meaningless” key and yet that same key is meant to be the backbone for dependencies and relationships.

Perhaps what you meant to say (and I hope) is this…

“After identifying and IMPLEMENTING the natural key, its suitability for us in RI is evaluated and usual quickly rejected. We find that adding a meaningless candidate key of small physical size improves certain sql operations and seems to be easier to maintain from a cascading RI point of view.”

murphyman · 2005-11-05 Reply Admin

John Smallberries:

There is a little zealotry here, but I think it's more "theory" vs. "the real world".

Now, I'm big on theory, and I fully appreciate that the application of theory helps us in engineering. That said, sometimes theory cannot be applied to the letter because of real world constraints.

I would love to design a database on top of solid and immutable business rules. So far, I have never had this luxury...not even close. Businesses change (rapidly) and an agile database make life much easier.

I believe anyone can fuck up a DB design whether they use natural keys or surrogate keys. Care must be taken ensure integrity no matter what. I use natural keys where they make sense, but surrogate identities where there is no concise natural key I can depend on not changing. I add alternate keys and inversion entries where appropriate, and enforce FKs with constraints. I'm concerned about scalability & performance (in that order) but not really about space because that's cheap.

Jeff: I appreciate your arguments, but instead of repeatedly beating the same dead horse (and offending everybody by implying they have a lack of knowledge or skill), why don't you try seeing it from our point of view, and speak to pragmatics, not theory.

A-frikkin-men. Best post I've read so far on this topic.

Databases should be conceptually sewn-up - but in practice it is rarely so. Hands up here whose had to work on tables with 90+ columns? I've seen bizareness to the extent of columns called SUBSCRIPTION on an 'account' table, closely followed by columns SUBSCRIPTION_2001, and SUBSCRIPTION_2002 (none after that - clearly they employed someone in 2003 with a bit of sense). Databases, even in big, important companies, sometimes have scarily arranged databases. In those situations I view 'arbitrary' PKs as the way out of the hell that's been created. Yes - maybe in the perfect world we don't need them, but I for one don't work there...

2005-11-06 Reply Admin

This kind of retarded schema makes it virtually impossible to historical queries, eg date BETWEEN start AND end

Also, to the person complaining that changing their PK requires lots of changes, if your data schema allows/requires changes to your PK's then you should be posting it here as a WTF in itself.

2005-11-06 Reply Admin

John Smallberries:
ammoQ:
We've gone through this whole discussion before and IMO it's more about religious believe than anything else. I must admit that I'm shocked by the amount of premature arrogance shown here.

There is a little zealotry here, but I think it's more "theory" vs. "the real world".

Now, I'm big on theory, and I fully appreciate that the application of theory helps us in engineering. That said, sometimes theory cannot be applied to the letter because of real world constraints.

I would love to design a database on top of solid and immutable business rules. So far, I have never had this luxury...not even close. Businesses change (rapidly) and an agile database make life much easier.

I believe anyone can fuck up a DB design whether they use natural keys or surrogate keys. Care must be taken ensure integrity no matter what. I use natural keys where they make sense, but surrogate identities where there is no concise natural key I can depend on not changing. I add alternate keys and inversion entries where appropriate, and enforce FKs with constraints. I'm concerned about scalability & performance (in that order) but not really about space because that's cheap.

Jeff: I appreciate your arguments, but instead of repeatedly beating the same dead horse (and offending everybody by implying they have a lack of knowledge or skill), why don't you try seeing it from our point of view, and speak to pragmatics, not theory.

That's asking a lot from a man who has demonstrated that he doesn't have the required depth.

2005-11-06 Reply Admin

UncleMidriff:
Anonymous:

... snip...
According to Codd if you want information about purple elephants then you ask the DB for information about purple elephants and there should never ever be a way to get information about purple elephants without asking for information about purple elephants or some superset that includes purple elephants. If you have artificial keys then you can ask for some nosense like 123 and get information about purple elephants... now what the f* does 123 have to do with purple elephants nothing except in your particular implementation of the DB. Now your going to have code that is asking the DB for 123 and it won't make sense where as code that asks the DB for purple elephants will.
...snip...

I'd like to thank you for your well thought out post, and I'd also like a little bit of clarification if you have the time.

Can you give me any sort of reference to support the claim that Codd insists that if I need information about purple elephants I should ask the database for information about purple elephants and not for information about entity #123? While that does make some sense, I wonder (sincerely, not merely trying to be an ass) what Codd would have to say about a column named "Cust#" in a table containing customer information? Would he insist that in order to get information about Adam Collins at 3446 N. Washington Ave, Oklahoma City, Oklahoma, 73516 that I ask the database for information about Adam Collins at 3446 N. Washington Ave, Oklahoma City, Oklahoma, 73516 instead of asking the database for information about Customer #4452? If not, why not?

Also, how exactly do stored procedures break the relational model? I've always understood them to be primarily and merely saved SQL statements.

He probably would be against the Cust# thing but in his genius he was a bit of a fanatic. But I have no problem with using a Customer Number that is actually part of the customer's data. If that customer number appears on his invoices... if the customer can identify himself to your company by that number then the datum is real and is a perfectly reasonable thing to use as a primary key. As Jeff has been pointing out, he is not against artificially created primary keys if they have meaning and are part of the business model. GL account numbers, Customer numbers, company account numbers, credit card numbers, SSNs, and EINs are all examples of these sorts of keys, but you also notice that they have taken on real meaning in the real world. What we are arguing against are number sequences that exist in the database and the application code that never reach the outside world and have no real reason to exist. They make the SQL meaningless, they make the application code meaning less. Since you basically have to create a key on the natural key in the DB anyway, to preserve the integity of the data, why waste DB resources creating and storing an extra row and indexing structures to support the artificial meaningless key.

Jeff S · 2005-11-06 Reply Admin

Anonymous:

That's asking a lot from a man who has demonstrated that he doesn't have the required depth.

Often,during a databate or disagreement, I find it helpful to stop and ask the other person if they can state my position. I'd be curious to hear your interpretation of mine, and why you feel that I have "clearly demonstrated" that I don't have "the required depth."

Jeff S · 2005-11-06 Reply Admin

UncleMidriff:

Also, most reasonable artificial primary key proponents, including myself, would probably agree with you that a table of U.S. States probably doesn't really need an artificial primary key, so using it as an example of why artificial primary keys are bad in general is somewhat lacking.

Once again, I will re-state, again, once more, another time, yet again, the problem is the many people here (and in the other thread) who claim that EVERY table ALWAYS must have an "ID", and that is ONLY way to have a PK or relations or to write joins. And, again, ironically, the US States table question was directly asked by myself (see the other thread), and to my amazement many people claimed (apparently, the were also serious) that of course a table of US States would have a StateID. (One guy claimed this way it could be expandable to eventually include internal information and such -- in your US States table!) That is why I go back to that.

If you agree with me that StateID is pointless to add to this table, then you can relax and take comfort knowing that none of comments are directed at you. In fact, if you read carefully any of what I have wrote, and you really know what a primary key is (see my post regarding many "experts'" definition of this) and you don't design your database striclty to avoid those "hard joins involving more than 1 column", then you really should know that the statements I am making are not directed at you.

Jeff S · 2005-11-06 Reply Admin

Jeff S:

Once again, I will re-state, again, once more, another time, yet again, the problem is the many people here (and in the other thread) who claim that EVERY table ALWAYS must have an "ID", and that is ONLY way to have a PK or relations or to write joins. And, again, ironically, the US States table question was directly asked by myself (see the other thread), and to my amazement many people claimed (apparently, the were also serious) that of course a table of US States would have a StateID. (One guy claimed this way it could be expandable to eventually include international information and such -- in your US States table!) That is why I go back to that.

oops -- see the correction in my previous post, above.

I really wish we had like even a small 1 minute window for corrections.

Jeff S · 2005-11-06 Reply Admin

UncleMidriff:

Now, what if Joe, Sue, Bubba, and Bob all live together and join your club. You'd have four rows in your address table all with the exact same data save for the Customer foreign key. While this would work well, and I can't forsee it causing any problems, it bugs me that we're storing the exact same address more than once. I might propose that we move all the address info to a separate table, append some kind of AddressID to it, and then add a reference to that AddressID to our first table. That way, all the same constraints are enforced, but instead of repeating the entire address 4 times, were only repeating the small AddressID value 4 times. What do you think? Have I gone off the deep end?

That's a great modelling question, and there's many ways to handle that, and it can be a good discussion. Unfortunately, I don't see where a single word I've written disallows this; that is very different from the storage of addresses scenerio I mentioned earlier. If each entity can have exactly one address, but some enties can share addresses, then indeed creating an address entity might be the way to go, and adding a surrogate key to that table would probably be a great idea, instead of having the key being all columns of course. The situation I refered to was where a single entity can have multiple addresses, and ways to model that situation.

I will repeat (yet again) that there are many situations in which surrogate keys are very useful.

Maybe I should just add that to my signature? It will save some typing.

RevMike · 2005-11-06 Reply Admin

UncleMidriff:
RevMike:
Ytram:
I have yet to see someone explain to me how using identities is NOT relational. Perhaps you can tell me without telling me to go read a book by Date.

It is relational, just not third normal. "The key, the whole key, and nothing but the key, so help me Codd."

If there is a true natural key and an artificial identity then your row fails the "nothing but the key" test and the design is not third normal.

Designers should be very careful, however, when choosing a natural candidate key as a primary key. Some things work in limited domains but fail easily has the system gets extended. For example, I once maintained a personnel system that used social security number as a primary key for person. It failed when we had to 1) add contract workers for whom we had no SSN and 2) we had guest employees who worked for a foreign sister company come to the US temporarily. Also there are a (fairly small) number of people who have been issued duplicate SSNs.

Can you give me a reference or an example explaining why the use of artificial primary keys breaks third normal form? I wasn't aware that this was so.

Please read this very closely. If the data already contains a true natural key, then adding an artificial key breaks 3NF (or perhaps BCNF, I don't remember which). A row should be uniquely identified by one and only one key.

Let me note here that I am not a purist of religious fervour. Those that want to create an artificial key when a composite key gets substantially complex have a good point. I wouldn't do it for a composite key of three columns, though. It would probably need to be six or seven columns before I'd make that compromise. Others have pointed out that joins on integer columns can be faster. I'd benchmark this for my platform (I actively work in DB2 on LUW, DB2 on OS/390, Sybase, and Oracle 8i through 10g) but the normal caution against premature optimization also applies.

RevMike · 2005-11-06 Reply Admin

RevMike:
UncleMidriff:

Can you give me a reference or an example explaining why the use of artificial primary keys breaks third normal form? I wasn't aware that this was so.

Please read this very closely....

I should have preef-read better because I didn't finish this thought.

I don't for one minute say that the use of artifical primary keys breaks third normal form. I do say that using an artificial primary key when there is a valid natural key violates third normal form. If there is no valid natural key then artificial keys are the right thing to do.

A Truly ID-iotic Design

Leave a comment on “A Truly ID-iotic Design”