- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Another one of my favorite examples. So, you use an identity to represent acount 1000 (say, "Cash"), instead of it's account number, "1000". So in your master Accounts table, the table that defines each account, the question is: does your database stop you from having multiple accounts numbered "1000"?
Hopefully, assuming you are just a little misguided about identities, have good data modelling skills, and are working hard to implement a good, solid, reliable and accurate database model, you have two answers to give:
* No, There can only be one account number 1000. We add a unique constraint on Account Number, of course! We want a reliable database with integrity.
or
* Yes, but that's because we have more than one company, so each company can have an account number 1000. So, we have a unique constraint on the combination of comany/account. This unqiue constraint allows us to have a reliable database with integrity so we know each company can only have each account number entered once.
In situation #1, the "No" answer, you have a useless identity as a primary key in addition to your unique constraint. So you have wasted valueable space that many claim to save by using an identity, and added an extra index as well. And now, of course, you cannot expand your table when you merge, since your constraint doesn't allow it. Nothing gained by using your indentity column in your "merger" scenerio. And every table that has an "AccountID" column requires a join just to display the AccountNumber, or to do a simple sort, that otherwise would not be necessary if that number was stored in your tables directly as a natural foreign key.
In situation #2, you are have the same redundant/useless primary key scenerio, and of course you have indicated what a better natural key of your table should be -- a composite of Company/Account. And guess what happens when you need to expand when you take over more companies !?? Your table will allow it either way. So, again, nothing gained by using your identity, and the difference is you have an extra, useless identity column and index to store your artifical key. In addition, any table that contains a reference back to your Accounts table can only an AccountID listed and no Company anywhere, right? (Your answer needs to be yes if you believe in normalization, otherwise you have redundant data). So if I want all transactions for a Company, I need extra joins -- horror, all those "hard joins" that we were trying to avoid -- just to list the transactions instead of filtering my transaction table directly on the company of the transaction.
The only way that identities "help" you in this situation is if you have a weak data model and wish to just keep stuffing random account numbers and values into your tables, because your identity will allow it, when you have that big merge. Not good, fellas. As I've mentioned before, literally *every day* at sqlteam.com we have to help people unravel messes like that.
Admin
"..do I really care..", "...not tonight, I don't ... "., "I don't want...", "...I like it...", "...stupidest thread EVAR..."
Based on those, uh, "highlights", I assume you are just trolling/goofing around, but if you really do want to have an intelligent discussion about your "cans of soup" scenario (which is a good one to think about, actually) let me know.
Admin
The 'boolean' is vital (perhaps not a great name) but really, you have to know the value of your bits.
My bits are quite valuable to me, but that doesn't mean it's a "yes or no" thing. I mean, if someone was going to take my life or my bits, I would have to make a judgement there; and if all I've got is "true" meaning "Yes, I value my bits," then I've got no context for making the decision between life and bits. Furthermore, I could never instruct a computer to make the same choice for me, other than to make random choice between everything that has a value of 'true'.
That's why every table should have a unique 'SortOrder' column.</blockquote>
I must commend you on the first person to make me snort in at least 6 months. I kept imagining a "Scotty" voice.
Admin
Yup. As Jeff indicated above, it's better if all your other data tables embedded the composite of Company/Account when referring to "that" table. Much better than using some artificial "index" or some such equally unique entry to mean the same thing. And you've saved the space you'd otherwise waste storing that artificial index.
Yup. Much better.
Admin
I appears you've missed a key point. The AccountId (identity) has no meaning other than to provide uniqueness; it never leaves the database (or near-db business code). Therefore, you're free to restructure the tables after the merger any way you like. You can reseed the identity, combine or break up tables, etc. By giving your PK to customers (in the form of AccountNumber), you lock yourself in to that design...you can't call all your customers up tell them "the next time you call customer service, tell them your AccountNumber is 1234 and your CompanyNumber is 5678".
Admin
The "space saving" argument is dubious.
To provide uniqueness, you would probably need a composite key. This key will likely comprise larger data types than ints, possibly large character columns. When used as an FK, all those attributes get migrated to the child tables.
If a single int is used, you save a lot of space.
Admin
Since that is 99% sarcasm, I can't tell if you are being serious or not, but hopefully you have understood that the "space saving" argument is one made by proponens of surrogate keys and that is simply addressing it. Saving space is fair less important than having data integrity.
Admin
You said a table of Accounts, and I took that to mean a table of GL Accounts, not a table of Customers. That, again, is a different story.
So, if no customers ever recieve that artificial key, it is for the DB only, then what do they tell you their account # is when they call up? What shows up on their invoice? You said we shouldn't give the "Real" PK to them, so what do you give them?
Admin
Damn. Missed perfection by 1%. But I am relieved to see that you recognized it. On this forum, one worries about such things.
The problem is that using the composite key (Company/Account) to refer to this record from other tables can actually threaten data integrity, rather than ensuring it. Because the individual columns of "Company" and "Account" have meaning outside the database, so their contents are "free" in other contexts within the database.
That's confusing. Let's consider an example.
You're building a table of transactions. And each transaction wants to refer to one (or maybe two, if you're doing double-entry accounting) "accounts". You can choose to (a) use an index internal to the database to refer to the company/account, or (ii) use the composite company/account. Except that under (ii) you can't ensure the company/account is unique, and you can't ensure it actually corresponds to an entry in the accounts table. Under (a) you can't really ensure the index has a corresponding entry, but because it's internal to the database, you can come a lot closer to ensuring that fact: because it's internal, nobody is tempted to make it visible except through a lookup in the "accounts" table.
If you're guaranteeing that company/account is unique in one table, you've got to ensure consistency across all tables, even where it's not unique (e.g. the transactions table I just discussed). But if you're storing company/account in those other tables, it's tempting to use that information directly and not go "through" the proper tables to ensure that uniqueness.
Admin
You seem to be confusing primary keys and foreign keys, and speaking about transaction tables and entity tables and going back and forth between the two. It helps to state clearly which aspects of a database design you are discussing.
This is the worst part about these discussions -- people who don't understand composite primary keys. Under (ii), assuming you are talking about the accounts table (not the transaction table), of course you are guaranteeing that company/account is unique if you do it properly -- it is the primary key! As for the transaction table, then, that references that Accounts table: again, it seems many people also don't undertand how foreign key constraints work, either. You can have more than 1 column participate in a foreign key constraint also! This is definitely new stuff to lots of people who think that a pk = an identity and nothing else, or that a foreign key can only rerfence a single identity column and nothing else.
So, as you said, "it's tempting to use that information directly if you store company/account in each table" -- OF COURSE! That is, again, the point of a transaction table. If the transaction is booked to a combination of acompany/account, then that is what is should contain -- the company and the account it is booked to. If that combo is not valid, the fk constraint fails. If it is valid, it works and now you don't have to use a join on some silly AccountID just to determine which account number or company that each transaction is booked to.
If you mean something else, then an example might be useful.
Admin
And, again, John, you are missing the point. As we said -- we want data integrity. We must make this assumption during this debate, otherwise it is over, I win and you lose, since my data is accurate, yours is not, and it doesn't matter which is "easier to use" or "has less hard joins to write" or "uses less space".
If you agree that we are going to build a database that is modelled to be accurate and have full integrity, then you must add a unique constraint on your combination of company/account in addiition to your extra identity "primary key". So, in effect, you are using more space by adding your extra identity column.
Admin
So, if I have a table with just unique email addresses, doesn't it make sense to make an artificial key if I ever want to do any foreign-key-ing?
Say I have another table, which has record which relate to email addresses, surely it'd take up less room to just have an integer pointing to the index of the email address table, rather than have the whole email address? Especially if there are 4000 records for one email address?
Sure, natural keying might be "neater" on the email address table, if I've got other properties like name, address in there, but when I come to foreign key that, isn't it quick and more efficient to have an artificial index key thingie?
Admin
No, it's not confusing, merely confused. Data is just data as far as a schema is concerned - meaning is irrelevant. That's what presentation / interpretation / datawarehousing is for.
Foreign Keys anyone? Data integrity is incredibly important for commercial data - it is not something you want to "come a lot closer to" - its something you simply enforce. But, giiven that you find it hard to enforce integrity on your own numbering system, I can only imagine how hard you find it with actual data.
Admin
Take a look at the following excerpt.
http://www.acm.org/classics/nov95/s1p4.html
Please tell me: what is a man#? Who assigns it and maintains it? How is it a natural data attribute of a person?
Admin
Again, and I apologize for constaintly repeating myself, can you have more than 1 entry for the same email address in your "master" table of email addresses? An identity as the PK allows this unless you add a unique constraint on email address.
And, again, as I've said over and over there are many situations where you might want to add a surrogate key, and identities are great for those cases. It's not one way or the other in all cases. It is just scary to read some of these comments, where natural keys are a "last resort" or "should never be used" or people who don't understand the difference between a primary key and an identity.
I would estimate, if you poll 100 database "experts" out there and ask them to describe a primary key, 50 of them will say "it's a unique autonumber of some sort you add to a table", 40 will say "it is the column that uniquely identifies each row in your table", and probably only 10 will say (the correct answer) "it is the set of columns in your table that uniquely identifies each row." Far too many database experts out the have never joined on more than 1 column between two tables, or never created a multi-column fk constraint, and not only don't know how, they don't know that they even can!
Admin
Some days it's just not worth explaining things to people. They just don't get it. They think that data integrity is guaranteed because of good design, that things can never go wrong if only some simple rules are followed. And they never understand when something lighthearted has been injected into the discussion: their comprehension doesn't run that deep.
It makes me want to scream. It makes me want to call them names, but they won't recognize the truth there, either.
Data integrity is never guaranteed by good design. But it is almost impossible with bad design.
We've discussed various aspects of bad design. And yes, foreign keys are a valuable (if performance reducing) tool which are available.
But let's talk about another aspect of bad design: cross-populating multiple tables with keys based on data that you don't own.
Some have advocated vociferously for embedding a company's name across multiple tables, as part of the best possible key to use. Except that they forget that the real world works on a multidimensional "best", instead of their unidimensional theoretical best.
They forget that companies change name. They get absorbed into their parent holding companies, spun out from them, and acquired by other companies. Each time, a decision is made as to whether it's a new relationship or the same relationship under a different name. And their data is screwed because they used data they don't own all over the place.
Admin
Again (boy I am tired of starting every post with "Again") there is nothing wrong with deciding to assign a number to an entity upon creation, or to use that number as the primary key in your table, if that is part of your business process. And, in fact, you might even choose to use an identity column to do this! Why not, that's what it does, right? If your business process states that creating a unique code per customer is useful, then do so. If that business process states that a good time to assign that code is upon entry into the database, then there you ! And guess what? In that case, that identity is a NATURAL key ! That's right, "anti-natural key" folks, some of your identities out there are NATURAL KEYS! The horror! Quick, time for a re-design! This cannot be right! Our keys MUST be meaningless, the Access 97 "new table wizard" told me so!
Guys and gals, long before relational databases existed, people have recognized the advantage of assigning numeric codes to entities to keep things short and simple, for tagging items, customer reference codes, printing on invoices, etc. A numeric code being assigned to an entity as part of your business process NOT the same as tagging every row of every table with a meaningless, random ID, and then using that meaningless, random ID to completely definie all relations in all tables throughout your system!
Admin
Hmmmm ... I don't recall anyone hear saying that a good choice for a key would be an external customer name for a table ....
If you want to join the debate, scream a little, let it out, and then start addressing the actual points raised in the posts and don't make up your own.
However, it does give me a good chance to address another of my favorite "anti-natural key" reasons: Because external company names can change! Or people can get married! Therefore, you should never use natural keys because your system becomes a mess!
OK, let's tackle that one ... the premise is "an external company name might change" and the conclusion is "therefore, never use natural keys". Hmmm .. I might rework this a little and say " an external company name might change" with a conclusion of "therefore, don't use an external company name as a primary key". Hmmm .. Makes a little more sense, right? now we are not jumping to absolute conclusions, right?
So, here's what we have learned: Don't use external data that can change as your PK. OK, done. Now, how does that tell me that natural keys are bad? Based on that premise, have you proved me to that natural keys are bad and should never be used, and that all tables must have random, meaningless ID columns added?
Admin
Jeff,
I think we agree that not every single table needs a numeric ID column. But I would argue that the reason many "experts" fall on such a practice is that, in the large majority of business applications, entities like "Parts", "Orders", and "Employees" all commonly have an otherwise meaningless ID number assigned to them. It is absolutely unnecessary to have an ID in a relationship table, e.g.:
I don't think anyone here disagrees. What I think is misleading is people conflating your argument with the argument that "No entity should ever have a meaningless identity number!"
Admin
You are probably right. I tried to state over and over that surrogate keys have their place, but people read/understand what they want. The issue is those people who don't understand the difference between a pk and an identity, those that think "joins are too hard" w/o identities and thefore always use them, and those in the camp that you should NEVER have a natural key or anything other than an identity. Unfortunately (and unlike the assumption many are making about me always being against surrogates) those statements have been made, many times and repeated over and over, and that is the worst part.
Admin
Here come a bunch of posts in response to some of the other posts in this thread. Sorry for the post blitz:
I have read that thread...and reread it and thought about it and read it again, etc. You (and Alex) made smug comments that pissed me off in that thread as well. And while I don't remember the specific content of each of your posts in that thread, I did read them, and I do remember not being convinced. Maybe if I have time later I'll go back and read it again, but be sure that I have read the entire thread at least twice..
Admin
First of all, I'd like to apologize for the flame-y-ness of my post. I was already hacked off about something at work and I posted without remembering that every freaking time I make a flame-o-rific post, I end up regretting it. Anyway...sorry.
What insults me so much is that I am being told that the way I design databases, using artificial primary keys, is a) fundamentally incorrect, b) obviously stupid, c) will surely result catastrophic failure and financial ruin for all parties involved at some point in the future (or at least such catastrophe has happened in the past and artificial keys were obviously to blame). Being relatively fresh out of college, this scares me to death, especially coming from people like Alex and yourself, who are much more experienced than I am.
Then, when I press for an explanation as to why using artificial primary keys is fundamentally incorrect, obviously stupid, and will result is disaster, I get a handful of examples of really crappy databases that used artificial primary keys and eventually blew up in the faces of everyone involved. Essentially, all that those examples tell me is that a really crappy database that uses artificial primary keys is a really crappy database; they don't explain how artifical primary keys, own their very own, make a database crappy.
At this point I should point out that you've done a better job of steering clear of this kind of thing than most of the other natural key proponents on this site. Although you do mention that you've had to rescue artificial primary key proponents over at sqlteam.com, you also at least try to give some -good- examples of when/why artificial primary keys are bad. None of them have convinced me yet, but at least you've tried.
Anyway, I'm told by you guys, that have more experience than I do, that the way I'm doings things is horribly wrong. My experience, education, and discussions with other people more experienced than myself disagree, so I ask for clarification. What I get is mostly scary stories and smugness. I find it insulting when people tell me what I do sucks and then can't or won't give a good reason as to why.
And now, for a bit of clarification:
I like artificial primary keys a lot.
I do not thnk every database table in existence should use them.
I do not think natural primary keys are "teh OMG Evils!11!"
I do not think that "an identity and a primary key are the same thing"
Admin
OK, then, read up in this thread in which I made more sincere efforts to clear things up, and feel free to join the debate. I hope you can appreciate my honest effort not to preach but to hopefully help others to improve their database knowledge.
Admin
Can you give me a reference or an example explaining why the use of artificial primary keys breaks third normal form? I wasn't aware that this was so.
Admin
I'm not sure what to make of your post. The post of mine that you quoted was directed at Jeff S. and at him alone. Also, based on the rest of your posts in this thread, I'd say we agree with each other. Sorry for the confusion.
Admin
I think we understand that just fine. And I know I wouldn't be opposed to you refuting my stance that artificial primary keys are A-OK with something like, "No, Uncle Midriff, you're wrong. See, here in "An Introduction to Database Systems, Eighth Edition [Hardcover] by Date, C.J.", chapter 6, he says that artificial primary keys aren't valid in the relational database model because..."
But that's not what we get from you. Instead, we get something like, "No, you're wrong. Artificial primary keys aren't valid in the relational database model. Date, Codd and all the books on the subject I've read agree. Trust me."
And now, for some more clarification:
I understand that some folks consider identities in the various relational database management systems in existence to be non-relational because, in some of those systems, once you specify that a column is an identity column it can't be changed. That makes at least a small amount of sense to me, but I don't think it has much to say about the use of artificial primary keys in general.
Admin
I'd like to thank you for your well thought out post, and I'd also like a little bit of clarification if you have the time.
Can you give me any sort of reference to support the claim that Codd insists that if I need information about purple elephants I should ask the database for information about purple elephants and not for information about entity #123? While that does make some sense, I wonder (sincerely, not merely trying to be an ass) what Codd would have to say about a column named "Cust#" in a table containing customer information? Would he insist that in order to get information about Adam Collins at 3446 N. Washington Ave, Oklahoma City, Oklahoma, 73516 that I ask the database for information about Adam Collins at 3446 N. Washington Ave, Oklahoma City, Oklahoma, 73516 instead of asking the database for information about Customer #4452? If not, why not?
Also, how exactly do stored procedures break the relational model? I've always understood them to be primarily and merely saved SQL statements.
Admin
In a table of U.S. States, State name and State abbreviation are both candidate keys and therefore both need uniqueness constraints, regardless of whether or not you decide to append an artificial primary key to the table. So, adding an artificial primary key to the table adds exactly one column and exactly one constraint. Your statement makes it seem that adding an artificial primary key to the table requires the addition of unnecessary uniqueness constraints to the other columns in the table, and that is incorrect.
Also, most reasonable artificial primary key proponents, including myself, would probably agree with you that a table of U.S. States probably doesn't really need an artificial primary key, so using it as an example of why artificial primary keys are bad in general is somewhat lacking.
You can and should do all that even if you're using artificial primary keys. Just because I tack an AddressID column onto an address table to make life simpler doesn't mean I shouldn't or can't enforce the uniquness constraints you specify.
Now, what if Joe, Sue, Bubba, and Bob all live together and join your club. You'd have four rows in your address table all with the exact same data save for the Customer foreign key. While this would work well, and I can't forsee it causing any problems, it bugs me that we're storing the exact same address more than once. I might propose that we move all the address info to a separate table, append some kind of AddressID to it, and then add a reference to that AddressID to our first table. That way, all the same constraints are enforced, but instead of repeating the entire address 4 times, were only repeating the small AddressID value 4 times. What do you think? Have I gone off the deep end?
Admin
I'd answer with the first option, until a merger happened, as it is correct until such a time. Then when a merger happened, I'd add a Company column to the accounts table and change the uniqueness constraint to include the new column. Then I'd say, "Done!"
If I understand you correctly, when a merger happened, you'd add a Company column to the accounts table, change the primary key on the table to include the new column, and then change each and every row in each and every table that references the accounts table to include the Company column in the foreign key.
Each method has it advantages and disadvantages. I like my method better than yours is all I'm sayin'.
There you go again, acting like only no-talent ass-clowns would ever consider using artificial primary keys.
Admin
Why?
How does a randomly generated number from a business rule differ so much from a randomly generated number from the developer? Why is the former ok, but the later magically breaks the relational model into a billion pieces, makes both Codd and Date cry bitter tears, and makes your entire business explode?
</hyperbole>
Admin
And so ends the post blitz. Sorry, but I didn't have time to respond -and- make sure my posts were ready to be posted before just now, so when I finally got some time I decided to post them all at once. If this upset any of you, please feel free to insult my mother.
And Jeff, thanks for taking the time you have taken to better explain your position. Some of the first posts I just put up probably sound a little more antagonistic than they should, given some of the things you've said in this thread between when I wrote them and posted them. Sorry for that. Based on your last few posts, you seem more reasonable to me now than you did when I was writing the first few posts.
Admin
oh cmon! this is a real WTF.. been visiting this site for months now and still can' belive it
does that person know that there a way to get date infos on every Database system??!!!
its surely not standard
Admin
Time to rewrite the isTrue function:
function isTrue($x) {
$db = open_connection('oracledb','tWTF','1234');
exec_sql($db, 'CREATE TABLE temp ( tempval int ) AS SELECT COUNT(*) FROM tBoolean WHERE tBoolean.BooleanID = '.$x.' AND t.BooleanName = 'Yes' );
$true = exec_sql($db, 'SELECT * FROM temp').row[0][0];
exec_sql($db,'DROP TABLE temp');
if (!isFalse($true)) {
return $true;
} else {
return $false;
}
close_connection($db);
}
Admin
We've gone through this whole discussion before and IMO it's more about religious believe than anything else. I must admit that I'm shocked by the amount of premature arrogance shown here.
Admin
There is a little zealotry here, but I think it's more "theory" vs. "the real world".
Now, I'm big on theory, and I fully appreciate that the application of theory helps us in engineering. That said, sometimes theory cannot be applied to the letter because of real world constraints.
I would love to design a database on top of solid and immutable business rules. So far, I have never had this luxury...not even close. Businesses change (rapidly) and an agile database make life much easier.
I believe anyone can fuck up a DB design whether they use natural keys or surrogate keys. Care must be taken ensure integrity no matter what. I use natural keys where they make sense, but surrogate identities where there is no concise natural key I can depend on not changing. I add alternate keys and inversion entries where appropriate, and enforce FKs with constraints. I'm concerned about scalability & performance (in that order) but not really about space because that's cheap.
Jeff: I appreciate your arguments, but instead of repeatedly beating the same dead horse (and offending everybody by implying they have a lack of knowledge or skill), why don't you try seeing it from our point of view, and speak to pragmatics, not theory.
Admin
Now, that is pretty different from your earlier comments, and 100% agrees with every post I've made here.
Each example I've given, each argument I've refuted, has absolutely nothing to do with theory. Every one has been a real-world scenerio, I don't think I've used "tuples" or "widgets" or anything purely theoretical. Honestly, believe or not, I don't care much for theory. My goal is all practical: modelling accurate data, planing for the future, ensuring RI, and so on. Now, some call those goals "theoretical" and "unattainable", but I've had pretty good success with it and it's pretty easy once you get into the habit of taking the time to plan ahead and come up with a good solid model. Which, of course, often includes surrogates where appropriate.
I don't choose natural keys or surrogate keys because of a rule, or because some one told me to, or because I want to look smart. I use the best key where appropriate after because they help contribute to an intelligent db design. People who don't do this, and blindly follow rules or guidelinees w/o thinking intelligently about their design (i.e., just add an ID to each table!) are the ones I am speaking to, so if you are not in that camp you have no reason to be offended.
Admin
This thread has made me wince several times.
So much confusion over logical versus physical...
I’ll say up front I lean more to the “natural” side of things simple because I spend most of my time modelling databases. At this level there is no such thing as “identity” properties or “autonumbering”. just attributes wanting a nice safe place to live….. “Won’t someone think of the non-key attributes!.” :-)
When it comes time for my beautiful models to be “implemented”, they ALWAYS get mangled. Theory is forced to live with reality… Artificial keys appear and other weird by-products of the SQLisation.process.
It is not really OK, but I can cope. Why? Because as long as I have been able to exactly duplicate my logical model’s constraints and purpose I don’t really care.
“We can’t create this ternary junction table because it exceeds the key limit.” “Damn! We’ll have to chuck a surrogate candidate key in back here.. Damn you SQL, damn you all to hell!”
Admin
Well, now you've gone and asked about the absolute case again, with "never" and "all". Let me back up to the part that I fully agree with, and respond from there:
"Don't use external data that can change as your PK."
Corollaries:
Suppose you have a table of customers. There is a column CustID (varchar) whose value is chosen by a human being, which must be unique, which is printed on statements and other things. There is a column CustKey (int) whose value is determined by a standard auto-increment routine, which must be unique, which is never printed on anything (used only for joins).
The original post is not really a key argument at all, but rather "examples of data which doesn't even need its own table". Unless (as someone else pointed out) you need to crunch such a huge amount of data that encoding, say, weekend/holiday rules in a table is actually necessary for efficiency.
Admin
No, I've been pretty consistent:
I use natural keys far less frequently than surrogate keys.
And I do it simply to look smart.
Admin
I am trying to fathom a model that has a “meaningless” key and yet that same key is meant to be the backbone for dependencies and relationships.
Perhaps what you meant to say (and I hope) is this…
“After identifying and IMPLEMENTING the natural key, its suitability for us in RI is evaluated and usual quickly rejected. We find that adding a meaningless candidate key of small physical size improves certain sql operations and seems to be easier to maintain from a cascading RI point of view.”
Admin
A-frikkin-men. Best post I've read so far on this topic.
Databases should be conceptually sewn-up - but in practice it is rarely so. Hands up here whose had to work on tables with 90+ columns? I've seen bizareness to the extent of columns called SUBSCRIPTION on an 'account' table, closely followed by columns SUBSCRIPTION_2001, and SUBSCRIPTION_2002 (none after that - clearly they employed someone in 2003 with a bit of sense). Databases, even in big, important companies, sometimes have scarily arranged databases. In those situations I view 'arbitrary' PKs as the way out of the hell that's been created. Yes - maybe in the perfect world we don't need them, but I for one don't work there...
Admin
This kind of retarded schema makes it virtually impossible to historical queries, eg date BETWEEN start AND end
Also, to the person complaining that changing their PK requires lots of changes, if your data schema allows/requires changes to your PK's then you should be posting it here as a WTF in itself.
Admin
That's asking a lot from a man who has demonstrated that he doesn't have the required depth.
Admin
He probably would be against the Cust# thing but in his genius he was a bit of a fanatic. But I have no problem with using a Customer Number that is actually part of the customer's data. If that customer number appears on his invoices... if the customer can identify himself to your company by that number then the datum is real and is a perfectly reasonable thing to use as a primary key. As Jeff has been pointing out, he is not against artificially created primary keys if they have meaning and are part of the business model. GL account numbers, Customer numbers, company account numbers, credit card numbers, SSNs, and EINs are all examples of these sorts of keys, but you also notice that they have taken on real meaning in the real world. What we are arguing against are number sequences that exist in the database and the application code that never reach the outside world and have no real reason to exist. They make the SQL meaningless, they make the application code meaning less. Since you basically have to create a key on the natural key in the DB anyway, to preserve the integity of the data, why waste DB resources creating and storing an extra row and indexing structures to support the artificial meaningless key.
Admin
Often,during a databate or disagreement, I find it helpful to stop and ask the other person if they can state my position. I'd be curious to hear your interpretation of mine, and why you feel that I have "clearly demonstrated" that I don't have "the required depth."
Admin
Once again, I will re-state, again, once more, another time, yet again, the problem is the many people here (and in the other thread) who claim that EVERY table ALWAYS must have an "ID", and that is ONLY way to have a PK or relations or to write joins. And, again, ironically, the US States table question was directly asked by myself (see the other thread), and to my amazement many people claimed (apparently, the were also serious) that of course a table of US States would have a StateID. (One guy claimed this way it could be expandable to eventually include internal information and such -- in your US States table!) That is why I go back to that.
If you agree with me that StateID is pointless to add to this table, then you can relax and take comfort knowing that none of comments are directed at you. In fact, if you read carefully any of what I have wrote, and you really know what a primary key is (see my post regarding many "experts'" definition of this) and you don't design your database striclty to avoid those "hard joins involving more than 1 column", then you really should know that the statements I am making are not directed at you.
Admin
oops -- see the correction in my previous post, above.
I really wish we had like even a small 1 minute window for corrections.
Admin
That's a great modelling question, and there's many ways to handle that, and it can be a good discussion. Unfortunately, I don't see where a single word I've written disallows this; that is very different from the storage of addresses scenerio I mentioned earlier. If each entity can have exactly one address, but some enties can share addresses, then indeed creating an address entity might be the way to go, and adding a surrogate key to that table would probably be a great idea, instead of having the key being all columns of course. The situation I refered to was where a single entity can have multiple addresses, and ways to model that situation.
I will repeat (yet again) that there are many situations in which surrogate keys are very useful.
Maybe I should just add that to my signature? It will save some typing.
Admin
Please read this very closely. If the data already contains a true natural key, then adding an artificial key breaks 3NF (or perhaps BCNF, I don't remember which). A row should be uniquely identified by one and only one key.
Let me note here that I am not a purist of religious fervour. Those that want to create an artificial key when a composite key gets substantially complex have a good point. I wouldn't do it for a composite key of three columns, though. It would probably need to be six or seven columns before I'd make that compromise. Others have pointed out that joins on integer columns can be faster. I'd benchmark this for my platform (I actively work in DB2 on LUW, DB2 on OS/390, Sybase, and Oracle 8i through 10g) but the normal caution against premature optimization also applies.
Admin
I should have preef-read better because I didn't finish this thought.
I don't for one minute say that the use of artifical primary keys breaks third normal form. I do say that using an artificial primary key when there is a valid natural key violates third normal form. If there is no valid natural key then artificial keys are the right thing to do.