- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Admin
I am not a DBA. I have not even studied databases academically. But I am forced to design some (small ones) occasionally. So, may I ask; isn't the question of how to specify the PK and how to tie tables together really two separate issues?
In this example, real world data (store number) occurred in two tables. This was the root of the problem. (Right?) The store table should have contained a row-index. And all other tables referring to that store should have been using that row-index. But, why does it have to be the PK.
Wouldn't it be better to aim for having natural PKs, but to not use any of that data outside that table?
Let's say a merger occurs. Other stores, with an overlapping naming scheme, needs to be brought into the database. Let's say an additional column, 'chain_name', is used to identify which brand each store belongs to. Could you then easily update your approach of having a unique constraint on the store number? (Compare this to just adding chain_name to the PK). Even if it is, it seems that the result would be that you are implementing the PK concept on columns which aren't while, having a PK which is really just an INC.
Admin
I'm slightly surprised that Rick wasn't worried that changing the primary key on the table would break foreign keys pointing to it. And if the PK is changed it's likely to mean that most joins involving the table need to be checked as well.
Admin
The TWTF is that nobody pointed out that the upgrade to the hardware was probably totally unnecessary in the first place. Given that the design of the database was clearly in need of attention the people responsible for maintaining the system should have asked an experienced DBA to look at the database design BEFORE attempting to fix it by poking around with the hardware of a DBSERVER. I have seen response times of over 30 seconds drop to a few milliseconds simply by getting the primary keys and indexes right - especially in Oracle.
Admin
I'm curious as to how you would design a table to hold Person information by this criterion?
As this is a real world application for a multicultural society, First Name/Surname/DOB must all be optional fields and the addresses must be one-to-many (history). Also, telephone numbers are not unique, are optional and sometimes shared, and there is no 3rd party unique identifier, such as social security number, in the county of implementation. E-mail addresses are sometimes shared and are optional. Oh, and names change often as well, due to half the client base being female and some of them getting married. etc.
Admin
They think relations ARE tables, that's the root of all SQL evil. The purpose of a primary key is to distinct set elements. And so on... I'd rather surprised vanilla graduates haven't been told that.
Admin
Where I come from, if you implement a change and it has side effects, even if they're impossible to plan for, or even because another application is incorrectly pointing at your web service even though they knew it was getting decommissioned like 3 years ago, it's your fault. Doesn't matter that the other app owner didn't fix it correctly, it's still your fault because your change caused something to break.
Now tell me that isn't screwy...
Admin
Talk about snatching defeat from the jaws of victory...
Not exactly. The upgrade was not the reason for the errors - merely the change in circumstances which allowed to bug to produce those errors. Like disease and symptoms. Just because you don't show any symptoms doesn't mean you don't have the disease...Admin
Which is presumably why Rick looked at it, said "What the fuck?" and knew exactly which site to send the story to.
Maybe there's a dailygoodpractices site where stories of sensible key construction are told but I've never looked for it.
Admin
Ob boy! The real WTF seems to be those comments about comparing different DBMS und data model design.
To answer a question that someone here asked: Yes, DB programmers still exist, I consider myself to be one. That said, I'm under the impression that we're a dying breed. Why bother with engineering a data model when you can use language x and framework y to put everything on the application layer (and hope for the best)?
The Oracle DBMS does provide sequences: unique numeric ids that can't have duplicates. Version 11g and onward doesn't even require triggers anymore, you can put them as default values for your columns. People, who claim otherwise, should consult the user manual.
As for the natural vs. synthetic PK debate: It's as old as data modeling and it'll never get solved. Those that advocate natural PKs either love writinge sql code (a lot of it) or have very basic data models which didn't even make it to a master <-> detail <-> detail relationships with additional lookup tables etc. Then they would realize that those joins they're writing are getting hard to read. Needless to say that if that natural PK is not as unique as hoped or can even be changed by the user, you'll end up in "update hell" - no problem when the used database is small (in size), big problem when you have to update millions (or even billions) of records.
BTW, don't name those colums only ID. Sooner or later you'll end up mixing IDs of different tables or end up joing them. One way to prevent this using a naming schema like id_
. So in the example with tables A and B in a master <-> detail relationship, you get:A: id_a (PK) B: id_b (PK), id_a (FK)
Even a model with additioal levels of details is straight forward:
C: id_c (PK), id_b (FK) D: id_d (PK), id_c (FK)
PK ... Primary Key, FK ... Foreign KEY
A join would take only one condition per relationship, just imagine how a natural key approach would like, where you would have to add at least one addional column for each level to ensure uniqueness.
Even someone who is not familiar with the data model should have no trouble writing those joins in the model I presented, I'm not sure that holds true for the natural key approach.
As for some questions regarding how to decide , all I can offer is a of thumb that has been working for me: Entities where any of their assumed unique attributes could get changed, even by correcting a typo from data entered by a user, should go straight to synthetic PK which should not have a meaning at all. In fact, the user should never even be aware of its existance. Natural PK works best for tables at the bottom of relationships, such as lookup tables, like currency, country. However, even there, chances are that you'll encounter problems. Famous examples are the conversion of several currencies into the EURO or the change of german zip codes from 4 to 5 digits after the German Reunification.
I've got only 20+ years of experience in the IT industry and 10+ years of DB programming, so I'm open for suggestions by people that have collected their share of experience, please don't hesitate to come forward and educate me/us.
l.
Admin
Gosh, guess we've just been lucky!
Admin
Well, MSSQL now has a hiearchy data type, so you could start with /adam, then /adam/cane (sic?) and /adam/abel and so on. Might need some bigger disks.
Admin
Reading the comments, I almost lost all hope. But after 3 pages I found comment from lofwyr. Thank you, sir.
(on the other hand - all those wtfy comments guarantee high quality wtf articles in the future yay!)
Admin
So glad to see that you contributed to this ongoing conversation with your very own witty and insightful comments...
golf clap
BTW, he reiterated pretty much what everybody else has been saying, albeit much better worded.
Admin
Admin
Everybody? after reading the comments I had the impression that I am a minority. Which could be one way to explain the neverending flow of WTFs.
l.
Admin
I would assume there was an underlying sequence and the trigger was just pulling NEXTVAL. It's about the only time I can think of where using a trigger is acceptable.
Admin
Bigger disks? Don't think so. We're talking about something like 60 centuries, at 3 generations per century, more or less - that'll be 180 generations. Call it 200. A first name takes, what ten characters? That's about 2K per Person for unique identification. Not great, but when $100 buys you a terabyte drive, retail, size isn't really the first thing that kills this deal.
Admin
Not screwy at all. There's a bug in the system which is either caused by, or has been uncovered by, activity which you have perpetrated. Therefore it is up to you to find out what its cause is. Even when you find that the underlying cause of the bug was not your direct fault, it is perfectly reasonable to request that you take the responsibility for ensuring that it be fixed.
However, if your change uncovers the bug before it gets to production, that's something else altogether. Then you are fully entitled to jump up and down and crow like a cockerel that you've found another of Billy Bodger's billion bugs and get lots more pats on the back from your admiring co-workers.
Admin
It's not actually as insane as it sounds. You make the auto-number field unique, but not the primary key. It can still be used for quick look ups from an index, but you make the primary key some combination of values from the row which must always be unique. It's essentially just there as a row ID (the ROWID pseudo-column will change if you export and then reimport the database).
We deal with racing data, our primary key for a race is the date of the race, the identity of the course and the race number. There can only be one race one at course foo on a given date. If anything ever tries to insert that race again it will choke with a unique key constraint violation, rather than creating a duplicate entry.
Admin
Fair enough - the non-biblical version might require a bit more storage though :-)
Admin
WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...
Admin
Probably it's this underying idea: too good performance of a database means unemployment for the dba.
Admin
Well, imagine a simple point-of-sale application with a scanner......
Admin
Sadly, there's a thought in database design that insists that you should never use an unique ID in a table. Strange thought since the double-entry bookkeeping system has been doing this for 500 years before there were any computers.
Admin
Admin
Admin
Alright, let's just say that the change triggered the failure. Or would it be a sequence?
Admin
This is very true. The returning clause is your friend and has been around for some time.
Triggers can really dog down bulk insertion (context switching between SQL and PL/SQL for each row... shiver), so proper use of bulk insert operations, bulk returns (if needed at all) and sequence caching is the way to go for high throughput
--
Also whoever had a CEO go to task to implement referential integrity has an awesome boss. Always amazes me when someone thinks they can do this better than the built in tools and then wonders why performance suffers and bugs arise.
Admin
accounting != general database use person != dbms
Admin
Apparently, some enterprises in South Korea require you to provide your ID number for some things.
Admin
I've noticed that a few times now. Whenever C-Octothorpe wants to legitimize a statement, he either states or implies that "everyone" or "security experts" believe it too.
Does he actually think that way?
Sure path to delusion.
Admin
SSNs are only given to one person at a time. When someone dies that SSN is thrown back into the "unused" pool and is handed out again at a later date.
Admin
Admin
Well, it works for me and everyone else...
I should have been more clear. What I meant was everyone who was talking about natural vs surrogate keys... Natural is generally shite, surrogate is good. I blame my ADD for my absolutist comments.
Admin
FTFY
Admin
[quote user="Kensey"][quote user="@Deprecated"]Well, in one important sense, the user was not right in what he probably assumed: It wasn't that the upgrade was the only cause of the problem, or was done incorrectly. It was merely that a valid configuration change in the hardware tickled a bug in the application.
If you switch from ordinary water to filtered water in an industrial system, and it causes the system to fail because mineral deposits were sealing cracks in the water lines that weren't supposed to be there in the first place, [b]I wouldn't say the water switch "caused" the failure[b/]. Rather, the failure was already there in the system and the previous state of affairs was masking it.[/quote]
Actually, that is exactly what you said.
Whether or not the mineral deposits were there intentionally (or the slowness in the system in this wtf story), these were part of a system which was functioning.
Correct testing in an environment similar to production would have caught the timestamp performance issue, and performing this testing is the responsibility of those making the current change.
Admin
Actually, that is exactly what you said.
Whether or not the mineral deposits were there intentionally (or the slowness in the system in this wtf story), these were part of a system which was functioning.
Correct testing in an environment similar to production would have caught the timestamp performance issue, and performing this testing is the responsibility of those making the current change.
(wow - quote fail)
Admin
[quote user="C-Octothorpe]Well, it works for me and everyone else...
I should have been more clear. What I meant was everyone who was talking about natural vs surrogate keys... Natural is generally shite, surrogate is good. I blame my ADD for my absolutist comments.[/quote]
Natural vs. Surrogate, anyone saying use one and not the other is wrong.
Surrogate keys are for the system to track precise records, auto-generated, incremental fields that mean nothing else to the data involved. Natural keys are for human readability and searching.
For example: You want to find a person, You search for John Smith in Virginia, your table anticipates this and has an index built on this so the search is fast and returns 5 records. The user looks at the details, picks the right one and chooses to edit. The system then saves the incremented id value so it knows exactly which john smith you are working with. This value is NEVER shown to the user because it has no meaning to them, it's meaning only exists to the system for uniquly identifying and linking records. You do not want to include all the details in your search to narrow it down because the human mind can make these decisions very fast, reduce thier typing by getting just enough info to narrow the search and display all matching records. Two indexes, one key, one surrogate, one natural, with no limit on how many natural key indecies you may want to create.
Admin
This. I've been in situations where strange side effects (read: something broke) happened from decomissioning old components and it was determined to be our fault because we didn't perform adequate testing.
The moral of the story is that if you're going to enter a china shop, make sure you don't break anything. Even if the person who was in there just before you thought it would be a good idea to precariously stack all the dishes right at the foot of door, it was you who actually broke the dishes...
Admin
[quote user="KattMan"][quote user="C-Octothorpe]Well, it works for me and everyone else...
I should have been more clear. What I meant was everyone who was talking about natural vs surrogate keys... Natural is generally shite, surrogate is good. I blame my ADD for my absolutist comments.[/quote]
Natural vs. Surrogate, anyone saying use one and not the other is wrong.
Surrogate keys are for the system to track precise records, auto-generated, incremental fields that mean nothing else to the data involved. Natural keys are for human readability and searching.
For example: You want to find a person, You search for John Smith in Virginia, your table anticipates this and has an index built on this so the search is fast and returns 5 records. The user looks at the details, picks the right one and chooses to edit. The system then saves the incremented id value so it knows exactly which john smith you are working with. This value is NEVER shown to the user because it has no meaning to them, it's meaning only exists to the system for uniquly identifying and linking records. You do not want to include all the details in your search to narrow it down because the human mind can make these decisions very fast, reduce thier typing by getting just enough info to narrow the search and display all matching records. Two indexes, one key, one surrogate, one natural, with no limit on how many natural key indecies you may want to create.[/quote]
That's what I said about 120 posts ago, but yeah, I totally agree with you. The problem is that some devs use them interchangeably. My general approach is what you said: have a meaningless surrogate PK (GUID or auto-increment) and then put unique contraints on certain fields, which are then used to search on entities.
Admin
Just to be clear, I didn't say that you should NEVER use a surrogate key. In this specific WTF, adding a unique sequence column to the existing primary key has the same effect as having no primary key at all.
That may be an acceptable fix when orders are failing, but someone needs to have a good look at the schema and figure-out what columns (if any) need to be in the primary key for that table. Right now, you've got something that looks like a primary key, but doesn't behave like a primary key.
Imagine this. It's a few weeks later, and a bug in the front end (or a user with a sticky keyboard) is submitting orders twice. The DB won't notice because the primary key is just there for show. Rick's customers (and his manager) won't be happy when they start getting invoiced for double of what they ordered.
Admin
That's the purpose of secondary keys (UNIQUE). Functional keys shouldn't be PKs, for various reasons already exposed:
although they are unique, they could be modified. It's functional data, so sometime it will be modified (wrong input from the user or whatever ERP is feeding your application).
your SPs are gonna look awful with your several line joins.
you shouldn't be waiting for your db to reject inputs. It's nice to have a guard so the coherence of your data doesn't get compromised by a future modification of the code, but you're not gonna tell the user who entered a duplicate "hey, it seems that you can't do that, cause my DB just rejected your data". That's a functional rule, so it should be in your business layer in the first place.
Admin
Yes. This, OMG, this. Listen to the man.
If you ever find yourself writing an update script to handle changes made to PK values, stop, back away from the keyboard and take a long, hard look at your DB design.
Admin
I'm not a DBA and my only experience is with SQL Server but doesn't having multiple fields in a primary key also make searches slower? Having more fields (and most likely some non integers) would mean having less records in the pages and extents (sorry if those are the wrong terms, it's been a while) and, therefore, more disk reads at the final node of the index tree?
Admin
As stated before, that's why you put a unique constraint on whatever information the business uses to identify an order.
But really, you shouldn't be relying on your DB to protect you from bugs in your business logic.
Admin
Completely agree with surrogate keys. Ming The Merciless (aka Joe Celko) rails mightily against them as cardinal crimes against the Great Godess Relational Purity, but, dare I say, in real-world applications they are the way to go for precisely the reasons you say.
Admin
If you are reading multiple records (perhaps you are doing a JOIN and have a bunch of foreign keys to dereference), it probably won't matter because after a handful of lookups you'll be in bookmark lookup hell and the DBMS will switch to a scan. If you aren't in bookmark lookup hell, then your primary key is clustered and it doesn't matter how long the key is.
Admin
Admin
Ideally (and let's emphasize the "ideal" here), your Primary Key should be how your company tells the various entities apart. An arbitrary key that doesn't relate back to the business risks duplication.
In practice, this isn't a massive deal - any company with sufficient suppliers will start handing out "vendor numbers" and "customer numbers" (to avoid having to try and tell two "Bob Smith"s apart). Ditto for products. The trick is to make sure that the system's key matches the implicit key that is already being used as much as possible.
The only exception I've seen is when the distinction is truly arbitrary. Order numbers, for example (the number doesn't mean anything other than acting as a reference to a specific order).
(I spend a lot of time doing database mining at work - a well-designed database is a pleasure to work with, while a badly designed one makes me wish the designer was within punching distance.)