The Daily WTF: Curious Perversions in Information Technology

TGV · 2011-06-29 Reply Admin

dkallen:
The obvious solution is to catch the error, loop until SYSDATE changes, then retry. The loop should go very fast on the new hardware.

You really don't know about db design, do you, especially about Oracle? The solution *obviously* is to change the type from DATE into TIMESTAMP(9), which gives you nanosecond resolution!

2011-06-29 Reply Admin

Databastard:
anon:
Well. There are people who thinks that natural keys should be used
And those people would be Wrong.
Remy Porter:
Your primary key should be a unique identifier for your _entity_. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.
I worked for a large nationwide retail chain once. Primary key for the store table was store number -- naturally. In the inventory table, which told us how many of each item was on hand in each store, there was a foreign key to store number.
A small store typically stocks, at one time or another throughout its life, at least 100,000 items. Larger stores have a million or more.

A million items times a thousand stores = a billion rows in the inventory table.

The first digit of the store number identified which distribution center it got its merchandise from. Oops!! Danger! Danger! There is meaningful information stored in the key!!!

Sure enough, the business grew (dammit) and we had to open a new distribution center.

Store numbers all had to change.

A billion rows had to be updated with their new store numbers. That's not just a billion writes. Each one has to go through the foreign key constraint logic, and you have to change the store number in the same transaction as you change the million rows that are children of that store. Needless to say the corporate database was effectively "down" for a while.

All because somebody was ignorant of the basic idea that a key should never hold any meaningful information.

If you are worried about accidentally creating duplicate store records, put a unique index on the store number even though it is not the key. Duplicates can't be created, but when a store number changes it is just one update to one row. The relationships still hold, and nothing else needs to be updated.

Now, who wants to talk about using GUIDs for row IDs instead of a sequential number?

I am not a DBA. I have not even studied databases academically. But I am forced to design some (small ones) occasionally. So, may I ask; isn't the question of how to specify the PK and how to tie tables together really two separate issues?

In this example, real world data (store number) occurred in two tables. This was the root of the problem. (Right?) The store table should have contained a row-index. And all other tables referring to that store should have been using that row-index. But, why does it have to be the PK.

Wouldn't it be better to aim for having natural PKs, but to not use any of that data outside that table?

Let's say a merger occurs. Other stores, with an overlapping naming scheme, needs to be brought into the database. Let's say an additional column, 'chain_name', is used to identify which brand each store belongs to. Could you then easily update your approach of having a unique constraint on the store number? (Compare this to just adding chain_name to the PK). Even if it is, it seems that the result would be that you are implementing the PK concept on columns which aren't while, having a PK which is really just an INC.

pjt33 · 2011-06-29 Reply Admin

I'm slightly surprised that Rick wasn't worried that changing the primary key on the table would break foreign keys pointing to it. And if the PK is changed it's likely to mean that most joins involving the table need to be checked as well.

2011-06-29 Reply Admin

The TWTF is that nobody pointed out that the upgrade to the hardware was probably totally unnecessary in the first place. Given that the design of the database was clearly in need of attention the people responsible for maintaining the system should have asked an experienced DBA to look at the database design BEFORE attempting to fix it by poking around with the hardware of a DBSERVER. I have seen response times of over 30 seconds drop to a few milliseconds simply by getting the primary keys and indexes right - especially in Oracle.

2011-06-29 Reply Admin

Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

I'm curious as to how you would design a table to hold Person information by this criterion?

As this is a real world application for a multicultural society, First Name/Surname/DOB must all be optional fields and the addresses must be one-to-many (history). Also, telephone numbers are not unique, are optional and sometimes shared, and there is no 3rd party unique identifier, such as social security number, in the county of implementation. E-mail addresses are sometimes shared and are optional. Oh, and names change often as well, due to half the client base being female and some of them getting married. etc.

2011-06-29 Reply Admin

They think relations ARE tables, that's the root of all SQL evil. The purpose of a primary key is to distinct set elements. And so on... I'd rather surprised vanilla graduates haven't been told that.

C-Octothorpe · 2011-06-29 Reply Admin

AndrewH:
"Well," Rick sighed, "the user was right."

Not really. The upgrade just made the error more likely to occur; it exposed the problem, it didn't cause it.

Where I come from, if you implement a change and it has side effects, even if they're impossible to plan for, or even because another application is incorrectly pointing at your web service even though they knew it was getting decommissioned like 3 years ago, it's your fault. Doesn't matter that the other app owner didn't fix it correctly, it's still your fault because your change caused something to break.

Now tell me that isn't screwy...

2011-06-29 Reply Admin

Talk about snatching defeat from the jaws of victory...

"Well," Rick sighed, "the user was right."

Not exactly. The upgrade was not the reason for the errors - merely the change in circumstances which allowed to bug to produce those errors. Like disease and symptoms. Just because you don't show any symptoms doesn't mean you don't have the disease...

2011-06-29 Reply Admin

BentFranklin:
Unique, auto-numbered, never null... Shouldn't TRXID have been the primary key all along?

Which is presumably why Rick looked at it, said "What the fuck?" and knew exactly which site to send the story to.

Maybe there's a dailygoodpractices site where stories of sensible key construction are told but I've never looked for it.

2011-06-29 Reply Admin

Ob boy! The real WTF seems to be those comments about comparing different DBMS und data model design.

To answer a question that someone here asked: Yes, DB programmers still exist, I consider myself to be one. That said, I'm under the impression that we're a dying breed. Why bother with engineering a data model when you can use language x and framework y to put everything on the application layer (and hope for the best)?

The Oracle DBMS does provide sequences: unique numeric ids that can't have duplicates. Version 11g and onward doesn't even require triggers anymore, you can put them as default values for your columns. People, who claim otherwise, should consult the user manual.

As for the natural vs. synthetic PK debate: It's as old as data modeling and it'll never get solved. Those that advocate natural PKs either love writinge sql code (a lot of it) or have very basic data models which didn't even make it to a master <-> detail <-> detail relationships with additional lookup tables etc. Then they would realize that those joins they're writing are getting hard to read. Needless to say that if that natural PK is not as unique as hoped or can even be changed by the user, you'll end up in "update hell" - no problem when the used database is small (in size), big problem when you have to update millions (or even billions) of records.

BTW, don't name those colums only ID. Sooner or later you'll end up mixing IDs of different tables or end up joing them. One way to prevent this using a naming schema like id_

. So in the example with tables A and B in a master <-> detail relationship, you get:

A: id_a (PK) B: id_b (PK), id_a (FK)

Even a model with additioal levels of details is straight forward:

C: id_c (PK), id_b (FK) D: id_d (PK), id_c (FK)

PK ... Primary Key, FK ... Foreign KEY

A join would take only one condition per relationship, just imagine how a natural key approach would like, where you would have to add at least one addional column for each level to ensure uniqueness.

Even someone who is not familiar with the data model should have no trouble writing those joins in the model I presented, I'm not sure that holds true for the natural key approach.

As for some questions regarding how to decide , all I can offer is a of thumb that has been working for me: Entities where any of their assumed unique attributes could get changed, even by correcting a typo from data entered by a user, should go straight to synthetic PK which should not have a meaning at all. In fact, the user should never even be aware of its existance. Natural PK works best for tables at the bottom of relationships, such as lookup tables, like currency, country. However, even there, chances are that you'll encounter problems. Famous examples are the conversion of several currencies into the EURO or the change of german zip codes from 4 to 5 digits after the German Reunification.

I've got only 20+ years of experience in the IT industry and 10+ years of DB programming, so I'm open for suggestions by people that have collected their share of experience, please don't hesitate to come forward and educate me/us.

l.

2011-06-29 Reply Admin

Meep:
Surrogate keys don't provide the mechanisms to ensure that 1:1 correspondence between your records and the things you are recording. Without it, your database is wrong, and if it becomes even modestly complex, you can't predict how wrong any queries based on that data will be.

Gosh, guess we've just been lucky!

2011-06-29 Reply Admin

Darth Paul:
I'm curious as to how you would design a table to hold Person information by this criterion?

Well, MSSQL now has a hiearchy data type, so you could start with /adam, then /adam/cane (sic?) and /adam/abel and so on. Might need some bigger disks.

2011-06-29 Reply Admin

Reading the comments, I almost lost all hope. But after 3 pages I found comment from lofwyr. Thank you, sir.

(on the other hand - all those wtfy comments guarantee high quality wtf articles in the future yay!)

C-Octothorpe · 2011-06-29 Reply Admin

MaR:
Reading the comments, I almost lost all hope. But after 3 pages I found comment from lofwyr. Thank you, sir.
(on the other hand - all those wtfy comments guarantee high quality wtf articles in the future yay!)

So glad to see that you contributed to this ongoing conversation with your very own witty and insightful comments...

golf clap

BTW, he reiterated pretty much what everybody else has been saying, albeit much better worded.

WthyrBendragon · 2011-06-29 Reply Admin

"Well," Rick sighed, "the user was right."

Technically, the user was only partially right. The upgrade was the trigger to exposing the problem but the true root cause was the misuse of a timestamp as a supposedly unique value for a multi-user POS system.

2011-06-29 Reply Admin

C-Octothorpe:
MaR:
Reading the comments, I almost lost all hope. But after 3 pages I found comment from lofwyr. Thank you, sir.
(on the other hand - all those wtfy comments guarantee high quality wtf articles in the future yay!)

So glad to see that you contributed to this ongoing conversation with your very own witty and insightful comments...

golf clap

BTW, he reiterated pretty much what everybody else has been saying, albeit much better worded.

Everybody? after reading the comments I had the impression that I am a minority. Which could be one way to explain the neverending flow of WTFs.

l.

2011-06-29 Reply Admin

Owen Two:
...populated by a trigger
Should have been populated by a sequence.

I would assume there was an underlying sequence and the trigger was just pulling NEXTVAL. It's about the only time I can think of where using a trigger is acceptable.

2011-06-29 Reply Admin

Kev:
Darth Paul:
I'm curious as to how you would design a table to hold Person information by this criterion?

Well, MSSQL now has a hiearchy data type, so you could start with /adam, then /adam/cane (sic?) and /adam/abel and so on. Might need some bigger disks.

Bigger disks? Don't think so. We're talking about something like 60 centuries, at 3 generations per century, more or less - that'll be 180 generations. Call it 200. A first name takes, what ten characters? That's about 2K per Person for unique identification. Not great, but when $100 buys you a terabyte drive, retail, size isn't really the first thing that kills this deal.

2011-06-29 Reply Admin

C-Octothorpe:
AndrewH:
"Well," Rick sighed, "the user was right."

Not really. The upgrade just made the error more likely to occur; it exposed the problem, it didn't cause it.

Where I come from, if you implement a change and it has side effects, even if they're impossible to plan for, or even because another application is incorrectly pointing at your web service even though they knew it was getting decommissioned like 3 years ago, it's your fault. Doesn't matter that the other app owner didn't fix it correctly, it's still your fault because your change caused something to break.

Now tell me that isn't screwy...

Not screwy at all. There's a bug in the system which is either caused by, or has been uncovered by, activity which you have perpetrated. Therefore it is up to you to find out what its cause is. Even when you find that the underlying cause of the bug was not your direct fault, it is perfectly reasonable to request that you take the responsibility for ensuring that it be fixed.

However, if your change uncovers the bug before it gets to production, that's something else altogether. Then you are fully entitled to jump up and down and crow like a cockerel that you've found another of Billy Bodger's billion bugs and get lots more pats on the back from your admiring co-workers.

2011-06-29 Reply Admin

C-Octothorpe:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.
Wha?

It's not actually as insane as it sounds. You make the auto-number field unique, but not the primary key. It can still be used for quick look ups from an index, but you make the primary key some combination of values from the row which must always be unique. It's essentially just there as a row ID (the ROWID pseudo-column will change if you export and then reimport the database).

We deal with racing data, our primary key for a race is the date of the race, the identity of the course and the race number. There can only be one race one at course foo on a given date. If anything ever tries to insert that race again it will choke with a unique key constraint violation, rather than creating a duplicate entry.

2011-06-29 Reply Admin

trtrwtf:
Kev:
Darth Paul:
I'm curious as to how you would design a table to hold Person information by this criterion?

Well, MSSQL now has a hiearchy data type, so you could start with /adam, then /adam/cane (sic?) and /adam/abel and so on. Might need some bigger disks.

Bigger disks? Don't think so. We're talking about something like 60 centuries, at 3 generations per century, more or less - that'll be 180 generations. Call it 200. A first name takes, what ten characters? That's about 2K per Person for unique identification. Not great, but when $100 buys you a terabyte drive, retail, size isn't really the first thing that kills this deal.

Fair enough - the non-biblical version might require a bit more storage though :-)

2011-06-29 Reply Admin

Kazan:
with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN. as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...

2011-06-29 Reply Admin

Probably it's this underying idea: too good performance of a database means unemployment for the dba.

2011-06-29 Reply Admin

Jack:
some orders had the same item on two lines meant that the transaction exploded
Yeah, but why would anyone wanna do that? Stupid users. See, it was their fault after all, not the upgrade.

Well, imagine a simple point-of-sale application with a scanner......

2011-06-29 Reply Admin

Sadly, there's a thought in database design that insists that you should never use an unique ID in a table. Strange thought since the double-entry bookkeeping system has been doing this for 500 years before there were any computers.

kabelo · 2011-06-29 Reply Admin

He didn't want to add a new, unique column to the table, only because there were so many undocumented processes doing undocumentable things

LMFAO -

PedanticCurmudgeon · 2011-06-29 Reply Admin

Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

When in doubt, remember the Law of TDWTF Jungle: Troll or be trolled. I leave which is being done in this post as an exercise for the reader.

@Deprecated · 2011-06-29 Reply Admin

Kensey:
@Deprecated:
"Well," Rick sighed, "the user was right."

Immediately after which, Rick apologized profusely to the user for doubting him.

Ha, as if.

Well, in one important sense, the user was not right in what he probably assumed: It wasn't that the upgrade was the only cause of the problem, or was done incorrectly. It was merely that a valid configuration change in the hardware tickled a bug in the application.

If you switch from ordinary water to filtered water in an industrial system, and it causes the system to fail because mineral deposits were sealing cracks in the water lines that weren't supposed to be there in the first place, I wouldn't say the water switch "caused" the failure. Rather, the failure was already there in the system and the previous state of affairs was masking it.

Alright, let's just say that the change triggered the failure. Or would it be a sequence?

2011-06-29 Reply Admin

This is very true. The returning clause is your friend and has been around for some time.

Triggers can really dog down bulk insertion (context switching between SQL and PL/SQL for each row... shiver), so proper use of bulk insert operations, bulk returns (if needed at all) and sequence caching is the way to go for high throughput

--

Also whoever had a CEO go to task to implement referential integrity has an awesome boss. Always amazes me when someone thinks they can do this better than the built in tools and then wonders why performance suffers and bugs arise.

2011-06-29 Reply Admin

Shawn:
Sadly, there's a thought in database design that insists that you should never use an unique ID in a table. Strange thought since the double-entry bookkeeping system has been doing this for 500 years before there were any computers.

accounting != general database use person != dbms

2011-06-29 Reply Admin

Zebedee:
You give your social security number to every site you wanr to make a purchase from? hmmm

Apparently, some enterprises in South Korea require you to provide your ID number for some things.

2011-06-29 Reply Admin

lofwyr:
C-Octothorpe:
...BTW, he reiterated pretty much what everybody else has been saying, albeit much better worded.

Everybody? after reading the comments I had the impression that I am a minority....

I've noticed that a few times now. Whenever C-Octothorpe wants to legitimize a statement, he either states or implies that "everyone" or "security experts" believe it too.

Does he actually think that way?

Sure path to delusion.

Maurits · 2011-06-29 Reply Admin

itsmo:
Kazan:
with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN. as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

WTF - how do the IRS deal with that one? No I don't owe you $1,000,000 - it's that other Iowan...

SSNs are only given to one person at a time. When someone dies that SSN is thrown back into the "unused" pool and is handed out again at a later date.

2011-06-29 Reply Admin

pjt33:
I'm slightly surprised that Rick wasn't worried that changing the primary key on the table would break foreign keys pointing to it. And if the PK is changed it's likely to mean that most joins involving the table need to be checked as well.

Except that foreign keys don't reference the primary key on the target table, they reference a set of columns (which usually is the same set of columns as comprises the primary key, but not always). Changing the primary key really boils down to adjusting the criteria for uniqueness on a table - whether it be this set of columns (natural key) or this ID column (surrogate key) is up to the design of the system, and needs to be evaluated on a case-by-case basis to determine the best solution.

C-Octothorpe · 2011-06-29 Reply Admin

Hortical:
lofwyr:
C-Octothorpe:
...BTW, he reiterated pretty much what everybody else has been saying, albeit much better worded.

Everybody? after reading the comments I had the impression that I am a minority....

I've noticed that a few times now. Whenever C-Octothorpe wants to legitimize a statement, he either states or implies that "everyone" or "security experts" believe it too.

Does he actually think that way?

Sure path to delusion.

Well, it works for me and everyone else...

I should have been more clear. What I meant was everyone who was talking about natural vs surrogate keys... Natural is generally shite, surrogate is good. I blame my ADD for my absolutist comments.

dohpaz42 · 2011-06-29 Reply Admin

Tharg:
As per usual from me, put out or get out.

FTFY

2011-06-29 Reply Admin

[quote user="Kensey"][quote user="@Deprecated"]Well, in one important sense, the user was not right in what he probably assumed: It wasn't that the upgrade was the only cause of the problem, or was done incorrectly. It was merely that a valid configuration change in the hardware tickled a bug in the application.

If you switch from ordinary water to filtered water in an industrial system, and it causes the system to fail because mineral deposits were sealing cracks in the water lines that weren't supposed to be there in the first place, [b]I wouldn't say the water switch "caused" the failure[b/]. Rather, the failure was already there in the system and the previous state of affairs was masking it.[/quote]

Actually, that is exactly what you said.

Whether or not the mineral deposits were there intentionally (or the slowness in the system in this wtf story), these were part of a system which was functioning.

Correct testing in an environment similar to production would have caught the timestamp performance issue, and performing this testing is the responsibility of those making the current change.

2011-06-29 Reply Admin

@Deprecated:
Well, in one important sense, the user was not right in what he probably assumed: It wasn't that the upgrade was the only cause of the problem, or was done incorrectly. It was merely that a valid configuration change in the hardware tickled a bug in the application.
If you switch from ordinary water to filtered water in an industrial system, and it causes the system to fail because mineral deposits were sealing cracks in the water lines that weren't supposed to be there in the first place, I wouldn't say the water switch "caused" the failure. Rather, the failure was already there in the system and the previous state of affairs was masking it.

Actually, that is exactly what you said.

Whether or not the mineral deposits were there intentionally (or the slowness in the system in this wtf story), these were part of a system which was functioning.

Correct testing in an environment similar to production would have caught the timestamp performance issue, and performing this testing is the responsibility of those making the current change.

(wow - quote fail)

KattMan · 2011-06-29 Reply Admin

[quote user="C-Octothorpe]Well, it works for me and everyone else...

I should have been more clear. What I meant was everyone who was talking about natural vs surrogate keys... Natural is generally shite, surrogate is good. I blame my ADD for my absolutist comments.[/quote]

Natural vs. Surrogate, anyone saying use one and not the other is wrong.

Surrogate keys are for the system to track precise records, auto-generated, incremental fields that mean nothing else to the data involved. Natural keys are for human readability and searching.

For example: You want to find a person, You search for John Smith in Virginia, your table anticipates this and has an index built on this so the search is fast and returns 5 records. The user looks at the details, picks the right one and chooses to edit. The system then saves the incremented id value so it knows exactly which john smith you are working with. This value is NEVER shown to the user because it has no meaning to them, it's meaning only exists to the system for uniquly identifying and linking records. You do not want to include all the details in your search to narrow it down because the human mind can make these decisions very fast, reduce thier typing by getting just enough info to narrow the search and display all matching records. Two indexes, one key, one surrogate, one natural, with no limit on how many natural key indecies you may want to create.

C-Octothorpe · 2011-06-29 Reply Admin

no u:
@Deprecated:
Well, in one important sense, the user was not right in what he probably assumed: It wasn't that the upgrade was the only cause of the problem, or was done incorrectly. It was merely that a valid configuration change in the hardware tickled a bug in the application.
If you switch from ordinary water to filtered water in an industrial system, and it causes the system to fail because mineral deposits were sealing cracks in the water lines that weren't supposed to be there in the first place, I wouldn't say the water switch "caused" the failure. Rather, the failure was already there in the system and the previous state of affairs was masking it.

Actually, that is exactly what you said.

Whether or not the mineral deposits were there intentionally (or the slowness in the system in this wtf story), these were part of a system which was functioning.

Correct testing in an environment similar to production would have caught the timestamp performance issue, and performing this testing is the responsibility of those making the current change.

(wow - quote fail)

This. I've been in situations where strange side effects (read: something broke) happened from decomissioning old components and it was determined to be our fault because we didn't perform adequate testing.

The moral of the story is that if you're going to enter a china shop, make sure you don't break anything. Even if the person who was in there just before you thought it would be a good idea to precariously stack all the dishes right at the foot of door, it was you who actually broke the dishes...

C-Octothorpe · 2011-06-29 Reply Admin

[quote user="KattMan"][quote user="C-Octothorpe]Well, it works for me and everyone else...

I should have been more clear. What I meant was everyone who was talking about natural vs surrogate keys... Natural is generally shite, surrogate is good. I blame my ADD for my absolutist comments.[/quote]

Natural vs. Surrogate, anyone saying use one and not the other is wrong.

Surrogate keys are for the system to track precise records, auto-generated, incremental fields that mean nothing else to the data involved. Natural keys are for human readability and searching.

For example: You want to find a person, You search for John Smith in Virginia, your table anticipates this and has an index built on this so the search is fast and returns 5 records. The user looks at the details, picks the right one and chooses to edit. The system then saves the incremented id value so it knows exactly which john smith you are working with. This value is NEVER shown to the user because it has no meaning to them, it's meaning only exists to the system for uniquly identifying and linking records. You do not want to include all the details in your search to narrow it down because the human mind can make these decisions very fast, reduce thier typing by getting just enough info to narrow the search and display all matching records. Two indexes, one key, one surrogate, one natural, with no limit on how many natural key indecies you may want to create.[/quote]

That's what I said about 120 posts ago, but yeah, I totally agree with you. The problem is that some devs use them interchangeably. My general approach is what you said: have a meaningless surrogate PK (GUID or auto-increment) and then put unique contraints on certain fields, which are then used to search on entities.

2011-06-29 Reply Admin

Just to be clear, I didn't say that you should NEVER use a surrogate key. In this specific WTF, adding a unique sequence column to the existing primary key has the same effect as having no primary key at all.

That may be an acceptable fix when orders are failing, but someone needs to have a good look at the schema and figure-out what columns (if any) need to be in the primary key for that table. Right now, you've got something that looks like a primary key, but doesn't behave like a primary key.

Imagine this. It's a few weeks later, and a bug in the front end (or a user with a sticky keyboard) is submitting orders twice. The DB won't notice because the primary key is just there for show. Rick's customers (and his manager) won't be happy when they start getting invoiced for double of what they ordered.

GFK · 2011-06-29 Reply Admin

Someone who can't be bothered to login from work:
We deal with racing data, our primary key for a race is the date of the race, the identity of the course and the race number. There can only be one race one at course foo on a given date. If anything ever tries to insert that race again it will choke with a unique key constraint violation, rather than creating a duplicate entry.

That's the purpose of secondary keys (UNIQUE). Functional keys shouldn't be PKs, for various reasons already exposed:

although they are unique, they could be modified. It's functional data, so sometime it will be modified (wrong input from the user or whatever ERP is feeding your application).
your SPs are gonna look awful with your several line joins.
you shouldn't be waiting for your db to reject inputs. It's nice to have a guard so the coherence of your data doesn't get compromised by a future modification of the code, but you're not gonna tell the user who entered a duplicate "hey, it seems that you can't do that, cause my DB just rejected your data". That's a functional rule, so it should be in your business layer in the first place.

2011-06-29 Reply Admin

Databastard :
All because somebody was ignorant of the basic idea that a key should never hold any meaningful information.

Yes. This, OMG, this. Listen to the man.

If you ever find yourself writing an update script to handle changes made to PK values, stop, back away from the keyboard and take a long, hard look at your DB design.

2011-06-29 Reply Admin

I'm not a DBA and my only experience is with SQL Server but doesn't having multiple fields in a primary key also make searches slower? Having more fields (and most likely some non integers) would mean having less records in the pages and extents (sorry if those are the wrong terms, it's been a while) and, therefore, more disk reads at the final node of the index tree?

2011-06-29 Reply Admin

Les:
Imagine this. It's a few weeks later, and a bug in the front end (or a user with a sticky keyboard) is submitting orders twice. The DB won't notice because the primary key is just there for show. Rick's customers (and his manager) won't be happy when they start getting invoiced for double of what they ordered.

As stated before, that's why you put a unique constraint on whatever information the business uses to identify an order.

But really, you shouldn't be relying on your DB to protect you from bugs in your business logic.

2011-06-29 Reply Admin

C-Octothorpe:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

You're referring to natural keys, which unfortunately I have used and been burned by them in the past. Also because natural keys are occasionally maintained by users (think something like ItemNo), and you can guess what kind of rat-hole this could go down...

Surrogate keys, IMO, are the safest route, although I do agree that they are meaningless to the entity. They also usually allow increased performance when performing joins because you're comparing a single unique value vs multiple values which can be int, datetime, varchar, etc.

If you need uniqueness, just add a unique constraint on them, therefore you have them indexed, unique plus the gains from not having to manage the PK through triggers manually and you know (locally) you won't have ID collisions.

Completely agree with surrogate keys. Ming The Merciless (aka Joe Celko) rails mightily against them as cardinal crimes against the Great Godess Relational Purity, but, dare I say, in real-world applications they are the way to go for precisely the reasons you say.

Jaime · 2011-06-29 Reply Admin

anon:
I'm not a DBA and my only experience is with SQL Server but doesn't having multiple fields in a primary key also make searches slower? Having more fields (and most likely some non integers) would mean having less records in the pages and extents (sorry if those are the wrong terms, it's been a while) and, therefore, more disk reads at the final node of the index tree?

Primary keys are almost always used to return a single record. In this case, it doesn't matter how many keys you can fit on a single page, you'll always read a single leaf-level index page. You may have an extra level in the index, so you might get one more page read.

If you are reading multiple records (perhaps you are doing a JOIN and have a bunch of foreign keys to dereference), it probably won't matter because after a handful of lookups you'll be in bookmark lookup hell and the DBMS will switch to a scan. If you aren't in bookmark lookup hell, then your primary key is clustered and it doesn't matter how long the key is.

pjt33 · 2011-06-29 Reply Admin

airdrik:
pjt33:
I'm slightly surprised that Rick wasn't worried that changing the primary key on the table would break foreign keys pointing to it. And if the PK is changed it's likely to mean that most joins involving the table need to be checked as well.
Except that foreign keys don't reference the primary key on the target table, they reference a set of columns (which usually is the same set of columns as comprises the primary key, but not always). Changing the primary key really boils down to adjusting the criteria for uniqueness on a table - whether it be this set of columns (natural key) or this ID column (surrogate key) is up to the design of the system, and needs to be evaluated on a case-by-case basis to determine the best solution.

I've never been forced to use Oracle, but MS SQL won't let you create a foreign key to a set of columns which isn't a key, and I would be rather concerned about an RDBMS which did because a foreign key should be to a unique row. So changing the primary key is likely to break most foreign keys to the table, if not all.

2011-06-29 Reply Admin

K:
But then I want to ask something more serious: Why would anyone ever not want a unique, autoincremented integer to be the primary key? I learned in uni that there are many ways to do PKs, but all I have ever implemented was an extra integer with no other meaning than "this is the PK".

Ideally (and let's emphasize the "ideal" here), your Primary Key should be how your company tells the various entities apart. An arbitrary key that doesn't relate back to the business risks duplication.

In practice, this isn't a massive deal - any company with sufficient suppliers will start handing out "vendor numbers" and "customer numbers" (to avoid having to try and tell two "Bob Smith"s apart). Ditto for products. The trick is to make sure that the system's key matches the implicit key that is already being used as much as possible.

The only exception I've seen is when the distinction is truly arbitrary. Order numbers, for example (the number doesn't mean anything other than acting as a reference to a specific order).

(I spend a lot of time doing database mining at work - a well-designed database is a pleasure to work with, while a badly designed one makes me wish the designer was within punching distance.)

The Key to a Good Schema

Leave a comment on “The Key to a Good Schema”