The Daily WTF: Curious Perversions in Information Technology

2011-06-28 Reply Admin

this is vulnerable to sql injection..

xkcd 327

2011-06-28 Reply Admin

BentFranklin:
Unique, auto-numbered, never null... Shouldn't TRXID have been the primary key all along?

Well. There are people who thinks that natural keys should be used (as default). Of course transaction time is not a natural key...

frits · 2011-06-28 Reply Admin

Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

unique identifier != metadata.

The name "Remy Porter" is a surrogate key. Maybe you should change your name to a collection of your human attributes that makes you distinct from all other humans?

TheCPUWizard · 2011-06-28 Reply Admin

SCSimmons:
I just wanted to mention, for the benefit of anyone fortunate enough not to have worked with Oracle databases, that this: "For the same reason, he didn't want to change the timestamp to an actual timestamp datatype, which supported fractional seconds." is not nonsensical. A date field in Oracle has date plus hours/minutes/seconds, a timestamp field has the same information but at higher precision. But for some unfathomable reason, most PL/SQL queries that are handed a date when they were expecting a timestamp, or vice-versa, will choke and die. Unlike when dealing with numeric fields of different precisions, or datetime fields of different precisions in any other DBMS that I know of, Oracle refuses to cope with this and just spits errors at you.
If you're smart, you write explicit CAST statements into your PL/SQL to handle column definition changes of this kind. If you're even smarter, you use a different DBMS. :)

And if it was SqlServer (I don't know if this capability was available on Oracle at any given version), the TimeStamp field is setup to be guaranteed and unique (at least for distinct transactions). So a simple column type change would likely have addressed the issue..

[NOT claimin that the WTF would be solved by the above, simply that it might have been the minimal work to get the system back online - expecially if he had not found the other field...]

C-Octothorpe · 2011-06-28 Reply Admin

frits:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

unique identifier != metadata.

The name "Remy Porter" is a surrogate key. Maybe you should change your name to a collection of your human attributes that makes you distinct from all other humans?

I guess the sum of all your human attributes may identify you uniquely... Except you're never allowed to move, change your name, your age, height, weight, hair color, and god forbid, gender...

2011-06-28 Reply Admin

anon:
BentFranklin:
Unique, auto-numbered, never null... Shouldn't TRXID have been the primary key all along?

Well. There are people who thinks that natural keys should be used (as default). Of course transaction time is not a natural key...

This is a hard problem to begin with because you often have to work out all your data maintenance processes in advance to understand what you can use to uniquely identify something. It gets harder if you sometimes have this data but not always. It gets harder when the thing in question changes, and so forth.

SQL doesn't make it easy, of course, even a simple join becomes tedious when you have a composite key. And, generally, most clients assume keys will be static; SQL's anemic type system can't do opaque types.

So people use surrogate keys, and often times ignore the problem, or blame users.

But what a database typically does is store a record of things in the real world, and the reason you're using it in the first place is that people can't keep up with the quantity of data. A natural key is a way of describing the relationship the record in the system has with the actual thing in the real world, and the database can use simple logic to ensure that there's a 1:1 correspondence between its records and actual things that have been recorded.

Surrogate keys don't provide the mechanisms to ensure that 1:1 correspondence between your records and the things you are recording. Without it, your database is wrong, and if it becomes even modestly complex, you can't predict how wrong any queries based on that data will be.

boog · 2011-06-28 Reply Admin

C-Octothorpe:
I guess the sum of all your human attributes may identify you uniquely...

Is addition really applicable? What if two distinct combinations of human attributes add up to the same value? Collisions seem almost guaranteed to occur at some point.

2011-06-28 Reply Admin

I'm lucky enough to have worked with both sql server and oracle.

I've only seen a couple of comments here which indicate that the posters have the faintest inkling about Oracle.

As has been posted above, the answer to getting a PK for a parent and the child is

Insert blah into x returning y.

Which numpty suggested using a trigger to populate a pk? What's wrong with INSERT INTO

<sequence name>.nextval? If you want to know the justification for numpty, see http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2575882200346616184.

As for complaining about a language rejecting an implicit cast, well, I'd say that's a good way to stop bugs. As far as I'm concerned, an implicit cast IS a bug. I only wish Oracle was as fussy with all it's data typing. Ah, but then, think of all that code that would break, due to all the bugs (implicit casts) in it....

To the illustrious know-it-all that decries surrogate PK's without reason, all I can do is say "Based on what evidence?" Ever tried an inner join on a compound key instead of a surrogate? Ever tried manipulating the PK's, because when the business says "it will never change" they lied!

As per usual from me, put up or shut up.

2011-06-28 Reply Admin

C-Octothorpe:
frits:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

unique identifier != metadata.

The name "Remy Porter" is a surrogate key. Maybe you should change your name to a collection of your human attributes that makes you distinct from all other humans?

I guess the sum of all your human attributes may identify you uniquely... Except you're never allowed to move, change your name, your age, height, weight, hair color, and god forbid, gender...

But giving a person some 10-digit number will work perfectly. They just have to never lose it, lie about it, and they just have to trust us never to hand it out to anyone who asks.

Anonymou5 · 2011-06-28 Reply Admin

Dan:
Sweatpants Wearing Nagesh:
The Article:
But dangling from the end of the key like an angry Klingon hung "D_TIMESTAMP".

In India, Software Engineers have Mega-Maids for taking care of this.

"What happened?" "It's Mega-Maid, sir. She's gone from suck to blow!"

Ludicrous speed? Sir, we've never gone that fast before. I don't know if this server can take it.

nonpartisan · 2011-06-28 Reply Admin

C-Octothorpe:
Lorens:
her she her she'd She she she

Ahh, there's your problem...

Way to be a sexist pig, considering that the first WTF mentioned was a guy. Quote:

Lorens:
I have worked with at least two supposedly competent persons who did not understand the concept of a primary key.
The first one (third-year student) set the key to be the client name plus the client's address. Many of our clients were multi-located. When I asked what he thought would happen when a client moved, he proudly replied that he'd thought of that: the code that did the address change went through all the related tables and updated the key.

Anonymou5 · 2011-06-28 Reply Admin

nonpartisan:
C-Octothorpe:
Lorens:
her she her she'd She she she
Ahh, there's your problem...
Way to be a sexist pig, considering that the first WTF mentioned was a guy.

Way to be an over-sensitive feminist.

Matt Westwood · 2011-06-28 Reply Admin

nonpartisan:
C-Octothorpe:
Lorens:
her she her she'd She she she

Ahh, there's your problem...

Way to be a sexist pig, considering that the first WTF mentioned was a guy. Quote:

Lorens:
I have worked with at least two supposedly competent persons who did not understand the concept of a primary key.
The first one (third-year student) set the key to be the client name plus the client's address. Many of our clients were multi-located. When I asked what he thought would happen when a client moved, he proudly replied that he'd thought of that: the code that did the address change went through all the related tables and updated the key.

Typical humourless feminist. Bet you're American as well.

2011-06-28 Reply Admin

But then I want to ask something more serious: Why would anyone ever not want a unique, autoincremented integer to be the primary key? I learned in uni that there are many ways to do PKs, but all I have ever implemented was an extra integer with no other meaning than "this is the PK".

2011-06-28 Reply Admin

Lorens:
BentFranklin:
Unique, auto-numbered, never null... Shouldn't TRXID have been the primary key all along?

I suspect the original programmer did not understand the goal of a primary key. TRXID must have been added by somebody else, probably someone who left in disgust before he could implement it or document it.

I have worked with at least two supposedly competent persons who did not understand the concept of a primary key.

A long time ago in a galaxy far far away there were 2 different kinds of programmers, database programmers and gui or interface programmers. The later always got into trouble when you started talking about database normalization, primary keys, table relation ships etc etc. It's been awhile since I wore my database programming hat but I'm going to guess it's pretty much the same today as it was back then.

2011-06-28 Reply Admin

Tharg:
As for complaining about a language rejecting an implicit cast, well, I'd say that's a good way to stop bugs. As far as I'm concerned, an implicit cast IS a bug. I only wish Oracle was as fussy with all it's data typing. Ah, but then, think of all that code that would break, due to all the bugs (implicit casts) in it....

Oracle is (except for strings and nulls!) following the standard; implicit casting certainly is broken by design, but that's a problem with SQL itself.

For example, it should always be the case that A UNION B = B UNION A. And if a row is in (A UNION B), then it is should be in A or B or both.

But, nope, implicit casting breaks both of those expectations.

Ever tried an inner join on a compound key instead of a surrogate?

Are you really crying about typing FROM FOO F JOIN BAR B ON F.A = B.A AND F.B = B.B AND F.C = B.C? It's SQL, it's verbose, deal with it.

Ever tried manipulating the PK's, because when the business says "it will never change" they lied!

Yes, it's like any other schema change. You might have to do some views to show the old natural key until clients are updated, but it's not that hard.

2011-06-28 Reply Admin

Matt Westwood:
Typical humourless feminist. Bet you're American as well.

Heh.

How many feminists does it take to change a light bulb?

One. And it's Not Funny!

2011-06-28 Reply Admin

K:
But then I want to ask something more serious: Why would anyone ever not want a unique, autoincremented integer to be the primary key? I learned in uni that there are many ways to do PKs, but all I have ever implemented was an extra integer with no other meaning than "this is the PK".

Example: Say you have two tables A and B, and you're a fan of surrogate keys, so A has a column ID, and B has a column ID, both of which are autogenerated surrogate keys.

Now lets say, further, that there's a many to many relationship between A and B. So you create a table C with two columns AID and BID that have foreign key constraints to A and B, respectively.

Clearly the natural key to use is the compound key AID and BID.

Given that many-many relationships are fairly ubiquitous, this seems like a pretty common use case.

2011-06-28 Reply Admin

boog:
"Well," Rick sighed, "the user was right."
And with that, the universe collapsed inward and promptly ceased to exist.

And was promptly replaced with one even more weird and wonderfull then the last.

2011-06-28 Reply Admin

Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

Yeah, understand your argument but entities can evolve too as the system grows. I had to support (ugh) a system with 90% of the business logic implemented at database level. Changes on tables fired triggers called stored procedures making changes on other tables firing triggers...

Now, the big deal here was the EmployeeNumber (Employee.EmployeeNumber as a candidate for an entity defining its own PK, yes?). Well that's what the designers did. EmployeeNumbers spread through the entire schema as FKs. And what happened when they implemented a feature to change an EmployeeNumber? (happens, contractor workers suddenly becoming corporate workers and need their emp-numbers to match what SAP says. All hail mighty SAP!)

They used a transaction encapsulating a whole lot of INSERT and DELETE statements. Remember what I said about the business logic implemented with triggers? A lot of extra logic to maintain to prevent orphaned records. And customized tables was a a different animal altogether.

So... not saying "always use auto-num" (design principles and all that), but entities (even well defined ones) evolve too. If they used an auto-num, the entity could have been mutable and a lot of issues would have been avoided (even if they didnt choose to do business logic in triggers. Still gives me nightmares).

nonpartisan · 2011-06-28 Reply Admin

Matt Westwood:
nonpartisan:
C-Octothorpe:
Lorens:
her she her she'd She she she

Ahh, there's your problem...

Way to be a sexist pig, considering that the first WTF mentioned was a guy. Quote:

Lorens:
I have worked with at least two supposedly competent persons who did not understand the concept of a primary key.
The first one (third-year student) set the key to be the client name plus the client's address. Many of our clients were multi-located. When I asked what he thought would happen when a client moved, he proudly replied that he'd thought of that: the code that did the address change went through all the related tables and updated the key.

Typical humourless feminist. Bet you're American as well.

Way to know how to rile people up . . .

I'm far from a feminist. I just calls 'em as I sees 'em. I actually think the female was the bigger WTF as she had 10 years' experience. I think it was more a WTF expecting a third-year student to be competent (and, in my view, I expect "competent" to mean "to be solidly knowledgeable in the basic concepts of database design and server administration") enough to reject this idea outright. Even if I did come up with this idea as a student, I know my WTF bell would be ringing and I'd be more inclined to take that idea and ask Lorens about it first before I implemented it.

2011-06-28 Reply Admin

"Well," Rick sighed, "the user was right."

Not really. The upgrade just made the error more likely to occur; it exposed the problem, it didn't cause it.

2011-06-28 Reply Admin

Meep:
K:
But then I want to ask something more serious: Why would anyone ever not want a unique, autoincremented integer to be the primary key? I learned in uni that there are many ways to do PKs, but all I have ever implemented was an extra integer with no other meaning than "this is the PK".
Example: Say you have two tables A and B, and you're a fan of surrogate keys, so A has a column ID, and B has a column ID, both of which are autogenerated surrogate keys.
Now lets say, further, that there's a many to many relationship between A and B. So you create a table C with two columns AID and BID that have foreign key constraints to A and B, respectively.

Clearly the natural key to use is the compound key AID and BID.

Given that many-many relationships are fairly ubiquitous, this seems like a pretty common use case.

That's the main one. I've also been known to use short varchar PKs for tables that are rarely updated, such as a list of countries PKed on their two-letter country code. (Even if one has to be updated, an ON UPDATE CASCADE takes care of it.)

boog · 2011-06-28 Reply Admin

AndrewH:
"Well," Rick sighed, "the user was right."
Not really. The upgrade just made the error more likely to occur; it exposed the problem, it didn't cause it.

Hey, Rick said it. Take it up with him.

2011-06-28 Reply Admin

anon:
Well. There are people who thinks that natural keys should be used

And those people would be Wrong.

Remy Porter:
Your primary key should be a unique identifier for your _entity_. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

I worked for a large nationwide retail chain once. Primary key for the store table was store number -- naturally. In the inventory table, which told us how many of each item was on hand in each store, there was a foreign key to store number.

A small store typically stocks, at one time or another throughout its life, at least 100,000 items. Larger stores have a million or more.

A million items times a thousand stores = a billion rows in the inventory table.

The first digit of the store number identified which distribution center it got its merchandise from. Oops!! Danger! Danger! There is meaningful information stored in the key!!!

Sure enough, the business grew (dammit) and we had to open a new distribution center.

Store numbers all had to change.

A billion rows had to be updated with their new store numbers. That's not just a billion writes. Each one has to go through the foreign key constraint logic, and you have to change the store number in the same transaction as you change the million rows that are children of that store. Needless to say the corporate database was effectively "down" for a while.

All because somebody was ignorant of the basic idea that a key should never hold any meaningful information.

If you are worried about accidentally creating duplicate store records, put a unique index on the store number even though it is not the key. Duplicates can't be created, but when a store number changes it is just one update to one row. The relationships still hold, and nothing else needs to be updated.

Now, who wants to talk about using GUIDs for row IDs instead of a sequential number?

2011-06-28 Reply Admin

Anon:
BentFranklin:
Unique, auto-numbered, never null... Shouldn't TRXID have been the primary key all along?

Yes, well done captain obvious.

Shouldn't that have been 'Kathryn Obvious' ??

2011-06-28 Reply Admin

Databastard:
All because somebody was ignorant of the basic idea that a key should never hold any meaningful information.

Amen brother!

Sometimes I feel like a dataless key; Sometimes I feel like a dataless key; Sometimes I feel like a dataless key A long ways from home....

2011-06-28 Reply Admin

dkallen:
The obvious solution is to catch the error, loop until SYSDATE changes, then retry. The loop should go very fast on the new hardware.

+1, that made me lol or at least gol

2011-06-28 Reply Admin

Bit cynical, eh Remy?

"...because they didn't like change..." - Do you think maybe the users are intelligent enough to realise that when a system breaks, and there has been a recent change (especially when the problem starts immediately after that change is deployed), then there's a good chance that that change is somehow related?

Reasonably stable systems (even when they're crap), tend to be reasonably stable - that is, they don't fallover one day just because they can....

2011-06-28 Reply Admin

Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

So what would be a good primary key for a 'Customer' ?

2011-06-28 Reply Admin

Meep:
C-Octothorpe:
frits:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

unique identifier != metadata.

The name "Remy Porter" is a surrogate key. Maybe you should change your name to a collection of your human attributes that makes you distinct from all other humans?

I guess the sum of all your human attributes may identify you uniquely... Except you're never allowed to move, change your name, your age, height, weight, hair color, and god forbid, gender...

But giving a person some 10-digit number will work perfectly. They just have to never lose it, lie about it, and they just have to trust us never to hand it out to anyone who asks.

Doesn't America have those social security number things? In Australia we have tax file numbers, same thing but 9 digits. I guess the tax office could put together a natural key... or not.

2011-06-28 Reply Admin

JD:
Meep:
C-Octothorpe:
frits:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

unique identifier != metadata.

The name "Remy Porter" is a surrogate key. Maybe you should change your name to a collection of your human attributes that makes you distinct from all other humans?

I guess the sum of all your human attributes may identify you uniquely... Except you're never allowed to move, change your name, your age, height, weight, hair color, and god forbid, gender...

But giving a person some 10-digit number will work perfectly. They just have to never lose it, lie about it, and they just have to trust us never to hand it out to anyone who asks.

Doesn't America have those social security number things? In Australia we have tax file numbers, same thing but 9 digits. I guess the tax office could put together a natural key... or not.

You give your social security number to every site you wanr to make a purchase from? hmmm

2011-06-28 Reply Admin

C-Octothorpe:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

You're referring to natural keys, which unfortunately I have used and been burned by them in the past. Also because natural keys are occasionally maintained by users (think something like ItemNo), and you can guess what kind of rat-hole this could go down...

Surrogate keys, IMO, are the safest route, although I do agree that they are meaningless to the entity. They also usually allow increased performance when performing joins because you're comparing a single unique value vs multiple values which can be int, datetime, varchar, etc.

If you need uniqueness, just add a unique constraint on them, therefore you have them indexed, unique plus the gains from not having to manage the PK through triggers manually and you know (locally) you won't have ID collisions.

The original point is one of them "works in theory" things...

The problem is, how do we identify 'Unique data' for a person... Name? No Name Surnmae? No, Name Surname DOB? Still not unique.... Name Surname DOB Address? Still not guaranteed to be unique (although likely). Social Security Number? Bingo!!

Actually, in hindsight, that;s how the world works

2011-06-28 Reply Admin

Nope, I'm Australian and I don't have one. But as an argument for natural keys over surrogate keys do you argue that you should have a natural key instead of a social security number?

2011-06-28 Reply Admin

Matt Westwood:
C-Octothorpe:
Lorens:
her she her she'd She she she

Ahh, there's your problem...

Naughty ... I shouldn't, but: +1 PML

Didn't we already scare away one foxy lady like that????

2011-06-28 Reply Admin

boog:
C-Octothorpe:
I guess the sum of all your human attributes may identify you uniquely...
Is addition really applicable? What if two distinct combinations of human attributes add up to the same value? Collisions seem almost guaranteed to occur at some point.

this is exactly why we use '+' for string concatenation - some one sued the word 'sum of' when they meant 'concatenation of'

captcha: nibh

2011-06-28 Reply Admin

anon:
Lorens:
BentFranklin:
Unique, auto-numbered, never null... Shouldn't TRXID have been the primary key all along?

I suspect the original programmer did not understand the goal of a primary key. TRXID must have been added by somebody else, probably someone who left in disgust before he could implement it or document it.

I have worked with at least two supposedly competent persons who did not understand the concept of a primary key.

A long time ago in a galaxy far far away there were 2 different kinds of programmers, database programmers and gui or interface programmers. The later always got into trouble when you started talking about database normalization, primary keys, table relation ships etc etc. It's been awhile since I wore my database programming hat but I'm going to guess it's pretty much the same today as it was back then.

Hmm...yeah, apps programmers amde things work irrespective of the data, and DB programmers complained that the data was crap.

Things haven't changed, but DB progs who complain that Apps progs do stupid things on the DB only have themselves to blame for not insisting that THEY design (at least control) the database, and the AP looks after the rest. The reason most Databases are crap, is because they are initially designed/created by an apps prog not a DBA. We should all be on the same side, instead it's all the DBAs fault (or vice versa if you're a DBA)....

<note> Some pedantic turd will probably say that there is a difference between a DB programmer and a DBA. They're probably right, but that's really the problem, isn't it...do DB prgroammers still exist?

2011-06-28 Reply Admin

Zebedee:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

So what would be a good primary key for a 'Customer' ?

The order in which they entered your system - that's natural not auto-generated....Oh wait....

2011-06-28 Reply Admin

JD:
Nope, I'm Australian and I don't have one. But as an argument for natural keys over surrogate keys do you argue that you should have a natural key instead of a social security number?

Nope, I'm pretty sure I never said that. Try reading a bit slower, take your time, it's ok.

2011-06-28 Reply Admin

I'm so sorry you poor column :( Here's a hug! <hugs>

2011-06-28 Reply Admin

Zebedee:
You give your social security number to every site you wanr to make a purchase from? hmmm

It is illegal to require someone to provide their Tax File Number except in a few situations.

Using TFNs as a primary key is still a bad idea, because not everybody has one.

2011-06-28 Reply Admin

Zebedee:
JD:
Nope, I'm Australian and I don't have one. But as an argument for natural keys over surrogate keys do you argue that you should have a natural key instead of a social security number?
Nope, I'm pretty sure I never said that. Try reading a bit slower, take your time, it's ok.

Funny, I was about to say the same thing to you. MY POINT was that we have the equivalent of surrogate keys in real life because it's too hard to identify people using natural keys. That's why the tax office distributes TFNs. Real life surrogate keys.

What that has to do with distributing a tax file number mystifies me. If you have a primary key assigned to apples, and you send a message to an entirely different company that also has an apples item do you share your primary key?

In future try not to misunderstand my post, reply something out of context then rip on me for low reading comprehension.

Captcha: nobis

2011-06-28 Reply Admin

Jacques Chester:
Zebedee:
You give your social security number to every site you wanr to make a purchase from? hmmm

It is illegal to require someone to provide their Tax File Number except in a few situations.

Using TFNs as a primary key is still a bad idea, because not everybody has one.

Clearly I didn't explain myself clearly enough. I'm suggesting that the Tax office uses the TFN as a kind of surrogate primary key, and everyone who pays tax does indeed have a TFN. I'm in absolutely no way suggesting that anyone uses a TFN as a primary key in their database, that would be insane. But the concept of assigning meaningless numbers to people has precedent everywhere. So create your own TFN. Call it a customer reference number, bam.

Kazan · 2011-06-28 Reply Admin

Meep:
C-Octothorpe:
frits:
Remy Porter:
Omnomnonymous:
Les:
If N_TRXN_ID is auto-populated by a trigger, it has no business being in a primary key.
Think about it. N_TRXN_ID will always be unique, even if your code starts inserting the same row over and over. You may as well not have a primary key; the end result is the same, and you'll save a little bit of overhead.

Not sure if troll?

Actually, I agree with him. Your primary key should be a unique identifier for your entity. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.

unique identifier != metadata.

The name "Remy Porter" is a surrogate key. Maybe you should change your name to a collection of your human attributes that makes you distinct from all other humans?

I guess the sum of all your human attributes may identify you uniquely... Except you're never allowed to move, change your name, your age, height, weight, hair color, and god forbid, gender...

But giving a person some 10-digit number will work perfectly. They just have to never lose it, lie about it, and they just have to trust us never to hand it out to anyone who asks.

SSN isn't a unique key. that was actually a bug in some ecommerce software that i helped reimplement the data access layer and the database because they incorrectly assumed that SSN could be the primary key.

replaced it all with auto increment unique surrogate key column. so much better than trying to make a derived key.

derived keys WILL have collisions.

2011-06-28 Reply Admin

Jacques Chester:
Zebedee:
You give your social security number to every site you wanr to make a purchase from? hmmm

It is illegal to require someone to provide their Tax File Number except in a few situations.

Using TFNs as a primary key is still a bad idea, because not everybody has one.

It's not illegal to refuse to provide your TFN, even when dealing with the Tax Office itself. Read the top of your Tax Return (for Australians, you should have just received yours in the mail).

From memory (because mine is not in front of me): "It is not an offence to not provide your TFN, but not doing so may cause delays in the processing of your Tax Return"

Kazan · 2011-06-28 Reply Admin

Databastard:
anon:
Well. There are people who thinks that natural keys should be used
And those people would be Wrong.
Remy Porter:
Your primary key should be a unique identifier for your _entity_. Which means it should convey some meaning about the identity of the entity. In general, slapping an auto-numbered PK on a table is a hack. A well defined entity should define its own primary key.
I worked for a large nationwide retail chain once. Primary key for the store table was store number -- naturally. In the inventory table, which told us how many of each item was on hand in each store, there was a foreign key to store number.
A small store typically stocks, at one time or another throughout its life, at least 100,000 items. Larger stores have a million or more.

A million items times a thousand stores = a billion rows in the inventory table.

The first digit of the store number identified which distribution center it got its merchandise from. Oops!! Danger! Danger! There is meaningful information stored in the key!!!

Sure enough, the business grew (dammit) and we had to open a new distribution center.

Store numbers all had to change.

A billion rows had to be updated with their new store numbers. That's not just a billion writes. Each one has to go through the foreign key constraint logic, and you have to change the store number in the same transaction as you change the million rows that are children of that store. Needless to say the corporate database was effectively "down" for a while.

All because somebody was ignorant of the basic idea that a key should never hold any meaningful information.

If you are worried about accidentally creating duplicate store records, put a unique index on the store number even though it is not the key. Duplicates can't be created, but when a store number changes it is just one update to one row. The relationships still hold, and nothing else needs to be updated.

Now, who wants to talk about using GUIDs for row IDs instead of a sequential number?

design wtf... a record for each item per store?

should be a stores table, a distribition centers table, and an items table.

there should be an associator table from store to distribution center, and an associator table from store to items table. the store<-->items associator would hold an inventory quantitiy as well. you could also then have an associator table from distribution center o items.

huge amount of duplicate data eliminated from the DB.

Kazan · 2011-06-28 Reply Admin

with the lot:
Social Security Number? Bingo!!

SSN is not usable as a key, you can have collisions. there are two different guys in iowa with the same SSN. as noted above i helped reimplement software that was implemented based on the same flawed assumption you just made.

Scarlet Manuka · 2011-06-29 Reply Admin

Kazan:
Databastard:
I worked for a large nationwide retail chain once. Primary key for the store table was store number -- naturally. In the inventory table, which told us how many of each item was on hand in each store, there was a foreign key to store number.
A small store typically stocks, at one time or another throughout its life, at least 100,000 items. Larger stores have a million or more.[...]

design wtf... a record for each item per store?

should be a stores table, a distribition centers table, and an items table.

there should be an associator table from store to distribution center, and an associator table from store to items table. the store<-->items associator would hold an inventory quantitiy as well. you could also then have an associator table from distribution center o items.

huge amount of duplicate data eliminated from the DB.

Reading comprehension fail. The stores<->items associator you describe is what they did have. A large store, over its lifetime, can have over a million distinct products.

I'm assuming that the store number <-> distribution center link needed to be preserved for business reasons. If not, you are correct that the simplest way to handle it would be to add a store <-> distribution center table.

Otherwise, the solution to the problem should have been to convert the existing store number column into a surrogate key (store_id or some such), and add a new column for the stores table with the store number on it. Add a unique index to the new store number column.

Of course either approach requires updating your application logic to use the new design. I suppose if that's not an option then you're stuck doing it the hard way.

Also, I was pleasantly surprised by the ending. I was expecting that the solution would be to convert D_TIMESTAMP into a fractional seconds datatype...

2011-06-29 Reply Admin

dkallen:
The obvious solution is to catch the error, loop until SYSDATE changes, then retry. The loop should go very fast on the new hardware.

Nice approach :) What gets me though is that according to TFA: "Their testing had shown that the beefy CPU and half-ton or RAM had improved their benchmarks. " So how come they didn't experience the problem during benchmarking and implement your ingenious solution?

2011-06-29 Reply Admin

In MySQL (though I don't have a client handy to test the syntax):

select min(col) from tbl where col not in (select a.col from tbl a inner join tbl b on a.col<b.col);

The Key to a Good Schema

Leave a comment on “The Key to a Good Schema”