- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
What if your customers don't tell you the phone number? Will you put made-up phone numbers into your database so you have your unique key?
Admin
Well just wait until you get 2 comments posted at the exact same second...
Admin
Well, a timestamp is fine until two people post comments at the same time.
Aside: a forum WTF: "Something didn't quite work out ... " is not what I call a helpful error.
Admin
Yeah I thought of that, I add one second to it in that case.
Admin
Honestly, for a simple low-profile application like a single-user blog, there is hardly a database design bad enough to stop it from working.
Admin
I prefer working with database systems that allow for more than one unique index for each table. Anyway, when we talk about 255 million rows, I would not want it to break for those 10 million "John Smith" who have no phone.
Admin
This is rather appropriate for me, because at one point in my life, I lived at "18 John St" in one city & state, and worked at "18 John St" in a different city & state.
However, one thing which I haven't seen anyone comment on, is the fact that
And a whole bunch of other variation are (as far as the USPS is concerned) all the same address. So, whether or not the system can find your record depends on how you choose to abbrev your address that day.
Also, say you're the second person at "123 Main St". How are you supposed to know that in the future, when they ask you for you address, you're supposed to say "2 123 Main St"? And, how is the service rep, taken this information over the phone supposed to know that it's "2 123 Main St", and not "2123 Main St"?
That "developer" really shouldn't be allowed near computers (or any electrical device).
Admin
I think you meant for your post to read "This is frowned on *by some people*...".
There is nothing inherently wrong with the design of using artificial keys. There are other constructs for managing uniqueness other than just the primary key - alternate keys and unique indexes - there's nothing special about using the primary key to enforce uniqueness. What is special about the primary key is that 1) it needs to be unique and unchanging, 2) it's values will be propagated to all other tables with foreign key relationships and 3) it will be heavily used in joins.
'Natural' keys tend to suffer some deficiencies here. The unique and unchanging natural data that you select frequently turns out to be neither in practice (people make entry errors, phone numbers change, names change, SS#'s aren't always available for everyone, etc., etc.). When the data does change the fact can't simply be updated in a single place, it has to be updated in every single place referring to the original table as well. Systems using natural keys that I've seen tend to develop key bloat when the primary key of dependent tables includes all the columns from the parent tables original key plus additional columns for the child - I've seen systems with 7 or 8 part primary keys due to a couple layers of nested dependency. Finally the size and efficiency of your primary key index for joins is at the mercy of what your natural key happens to be - is it a simple number (small and fast to compare, many records to an index page) or a concatenation of several text columns (from your example a phone number and a customer name).
Artificial keys suffer none of these deficiencies. The only legitimate argument you made against artificial keys is that you'll have one additional index on your table for the artificial value - though typically that index is going to be better for joins than an index on a natural key is going to be.
Admin
That is so hilarious!
Admin
I don't get it, you should be unaware of the "facts" of the table.
What I don't understand about using "natural" data for the Primary Key. How do I relate another table to this database?
Table1 (customertable)has PK with Customer Name and Primary Key Table2 (ordertable) is a list of Orders. One of the fields points back to the custoemrtable. What is this field? A name/phonenumber?
Admin
The first six are. Why you think "court" and "avenue" are evaluated to be "street" is beyond me, though.
I worked on USPS software for two years and can assure you they aren't. You can also look at
http://www.usps.com/ncsc/lookups/usps_abbreviations.html
ok
dpm
Admin
I just say "I don't want that job." It's worked so far. I also like to say "No, I don't know COBOL" even though I do. Who could blame me?
ok
dpm
Admin
Not only are duplicate SSNs used illegitimately, but SSNs are legitimately re-used as people die and are born.
This causes problems for companies that need to hold data for a long time, like life insurance companies.
Admin
Because the USPS officially ignores the suffix and considers all of them to be just "1 Forrest". New streets must be unique within the town irrespective of the suffix.
Admin
Admin
That's plainly not true. I assure you that 123 8th St and 123 8th Ave are two different locations in Manhatten
Admin
I code all of my IDs by converting them to binary strings where 0s are tabs and 1s are spaces. This encrypts the IDs so that they can't be seen when printed out, because confidentiality is extremely important in our office due to HIPAA.
Admin
hence he said "new streets"
Admin
I like unique keys. I like to think of databases tables kinda like a huge multidimensional array, with the first dimension being the index/primary key. My tendancy in db programming is to always use a field for the key that autoincrements. No, I don't do alot (if any) of db programming. Bottom line though, is that I've never had any problems with this situation.
Admin
I cannot imagine why you say that. At what point in processing do they get "ignored"? I've worked on the code
and the database-generation scripts, so I know they don't --- what have you done?
And if that were true, how would mail get delivered? God knows there are thousands of examples against your claim:
anyone from Atlanta care to count how many "Peachtree" names there are in that area? Jeez. The first google entry
for (atlanta "how many peachtree") is http://www.newcolonist.com/peachtree.html which features a picture of the
signs at the intersection of Peachtree Road and Peachtree Avenue. http://www.newcolonist.com/peachtree.html
ok
dpm
Admin
> ...but you would want your PK to be the phone number PLUS the caller's name. This identifies a fact that is UNIQUE to each row in the table, prevents duplicates, and it's not bad design.
Actually, for this data you should use a generator to create a unique number that is unrelated to the data. The phone number plus name should be a seperate, unique constraint. This unique constraint is also called a candidate key.
Now the code actually reflects what you're trying to accomplish, and the unique constraint can be changed if the rules change. You may get a case where John Smith is assigned the phone number that used to belong to John Smith.
If you are concerned about retrieval speed, that would be handled by an index based on the query and the potential plans generated by the optimizer.
Rick DeBay
Admin
Which wouldn't make a damned bit of difference to the USPS databases and applications.
What would it matter if "new" streets are unique, when you still have to deal with the
already existing thousands of duplicates?
ok
dpm
Admin
Lovely, lovely! Did the cities have different names?
In one newsgroup, there was a thread on addresses, and one poster wrote of classifying addresses by even or odd. (This is especially applicable in Canada since the Postal Code will typically be different for each side of the street.) I, myself, live at 1313 1/2 Railroad Avenue. Yes, that is a one-half. It does cause some doubletakes.
In Vancouver, BC, Canada, there is an address "South Foot of Main Street". It was in the Postal Code guide the last time I checked for it. (No, you can not simply fill in 0. That would be at the other end of Main Street.)
Sincerely,
Gene Wirchenko
Admin
So you're to blame!
http://compsoc.dur.ac.uk/whitespace/
Admin
Just to add one more vote in favor of that statement being untrue. I have worked for a company that did direct mailing(we had our very own post office), and worked quite a bit with address issues. Street suffixes are not ignored, and if they were it would be a mess. As another poster said, even if the requirement was made that new streets had to be uniquely named, this solves nothing because of all of the pre-existing street with non-unique names.
And another thing: I just moved into a brand new neighborhood, and we have a street named SE 89th Terrace, that runs about 50 yards parallel to SE 89th Street. So if this law/rule/whatever requiring unique name exists, it must be very new.
Admin
Artificial keys require more lookups and therefore often additional joins. And of course they need additional space, though this might be the smallest concern.
In this forum, we've had a discussion about this topic a few weeks ago; and since I'm generally pro-artificial key (many systems use them for every single table), I've kept an eye on problems related to artificial keys.
My perception is that they hurt most where they are most useless.
The prerequisite is that my company makes software that is supposed to be "standardized", but in fact it is heavily customized for each client, to the point where it is almost individually developed. Although all clients generally need the same tables - product, customer, order, order_detail etc., the structure of the natural keys differ. E.g. for some clients, the product number is unique, while another clients needs an additional field, e.g. "company_number", to make the unique key. Using artificial keys for all those tables greatly improved reusability of the code between clients. On the other hand, some tables describe internal features of our software - parameters, code-description-mappings, messages etc. In many cases, there would be a "natural" key, and since these tables are not directly related to the customer's business, so there is the only reason for using the autogenerated IDs as unique key has been our stubbornness (called "consistency"). Anyway, looking back, it was a rather poor decision. Not a "OMG we are doomed" bad one, but definitely something I will do different the next time I have a chance to.
Admin
Again, yes. I live at a Woodstone Place, but the adjoining street is Woodstone drive. There are overlaps in the numeric ranges, as well.
Admin
Generally speaking, the artificial versus natural key debate comes down to this: You'll never do anything bad by using artificial keys. There are some cases when you can do better using natural keys and lots of cases when you can do worse using natural keys. If you're not 100% about using a natural key, go with artificial.
Admin
My thoughts exactly. If you are equipped to handle the special case of incrementing the timestamp on a collision, you should have just incremented a counter in the first place. the hack is just as much work as doing it right.
In general, you have a bad design because you can never scale beyond 60 transactions per second.
Admin
WTF!
I can't type anything in the reply box, only the html box!</>
Admin
"I've seen systems with 7 or 8 part primary keys due to a couple layers of nested dependency"
Actually, having all of the keys of all of the parent tables can be a help in many situations. So you have a DB structure that involves 7 or 8 levels of nesting, if each table had it's own artificial key and only the key of it's immediate parent, you would have to join between all 7 levels of tables to get down to the detail tables for a specific top level entity. If all of the child tables contain the key of every table above it, it's trivial to skip the intermediate tables and join straight to the details you need.
Mind you this argument works with either natural or artificial keys. Sometimes it can be useful to do both, so each table has it's own single column primary key, and an alternate key that is based on the parent table + whatever is naturally unique.
Admin
Interesting... When logging in with FireFox I can't type in this box and there are two tabs at the bottom.
I ran into this situation at a very large defense contractor who builds state of the art aircraft. Some of their systems allowed users to "decorate" data in fields to make it unique when there was duplication. Needless to say it was a nightmare to parse their data...
Admin
Yep, me too. I work in criminal justice. We come across non-unique SSNs all the time, not to mention lots of criminals with multiple SSNs.
Inaccurate data aside, most people don't realize that occosionally the same SSN is legitimately assigned to more than one person.
Admin
He's using jargon particularly common in the OLAP subdiscipline of database design. Data is broken into "facts" and "dimensions". Then a "star schema" is built placing the fact table at the center and relating it by foreign key to several dimension tables. The primary key of each record in the fact table is a composite of the foreign keys of all the dimensions.
For instance, say you were to build an OLAP environment for sales data. You might build dimensions for product, sales team, and quarter. Then you'll build a fact table that has, for every combination of product, sales team, and time period, a set of values for gross sales, net sales, discounts, etc. This structure is very efficient for certain types of analysis.
The reason you don't understand this is that there is lots of data for which there is not a reasonable natural key, so your example is rigged to fail. To build a more reasonable example, say you were building a system to handle stock trades. A composite key of ticker symbol and exchange would be an appropriate natural key. (However it would not actually be appropriate for a portfolio management app, because tickers symbols can and do change over time.)
Admin
Sure, go ahead and add indexes and constraints willy-nilly. Who cares about insert/update performance.
Admin
I know someone like that? Is this Sybase code for a major investment bank in Manhattan?
Admin
Depends on your platform. Some suffer from this deficiency and others don't.
Admin
Why would stock ticker and exchange be a reasonal natural key?
And that still doesn't answer my question: Table1 has a PK of Stock Ticker and Exchange Table2 indexes back into Table 1...
WHAT value gets placed into this field in Table2? Is it "GOOGNYSE"?
I truley don't know, and am trying to understand what is going on.
Admin
after reading this post, and examining the website it links too, i am now converting all client code to this new language!
AWESOME!
Admin
Because when you have millions of records -- and growing by thousands a day -- you can treat a couple thousand as a special case.
But, you are looking at it from the wrong point of view. If you have a letter for "1 Forrest St" in a town with no "Forrest St" but does have a "Forrest Ave", do you deliver it there or return it?
Admin
Of course you need the same 2 fields in Table2, not a concatenated string. So Table2 probably has a compound key of at least 3 columns.
Admin
That's NOTHING!
Here in .au the austpost are very picky about their addresses
The address:
ClientName
Attn: Contact
123 Some St
Town State PostCode
Will be sent to some magically address linked to ClientName -- sometimes...
Attn: Contact
123 Some St
Town State PostCode
Will be returned saying incomplete address -- sometimes...
ClientName
Attn: Contact
123 Wrong St
Town State PostCode
Will be returned saying incomplete address -- sometimes...
I'm still yet to figure out wtf they are thinking?
Go go Australia Post!
Admin
The MOST interesting part of the conversation was when the OWNER claimed that "MOST" of the marketing post was not delivered. Never mind the stupidity of the developer... what are the odds that this affected "MOST" of the clients? It was either a very small database OR the clients swap addresses often.
Admin
Why do you people give someone your phone number just because they ask you for it? Just say no.
Admin
This post should have also displayed the attributes of the record in question. That way we could really see the extent of his poor judgment. It would be HORRIBLE if valid candidate keys existed, but the developer overlooked them.
This guy definately needs to read a book on database design. Mistakes like these just shouldn't happen unless you are a total beginner and have no idea what properties must hold in a primary or candidate key. Though, the company is at fault here too. They had nobody that could validate the work that was being done for them.
I agree that anyone working in the software field should be licensed to do their work. Other engineers must be licensed. Why should software engineering be any different?
Admin
What's wrong with you? Please don't tell me you are too fucking stupid to use a simple forum, yet still feel competent enough to comment on other's work.
Admin
I don't believe this is true. Where I live, there is a point where Chennault Beach Dr. and Chennault Beach Rd. meet. I'm certain that the Post Office would be rather upset if a letter was addressed incorrectly between those two.
~TuxGirl
Admin
Manni: An Area Code + Exchange + last four digits is a hell of a lot more unique than "1 Main Street" would ever be.
The heuristics to identify a possible duplicate aren't all that difficult. But you have to put some human imput into the process. Things that work for a primarliy Anglo Saxon database won't work for slavic names, or asian names. One has to look at results of match pass, and be able to adjust parameters on the fly.
I've been doing this stuff for 20-odd years. Hardly a day goes by without an "oh, shit"
Admin
The software analyzes the accompanying image, and treats the response accordingly.
Admin
"If you don't give me your telephone number, we won't be able to verify your address, and your order may not arrive at its intended destination"
Now Listen Carefully.
Unless you're standing in front of the clerk with a wad of real money in you hand, you're going to have a hard time getting your "stuff" without divulging your telephone number.
It is, after all, the least ambiguous piece of information that your going to give to the person (or software), that (who) is going to be responsible for ensuring that the deal is done. . . i.e. goods are received, AND bankers are satisfied.