- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Ah, yes. . . Canada Post Address Accuracy Program.
Gene. . . If you flap your wings really hard, you can join us in the Aerie
The Canadian postal code definition was a wonderful thing. I say "was", because some of the codes have become so overloaded. . . and in British Columbia, because of population expansion, they've started running out of Postal codes, to the extent (no kidding), that they may have to invent new alphabetic characters for the ANA NAN format.
'
Admin
Ok, so you have various IBM departments as customers. What's your phone number, sir?
1-800-CALL-IBM...
3,700 customers later...
I'm a fan of the "unnatural key" method, but at times, I've had to use natural primary keys. Typically, I'll use a multiple primary key. One that might sorta work in this case:
create table customers (
name varchar,
address varchar,
phone1 varchar,
phone2 varchar,
primary key (name, address)
);
In this case, it's the combination of name and address that uniquely identify the customer. However, this breaks quickly IF THE CUSTOMER MOVES. Thus, having an un-natural customer ID is best, even if you use phone number, name, or address to find the customer.
Admin
Gnaa, that reminds me of my first ever IT project: the change of Germany's zip code from four to five digits in 1993.
In that project we discovered DDFH (database design from hell): Two database tables were connected to each other by ... the address! WTF? But it became more worse: these two tables were updated by two different applications so the data diverged and two formerly connected records were not connected anymore. The whole project team desired to meet the developer of that crap in a quiet corner of a public park at 2:00am.
And BTW: Can you imagine how inventive users can be by creating address abbreviations?
Torsten
Admin
>I wish I knew the name of that developer because I think he found his way to the project I'm currently working on. He's our 'project manager'
What is a project manager doing specifying table design? He should be managing the project.
Admin
If I move house my DB primary key should not change. Other records may have linked to me by storing the value of that primary key.
Admin
Theoretically, you could run cascading updates to change those other records, too... But IMO you are right, using the address as primay key is kinda weird.
Admin
I have to agree, not to spark up the old debate, but artificial ids are so often the best way forward.
I actually have to say i prefer to use NEWID() in sql server, because programmers so often like to infer some sort of meaning in the order of records generated by identities, which is just a huge no no.
Admin
Again not truly terrible.
As we all know dates are just a humoungous pain whatever the weather, especially in international corporations. Never mind that things like the google api consider all dates to be Julian (something of a wtf in itself, though they're bright lads down google way so they must have a good reason).
The SQL server ascii standard for dates is a char 8 yyyymmdd so storing a date in that form has in my humble experience led to less problems that datetimes, front ends can then do a simple formatting, or allow regional settings to show it as appropriate.
re the use of integer, again not shocking if that integer is the number of seconds since la la la 1798 (again something of SQL server design wtf, just how far back do customer records go? what are we bankers to the pope or something?)
Admin
<FONT face=Arial>Not if 2 customers share a house...</FONT>
Admin
That was in response to someone saying that the phone number would always be unique
Admin
Or most of the customers were from just a few different companies. I can quite easily imagine the scenario where a company has a purchasing department of 5 people, and each person is registered in this database. In that case, only 1 out of the 5 people would have mail addressed to them correctly.
Admin
Excuse me? Storing dates as integer numbers (preferably 64bit though) in seconds since 1 jan 1970 is imho one of the easiest ways to work with date/times. This is the way it works on any *nix system in the world.
Most databases support a function to convert from/to unixtime from whatever date-format they export, but internally they usually use 32 or 64bit ints, which are easy to index and very fast to work with. On most platforms functions to create/convert dates in unix date-format are present, and if it isn't there by default, there are plenty of free implementations of it you can find easily on the net.
The real power of using dates as integer numbers is calculating and comparing, if you know basic math operators like +, -, modulo, and the facts that one minute is 60sec, one hour is 3600secs, and one day 86400secs. Not that hard I think? Comparing numbers is by far a lot easier for the developer than working with some complicated date/time library.
Admin
The real question is "reasonable natural key for WHAT?". And I can't think of a good answer even though I work on such a system. The closest I can think of is price data, but then you'd still have to add a timestamp. In that case it's actually a good example where an artificial key would be stupid.
As someone else has said before: you'd have two (three, in my example) separate columns in Table2.
Admin
Storing dates as Integers is a pain in the arse for getting the information back that you need. as you have to know what the base date is for the DB that your using, + the base date for the client application, and Add / Subtract the difference from the value before.
The system that I work with has dates stored as an integer in 1900 base format, windows generally uses 1904 standard, so I have to convert to and from oracle dates with the ugly ToNum(ToChar(ToDate( syntax, it's a pain in the arse, and terribly slow when doing bulk operations.
DJ
Admin
As far as I know, both MSSql Server and Oracle store dates internally as a floating point number of days (and portions of day). MS use 1/1/1900 as it's zero point.
<FONT color=#0000ff size=2>select</FONT><FONT size=2> </FONT><FONT color=#ff00ff size=2>cast</FONT><FONT color=#808080 size=2>(</FONT><FONT size=2>0.5 </FONT><FONT color=#0000ff size=2>as</FONT><FONT size=2> </FONT><FONT color=#0000ff size=2>datetime</FONT><FONT color=#808080 size=2>),</FONT><FONT size=2> </FONT><FONT color=#ff00ff size=2>cast</FONT><FONT color=#808080 size=2>(</FONT><FONT size=2>2958463 </FONT><FONT color=#0000ff size=2>as</FONT><FONT size=2> </FONT><FONT color=#0000ff size=2>datetime</FONT><FONT color=#808080 size=2>),</FONT><FONT color=#ff00ff size=2>cast</FONT><FONT color=#808080 size=2>(-</FONT><FONT size=2>53690 </FONT><FONT color=#0000ff size=2>as</FONT><FONT size=2> </FONT><FONT color=#0000ff size=2>datetime</FONT><FONT color=#808080 size=2>)</FONT>
<FONT color=#808080 size=2><FONT size=1>1900-01-01 12:00:00.000 9999-12-31 00:00:00.000 1753-01-01 00:00:00.000
</FONT></FONT>Admin
You were right-ish up to the point you proposed calculation and comparison of converted numbers. They are an encoding result - you don't mess with encoded data. You are ALWAYS better off using the built-in date/time libraries of the language. Attempting to reinvent a date/time library armed only with the knowledge that 1min=60sec, one is sure to end up with a WTF the examples of which we have all seen here. Please imagine adding three months (not 90 days, mind you) to any given date encoded as a long integer.
Admin
And, what if the client moves (changes his address) ?
Using anything that can change (address, phone number, waist size) as the primary key seems to be simply useless.
Admin
Actually, this was a huge problem for a telephone company about 10 to 20 years ago.
Don't remember which company and all the details of why this created such a big problem for them.
Sometimes, people that live in one area code end up with a different one a few years later because the area becomes more subdivided. My aunt lives in New Jersey and her area code used to be 908 but now it is 732.
It is hardly ever a good idea to use a column that has data that you may need update as your primary key.
A previous poster suggested using a client ID and that is what I would've done as well.
Admin
Yeah, to me, that is the kicker. You would think that after the first few phone calls, the developer would start thinking or maybe talk to someone else and try to find a solution to this problem. Instead, he just goes and changes the address to an invalid address and he keeps on doing this again and again! Beautiful!
Admin
>>
If you're a telphone services compnay, this really isn't an issue, since if they change phone numbers, they change accounts.
<<
Not true. I work at a telephone company, so I know.
If your phone number changes, depending on the circumstances, it's likely still to be the same account. We need to retain the billing information, listing information, and so forth across the change. I'm glossing over some some complications but in general your main telephone number (you can have multiple numbers) is used as part of the account ID in the billing system. It works. I am not at liberty to say more than that, but it works.
Admin
What about my hometown, in which addresses on roads running NE-SW are something like "123 NE nth ave. st." and roads running NW-SE are "123 NW nth st.". And I've also seen people put the "NE" at the end of the address (i.e. "123 nth ave. st. NE"). And they get delivered just fine. The point is that the address can be pretty mutilated.
Of course, this is why the postal service invented zip codes, which cannot be rearranged. Get the mail to the right post office, and some human will figure out that a letter addressed to "123 fake dr" in a town with no Fake Dr. but a Fake Rd. should really go to "123 Fake Rd.", while also deciphering the difference between "avenue street" and "street".
When you think about it, all the post office needs is a street address and a zip code. They have you put the name and city and state there just in case you put the wrong zip code, so that when it shows up to the wrong town someone will say "that's wrong", and they'll mark out the zip code with a sharpie and very angrily write the correct one and send it on, and you get your mail a week late.
Sidenote- I noticed in Jacksonville, FL that there is a "Blvd. St." and I thought that was a little silly.
Admin
Also, someone can have more than one phone number on the same account. And they might decide later that they want to drop any one of those phone lines (they'll probably pick the one for which they get the most wrong number callers).
Admin
I got so frustated with the braindead functions in the C time library, that I rolled my own. Basically, you just go back and forth between julian day numbers and the calendar of your choice. E.g:
<FONT face="Times New Roman">This is quite useful, too:</FONT>
Admin
<font size="1" style="font-family: verdana;">
<font size="2">WTF is an "avenue street"? WhoTF came up with that mess?
</font></font>
Admin
Last I read, over 160 streets in Atlanta have Peachtree in there name.
Admin
The post office needs just enough identification to get the mail to you.
My uncle once got a letter that was address to
Dan Miller
City, state
No street address or zip. Enough that the letter got to the post office in that city, and from there there was only one person in town with that name.
The reason for street addresses and zip codes is that it is a backup check. The above letter was not meant for my uncle, it was to someone who happened to have the same name as my uncle. The post office was reduced to delivering that letter to all the Dan Millers in town until one claimed it. (And of course each on the way would open the letter to see if it was theirs) If this was a big town (or city) they could not have done that, but there were only 2 Dan Millers in town so they had a reasonable chance of getting it there. (I'm not sure if this is a good plan, but it is what they did - IIRC there was no return address)
A address intentionally has much redundant information so that when (not if!) people make a mistake the post office can still get the letter to the right person. A great protocol that works well despite the poor transmission medium it works with.
Admin
I'm no DBA but I can't think of any situations where an artificial key would be really bad. I can think of situations where a natural key would be bad, and situations where either one would work, though.
I suppose part of this could be to do with what you consider a "key". For me, a key could be a reference number (such as an order reference number in an order tracking system). You might consider this to be a natural key, I suppose, but to me it's not directly connected to the actual data, is often generated along the same lines as an artificial key, and its sole reason for existence is to be unique. To me, this makes it an artificial key.
Admin
You think wrong. Many places use daylight savings time, which means that each year, they have one day that is 23 hours long and one that is 25 hours long.
Admin
In Toronto, ON, there is a major road called "Avenue Road". I always thought that was a bit silly too.
Admin
You forgot the humor emoticon.
I prefer correct data and maintainability myself. If you need many willy-nilly units of indexes or constraints, it's time to reexamine your data model.
Rick DeBay
Admin
>And that still doesn't answer my question:
>Table1 has a PK of Stock Ticker and Exchange
>Table2 indexes back into Table 1...
>WHAT value gets placed into this field in Table2?
>Is it "GOOGNYSE"?
You don't make a composite key by appending two pieces of data and shoving it in one field. The PK consists of two fields. The FK pointing from table2 to table1 would also be two fields.
Admin
In Leeds there's a road called "Street Lane"
And that's not even in America ;)
Admin
Is that French for "Of eBay" ?
Admin
Encoded data? Time is time, one of the measurements of time is seconds, what part of that is "encoded"? Internally - throughout all our applications, unix timestamps are used, only formatting and input conversions are done. For this, and your point of adding i.e. 3 month we indeed wrote some basic 100% platform-portable libraries, which actually isn't that hard - or much work... Reinventing the wheel? Maybe, but try to do this in a quick and easy portable way across database systems in any other way, I dare you, and then we're not even talking about timezone stuff - that really messes things up. If SQL had a decent standardized way of doing date-handling, there would be no doubt that I would use it's functionality, but in it's current state? No tx... I like it clean and separated, a database's job is to manage relational data, not to write half my application in... Posix unix time in GMT it is plz.
Admin
You are wrong. You have listed three distinct addresses: 1 FORREST ST, 1 FORREST CT, and 1 FORREST AVE. The USPS has some very convoluted logic for address matching. Look at the MASS/CASS technical manual on http://ribbs.usps.gov/files/cass. When doing address standardization for delivery, you are allowed to change suffix (eg ST, BLVD, CT, DR), and change or remove predirectionals and postdirectionals to make a match. Directionals may only be changed within 45 degrees (eg N may be changed to NE or NW) except, I believe, for this year they are addding a new rule that you can change it completely if you're using delivery point validation. If your mailpiece has 123 Main ST (my canonical test address) on it, and there are both a 123 E MAIN ST and a 123 MAIN CT, as I recall it will be delievered to 123 MAIN CT because changing the suffix is a considered smaller weight change than adding a directional.
You really need to learn about USPS address standardization rules if you want to make these assertions. Further, the USPS does not have any control what-so-ever over street and road naming. That is entirely controlled by local government (city/county/state). If a city council decides to name a street "East Westnorth Avenue Street Court" and have North, South, East, and West segments of it as well as a "East Westnorth Avenue Street Court Drive", the USPS has to suck it up and make it work. And municipalities do this. I have seen plenty of "East Park ST" (a directional and two suffixes) and "S 87TH STREET CT E";
I truly thank God that I no longer work in that industry.
Admin
Fuck this, this forum soft has to be a wtf on it's own... You can't edit, you can't quote, you can't preview a message before posting, you can't use it in anything else than IE? What a piece of junk... A little kid with some basic PHP, mysql and html knowledge could do better...
I simply wanted to fix the quotations in my previous post...
Admin
Let me guess, you live in Miami? I used to live on SW 88th Pl, which is right next to SW 88th Ct, which is right next to SW 88th St.
Then of course, you have streets with multiple names, like SW. 184 St. is also known as Eureka Drive, and also Burger King Drive. Let's see you key on that!
Admin
thank you. this is an excellent idea and i feel i need to definitely study more about databases. is there a good book? mysql from o'reilly a good idea? or it teachers more syntax than good database practices? college text books seem to go way into relational algebra and more or less a waste of time for me.,
Admin
First, a primary key can be composed of a single unique field, or a combination of fields that together are unique. So for a stock trading application the fields "TickerSymbol" and "Exchange" together would make a very appropriate natural primary key. They are natural attributes that uniquely identify a security that could be traded.
If table 2 were to reference back to table 1, it would need to have the entire foreign key - both the columns "TickerSymbol" and "Exchange". Furthermore, if table 2 had a master/detail relationship with table 1, the primary key of table 2 would consist of at least 3 columns, the 2 column foreign key that references back to table 1 and one or more columns that distinguish rows in table 2 that share a common "TickerSymbol" and "Exchange".
Let's flesh out this stock trading system a little more...
First, we have to build a list of securities that can be traded. In the industry these are typically called products, so we'll create a Product table. We need some sort of primary key for this table. Fortunately for us there are services that provide us data on the available products. One of the services is provided by Reuters. They've already assigned a unique identifier, called a Reuters Instrument Code, which we can use as a natural key. Here is our Product table: (note, I didn't research how large RIC needs to be, I just guessed at the size.)
Product
RIC CHAR(7) PK
ProductDesc VARCHAR(50)
It contains data like
"BMWG.DE", "Bayerische Motoren-Werke AG (BMWG) common stock"
Now, this stock may be tradeable on several exchanges. Now we need an Exchange table with the available exchanges:
Exchange
ExchangeID CHAR(6) PK
ExchangeDesc VARCHAR(50)
It contains data like
"NYSE", "New York Stock Exchange"
"NASDAQ", "NASDAQ"
"TOR", "Toronto Stock Exchange"
"HOUS", "Houstan Stock Exchange"
Note that we could have a generated key here, but this table needs only occassional maintenence, and there is a benefit to making the data readable in other tables.
Now we need to create a symbol table. There can be at most one symbol for each combination of RIC and exchange, and symbols must be unique within their exchange. We create this table:
Symbol
RIC FK Product PK
ExchangeCd FK Exchange PK
Symbol CHAR(8)
Unique constraint on ExchangeCd, Symbol.
Data in this table looks like this:
"BMWG.DE", "NYSE", "BMW"
"BMWG.DE", "TOR", "BMW"
"BMWG.DE", "HOUS", "Beamer"
Now, we need a table to track an order. We create an Order table that looks like this... (for simplicity we ignore that Order is a reserved word for the moment.)
Order
OrderID ID PK
RIC FK Symbol
ExchangeCd FK Symbol
Quantity INTEGER
Price DECIMAL
BuySellIndic CHAR(1)
BrokerID -- FK to another table we haven't defined yet
Entered TIMESTAMP
Executed TIMESTAMP
Now here is where natural keys show their value:
First, there is some set of processes (could be people or machine) monitoring this table looking for orders to execute. Something is looking for orders to send the NYSE while something else is looking for orders to send to HOUS, etc. Since those elements have been forwarded through this entire structure, we don't need to perform a join in order to extract that data, but can get the data directly. This makes the structure much more efficient transactionally.
Second, there is some group of auditors monitoring the orders in real time, looking for patterns that suggest insider trading and the like. They can see, for instance, all the activity for a particular product or a particular exchange without performing any joins. Again, this is a very efficient structure.
I hope this helps it all make sense.
Admin
lol, hopefully when writing queries, you aren't having to convert an indexed column. If u r, kiss ur index goodbye... hell, in a DB like this, would it matter? lol
Admin
I live (in Lousiana) on East Boulevard Street (as in, I live on the eastern part of Boulevard St.). A couple of blocks over, there is an East Boulevard (as in, the boulevard is named "East"). There is also a Boulevard Ave in town, but I don't associate with those people.
Admin
For performance critical systems, a good natural key can be significantly better. The propogation of the key can mean that joins can be eliminated. The rule of thumb should always be, however, "When in doubt use an artificial key."
Admin
I know I used the word "encoded" very lightly, but a timestamp is encoded: unless you know the starting point, the number does not mean anything "as a date". Also, time is certainly time but date is not exactly time, meaning, you can not simply represent three months in seconds. How many seconds there are in a given three month period depends on a lot of factors. What day of what month the period begins? Is it a leap year? Is daylight savings in effect? You cannot just add 90*24*60*60*60 to DateTime.Now in the application layer - that was what I meant. From the last paragraph of your original post, it seemed as if you preferred manipulating the actual dates represented as integers, in the business logic. Adding seconds to dates is not the responsibility of the application layer, it is the responsibility of date time libraries, and now that you made it clear that you do use hand-rolled libraries for this purpose, we are agreed on this point. And congrats too, for implementing a portable date-time library, because I still do think that handling dates is non-trivial and one could screw up badly, if not careful.
I also see that my capitalized "always" would look better if replaced with "almost always". I realize that possibly not all languages have decent date/time libraries, and when you have to work with those, of course you have to come up with something else. However, I did not suggest using database's own date handling - I know that that is not portable or consistent across systems. I was imagining a system where the db simply stored numbers without any clue as to their meaning and the actual application concerned itself with the conversion - not during the formatting and input, but storage and retrieval.
Admin
I don't know if this counts as "new" but around 1986 my brother moved into a brand-new condo development in Farmington Hills MI that comprised the streets Country Way, Country Circle, and Country Bluff. Cute, but WTF were they thinking?
--RA
Admin
Depends on the DBMS. The natural key was very efficient on IBM DB2 (MVS edition) and that was without using Computer Associates' tuning utilities, but a poor performer on MSSQL using the built-in query optimizer. We did use an IDENTITY column on the table, but this wasn't the primary key for the table and that column would never be used other than to join; the composite key was used. Because of the way the primary key was identified on the table, inserts on DB2 were wicked fast on a volume that spanned 6 DASDs as were retreivals.
Like I said in the example, the primary query client was a PBX machine, which only queried the thing one way, so I made the table as optimized as possible for that purpose alone. I'm sure QVC, AMEX and a bunch of other companies that receive high incoming call volumes (30+ calls per second, any query > 300msec to run is unacceptable) and key off callerID and+or customer-input information do the same thing.
Of course, this was replacing a crappily-indexed MSSQL database running on medium-range Intel hardware, and the 1-2sec response time from MSSQL on a 115million row table with 13 columns seemed like dire straits to the client. </shrug>
Admin
Thanks RevMike - thought it might be something like that.
Just remembered (on the duplicate values debate) phone numbers in the UK go through periodic changes. Leeds was entirely (0113) 2xx xxxx a few years back. Now the first digit after the area code can be 2 or 3. It's because of population rises, increase in business numbers and so on, meaning higher demand for phone numbers.
Admin
You should never use anything but a unique, auto-incremented, id. I put one in just about every table, as a matter of course, even if I don't think I'll need a key any time soon. It may take up a little space, but it adds so much value I don't think the space concern has weight.
Admin
I actually used a database (a very old one) where the phone number was one of the keys, and even there, it wasn't the primary key...there was a generated customer number for that. I'll tell you, if you do a lot of business in one town for 200 years or so, you're going to have duplicate phone numbers ALL THE TIME, as people who move in pick up phone numbers that were assinged to people who have moved out or died, or whatever.
Admin
That advice does not always scale to large or high performance systems. First of all, autoincrmenting IDs can cause otherwise parallelizable transactions to serialize. Second, appropriate use of natural keys and composite keys can limit the number of joins required to satisfy certain queries.
Admin
Well, the USPS does make a distinction between St and Ct. and Ave. But if you're talking about St, Str, Street, etc, sure they're the same...if you're not counting zip code. But within a zip code, they're unique. That's the whole point of a zip code.