The Daily WTF: Curious Perversions in Information Technology

2005-12-22 Reply Admin

Gene Wirchenko:
JamesCurran:
This is rather appropriate for me, because at one point in my life, I lived at "18 John St" in one city & state, and worked at "18 John St" in a different city & state.

Lovely, lovely! Did the cities have different names?

In one newsgroup, there was a thread on addresses, and one poster wrote of classifying addresses by even or odd. (This is especially applicable in Canada since the Postal Code will typically be different for each side of the street.) I, myself, live at 1313 1/2 Railroad Avenue. Yes, that is a one-half. It does cause some doubletakes.

In Vancouver, BC, Canada, there is an address "South Foot of Main Street". It was in the Postal Code guide the last time I checked for it. (No, you can not simply fill in 0. That would be at the other end of Main Street.)

Sincerely,

Gene Wirchenko

Ah, yes. . . Canada Post Address Accuracy Program.

Gene. . . If you flap your wings really hard, you can join us in the Aerie

The Canadian postal code definition was a wonderful thing. I say "was", because some of the codes have become so overloaded. . . and in British Columbia, because of population expansion, they've started running out of Postal codes, to the extent (no kidding), that they may have to invent new alphabetic characters for the ANA NAN format.

'

2005-12-22 Reply Admin

Ok, so you have various IBM departments as customers. What's your phone number, sir?

1-800-CALL-IBM...

3,700 customers later...

I'm a fan of the "unnatural key" method, but at times, I've had to use natural primary keys. Typically, I'll use a multiple primary key. One that might sorta work in this case:

create table customers (
name varchar,
address varchar,
phone1 varchar,
phone2 varchar,
primary key (name, address)
);

In this case, it's the combination of name and address that uniquely identify the customer. However, this breaks quickly IF THE CUSTOMER MOVES. Thus, having an un-natural customer ID is best, even if you use phone number, name, or address to find the customer.

TOPCTEH · 2005-12-22 Reply Admin

JamesCurran:

However, one thing which I haven't seen anyone comment on, is the fact that

1 Forrest St.
1 Forrest Str.
1 Forrest Street
One Forrest St.
One Forrest Str.
One Forrest Street
1 Forrest Ct
1 Forrest Court.
1 Forrest Ave.

And a whole bunch of other variation are (as far as the USPS is concerned) all the same address.

Gnaa, that reminds me of my first ever IT project: the change of Germany's zip code from four to five digits in 1993.

In that project we discovered DDFH (database design from hell): Two database tables were connected to each other by ... the address! WTF? But it became more worse: these two tables were updated by two different applications so the data diverged and two formerly connected records were not connected anymore. The whole project team desired to meet the developer of that crap in a quiet corner of a public park at 2:00am.

And BTW: Can you imagine how inventive users can be by creating address abbreviations?

Torsten

2005-12-22 Reply Admin

>I wish I knew the name of that developer because I think he found his way to the project I'm currently working on. He's our 'project manager'

What is a project manager doing specifying table design? He should be managing the project.

2005-12-22 Reply Admin

If I move house my DB primary key should not change. Other records may have linked to me by storing the value of that primary key.

ammoQ · 2005-12-22 Reply Admin

Anonymous:
If I move house my DB primary key should not change. Other records may have linked to me by storing the value of that primary key.

Theoretically, you could run cascading updates to change those other records, too... But IMO you are right, using the address as primay key is kinda weird.

2005-12-22 Reply Admin

pjabbott:
Anonymous:
I'm still just a hobbiest (so please forgive stupidity)

but wouldn't a phone number be a (always) unique natural key?

-- snip --

Generally speaking, it is very, very hard to come up with a natural PK when dealing with people. Name obviously wouldn't work. SSN might work, but then you might deal with people who are here on visas, international orders, or people who flat out don't want to give you that kind of info. About the only marginally successful thing I have seen is e-mail address (places like amazon.com use email address as your login), but then you have cascade update issues if they wish to change it. Best to just stick to an identify field.

I have to agree, not to spark up the old debate, but artificial ids are so often the best way forward.

I actually have to say i prefer to use NEWID() in sql server, because programmers so often like to infer some sort of meaning in the order of records generated by identities, which is just a huge no no.

2005-12-22 Reply Admin

Anonymous:
I did forget to mention that he's stored dates as strings in some tables of the same db. No kidding. Needless to say, our developers have to code in conversions for all these disparate types. They're going mental.

Again not truly terrible.
As we all know dates are just a humoungous pain whatever the weather, especially in international corporations. Never mind that things like the google api consider all dates to be Julian (something of a wtf in itself, though they're bright lads down google way so they must have a good reason).

The SQL server ascii standard for dates is a char 8 yyyymmdd so storing a date in that form has in my humble experience led to less problems that datetimes, front ends can then do a simple formatting, or allow regional settings to show it as appropriate.

re the use of integer, again not shocking if that integer is the number of seconds since la la la 1798 (again something of SQL server design wtf, just how far back do customer records go? what are we bankers to the pope or something?)

2005-12-22 Reply Admin

Not if 2 customers share a house...

2005-12-22 Reply Admin

Not if 2 customers share a house...

That was in response to someone saying that the phone number would always be unique

2005-12-22 Reply Admin

olddog:
The MOST interesting part of the conversation was when the OWNER claimed that "MOST" of the marketing post was not delivered. Never mind the stupidity of the developer... what are the odds that this affected "MOST" of the clients? It was either a very small database OR the clients swap addresses often.

Or most of the customers were from just a few different companies. I can quite easily imagine the scenario where a company has a purchasing department of 5 people, and each person is registered in this database. In that case, only 1 out of the 5 people would have mail addressed to them correctly.

KoFFiE · 2005-12-22 Reply Admin

Anonymous:
... One of his MANY claims to shame is that he stores dates as intergers (int). Just try and get records from one date to another with that kind of design. ...

Excuse me? Storing dates as integer numbers (preferably 64bit though) in seconds since 1 jan 1970 is imho one of the easiest ways to work with date/times. This is the way it works on any *nix system in the world.

Most databases support a function to convert from/to unixtime from whatever date-format they export, but internally they usually use 32 or 64bit ints, which are easy to index and very fast to work with. On most platforms functions to create/convert dates in unix date-format are present, and if it isn't there by default, there are plenty of free implementations of it you can find easily on the net.

The real power of using dates as integer numbers is calculating and comparing, if you know basic math operators like +, -, modulo, and the facts that one minute is 60sec, one hour is 3600secs, and one day 86400secs. Not that hard I think? Comparing numbers is by far a lot easier for the developer than working with some complicated date/time library.

brazzy · 2005-12-22 Reply Admin

Anonymous:
>>> To build a more reasonable example, say you were building a system to handle stock trades. A composite key of ticker symbol and exchange would be an appropriate natural key. (However it would not actually be appropriate for a portfolio management app, because tickers symbols can and do change over time.) Why would stock ticker and exchange be a reasonal natural key?

The real question is "reasonable natural key for WHAT?". And I can't think of a good answer even though I work on such a system. The closest I can think of is price data, but then you'd still have to add a timestamp. In that case it's actually a good example where an artificial key would be stupid.

Anonymous:
And that still doesn't answer my question: Table1 has a PK of Stock Ticker and Exchange Table2 indexes back into Table 1...
WHAT value gets placed into this field in Table2? Is it "GOOGNYSE"?

I truley don't know, and am trying to understand what is going on.

As someone else has said before: you'd have two (three, in my example) separate columns in Table2.

2005-12-22 Reply Admin

KoFFiE:
Anonymous:
... One of his MANY claims to shame is that he stores dates as intergers (int). Just try and get records from one date to another with that kind of design. ...

Excuse me? Storing dates as integer numbers (preferably 64bit though) in seconds since 1 jan 1970 is imho one of the easiest ways to work with date/times. This is the way it works on any *nix system in the world.

Most databases support a function to convert from/to unixtime from whatever date-format they export, but internally they usually use 32 or 64bit ints, which are easy to index and very fast to work with. On most platforms functions to create/convert dates in unix date-format are present, and if it isn't there by default, there are plenty of free implementations of it you can find easily on the net.

The real power of using dates as integer numbers is calculating and comparing, if you know basic math operators like +, -, modulo, and the facts that one minute is 60sec, one hour is 3600secs, and one day 86400secs. Not that hard I think? Comparing numbers is by far a lot easier for the developer than working with some complicated date/time library.

Storing dates as Integers is a pain in the arse for getting the information back that you need. as you have to know what the base date is for the DB that your using, + the base date for the client application, and Add / Subtract the difference from the value before.

The system that I work with has dates stored as an integer in 1900 base format, windows generally uses 1904 standard, so I have to convert to and from oracle dates with the ugly ToNum(ToChar(ToDate( syntax, it's a pain in the arse, and terribly slow when doing bulk operations.

DJ

JamesCurran · 2005-12-22 Reply Admin

Anonymous:

re the use of integer, again not shocking if that integer is the number of seconds since la la la 1798 (again something of SQL server design wtf, just how far back do customer records go? what are we bankers to the pope or something?)

As far as I know, both MSSql Server and Oracle store dates internally as a floating point number of days (and portions of day). MS use 1/1/1900 as it's zero point.

select cast(0.5 as datetime), cast(2958463 as datetime),cast(-53690 as datetime)

1900-01-01 12:00:00.000 9999-12-31 00:00:00.000 1753-01-01 00:00:00.000

ishmaeel · 2005-12-22 Reply Admin

KoFFiE:

...The real power of using dates as integer numbers is calculating and comparing, if you know basic math operators like +, -, modulo, and the facts that one minute is 60sec, one hour is 3600secs, and one day 86400secs. Not that hard I think? Comparing numbers is by far a lot easier for the developer than working with some complicated date/time library.

You were right-ish up to the point you proposed calculation and comparison of converted numbers. They are an encoding result - you don't mess with encoded data. You are ALWAYS better off using the built-in date/time libraries of the language. Attempting to reinvent a date/time library armed only with the knowledge that 1min=60sec, one is sure to end up with a WTF the examples of which we have all seen here. Please imagine adding three months (not 90 days, mind you) to any given date encoded as a long integer.

ParkinT · 2005-12-22 Reply Admin

And, what if the client moves (changes his address) ?

Using anything that can change (address, phone number, waist size) as the primary key seems to be simply useless.

ferrengi · 2005-12-22 Reply Admin

Alex Papadimoulis:
Anonymous:

I'm still just a hobbiest (so please forgive stupidity)

but wouldn't a phone number be a (always) unique natural key?

That would really depend on the business requirements. Consider that, instead of creating a Customer Number for new customers, you use their Phone Number. What happens when they change phone numbers? You will have to create a new account for them, or they will use their old account with their old phone number.

If you're a telphone services compnay, this really isn't an issue, since if they change phone numbers, they change accounts.

Actually, this was a huge problem for a telephone company about 10 to 20 years ago.
Don't remember which company and all the details of why this created such a big problem for them.
Sometimes, people that live in one area code end up with a different one a few years later because the area becomes more subdivided. My aunt lives in New Jersey and her area code used to be 908 but now it is 732.

It is hardly ever a good idea to use a column that has data that you may need update as your primary key.
A previous poster suggested using a client ID and that is what I would've done as well.

ferrengi · 2005-12-22 Reply Admin

Anonymous:
>> Developer: That generates an error - the user then rings me up and I generate the customer record. Ahhh yes, I like to write code where the user has to call me up when he hits an error. Makes me feel important, and job security and all...

Yeah, to me, that is the kicker. You would think that after the first few phone calls, the developer would start thinking or maybe talk to someone else and try to find a solution to this problem. Instead, he just goes and changes the address to an invalid address and he keeps on doing this again and again! Beautiful!

2005-12-22 Reply Admin

>>

If you're a telphone services compnay, this really isn't an issue, since if they change phone numbers, they change accounts.

<<

Not true. I work at a telephone company, so I know.

If your phone number changes, depending on the circumstances, it's likely still to be the same account. We need to retain the billing information, listing information, and so forth across the change. I'm glossing over some some complications but in general your main telephone number (you can have multiple numbers) is used as part of the account ID in the billing system. It works. I am not at liberty to say more than that, but it works.

kipthegreat · 2005-12-22 Reply Admin

mlathe:
Anonymous:
JamesCurran:

dpm:
The first six are. Why you think "court" and "avenue" are evaluated to be "street" is beyond me, though.

Because the USPS officially ignores the suffix and considers all of them to be just "1 Forrest". New streets must be unique within the town irrespective of the suffix.

That's plainly not true. I assure you that 123 8th St and 123 8th Ave are two different locations in Manhatten

hence he said "new streets"

What about my hometown, in which addresses on roads running NE-SW are something like "123 NE nth ave. st." and roads running NW-SE are "123 NW nth st.". And I've also seen people put the "NE" at the end of the address (i.e. "123 nth ave. st. NE"). And they get delivered just fine. The point is that the address can be pretty mutilated.

Of course, this is why the postal service invented zip codes, which cannot be rearranged. Get the mail to the right post office, and some human will figure out that a letter addressed to "123 fake dr" in a town with no Fake Dr. but a Fake Rd. should really go to "123 Fake Rd.", while also deciphering the difference between "avenue street" and "street".

When you think about it, all the post office needs is a street address and a zip code. They have you put the name and city and state there just in case you put the wrong zip code, so that when it shows up to the wrong town someone will say "that's wrong", and they'll mark out the zip code with a sharpie and very angrily write the correct one and send it on, and you get your mail a week late.

Sidenote- I noticed in Jacksonville, FL that there is a "Blvd. St." and I thought that was a little silly.

kipthegreat · 2005-12-22 Reply Admin

Anonymous:
>>

If you're a telphone services compnay, this really isn't an issue, since if they change phone numbers, they change accounts.

<<

Not true. I work at a telephone company, so I know.

If your phone number changes, depending on the circumstances, it's likely still to be the same account. We need to retain the billing information, listing information, and so forth across the change. I'm glossing over some some complications but in general your main telephone number (you can have multiple numbers) is used as part of the account ID in the billing system. It works. I am not at liberty to say more than that, but it works.

Also, someone can have more than one phone number on the same account. And they might decide later that they want to drop any one of those phone lines (they'll probably pick the one for which they get the most wrong number callers).

2005-12-22 Reply Admin

ishmaeel:
You are ALWAYS better off using the built-in date/time libraries of the language. Attempting to reinvent a date/time library armed only with the knowledge that 1min=60sec, one is sure to end up with a WTF the examples of which we have all seen here. Please imagine adding three months (not 90 days, mind you) to any given date encoded as a long integer.

I got so frustated with the braindead functions in the C time library, that I rolled my own. Basically, you just go back and forth between julian day numbers and the calendar of your choice. E.g:

DATE date_subtract(DATE from,int n_days)
{
  int julian=get_julian_day_n(from.year,from.month,from.day);
  julian-=n_days;
  return julian_day_n_to_gregorian(julian);
}

This is quite useful, too:

int get_day_of_week(int year,int month,int day) // 0 = Sunday, 1 = Monday, ...
{
  int a=(14-month)/12;
  int y=year-a;
  int m=month+12*a-2;
  return (day+y+y/4-y/100+y/400+(31*m)/12) % 7;
}

cconroy · 2005-12-22 Reply Admin

kipthegreat:
What about my hometown, in which addresses on roads running NE-SW are something like "123 NE nth ave. st." and roads running NW-SE are "123 NW nth st.". And I've also seen people put the "NE" at the end of the address (i.e. "123 nth ave. st. NE"). And they get delivered just fine. The point is that the address can be pretty mutilated.

WTF is an "avenue street"? WhoTF came up with that mess?

2005-12-22 Reply Admin

dpm:
JamesCurran:
dpm:
Why you think "court" and "avenue" are evaluated to be "street" is beyond me, though.

Because the USPS officially ignores the suffix and considers all of them to be just "1 Forrest". New streets must be unique within the town irrespective of the suffix.

I cannot imagine why you say that. At what point in processing do they get "ignored"? I've worked on the code
and the database-generation scripts, so I know they don't --- what have you done?

And if that were true, how would mail get delivered? God knows there are thousands of examples against your claim:
anyone from Atlanta care to count how many "Peachtree" names there are in that area? Jeez. The first google entry
for (atlanta "how many peachtree") is http://www.newcolonist.com/peachtree.html which features a picture of the
signs at the intersection of Peachtree Road and Peachtree Avenue. http://www.newcolonist.com/peachtree.html

ok
dpm

Last I read, over 160 streets in Atlanta have Peachtree in there name.

hank miller · 2005-12-22 Reply Admin

kipthegreat:

When you think about it, all the post office needs is a street address and a zip code. They have you put the name and city and state there just in case you put the wrong zip code, so that when it shows up to the wrong town someone will say "that's wrong", and they'll mark out the zip code with a sharpie and very angrily write the correct one and send it on, and you get your mail a week late.

The post office needs just enough identification to get the mail to you.

My uncle once got a letter that was address to
Dan Miller
City, state

No street address or zip.   Enough that the letter got to the post office in that city, and from there there was only one person in town with that name.

The reason for street addresses and zip codes is that it is a backup check.    The above letter was not meant for my uncle, it was to someone who happened to have the same name as my uncle.   The post office was reduced to delivering that letter to all the Dan Millers in town until one claimed it.   (And of course each on the way would open the letter to see if it was theirs) If this was a big town (or city) they could not have done that, but there were only 2 Dan Millers in town so they had a reasonable chance of getting it there. (I'm not sure if this is a good plan, but it is what they did - IIRC there was no return address)

A address intentionally has much redundant information so that when (not if!) people make a mistake the post office can still get the letter to the right person. A great protocol that works well despite the poor transmission medium it works with.

johnl · 2005-12-22 Reply Admin

I'm no DBA but I can't think of any situations where an artificial key would be really bad. I can think of situations where a natural key would be bad, and situations where either one would work, though.

I suppose part of this could be to do with what you consider a "key". For me, a key could be a reference number (such as an order reference number in an order tracking system). You might consider this to be a natural key, I suppose, but to me it's not directly connected to the actual data, is often generated along the same lines as an artificial key, and its sole reason for existence is to be unique. To me, this makes it an artificial key.

brazzy · 2005-12-22 Reply Admin

KoFFiE:
The real power of using dates as integer numbers is calculating and comparing, if you know basic math operators like +, -, modulo, and the facts that one minute is 60sec, one hour is 3600secs, and one day 86400secs. Not that hard I think?

You think wrong. Many places use daylight savings time, which means that each year, they have one day that is 23 hours long and one that is 25 hours long.

stevekj · 2005-12-22 Reply Admin

kipthegreat:

Sidenote- I noticed in Jacksonville, FL that there is a "Blvd. St." and I thought that was a little silly.

In Toronto, ON, there is a major road called "Avenue Road". I always thought that was a bit silly too.

2005-12-22 Reply Admin

Sure, go ahead and add indexes and constraints willy-nilly. Who cares about insert/update performance.

You forgot the humor emoticon.
I prefer correct data and maintainability myself. If you need many willy-nilly units of indexes or constraints, it's time to reexamine your data model.

Rick DeBay

2005-12-22 Reply Admin

>And that still doesn't answer my question:
>Table1 has a PK of Stock Ticker and Exchange
>Table2 indexes back into Table 1...

>WHAT value gets placed into this field in Table2?
>Is it "GOOGNYSE"?

You don't make a composite key by appending two pieces of data and shoving it in one field. The PK consists of two fields. The FK pointing from table2 to table1 would also be two fields.

johnl · 2005-12-22 Reply Admin

In Leeds there's a road called "Street Lane"

And that's not even in America ;)

kipthegreat · 2005-12-22 Reply Admin

Anonymous:
> Sure, go ahead and add indexes and constraints willy-nilly. Who cares about insert/update performance.

You forgot the humor emoticon.
I prefer correct data and maintainability myself. If you need many willy-nilly units of indexes or constraints, it's time to reexamine your data model.

Rick DeBay

Is that French for "Of eBay" ?

KoFFiE · 2005-12-22 Reply Admin

Anonymous:
KoFFiE:

...
Storing dates as Integers is a pain in the arse for getting the information back that you need. as you have to know what the base date is for the DB that your using, + the base date for the client application, and Add / Subtract the difference from the value before.

The system that I work with has dates stored as an integer in 1900 base format, windows generally uses 1904 standard, so I have to convert to and from oracle dates with the ugly ToNum(ToChar(ToDate( syntax, it's a pain in the arse, and terribly slow when doing bulk operations.

DJ

According to the Posix standard, a unix date-time always stores the date in seconds since 1 jan 1970. This is a world-wide accepted standard on a lot of other platforms also, so I fail to see the problem there. All unix platforms have standard functions to convert from/to a readable structure and functions to format it. Every database uses it's own Date-datatype, which makes porting applications hell. I just always store dates in Posix time in a 64bit integer in databases (so you don't hit the year 2038 barrier and can go waaay back also), which makes the application more database-portable. No issues when changing from oracle to ms-sql, postgres, or even mysql or sqlite.

I really can't see the problem of being capable of using simple mathematical operators to work with dates, once you're used to it, you can't go back. I had to do a lot of date/time manipulations in applications I wrote (client timeslot management for modem cascades etc) which were rather complicated. I do the date processing 100% in the application, I never ever let the db-engine handle dates, to the db-engine, it's just a number, and indexing on numbers is simply the fastest you can get...

I just keep all the formatting and date-handling in the application, I absolutely don't like the database handling things like that, for the same reason I don't like triggers and stored procedures to handle half of the application's job - maintenance and portability tends to be a BIG mess. Sure, in some cases they are usefull, or even would even be stupid not to use them for performance and/or data-integrity issues, but it shouldn't be used as an excuse for "fast development" or simply to give an extra guy a job. I know for sure that when tomorrow the commercial guy here walks in and asks if my application runs on database X that I can answer at least "that can be up and running in 3 months" (including development, testing and on-site installation), knowing very well that it could run on that database engine without a problem in a week in "basic fallback mode" once the database-connection plugin is finished. Refinement and testing afterwards takes somewhat more time, but if it has to be up and running next month, it's perfectly possible. I may use some techniques that are "not done" for database people, but they work, are fast and safe. We do use transactions when possible, and a collegue is even writing a transaction-alike mechanism for our database (just for the fun of it, don't know if we'll ever put in production), to ensure data-integrity even when transactions aren't supported.

[quote user="ishmaeel...
You were right-ish up to the point you proposed calculation and comparison of converted numbers. They are an encoding result - you don't mess with encoded data. You are ALWAYS better off using the built-in date/time libraries of the language. Attempting to reinvent a date/time library armed only with the knowledge that 1min=60sec, one is sure to end up with a WTF the examples of which we have all seen here. Please imagine adding three months (not 90 days, mind you) to any given date encoded as a long integer.

Encoded data? Time is time, one of the measurements of time is seconds, what part of that is "encoded"? Internally - throughout all our applications, unix timestamps are used, only formatting and input conversions are done. For this, and your point of adding i.e. 3 month we indeed wrote some basic 100% platform-portable libraries, which actually isn't that hard - or much work... Reinventing the wheel? Maybe, but try to do this in a quick and easy portable way across database systems in any other way, I dare you, and then we're not even talking about timezone stuff - that really messes things up. If SQL had a decent standardized way of doing date-handling, there would be no doubt that I would use it's functionality, but in it's current state? No tx... I like it clean and separated, a database's job is to manage relational data, not to write half my application in... Posix unix time in GMT it is plz.

RoKe · 2005-12-22 Reply Admin

However, one thing which I haven't seen anyone comment on, is the fact that
* 1 Forrest St.
* 1 Forrest Str.
* 1 Forrest Street
* One Forrest St.
* One Forrest Str.
* One Forrest Street
* 1 Forrest Ct
* 1 Forrest Court.
* 1 Forrest Ave.
And a whole bunch of other variation are (as far as the USPS is concerned) all the same address. So, whether or not the system can find your record depends on how you choose to abbrev your address that day.

You are wrong. You have listed three distinct addresses: 1 FORREST ST, 1 FORREST CT, and 1 FORREST AVE. The USPS has some very convoluted logic for address matching. Look at the MASS/CASS technical manual on http://ribbs.usps.gov/files/cass. When doing address standardization for delivery, you are allowed to change suffix (eg ST, BLVD, CT, DR), and change or remove predirectionals and postdirectionals to make a match. Directionals may only be changed within 45 degrees (eg N may be changed to NE or NW) except, I believe, for this year they are addding a new rule that you can change it completely if you're using delivery point validation. If your mailpiece has 123 Main ST (my canonical test address) on it, and there are both a 123 E MAIN ST and a 123 MAIN CT, as I recall it will be delievered to 123 MAIN CT because changing the suffix is a considered smaller weight change than adding a directional.

You really need to learn about USPS address standardization rules if you want to make these assertions. Further, the USPS does not have any control what-so-ever over street and road naming. That is entirely controlled by local government (city/county/state). If a city council decides to name a street "East Westnorth Avenue Street Court" and have North, South, East, and West segments of it as well as a "East Westnorth Avenue Street Court Drive", the USPS has to suck it up and make it work. And municipalities do this. I have seen plenty of "East Park ST" (a directional and two suffixes) and "S 87TH STREET CT E";

I truly thank God that I no longer work in that industry.

KoFFiE · 2005-12-22 Reply Admin

Fuck this, this forum soft has to be a wtf on it's own... You can't edit, you can't quote, you can't preview a message before posting, you can't use it in anything else than IE? What a piece of junk... A little kid with some basic PHP, mysql and html knowledge could do better...

I simply wanted to fix the quotations in my previous post...

2005-12-22 Reply Admin

Let me guess, you live in Miami? I used to live on SW 88th Pl, which is right next to SW 88th Ct, which is right next to SW 88th St.

Then of course, you have streets with multiple names, like SW. 184 St. is also known as Eureka Drive, and also Burger King Drive. Let's see you key on that!

2005-12-22 Reply Admin

christoofar:
Anonymous:
i know this maybe bad practice, but whenever i did relational databases, i always used uid autonumber as my primary unique keys. so uid/pid/etc would link and work nicely. but i really never had any databases more than a few relations so i dont think it was that bad in design. please let me know if uid would been appropiate here, as i learn by example expecially bc seeing awsful code design implemations.

This is frowned on for some main reasons:
It shows that you are unaware of what the "facts" inside each table are and you probably aren't aware that perhaps a composite key would be the PK.
It allows duplicates (you are simply avoiding the key aspect and just using an identity column to number your rows... like a sheet of columnar paper... you can write JOHN SMITH 444 STATE STREET POUGHKEEPSIE NY five times on that sheet of paper and each row is unique)
Consider a table of people which is frequently queried on by phone number (say a PBX system queries it as calls come in).

You could probably be safe with an IDENTITY column for numbering the customers, but you would want your PK to be the phone number PLUS the caller's name. This identifies a fact that is UNIQUE to each row in the table, prevents duplicates, and it's not bad design. If you believe in MSSQL, a good fill-factor and a CLUSTERED index, then it makes retrieving a unique record a sub-second operation.

A nice thing with setting your PKs right is if you feed in data from bulk information services (market aggregators like Merkle and such). You can feed in this data during the day (a program to throttle the inserts so OLTP functionality is not affected), and as constraints are dinged the exceptions can be fed into a dupe file or dupe table, which you can go back to your aggregator for a refund.

What I see most often is some ID field at the beginning that's set IDENTITY CLUSTERED, then the rest of the table is heavy with non-clustered indexes as the table gets very big. Sad, really.

For extremely large people tables on MSSQL, I have set the primary key to the phone number plus the SOUNDEX of the name, and seperate index for the SOUNDEX of the address (seperate columns are maintained to store the SOUNDEX values and are updated with triggers). This makes an address/name search on a table with 255 million+ rows on it come back with results in seconds for LIKE searches.

thank you. this is an excellent idea and i feel i need to definitely study more about databases. is there a good book? mysql from o'reilly a good idea? or it teachers more syntax than good database practices? college text books seem to go way into relational algebra and more or less a waste of time for me.,

RevMike · 2005-12-22 Reply Admin

Anonymous:
RevMike:
To build a more reasonable example, say you were building a system to handle stock trades. A composite key of ticker symbol and exchange would be an appropriate natural key. (However it would not actually be appropriate for a portfolio management app, because tickers symbols can and do change over time.)

Why would stock ticker and exchange be a reasonal natural key?

And that still doesn't answer my question:
Table1 has a PK of Stock Ticker and Exchange
Table2 indexes back into Table 1...

WHAT value gets placed into this field in Table2?
Is it "GOOGNYSE"?

I truley don't know, and am trying to understand what is going on.

First, a primary key can be composed of a single unique field, or a combination of fields that together are unique. So for a stock trading application the fields "TickerSymbol" and "Exchange" together would make a very appropriate natural primary key. They are natural attributes that uniquely identify a security that could be traded.

If table 2 were to reference back to table 1, it would need to have the entire foreign key - both the columns "TickerSymbol" and "Exchange". Furthermore, if table 2 had a master/detail relationship with table 1, the primary key of table 2 would consist of at least 3 columns, the 2 column foreign key that references back to table 1 and one or more columns that distinguish rows in table 2 that share a common "TickerSymbol" and "Exchange".

Let's flesh out this stock trading system a little more...

First, we have to build a list of securities that can be traded. In the industry these are typically called products, so we'll create a Product table. We need some sort of primary key for this table. Fortunately for us there are services that provide us data on the available products. One of the services is provided by Reuters. They've already assigned a unique identifier, called a Reuters Instrument Code, which we can use as a natural key. Here is our Product table: (note, I didn't research how large RIC needs to be, I just guessed at the size.)

Product
RIC CHAR(7) PK
ProductDesc VARCHAR(50)

It contains data like
"BMWG.DE", "Bayerische Motoren-Werke AG (BMWG) common stock"

Now, this stock may be tradeable on several exchanges. Now we need an Exchange table with the available exchanges:

Exchange
ExchangeID CHAR(6) PK
ExchangeDesc VARCHAR(50)

It contains data like
"NYSE", "New York Stock Exchange"
"NASDAQ", "NASDAQ"
"TOR", "Toronto Stock Exchange"
"HOUS", "Houstan Stock Exchange"

Note that we could have a generated key here, but this table needs only occassional maintenence, and there is a benefit to making the data readable in other tables.

Now we need to create a symbol table. There can be at most one symbol for each combination of RIC and exchange, and symbols must be unique within their exchange. We create this table:

Symbol
RIC FK Product PK
ExchangeCd FK Exchange PK
Symbol CHAR(8)
Unique constraint on ExchangeCd, Symbol.

Data in this table looks like this:
"BMWG.DE", "NYSE", "BMW"
"BMWG.DE", "TOR", "BMW"
"BMWG.DE", "HOUS", "Beamer"

Now, we need a table to track an order. We create an Order table that looks like this... (for simplicity we ignore that Order is a reserved word for the moment.)

Order
OrderID ID PK
RIC FK Symbol
ExchangeCd FK Symbol
Quantity INTEGER
Price DECIMAL
BuySellIndic CHAR(1)
BrokerID -- FK to another table we haven't defined yet
Entered TIMESTAMP
Executed TIMESTAMP

Now here is where natural keys show their value:

First, there is some set of processes (could be people or machine) monitoring this table looking for orders to execute. Something is looking for orders to send the NYSE while something else is looking for orders to send to HOUS, etc. Since those elements have been forwarded through this entire structure, we don't need to perform a join in order to extract that data, but can get the data directly. This makes the structure much more efficient transactionally.

Second, there is some group of auditors monitoring the orders in real time, looking for patterns that suggest insider trading and the like. They can see, for instance, all the activity for a particular product or a particular exchange without performing any joins. Again, this is a very efficient structure.

I hope this helps it all make sense.

2005-12-22 Reply Admin

lol, hopefully when writing queries, you aren't having to convert an indexed column. If u r, kiss ur index goodbye... hell, in a DB like this, would it matter? lol

2005-12-22 Reply Admin

kipthegreat:

Sidenote- I noticed in Jacksonville, FL that there is a "Blvd. St." and I thought that was a little silly.

I live (in Lousiana) on East Boulevard Street (as in, I live on the eastern part of Boulevard St.). A couple of blocks over, there is an East Boulevard (as in, the boulevard is named "East"). There is also a Boulevard Ave in town, but I don't associate with those people.

RevMike · 2005-12-22 Reply Admin

johnl:
I'm no DBA but I can't think of any situations where an artificial key would be really bad. I can think of situations where a natural key would be bad, and situations where either one would work, though.

For performance critical systems, a good natural key can be significantly better. The propogation of the key can mean that joins can be eliminated. The rule of thumb should always be, however, "When in doubt use an artificial key."

ishmaeel · 2005-12-22 Reply Admin

KoFFiE:
Encoded data? Time is time, one of the measurements of time is seconds, what part of that is "encoded"? Internally - throughout all our applications, unix timestamps are used, only formatting and input conversions are done. For this, and your point of adding i.e. 3 month we indeed wrote some basic 100% platform-portable libraries, which actually isn't that hard - or much work... Reinventing the wheel? Maybe, but try to do this in a quick and easy portable way across database systems in any other way, I dare you, and then we're not even talking about timezone stuff - that really messes things up. If SQL had a decent standardized way of doing date-handling, there would be no doubt that I would use it's functionality, but in it's current state? No tx... I like it clean and separated, a database's job is to manage relational data, not to write half my application in... Posix unix time in GMT it is plz.

I know I used the word "encoded" very lightly, but a timestamp is encoded: unless you know the starting point, the number does not mean anything "as a date". Also, time is certainly time but date is not exactly time, meaning, you can not simply represent three months in seconds. How many seconds there are in a given three month period depends on a lot of factors. What day of what month the period begins? Is it a leap year? Is daylight savings in effect? You cannot just add 90*24*60*60*60 to DateTime.Now in the application layer - that was what I meant. From the last paragraph of your original post, it seemed as if you preferred manipulating the actual dates represented as integers, in the business logic. Adding seconds to dates is not the responsibility of the application layer, it is the responsibility of date time libraries, and now that you made it clear that you do use hand-rolled libraries for this purpose, we are agreed on this point. And congrats too, for implementing a portable date-time library, because I still do think that handling dates is non-trivial and one could screw up badly, if not careful.

I also see that my capitalized "always" would look better if replaced with "almost always". I realize that possibly not all languages have decent date/time libraries, and when you have to work with those, of course you have to come up with something else. However, I did not suggest using database's own date handling - I know that that is not portable or consistent across systems. I was imagining a system where the db simply stored numbers without any clue as to their meaning and the actual application concerned itself with the conversion - not during the formatting and input, but storage and retrieval.

Rank Amateur · 2005-12-22 Reply Admin

JamesCurran:

dpm:
The first six are. Why you think "court" and "avenue" are evaluated to be "street" is beyond me, though.

Because the USPS officially ignores the suffix and considers all of them to be just "1 Forrest". New streets must be unique within the town irrespective of the suffix.

I don't know if this counts as "new" but around 1986 my brother moved into a brand-new condo development in Farmington Hills MI that comprised the streets Country Way, Country Circle, and Country Bluff. Cute, but WTF were they thinking?

--RA

christoofar · 2005-12-22 Reply Admin

RevMike:
Anonymous:
> ...but you would want your PK to be the phone number PLUS the caller's name. This identifies a fact that is UNIQUE to each row in the table, prevents duplicates, and it's not bad design.

Actually, for this data you should use a generator to create a unique number that is unrelated to the data. The phone number plus name should be a seperate, unique constraint. This unique constraint is also called a candidate key.
Now the code actually reflects what you're trying to accomplish, and the unique constraint can be changed if the rules change. You may get a case where John Smith is assigned the phone number that used to belong to John Smith.
If you are concerned about retrieval speed, that would be handled by an index based on the query and the potential plans generated by the optimizer.

Rick DeBay

Sure, go ahead and add indexes and constraints willy-nilly. Who cares about insert/update performance.

Depends on the DBMS. The natural key was very efficient on IBM DB2 (MVS edition) and that was without using Computer Associates' tuning utilities, but a poor performer on MSSQL using the built-in query optimizer. We did use an IDENTITY column on the table, but this wasn't the primary key for the table and that column would never be used other than to join; the composite key was used. Because of the way the primary key was identified on the table, inserts on DB2 were wicked fast on a volume that spanned 6 DASDs as were retreivals.

Like I said in the example, the primary query client was a PBX machine, which only queried the thing one way, so I made the table as optimized as possible for that purpose alone. I'm sure QVC, AMEX and a bunch of other companies that receive high incoming call volumes (30+ calls per second, any query > 300msec to run is unacceptable) and key off callerID and+or customer-input information do the same thing.

Of course, this was replacing a crappily-indexed MSSQL database running on medium-range Intel hardware, and the 1-2sec response time from MSSQL on a 115million row table with 13 columns seemed like dire straits to the client. </shrug>

johnl · 2005-12-22 Reply Admin

Thanks RevMike - thought it might be something like that.

Just remembered (on the duplicate values debate) phone numbers in the UK go through periodic changes. Leeds was entirely (0113) 2xx xxxx a few years back. Now the first digit after the area code can be 2 or 3. It's because of population rises, increase in business numbers and so on, meaning higher demand for phone numbers.

Satanicpuppy · 2005-12-22 Reply Admin

mlathe:
The worst thing is that he added the uniqueness to the front of address. I would have added whitespace to the end of the address

You should never use anything but a unique, auto-incremented, id. I put one in just about every table, as a matter of course, even if I don't think I'll need a key any time soon. It may take up a little space, but it adds so much value I don't think the space concern has weight.

Satanicpuppy · 2005-12-22 Reply Admin

Anonymous:
Anonymous:
I'm still just a hobbiest (so please forgive stupidity)

but wouldn't a phone number be a (always) unique natural key?

Possibly. Probably, even. But consider a case where two people from the same household have an account (husband/wife, parent/child, two siblings, roomates, whatever) If they share the house line then you're SOL.

I actually used a database (a very old one) where the phone number was one of the keys, and even there, it wasn't the primary key...there was a generated customer number for that. I'll tell you, if you do a lot of business in one town for 200 years or so, you're going to have duplicate phone numbers ALL THE TIME, as people who move in pick up phone numbers that were assinged to people who have moved out or died, or whatever.

RevMike · 2005-12-22 Reply Admin

Satanicpuppy:
mlathe:
The worst thing is that he added the uniqueness to the front of address. I would have added whitespace to the end of the address

You should never use anything but a unique, auto-incremented, id. I put one in just about every table, as a matter of course, even if I don't think I'll need a key any time soon. It may take up a little space, but it adds so much value I don't think the space concern has weight.

That advice does not always scale to large or high performance systems. First of all, autoincrmenting IDs can cause otherwise parallelizable transactions to serialize. Second, appropriate use of natural keys and composite keys can limit the number of joins required to satisfy certain queries.

Satanicpuppy · 2005-12-22 Reply Admin

JamesCurran:

However, one thing which I haven't seen anyone comment on, is the fact that

1 Forrest St.
1 Forrest Str.
1 Forrest Street
One Forrest St.
One Forrest Str.
One Forrest Street
1 Forrest Ct
1 Forrest Court.
1 Forrest Ave.

And a whole bunch of other variation are (as far as the USPS is concerned) all the same address.

Well, the USPS does make a distinction between St and Ct. and Ave. But if you're talking about St, Str, Street, etc, sure they're the same...if you're not counting zip code. But within a zip code, they're unique. That's the whole point of a zip code.

Uniquely Addressing

Leave a comment on “Uniquely Addressing”