• (cs) in reply to RevMike
    RevMike:
    Ytram:
    Jeff - The Anonymous One:
    The Cusips and maturity date are entered by hand, so I always have to assume they could be wrong.


    This might possibly be the best reason I can think of to not use natural keys(in general of course).  At some point, a user must input the value for the natural key, and we all know how stupid users are.  No matter what we do to validate input(double-key, regex, etc.) users will still be able to enter incorrect values, and depending on the number of users, it could be fairly frequently.

    If we use surrogate keys, we will find that correction of invalid input becomes a lot easier because we don't have to worry about the foreign key in another table when we fix the invalid input.  If the user enters "HORE", and this value is used as a foreign key in child tables, a cascade update might possibly be troublesome.  But if the value he enters is simply for a column with a unique constraint with a surrogate key representing that row, he can go back and easily change it to "HORSE"(what did you think it was? sicko!) without worrying about the possible troubles involved in a cascade update.

    Of course, let me qualify all my statements as a person who believes surrogate keys are preferred, not required for all tables, yadda yadda yadda....


    I'm not sure I see why this would be a huge problem.  If a record labeled "HORE" wa entered incorrectly, the user would simply create a new record labeled "HORSE" and go on without a problem.  This maintains history. (I have a bias, having spent 5 years maintaining an accounting system that never had a delete.) :)


    But if you have a new record, you might possibly need the relationships that are currently associated with "HORE".  I could see your response applying to an entity with transactions associated with it(such as customer -> orders, or employee -> sales), but what if the relationships are necessary for the proper functionality of the entity?(ex.  user -> permission settings)  If you need to reproduce the relationships for the new entity, that can be just as troublesome as a cascading update.

    However, if you use a surrogate key, you can quickly and easily update the value that was incorrectly entered by the user without worrying about reproduction of relationships or cascading updates.
  • (cs) in reply to Jeff S
    Jeff S:

    RevMike:
    UncleMidriff:


    I'm not sure if you were addressing Jeff's design or mine, so I'm sorry if I'm missing the point.

    In my proposed design, you'd have an address table and a table linking people to addresses.  That table would have PersonName, AddressID, and AddressType columns.  If Joe, Sue, Bob, and Bubba all live at the address with an AddressID of 123, then our table would have the following 4 rows in it:

    Joe, 123, Home
    Sue, 123, Home
    Bob, 123, Home
    Bubba, 123, Home

    If Bob moves to another address, say the address with an AddressID of 456, then all you need to do is change Bob's row in our table so that we have:

    Joe, 123, Home
    Sue, 123, Home
    Bob, 456, Home
    Bubba, 123, Home

    Of course, this will potentially leave you with addresses in the address table that aren't tied to anybody, and that's obviously not ideal.  Anyway, how else is this needlessly complicating the system, or am I just misunderstanding you?  If so, please let me know.


    Part of the problem with this, aside from the orphan row problem, is that every address change involves a lookup of all addresses to determine whether there is a matching record already.  The logic would need to create the new address record or simply update the relationship to the existing record.  This relatively expensive search needs to be serialized, otherwise two users could simultaneously enter duplicate addresses.  Now your overhead is way up and throughput is limited.

    I was actually thinking about it in the opposite way.  Rather than create a new address we could update the existing address record.

    When I model, I don't like to reduce coincidental relationships.  There is nothing fundamentally wrong with doing so, I just don't like to.  From your example I inferred that Joe, Sue, Bob, and Bubba are four different people with four distinct relationships that we are trying to track.  I don't see a good purpose in reducing them to a single detail.  If the relationship was more than coincidental, however, I have no problem with them sharing an address through an intermediate table describing the relationship, e.g. a Household table.

    As I think about it more, I'd really like to put the address type on the address table itself.  I don't like having the Many-To-Many table there.  I think this is more intuitive....

    Person
    PersonID, PersonName, BirthDate
    1, Joe, 1/1/1970
    2, Sue, 1/2/1971
    3, Bob, 1/3/1972
    4, Bubba, 1/4/1973

    PersonAddress
    PersonID, AddressType, Street, City, State, Zip
    1, Home, 123 Main St, Anywhere, NY, 11223
    2, Home, 123 Main St, Anywhere, NY, 11223
    3, Home, 123 Main St, Anywhere, NY, 11223
    4, Home, 123 Main St, Anywhere, NY, 11223

    Now, to update Bob's home address, I just update to this...

    PersonAddress
    PersonID, AddressType, Street, City, State, Zip
    1, Home, 123 Main St, Anywhere, NY, 11223
    2, Home, 123 Main St, Anywhere, NY, 11223
    3, Home, 999 Broadway, Anywhere, CA, 99999
    4, Home, 123 Main St, Anywhere, NY, 11223

    Note that this update doesn't require any large searchs and happens cleanly inside a very small transaction.  If my platform support row level locking, other updates to other records happen simultaneously.

    Depending on the application and platform I may or may not create a domain table for address type, potentially using a constraint or a user defined type to limit the values.  I'm not religious about this for an extremely small set of values.  In this case I'm thinking the AddressTypes might be limited to Home, Office, and Other.  If I do make the AddressType a foreign key reationship, I'd likely use a small character field rather than an ID.  It is admittedly a crutch, but one which eases the work of maintainers.  I don't mind compromising normal form, I mind when normal from is compromised without consideration of the impact.

    The question to ask is: "Are addresses attributes or entities?"

    If an address is just an ATTRIBUTE of an entity, just info you just want to store for a customer, then you either a) put it on the customer table or b) create a child addresses table of customers so you can store many addresses (1 per type, of course; pk = customer/AddressType).  That situation is similiar to what RevMike is describing.

    If an address is an ENTITY, something that you want to keep track of, something that you want other entities in your database to potentially share, then you have a separate Addresses table (pk of addressID if you want, be sure to add unqiue constraints) and you have a situation similiar to what Uncle M has described where you relate entities to addresses.  Either each customer has an AddressID column in the table (for 1 address per customer), or you store multiple addresses per customer in a table with a pk of (Customer/AddressID/Type).

    See the difference?  It depends on your needs; both situations are correct.

    That makes a lot of sense.  I dunno why I started thinking of addresses as entities that needed to be related to people rather than merely attributes of people, the latter of which will, in most situations, be the case.  Regardless, thanks for clearing that up for me.

  • (cs) in reply to Jeff
    Anonymous Jeff:
    Jeff S:

    Well, then, there you go.  The key to this  (no pun intended) is that the two cuspid are not related.  Therefore, you would not want a composite key, since they would have no common parent.



    Let me try to rephrase this as a general principal, and please let me know if I get it right.  If a row in a database can be identified based on a single column, then that column should be used as your primary key.


    Yes.  But be carefully about being seduced by near-keys like SSN.  If you have any doubt, it is safer to use an artificial key.

    Anonymous Jeff:
    If there are multiple columns then you might make a composite key, depending on whether or not rows with a matching value for part of the key would share some data.  If they would not, then an artifical key makes more sense (yet still putting the contstraint on your data to keep from duplicating rows you shouldn't).


    I'm not exactly sure what you mean here.  If a column is only dependent on part of the key, then that column should be normalized into a different table.  When you say "rows that share part of a key also share some data" do you mean they join to the same master in a master detail relationship?  This is a natural consequence of composite keys.  I don't see any reason that this situation should trigger the use of an artificial key.  Don't forget that there is nothing invalid about creating a composite key with both natural and artificial elements.  If you have a one-to-many relationship and there is no distinguishing characteristic to the many side, you can use the foreign key to the parent along with an artifical distinguisher to uniquely identify each row.

    Anonymous Jeff:
    If I understand this correctly then wouldn't at least part of each of your composite keys always exist in a parent table as a primary key?


    This is a very common design pattern when you have a master-detail-[subdetail[-subsubdetail]] structure.  Each table is keyed with a foreign key to its parent and a distinguisher of some sort.

    Anonymous Jeff:

    Also, is there a good book on DB design that I should make part of my reading over the next few months?  Something long on practical with enough theory so I'd know the "whys" of good db design?  I don't intend to ever be a db programmer, but I like to know enough about whatever I touch that I don't create more problems than I solve.


    That is a good goal.   I'm sorry I can't help.  I learned through an apprenticeship with some good designers many years ago.
  • (cs) in reply to Ytram
    Ytram:
    RevMike:
    Ytram:
    Jeff - The Anonymous One:
    The Cusips and maturity date are entered by hand, so I always have to assume they could be wrong.


    This might possibly be the best reason I can think of to not use natural keys(in general of course).  At some point, a user must input the value for the natural key, and we all know how stupid users are.  No matter what we do to validate input(double-key, regex, etc.) users will still be able to enter incorrect values, and depending on the number of users, it could be fairly frequently.

    If we use surrogate keys, we will find that correction of invalid input becomes a lot easier because we don't have to worry about the foreign key in another table when we fix the invalid input.  If the user enters "HORE", and this value is used as a foreign key in child tables, a cascade update might possibly be troublesome.  But if the value he enters is simply for a column with a unique constraint with a surrogate key representing that row, he can go back and easily change it to "HORSE"(what did you think it was? sicko!) without worrying about the possible troubles involved in a cascade update.

    Of course, let me qualify all my statements as a person who believes surrogate keys are preferred, not required for all tables, yadda yadda yadda....


    I'm not sure I see why this would be a huge problem.  If a record labeled "HORE" wa entered incorrectly, the user would simply create a new record labeled "HORSE" and go on without a problem.  This maintains history. (I have a bias, having spent 5 years maintaining an accounting system that never had a delete.) :)


    But if you have a new record, you might possibly need the relationships that are currently associated with "HORE".  I could see your response applying to an entity with transactions associated with it(such as customer -> orders, or employee -> sales), but what if the relationships are necessary for the proper functionality of the entity?(ex.  user -> permission settings)  If you need to reproduce the relationships for the new entity, that can be just as troublesome as a cascading update.

    However, if you use a surrogate key, you can quickly and easily update the value that was incorrectly entered by the user without worrying about reproduction of relationships or cascading updates.


    On the other hand, in my experience the amount of data that needs to be modified for these "relationship cases" like user->permissions tends to be much smaller (and simpler in structure) than transaction relationships.  Therefore it should be fairly trivial to modify these by hand.

    Just so you realize, I'm arguning at this point just for sport.  I don't strongly disagree with your point.
  • (cs) in reply to Jeff

    Anonymous:
    Let me try to rephrase this as a general principal, and please let me know if I get it right.  If a row in a database can be identified based on a single column, then that column should be used as your primary key.  If there are multiple columns then you might make a compisite key, depending on whether or not rows with a matching value for part of the key would share some data.  If they would not, then an artifical key makes more sense (yet still putting the contstraint on your data to keep from duplicating rows you shouldn't).

    If I understand this correctly then wouldn't at least part of each of your composite keys always exist in a parent table as a primary key?

    Also, is there a good book on DB design that I should make part of my reading over the next few months?  Something long on practical with enough theory so I'd know the "whys" of good db design?  I don't intend to ever be a db programmer, but I like to know enough about whatever I touch that I don't create more problems than I solve.

    I will see if I can track down a good book that isn't all theory; that's the problem -- it seems most are either all theory or not enough.

    >>If I understand this correctly then wouldn't at least part of each of your composite keys always exist in a parent table as a primary key?

    As a "rule of thumb", that is a pretty good way of thinking about it.   Otherwise, the composite key is just a set of attributes, and indeed it may be clearer,easer, and/or more efficient to put a surrogate key on the table.  But some of those attributes are actually entities, those entities and all info dependent on those entities should be stored in a related table (i.e., you should be normalized) and the reference to that entity should be part of your composite primary key.

     

     

  • Jeff (unregistered) in reply to RevMike
    RevMike:


    Anonymous Jeff:
    If there are multiple columns then you might make a composite key, depending on whether or not rows with a matching value for part of the key would share some data.  If they would not, then an artifical key makes more sense (yet still putting the contstraint on your data to keep from duplicating rows you shouldn't).


    I'm not exactly sure what you mean here.  If a column is only dependent on part of the key, then that column should be normalized into a different table.  When you say "rows that share part of a key also share some data" do you mean they join to the same master in a master detail relationship?
    The question Jeff S asked me was whether or not two records with the same Cusip are related, which I interpreted as there would be common data between them.  Of course, when that happens you create a child record for the different data and keep the same data in the parent table.  However, in my case there is no common data between the two records with the same cusip.  Cusip is a near-unique column, just like SSN, and and there is no similarity between two records with the same Cusip other than that single column.  I mention the Maturity Date because that column will ALWAYS be different if the Cusips are the same (where other fields might be the same coincidentially).  If I understand Jeff S correctly (which is what I was trying to asertain),  because two records with the same Cusip are not related in any way I should use an artifical key.

    RevMike:
    Anonymous Jeff:
    If I understand this correctly then wouldn't at least part of each of your composite keys always exist in a parent table as a primary key?

    This is a very common design pattern when you have a master-detail-[subdetail[-subsubdetail]] structure.  Each table is keyed with a foreign key to its parent and a distinguisher of some sort.

    Well good- that means I understand the implications!

    On a totally not related note- I've seen posts say that CAPTCHA doesn't get the first post.  It appears to me that it has nothing to do with "first post" but rather a timeout.  Forgive me if this has been noted- I'm rather new around here.
  • (cs) in reply to RevMike

    Reverend Mike Jackson:
    On the other hand, in my experience the amount of data that needs to be modified for these "relationship cases" like user->permissions tends to be much smaller (and simpler in structure) than transaction relationships.  Therefore it should be fairly trivial to modify these by hand.

    Just so you realize, I'm arguning at this point just for sport.  I don't strongly disagree with your point.

    True enough, although row-level security(user -> permissions) can have a fairly large amount of rows per user.

    Another point:  Would your customers want to have to worry about looking at history over two(or more!) different customer ids?  Let's go with the example of customer -> orders, where customer id is a natural key entered by a user.  Initially the stupid user entered a potentially offensive name of "HORE" for the Arabian Horse Owner's Association.  With this id, the customer made 10,000 transactions from 1/1/2005 to 2/31/2005, then the correction was made and a new record of "HORSE" was created.  From 3/1/2005 to 4/30/2005, another 10,000 transactions were made.

    Now the customer wants to go look at their history.  They enter their customer id of "HORSE" and enter a date range of 1/1/2005 to 4/30/2005.  They only see 10,000 transactions when they are expecting 20,000.  Then they try 1/1/2005 through 2/31/2005.  No transactions at all.  Then they slap their heads and realize:  There is only 28 days in February!  They also remember that their customer id was changed, so they go and look up "HORE" for the first date range and get the expected number.

    I know any customers I've ever dealt with wouldn't care for this at all.  The situation could have been completely avoided by using surrogate keys.

    <FONT size=1>or maybe cascading updates.</FONT>

  • (cs) in reply to Ytram
    Ytram:

    Reverend Mike Jackson:
    On the other hand, in my experience the amount of data that needs to be modified for these "relationship cases" like user->permissions tends to be much smaller (and simpler in structure) than transaction relationships.  Therefore it should be fairly trivial to modify these by hand.

    Just so you realize, I'm arguning at this point just for sport.  I don't strongly disagree with your point.

    True enough, although row-level security(user -> permissions) can have a fairly large amount of rows per user.

    Another point:  Would your customers want to have to worry about looking at history over two(or more!) different customer ids?  Let's go with the example of customer -> orders, where customer id is a natural key entered by a user.  Initially the stupid user entered a potentially offensive name of "HORE" for the Arabian Horse Owner's Association.  With this id, the customer made 10,000 transactions from 1/1/2005 to 2/31/2005, then the correction was made and a new record of "HORSE" was created.  From 3/1/2005 to 4/30/2005, another 10,000 transactions were made.

    Now the customer wants to go look at their history.  They enter their customer id of "HORSE" and enter a date range of 1/1/2005 to 4/30/2005.  They only see 10,000 transactions when they are expecting 20,000.  Then they try 1/1/2005 through 2/31/2005.  No transactions at all.  Then they slap their heads and realize:  There is only 28 days in February!  They also remember that their customer id was changed, so they go and look up "HORE" for the first date range and get the expected number.

    I know any customers I've ever dealt with wouldn't care for this at all.  The situation could have been completely avoided by using surrogate keys.

    <FONT size=1>or maybe cascading updates.</FONT>

    Two things:

    1) you mentioned "maybe cascading updates".  If you decide on ANY natural keys that might EVER change, and cascading updates are not available, then you shouldn't use them.  That's pretty cut and dry.  Again, do not confuse keys that define an entity with keys that relate entities.  The former very often might be surrogates. The later almost never should be.

    2) Another thing you missed is that if you decide to leave HORE as is, but then create a new HORSE customer, you would book negative transactions to zero out the original HORE customer account and then book the positives to move the transactions to the new customer.  Now you have the best of both worlds:  a correct customer code, correct transactions booked in the right place, and a perfect trail of exactly what occurred. 

  • (cs) in reply to Ytram
    Ytram:

    Reverend Mike Jackson:
    On the other hand, in my experience the amount of data that needs to be modified for these "relationship cases" like user->permissions tends to be much smaller (and simpler in structure) than transaction relationships.  Therefore it should be fairly trivial to modify these by hand.

    Just so you realize, I'm arguning at this point just for sport.  I don't strongly disagree with your point.

    True enough, although row-level security(user -> permissions) can have a fairly large amount of rows per user.

    Another point:  Would your customers want to have to worry about looking at history over two(or more!) different customer ids?  Let's go with the example of customer -> orders, where customer id is a natural key entered by a user.  Initially the stupid user entered a potentially offensive name of "HORE" for the Arabian Horse Owner's Association.  With this id, the customer made 10,000 transactions from 1/1/2005 to 2/31/2005, then the correction was made and a new record of "HORSE" was created.  From 3/1/2005 to 4/30/2005, another 10,000 transactions were made.

    Now the customer wants to go look at their history.  They enter their customer id of "HORSE" and enter a date range of 1/1/2005 to 4/30/2005.  They only see 10,000 transactions when they are expecting 20,000.  Then they try 1/1/2005 through 2/31/2005.  No transactions at all.  Then they slap their heads and realize:  There is only 28 days in February!  They also remember that their customer id was changed, so they go and look up "HORE" for the first date range and get the expected number.

    I know any customers I've ever dealt with wouldn't care for this at all.  The situation could have been completely avoided by using surrogate keys.

    <font size="1">or maybe cascading updates.</font>



    I think I mentioned earlier that much of my experience is with accounting systems that did not ever do a delete.  We dealt with things like this on a regular basis by providing the users with transfer tools.  In this way, there would be in HORE 10,000 new transactions and 10,000 offsetting "Transfer To" transactions moving the activity out of HORE into HORSE.  In HORSE there would be 10,000 new transactions and 10,000 "Transfer From" transactions recieving the activity from HORE.  Now the data is right, but there is also a complete audit trail and the system can reproduce the state at any point in time.
  • (cs)

    LAST!

  • (cs) in reply to Jeff S
    Ytram:

    I know any customers I've ever dealt with wouldn't care for this at all.  The situation could have been completely avoided by using surrogate keys.

    One more thing to consider:

    So you create a database-only surrogate key, CustID, is strictly internal for relations in your database.  The customer code, "XYZ", is our business customer code which we stamp invoices with, customers use when they call, etc., but we avoid this as our natural key and use a surrogate CustID identity.  This allows us to change the customer code as often as we want, right?

    OK, let's take customer ID #45, and change the customer code from XYZ to ABC to correct a mistake or to update our records.  Done!  No messy cascading updates, no fuss, nothing else changed!  Life is good. Take a vacation.

    Two weeks later, someone calls customer support for help with your product.  

    "No problem", says the support rep on the phone," what is your customer number?" 

    Unfortunately, the person calling doesn't know that the customer # has changed, and tells you XYZ.  Customer not found !  hmmm ... must have forgotten the customer code.  Thinking quick, looks up the invoice for the product which had the customer code printed on it.   He finds it, and there is again: XYZ! 

    "Still not found," replies the support person.  Then he has an idea:  "Well, since you have the invoice, what is the invoice # that you found? That will help me get the customer code."  The customer replies invoice #1235.  He looks it up, sees that it is linked to customer ID#45.  Customer ID#45 has a customer code of .... "ABC" !  

    "Sir, something's not right here, the system says invoice #1235 was for customer ABC! "

    "But right here, on the invoice i am looking at, it says invoice #1235, customer code XYZ!  I am holding it in my hand!"

    "Uh .. .please hold!"

     

     

  • (cs) in reply to Jeff S
    Jeff S:

    2) Another thing you missed is that if you decide to leave HORE as is, but then create a new HORSE customer, you would book negative transactions to zero out the original HORE customer account and then book the positives to move the transactions to the new customer.  Now you have the best of both worlds:  a correct customer code, correct transactions booked in the right place, and a perfect trail of exactly what occurred. 



    I hate it when someone posts my point while I am composing my own post. :)

    Jeff, it looks like you've been playing with the big boys for a bit.  Drop me an email at my nickname at gmail.  I might be in a position to hire someone in New England in the next six months.
  • (cs) in reply to Jeff S

    One benefit of natural keys that I don't think I've seen anyone mention is select performance.  You can only have one clustered index on a table.  If the business key can be used as the primary key, you get the benefit of fast look-ups for both user requests and joins.  This is can be particularly helpful if you are querying large detail (descendant) tables and you have carried the ancestors' key fields down to the child table as part of the primary key.

    Bear in mind that I am not suggesting that this is always the optimal way to go.  Preserving the ancestor key fields could use so much space that you are getting many fewer rows per page, negating the benefit of using the clustered index. 

  • (cs) in reply to RevMike
    RevMike:
    Jeff S:

    2) Another thing you missed is that if you decide to leave HORE as is, but then create a new HORSE customer, you would book negative transactions to zero out the original HORE customer account and then book the positives to move the transactions to the new customer.  Now you have the best of both worlds:  a correct customer code, correct transactions booked in the right place, and a perfect trail of exactly what occurred. 



    I hate it when someone posts my point while I am composing my own post. :)

    Jeff, it looks like you've been playing with the big boys for a bit.  Drop me an email at my nickname at gmail.  I might be in a position to hire someone in New England in the next six months.

    I definitely will, I am always up for new challenges .. you can also contact me as well through my weblog at sqlteam.com.

  • (cs) in reply to Jeff S
    Jeff S:

    One more thing to consider:

    So you create a database-only surrogate key, CustID, is strictly internal for relations in your database.  The customer code, "XYZ", is our business customer code which we stamp invoices with, customers use when they call, etc., but we avoid this as our natural key and use a surrogate CustID identity.  This allows us to change the customer code as often as we want, right?

    OK, let's take customer ID #45, and change the customer code from XYZ to ABC to correct a mistake or to update our records.  Done!  No messy cascading updates, no fuss, nothing else changed!  Life is good. Take a vacation.

    Two weeks later, someone calls customer support for help with your product.  

    "No problem", says the support rep on the phone," what is your customer number?" 

    Unfortunately, the person calling doesn't know that the customer # has changed, and tells you XYZ.  Customer not found !  hmmm ... must have forgotten the customer code.  Thinking quick, looks up the invoice for the product which had the customer code printed on it.   He finds it, and there is again: XYZ! 

    "Still not found," replies the support person.  Then he has an idea:  "Well, since you have the invoice, what is the invoice # that you found? That will help me get the customer code."  The customer replies invoice #1235.  He looks it up, sees that it is linked to customer ID#45.  Customer ID#45 has a customer code of .... "ABC" !  

    "Sir, something's not right here, the system says invoice #1235 was for customer ABC! "

    "But right here, on the invoice i am looking at, it says invoice #1235, customer code XYZ!  I am holding it in my hand!"

    "Uh .. .please hold!"

     


    Dear customer,

    as you might already have heard or read, our company has recently merged with  Jeff's Empire.
    As part of the consolidation process, our customer codes will be prefixed with "2-", while Jeff's Empire's customer codes are prefixed with "1-". So, from now on, your customer code is 2-XYZ. Please use that customer code for all future communications. Of course we will accept requests, orders etc. with your old customer code till Dec 31, 2006.

    Best regards,
    ammoQ's Lair

  • (cs) in reply to ammoQ
    ammoQ:
    Jeff S:

    One more thing to consider:

    So you create a database-only surrogate key, CustID, is strictly internal for relations in your database.  The customer code, "XYZ", is our business customer code which we stamp invoices with, customers use when they call, etc., but we avoid this as our natural key and use a surrogate CustID identity.  This allows us to change the customer code as often as we want, right?

    OK, let's take customer ID #45, and change the customer code from XYZ to ABC to correct a mistake or to update our records.  Done!  No messy cascading updates, no fuss, nothing else changed!  Life is good. Take a vacation.

    Two weeks later, someone calls customer support for help with your product.  

    "No problem", says the support rep on the phone," what is your customer number?" 

    Unfortunately, the person calling doesn't know that the customer # has changed, and tells you XYZ.  Customer not found !  hmmm ... must have forgotten the customer code.  Thinking quick, looks up the invoice for the product which had the customer code printed on it.   He finds it, and there is again: XYZ! 

    "Still not found," replies the support person.  Then he has an idea:  "Well, since you have the invoice, what is the invoice # that you found? That will help me get the customer code."  The customer replies invoice #1235.  He looks it up, sees that it is linked to customer ID#45.  Customer ID#45 has a customer code of .... "ABC" !  

    "Sir, something's not right here, the system says invoice #1235 was for customer ABC! "

    "But right here, on the invoice i am looking at, it says invoice #1235, customer code XYZ!  I am holding it in my hand!"

    "Uh .. .please hold!"

     


    Dear customer,

    as you might already have heard or read, our company has recently merged with  Jeff's Empire.
    As part of the consolidation process, our customer codes will be prefixed with "2-", while Jeff's Empire's customer codes are prefixed with "1-". So, from now on, your customer code is 2-XYZ. Please use that customer code for all future communications. Of course we will accept requests, orders etc. with your old customer code till Dec 31, 2006.

    Best regards,
    ammoQ's Lair

    hmm ... you didn't read the post very carefully, nor did you get the point ......   If I were to respond to this with a response of similiar effectiveness/relevance to yours, I might say "the mail doesn't get delivered on sundays."

  • (cs) in reply to Jeff S
    Jeff S:

    hmm ... you didn't read the post very carefully, nor did you get the point ......   If I were to respond to this with a response of similiar effectiveness/relevance to yours, I might say "the mail doesn't get delivered on sundays."



    No sane company arbitrarily changes customer codes from XYZ to ABC. And even if, XYZ will not be assigned to another customer. And if everything else fails, every customer has a name and an address, so the call center guy would not say "it's for customer ABC", but rather "you are Mr. Smith, right?" Your example is totally unrealistic. On the other hand, because of mergers, reorganisations etc., it might be necessary to fix the situation that customer numbers are no longer unique. Prefixing like explained in my example seems like a realistic option to me.

    Of course you need procedures to deal with customers who use their old codes, at least for a while. The guy at the call center knows about the merger, knows how to handle it, so if the customer says "XYZ", he will
    a) know that this must be an old cold, since new codes are 5 digits
    b) look up and see customer code "XYZ" is eighter Mr. Smith or Ms. Chin Quai, so
    c) when in doubt, will ask.

    Probably he will see which number was called, so he knows that the customer called the telefone number of company X, so he must be Mr. Smith.
    Anyway, at some point in time, the merged companies will want to integrate their systems, their databases etc. and in that case, the artificial ID will come handy.
    I don't say you must use IDs everywhere if you don't feel like it. It's just that I cannot stand people who dismiss that without valid reasons. I know a system which used natural keys; it has been replaced with a system that mostly uses artificial IDs. Both systems did resp. do their job.
    The system with the IDs avoided some of the problems that riddled the previous system, of course at some costs. In that case, it's worth the cost. I don't know your systems, your requirements and I would not dare to tell you what is better in your case.

    <°)))><
  • (cs) in reply to RevMike
    RevMike:

    I think I mentioned earlier that much of my experience is with accounting systems that did not ever do a delete.  We dealt with things like this on a regular basis by providing the users with transfer tools.  In this way, there would be in HORE 10,000 new transactions and 10,000 offsetting "Transfer To" transactions moving the activity out of HORE into HORSE.  In HORSE there would be 10,000 new transactions and 10,000 "Transfer From" transactions recieving the activity from HORE.  Now the data is right, but there is also a complete audit trail and the system can reproduce the state at any point in time.


    I routinely deal with accounting systems designed not to delete things (once the data is moved from data entry to permanent history, and not counting purging of old history at year end) but still allowing retroactive renumbering (not mandating it; you can still do things your way if you like).  As usual, complete audit trails are at odds with convenience, and the right balance for an individual case depends on what level of audit trail you really need in practice.

  • codemonkey (unregistered) in reply to emurphy

    I think most people may have missed a small detail on why 1899 is ground zero.

    Did you know that an OLE Date (you know, the one used in VB's date format) is a double value, where 0.0 = Dec 30th 1899. That is, CDbl(DateSerial(1899,12,30)) equals zero.

  • (cs) in reply to ammoQ

    ammoQ:

    I don't say you must use IDs everywhere if you don't feel like it. It's just that I cannot stand people who dismiss that without valid reasons.

    Do you feel I've been "dismissing surrogate keys" in this thread without valid reasons?  I have just been posting over and over "surrogates are bad" without making any valid points?  Before you decide that you can't stand me, do me a favor and see if you can sum up my feelings on this topic in a sentence or two.

  • (cs) in reply to Jeff S
    Jeff S:

    ammoQ:

    I don't say you must use IDs everywhere if you don't feel like it. It's just that I cannot stand people who dismiss that without valid reasons.

    Do you feel I've been "dismissing surrogate keys" in this thread without valid reasons?  I have just been posting over and over "surrogates are bad" without making any valid points?  Before you decide that you can't stand me, do me a favor and see if you can sum up my feelings on this topic in a sentence or two.



    That last sentence didn't mean you personally, I'm sorry. It was also meant to read "... cannot stand it when people..."
  • (cs) in reply to Jeff S
    Jeff S:
    Ytram:

    I know any customers I've ever dealt with wouldn't care for this at all.  The situation could have been completely avoided by using surrogate keys.

    One more thing to consider:

    So you create a database-only surrogate key, CustID, is strictly internal for relations in your database.  The customer code, "XYZ", is our business customer code which we stamp invoices with, customers use when they call, etc., but we avoid this as our natural key and use a surrogate CustID identity.  This allows us to change the customer code as often as we want, right?

    OK, let's take customer ID #45, and change the customer code from XYZ to ABC to correct a mistake or to update our records.  Done!  No messy cascading updates, no fuss, nothing else changed!  Life is good. Take a vacation.

    Two weeks later, someone calls customer support for help with your product.  

    "No problem", says the support rep on the phone," what is your customer number?" 

    Unfortunately, the person calling doesn't know that the customer # has changed, and tells you XYZ.  Customer not found !  hmmm ... must have forgotten the customer code.  Thinking quick, looks up the invoice for the product which had the customer code printed on it.   He finds it, and there is again: XYZ! 

    "Still not found," replies the support person.  Then he has an idea:  "Well, since you have the invoice, what is the invoice # that you found? That will help me get the customer code."  The customer replies invoice #1235.  He looks it up, sees that it is linked to customer ID#45.  Customer ID#45 has a customer code of .... "ABC" !  

    "Sir, something's not right here, the system says invoice #1235 was for customer ABC! "

    "But right here, on the invoice i am looking at, it says invoice #1235, customer code XYZ!  I am holding it in my hand!"

    "Uh .. .please hold!"

    My initial thought about this is that this isn't really a matter of what you decide to use as a primary key, but more a matter of knowing how to handle this kind of situation.  Let's say that you like natural primary keys better than artificial primary keys, so when you design this table you decide to use the customer code as your primary key.  Fortunately, your RDBMS has excellent cascading update capabilities and such, so changing the customer code for your customer from "XYZ" to "ABC" and changing all foreign key references is a snap.  But once you do so your screwed just like the person in your story who used artificial primary keys, because you've erroneously decided that no history of a change like this should be kept, and that's where the real problem lies.

    Does that make sense?  It's time to go home and I'm getting low on caffeine, so I doubt my ability to communicate effectively.

  • (cs) in reply to ammoQ
    ammoQ:
    <°)))><

    Fiiiiiiiishy. =D
  • (cs) in reply to UncleMidriff
    UncleMidriff:
    Jeff S:
    Ytram:

    I know any customers I've ever dealt with wouldn't care for this at all.  The situation could have been completely avoided by using surrogate keys.

    One more thing to consider:

    So you create a database-only surrogate key, CustID, is strictly internal for relations in your database.  The customer code, "XYZ", is our business customer code which we stamp invoices with, customers use when they call, etc., but we avoid this as our natural key and use a surrogate CustID identity.  This allows us to change the customer code as often as we want, right?

    OK, let's take customer ID #45, and change the customer code from XYZ to ABC to correct a mistake or to update our records.  Done!  No messy cascading updates, no fuss, nothing else changed!  Life is good. Take a vacation.

    Two weeks later, someone calls customer support for help with your product.  

    "No problem", says the support rep on the phone," what is your customer number?" 

    Unfortunately, the person calling doesn't know that the customer # has changed, and tells you XYZ.  Customer not found !  hmmm ... must have forgotten the customer code.  Thinking quick, looks up the invoice for the product which had the customer code printed on it.   He finds it, and there is again: XYZ! 

    "Still not found," replies the support person.  Then he has an idea:  "Well, since you have the invoice, what is the invoice # that you found? That will help me get the customer code."  The customer replies invoice #1235.  He looks it up, sees that it is linked to customer ID#45.  Customer ID#45 has a customer code of .... "ABC" !  

    "Sir, something's not right here, the system says invoice #1235 was for customer ABC! "

    "But right here, on the invoice i am looking at, it says invoice #1235, customer code XYZ!  I am holding it in my hand!"

    "Uh .. .please hold!"

    My initial thought about this is that this isn't really a matter of what you decide to use as a primary key, but more a matter of knowing how to handle this kind of situation.  Let's say that you like natural primary keys better than artificial primary keys, so when you design this table you decide to use the customer code as your primary key.  Fortunately, your RDBMS has excellent cascading update capabilities and such, so changing the customer code for your customer from "XYZ" to "ABC" and changing all foreign key references is a snap.  But once you do so your screwed just like the person in your story who used artificial primary keys, because you've erroneously decided that no history of a change like this should be kept, and that's where the real problem lies.

    Does that make sense?  It's time to go home and I'm getting low on caffeine, so I doubt my ability to communicate effectively.

    No, you are making sense.  The bottom line is that mergers and acquisitions are a pain in the ass no matter how you model customers.  It's just degress of pain in the ass.

  • (cs) in reply to foxyshadis
    foxyshadis:

    If you have a table with fields that change constantly (defined as "often enough that cascaded updates that cause a major performance impact can't be scheduled for downtimes") you'd be insane to make that the primary key.


    not that I'm agreeing or disagreeing with you (or the person you are responding to here), but the correct response here may just be: "what downtime?"

    In theory our system is expected to be up and running 24*7 - there is no "regularly scheduled downtime"... we only take it down as an absolute last resort, so any way of avoiding having to do major database surgery is a Good Thing... even if it means an extra key.

    Of course I'm speaking in theory here as in practice our database structure sucks worse than the given WTF... but I was hoping just to inject an idea that maybe sometimes it's a good idea to avoid this sort of thing *ever* having to be done.
  • (cs) in reply to taryn

    erg - sorry.. replied while still reading through the thread... so have duplicated another post. Apologies - i'll try to remember not to do that again. :(

  • (cs)

    <FONT face=Arial size=2>Hi guys</FONT>

    <FONT face=Arial size=2>Just thought I'd add a couple more details of this system. The database is a SQL 2000 database (so "ground zero" is 1 Jan 1900 rather than the end of 1899), and dates are stored in datetime fields (small mercies, and all that) so referential integrity isn't an excuse. One table Alex removed from my original message to him, that I was particularly fond of, was tPercentage. I'm sure I don't need to give you the full details of what that contained. Let's just say that it had 100 rows that start like this:</FONT>

    <FONT face=Arial size=2>tPercentage</FONT>
    <FONT face=Arial size=2>PercentageID
    int identity</FONT>
    <FONT face=Arial size=2>PercentageValue
    varchar(50)</FONT>
    <FONT face=Arial size=2>1</FONT> <FONT face=Arial size=2>1</FONT>
    <FONT face=Arial size=2>2</FONT> <FONT face=Arial size=2>2</FONT>

    <FONT face=Arial size=2>I hadn't meant this to turn into a physical versus artificial key debate, though it's been enjoyable. My reaction to this was more based on the unnecessary tables and odd choice of data types. The system, incidentally, was developed within the last five years, and I don't think an ex-Cobol developer (despite the evidence to the contrary) had been anywhere near it.</FONT>

    <FONT face=Arial size=2>Dave</FONT>

  • (cs) in reply to Jeff
    Anonymous:

    Ok, this is a very interesting thread to me because I am NOT a database guy, though I have been called upon to design databases to support my applications.  In these cases I always give it my best shot and then take it to someone I respect who has been around databases longer than I have to give constructive input.  Actually, this is generally the same process I do with any design, but with databases it is more like approaching a teacher rather than a peer.  I say all this to demonstrate I am very open and realize my knowledge is limited.


     So here's a very specific question from a real database I’ll be re-designing in the next six months: (snip).  So there you go- textbook case for a multi column natural key right?  <!--[if !supportEmptyParas]-->You can set the maturity date and the Cusip as the primary key and you’re done.

    So now I store both pieces of information in every table that I want to relate to this bond, correct?  The Cusip and maturity date pair will proliferate through the data and my joins will now be on both fields.  Not what I’m used to, but I can handle that if it’s going to make my data integrity stronger.  But what do I do when the user suddenly realizes that the Cusip or maturity date was entered wrong and this error has been proliferated throughout the entire database?  My integrity is intact, but the data is very wrong and fixing it is going to take quite a bit of error-prone updates to the database.  The more relations there are to this table, the more difficult it is going to be to make sure they are all changed correctly.  I suppose I can just force the users to enter new records and do all of the updates by hand, but now I’m putting constraints on the user that are difficult to explain, especially when I’m not certain myself.  But if I only stored a reference to the data (an identity column) rather than proliferate the errors throughout the database, the data is easy and safe to correct.  So how to you (the the natural key camp) handle this situation which seems textbook made for a natural key?



    Answer: Start by designing it right.   Fully normalize, with only naterual keys.   Get this design right first.  Save it in version control for everyone to see (even though it will soon be obsolete and only refered to again in a debate).   Then step back and sleep on it a little.   

    Now use experience to intelligently break accidemic theories.   It looks like you have a good excuse to create an artifical key.   You (or your expert friends) may also have some experience in the domain that says some intelligent de-normalization may be useful to make the database faster (in someplace where speed is needed), you can do this too, but make sure you carefully document your reasoning.   Also make sure that this isn't premature optimization which is more likely when doing something like this.
  • (cs) in reply to UncleMidriff
    UncleMidriff:
    Jeff S:
    A numeric code being assigned to an entity as part of your business process NOT the same as tagging every row of every table with a meaningless, random ID, and then using that meaningless, random ID to completely definie all relations in all tables throughout your system!


    Why?

    How does a randomly generated number from a business rule differ so much from a randomly generated number from the developer?  Why is the former ok, but the later magically breaks the relational model into a billion pieces, makes both Codd and Date cry bitter tears, and makes your entire business explode?


    Exaggeration aside, it is because the first number has meaning outside of the database, and the second does not.  For example, a bank account number has meaning Out There.  A problem with the internal number is that it can escape.  What happens then?

    Sincerely,

    Gene Wirchenko

  • (cs) in reply to Gene Wirchenko

    Gene Wirchenko:

    Sincerely,

    Gene Wirchenko

    <FONT face="Courier New" size=2>dear gene,</FONT>

    <FONT face="Courier New" size=2>can i put your signature on a t-shirt?</FONT>

    <FONT face="Courier New" size=2>casually bloated,</FONT>

    <FONT face="Courier New" size=2>emptyset</FONT>

  • (cs) in reply to Gene Wirchenko
    Gene Wirchenko:
    A problem with the internal number is that it can escape.  What happens then?


    You hunt it down. With dogs.
  • (cs) in reply to emptyset
    emptyset:

    Gene Wirchenko:

    Sincerely,

    Gene Wirchenko

    <font face="Courier New" size="2">dear gene,</font>

    <font face="Courier New" size="2">can i put your signature on a t-shirt?</font>

    <font face="Courier New" size="2">casually bloated,</font>

    <font face="Courier New" size="2">emptyset</font>



    I like how Gene is so full of himself that he won't reply to this or any other question of him. What a pompous jerk!

    Sincerely,

    Richard Nixon
  • Vidar Hokstad (unregistered) in reply to Jeff S
    Jeff S:

    To simplify:  You need a table of U.S. states.  You create "StateID" as PK.  One day, we get two entries for "North Dakota".  So we add a constraint saying only 1 entry per state name.  Then someone enters "N. Dakota".  More problems. Now what?  Looking at the data, we realize we can add a constraint saying only 1 entry per 2-char abbreviation.  Now we are feeling good.  And, of course, this all could have been avoided if we just declared the PK as the standard 2-char state abbreviation in the first place, and now we have extra columns and extra constraints in our table, and a meaningless "StateID" value all over our place in which a join is required just to print the 2-char abbreviation.



    And then you use the two char abbreviation to refer to that everywhere, including outside the database, and suddenly one day North Dakota decides to change name to just Dakota and decides to change it's two letter abbreviation as well, and you're up the creek without a paddle, having to figure out all the places you've used the two letter abbreviation. Or the US government decides to switch to three letter codes for whatever reason, and suddenly your two letter codes ARE artificial keys (though I would argue they already are - they are an identifier created to refer to the natural key - the state name - in a shorter more succint way).

    Country codes? Same problem - many country codes HAVE changed through the years to reflect country name changes, or have changed meanings as the countries they used to represent have been broken up.

    The point is that id fields are as much part of the data as anything else. VERY FEW things genuinely have a naturally occurring identity suitable for use as a natural key. People keep bringing up SSN's, but forget that 1) SSN's aren't always unique, 2) SSN's can change for a person, 3) non-US citizens doesn't have a SSN and might not even have a unique ID number.

    The same is true for a lot of other things. You brought up addresses, and tried to justify it by introducing an "addressType". However what about people with more than one home address? More than one office address? You would need to be able to refer to a specific address for the purpose of referring to it from another table, and you'd need to add something like "home address #2", at which point you might as well add a unique address id and stop shooting yourself in the foot.

    But let's do something difficult. How do you uniquely identify a person? A customer, for instace? You can't use name, obviously - not only will you have duplicates, but they change too to make things more painful. You can't even use name and address - plenty of families have more than one person of the same name living at the same address. You can't use SSN as mentioned above. You can't use passport numbers - lots of people doesn't have them and they're not guaranteed to be unique.

    THERE IS NO NATURAL KEY UNIQUELY IDENTIFYING AN INDIVIDUAL. Heck, even the full DNA structure won't be sufficient thanks to the joys of cloning. You have to invent a customer id. Now, you can try to pretend that this is part of the data so it isn't really an artificial key, but that is just bullshit. Artifical keys exists just for this purpose: When your assumptions about what constraints are valid on the candidate keys can't be guaranteed to always hold.

    As a bonus they are useful because RDBMS's do not implement the relational model in a way that makes it efficient to use arbitrary long keys and because querying based on them is inefficient with SQL. If SQL let you do "SELECT foo,bar FROM table1,table2 WHERE table1.key(somekey) = table2.key(anotherkey)" or something similar, and automatically used artificial keys and constraints behind your back to optimize things and to handle changes in the key values smoothly, then I might agree for some cases (but that still leaves the cases where the identity of a relation needs to be referenced outside the control of the DBMS).

    In practice, I can think of extremely few things that can be uniquely identified by natural data that are guaranteed unchanging. Locations? Names, zip code/postal codes, even country names and codes changes all the time. You could use latitude/longitude, but even that may be problematic (a region might be subdivided or a country might see regions gain indepenence, for instance, and the lat/long you've used may turn out to be in a part that's breaking away).

    Here in the UK county borders have changed many times through history, including recent decades - the area I lived in changed counties AND had a land area more than the size of the original borough, yet the current Croydon borough (now part of London) is a direct continuation of the then much smaller borough of Croydon, Surrey, so this isn't just an academic example.

    What's the natural key for an order? Surely not the items ordered, or the customers details, or the price or time or any combination of that, as they're not guaranteed to be unique? The closest to a "natural key" would be an invoice id - again an artificial piece of information (an aritifical key originally created for paper objects if you like), but even that isn't safe - it is not uncommon for a single order to result in multiple invoices or for a single invoice to refer to multiple ordes. And while that may not be the case in your application today, expect it to change the next time sales, marketing or finance gets a bright idea.

    I challenge you to list 5 things likely to occur in a real application that have "natural keys" that aren't simply artifical keys created to manage the things they identify in the real world, and that are actually likely to be unique and stable (i.e. won't need to change)

    Vidar Hokstad

  • Vidar Hokstad (unregistered) in reply to Jeff S
    Jeff S:

    In situation #1, the "No" answer, you have a useless identity as a primary key in addition to your unique constraint.  So you have wasted valueable space that many claim to save by using an identity, and added an extra index as well.  And now, of course, you cannot expand your table when you merge, since your constraint doesn't allow it.  Nothing gained by using your indentity column in your "merger" scenerio.  And every table that has an "AccountID" column requires a join just to display the AccountNumber, or to do a simple sort, that otherwise would not be necessary if that number was stored in your tables directly as a natural foreign key.



    Not a very good try. Account numbers change. "Often". While the account itself stays the same. The account number will often be referenced from large numbers of other systems, including systems that will NOT store their data in your database.

    And account numbers may/will be recycled - the frequency would depend on bank policy and legislation, but it's not like the numbers are limitless.

    A typical example of this is a bank with multiple sort codes that denotes a specific branch - though the branch system is slowly breaking down in most countries, some banks will change the account number if someone wishes to change the branch. But the account is the same - only the id used to refer to it EXTERNALLY has changed.

    And an account may often be referred to by multiple numbers. Case in point: Credit card accounts, which usually have multiple card numbers associated with them.

    In both these cases not tieing the account number to the identity of the account  serves an important purpose: It decouples the two so that when one changes, it doesn't require an immediate update of half a dozen other systems, some of which may even be beyond the banks control (such as standing orders with companies etc.).

    I recently changed my card numbers, but my old card numbers can still be used as payment references to settle my card balances - they're just blocked so they can't be used to charge the account any more.

    So what would the natural keys for a bank account or credit card account be?

    The nearest thing I could think of would be the original account number + the creation date, assuming no account numbers will be recycled within a single day (if so you obviously only increase the granularity). But this is information that really is no better than an artificial key for a bank account relation - the original account number alone only has value for a limited period before it becomes just a meaningless value used as part of a key.

    Vidar Hokstad



  • (cs) in reply to Richard Nixon

    Richard Nixon:

    I like how Gene is so full of himself that he won't reply to this or any other question of him. What a pompous jerk!

    <FONT face="Courier New" size=2>if there are no objections, i'm going to look into cafepress'ing this puppy.</FONT>

  • (cs) in reply to emptyset

    I've been reading the DWTF for probably close to a year now, but haven't bothered to sign up or comment on anything, until today.

    I'm a hobby programmer, with no formal CS education and nobody to learn from face-to-face, since I work in an unrelated field. And as a hobbyist the types of problems I'm exposed to are relatively limited (no large databases, for example). This site is a great way to hear experts', and self-proclaimed experts', comments about a broader range of topics.

    To me as a reader, this has been the best thread in a very long time. Long and thorough discussions of fundamental questions, with lots of examples, and far more interesting to read than a textbook. Thanks guys. [Y]

  • (cs) in reply to Vidar Hokstad
    Anonymous:
    Jeff S:

    In situation #1, the "No" answer, you have a useless identity as a primary key in addition to your unique constraint.  So you have wasted valueable space that many claim to save by using an identity, and added an extra index as well.  And now, of course, you cannot expand your table when you merge, since your constraint doesn't allow it.  Nothing gained by using your indentity column in your "merger" scenerio.  And every table that has an "AccountID" column requires a join just to display the AccountNumber, or to do a simple sort, that otherwise would not be necessary if that number was stored in your tables directly as a natural foreign key.



    Not a very good try. Account numbers change. "Often". While the account itself stays the same. The account number will often be referenced from large numbers of other systems, including systems that will NOT store their data in your database.

    And account numbers may/will be recycled - the frequency would depend on bank policy and legislation, but it's not like the numbers are limitless.

    A typical example of this is a bank with multiple sort codes that denotes a specific branch - though the branch system is slowly breaking down in most countries, some banks will change the account number if someone wishes to change the branch. But the account is the same - only the id used to refer to it EXTERNALLY has changed.

    And an account may often be referred to by multiple numbers. Case in point: Credit card accounts, which usually have multiple card numbers associated with them.

    In both these cases not tieing the account number to the identity of the account  serves an important purpose: It decouples the two so that when one changes, it doesn't require an immediate update of half a dozen other systems, some of which may even be beyond the banks control (such as standing orders with companies etc.).

    I recently changed my card numbers, but my old card numbers can still be used as payment references to settle my card balances - they're just blocked so they can't be used to charge the account any more.

    So what would the natural keys for a bank account or credit card account be?

    The nearest thing I could think of would be the original account number + the creation date, assuming no account numbers will be recycled within a single day (if so you obviously only increase the granularity). But this is information that really is no better than an artificial key for a bank account relation - the original account number alone only has value for a limited period before it becomes just a meaningless value used as part of a key.

    Vidar Hokstad



    Vidar !  This is great stuff.  Thank you for your posts.  They are missing one thing, though: can you let me know what your conclusion is after all this, or what your thesis is?  You made a large amount of points using several examples, but can you tie it all together for me so I can really get a full feel for your perspective on the issues.  (I..e, what kinds of keys are good on what kinds of tables, rules of thumb to use when modelling, physical versus logical, etc)

    Thanks for taking the the time to respond.

     

  • Vidar Hokstad (unregistered) in reply to Jeff S
    Jeff S:

    One more thing to consider:

    So you create a database-only surrogate key, CustID, is strictly internal for relations in your database.  The customer code, "XYZ", is our business customer code which we stamp invoices with, customers use when they call, etc., but we avoid this as our natural key and use a surrogate CustID identity.  This allows us to change the customer code as often as we want, right?

    OK, let's take customer ID #45, and change the customer code from XYZ to ABC to correct a mistake or to update our records.  Done!  No messy cascading updates, no fuss, nothing else changed!  Life is good. Take a vacation.

    Two weeks later, someone calls customer support for help with your product.  

    "No problem", says the support rep on the phone," what is your customer number?" 

    Unfortunately, the person calling doesn't know that the customer # has changed, and tells you XYZ.  Customer not found !  hmmm ... must have forgotten the customer code.  Thinking quick, looks up the invoice for the product which had the customer code printed on it.   He finds it, and there is again: XYZ! 

    "Still not found," replies the support person.  Then he has an idea:  "Well, since you have the invoice, what is the invoice # that you found? That will help me get the customer code."  The customer replies invoice #1235.  He looks it up, sees that it is linked to customer ID#45.  Customer ID#45 has a customer code of .... "ABC" !  

    "Sir, something's not right here, the system says invoice #1235 was for customer ABC! "

    "But right here, on the invoice i am looking at, it says invoice #1235, customer code XYZ!  I am holding it in my hand!"

    "Uh .. .please hold!"

     


    Or more likely (per a recent experience with my card company):

    - Blocked my cards because I lost my wallet.

    - Few days later I had to do a transfer, but hadn't gotten my new cards. Called customer service:

    "Hi, I need to do a transfer, but I don't have my new card number - I reported my cards lost a few days ago. Can you help?"

    "Sure, what's your old number?"

    "...."

    "Ok, looking up your account now. There, just provide me with some answers to some security questions, and then I'll give you the new number that's been linked to this account and we can arrange the transfer for you.

    Me: gives them the security details.

    Generally account numbers/card numbers is a many to one relationship to your account, where the key linking them have no semantics. It may be your original account number or account number plus creation date, or whatever, but there's no reason to do so - the data linking the two has only one function: To reference a unique account from an account/card number entity.

    And for your invoice example, I would NEVER, EVER design an invoice table that doesn't either duplicate ALL information that should be displayed on the invoice OR reference tables where I know the information will NEVER be updated or deleted (insertions only) - I do realise you're not advocating referring to data that will change here... Just wanted to point this one out. I've seen changing invoice data in too many US systems - here in the EU your auditors would get a stroke and your finance director would start worrying about jail time if they find out the development team has been building systems where issued invoices can change...

    My solution for the scenario above would be to add new records, and provide a mapping from the user visible customer id's to an artificial key. I  would not update the old ones, and I'd provide the customer care team with a simple way of looking up past user id's for this user.. I'd also as mentioned make bloody sure the invoice table kept unchanging copies of any field used for the invoice - and most likey I would keep an HTML or PDF rendered version of the invoice on original insertion. Yes, finance people often are - and more often should be - paranoid about ensuring their invoices never changes.

    Vidar Hokstad

  • Vidar Hokstad (unregistered) in reply to Gene Wirchenko
    Gene Wirchenko:

    Exaggeration aside, it is because the first number has meaning outside of the database, and the second does not.  For example, a bank account number has meaning Out There.  A problem with the internal number is that it can escape.  What happens then?

    Sincerely,

    Gene Wirchenko



    Ah, but the fact that the first number has meaning outside of the database is one of the main reasons why I would generally (not always) avoid using it as a primary key - things that have meaning outside the database have an ugly tendency to change. Both meaning, format, and for values for specific instances.

    If the internal number "escapes", then you have the same problem with that as you do with any other number with meaning outside the database, but no worse. I'd rather take the risk that a value may gather some limited meaning outside the database than choose a primary key that definitively does have a value outside the database and where the meaning is under the control of someone else.

    There are cases where values are stable enough for me to use natural keys as the primary key, but not often. Suddenly what was previously a unique key turns out to not be unique, or it needs to change but a significant number of external (to the database) systems contains references to the old value that you have no control over and you find yourself needing ot maintain a history of the externally visible id's mapped to a single entity.

    Vidar Hokstad

  • (cs) in reply to Vidar Hokstad
    Anonymous:
    Jeff S:

    To simplify:  You need a table of U.S. states.  You create "StateID" as PK.  One day, we get two entries for "North Dakota".  So we add a constraint saying only 1 entry per state name.  Then someone enters "N. Dakota".  More problems. Now what?  Looking at the data, we realize we can add a constraint saying only 1 entry per 2-char abbreviation.  Now we are feeling good.  And, of course, this all could have been avoided if we just declared the PK as the standard 2-char state abbreviation in the first place, and now we have extra columns and extra constraints in our table, and a meaningless "StateID" value all over our place in which a join is required just to print the 2-char abbreviation.



    And then you use the two char abbreviation to refer to that everywhere, including outside the database, and suddenly one day North Dakota decides to change name to just Dakota and decides to change it's two letter abbreviation as well, and you're up the creek without a paddle, having to figure out all the places you've used the two letter abbreviation. Or the US government decides to switch to three letter codes for whatever reason, and suddenly your two letter codes ARE artificial keys (though I would argue they already are - they are an identifier created to refer to the natural key - the state name - in a shorter more succint way).

    Country codes? Same problem - many country codes HAVE changed through the years to reflect country name changes, or have changed meanings as the countries they used to represent have been broken up.

    The point is that id fields are as much part of the data as anything else. VERY FEW things genuinely have a naturally occurring identity suitable for use as a natural key. People keep bringing up SSN's, but forget that 1) SSN's aren't always unique, 2) SSN's can change for a person, 3) non-US citizens doesn't have a SSN and might not even have a unique ID number.

    The same is true for a lot of other things. You brought up addresses, and tried to justify it by introducing an "addressType". However what about people with more than one home address? More than one office address? You would need to be able to refer to a specific address for the purpose of referring to it from another table, and you'd need to add something like "home address #2", at which point you might as well add a unique address id and stop shooting yourself in the foot.

    But let's do something difficult. How do you uniquely identify a person? A customer, for instace? You can't use name, obviously - not only will you have duplicates, but they change too to make things more painful. You can't even use name and address - plenty of families have more than one person of the same name living at the same address. You can't use SSN as mentioned above. You can't use passport numbers - lots of people doesn't have them and they're not guaranteed to be unique.

    THERE IS NO NATURAL KEY UNIQUELY IDENTIFYING AN INDIVIDUAL. Heck, even the full DNA structure won't be sufficient thanks to the joys of cloning. You have to invent a customer id. Now, you can try to pretend that this is part of the data so it isn't really an artificial key, but that is just bullshit. Artifical keys exists just for this purpose: When your assumptions about what constraints are valid on the candidate keys can't be guaranteed to always hold.

    As a bonus they are useful because RDBMS's do not implement the relational model in a way that makes it efficient to use arbitrary long keys and because querying based on them is inefficient with SQL. If SQL let you do "SELECT foo,bar FROM table1,table2 WHERE table1.key(somekey) = table2.key(anotherkey)" or something similar, and automatically used artificial keys and constraints behind your back to optimize things and to handle changes in the key values smoothly, then I might agree for some cases (but that still leaves the cases where the identity of a relation needs to be referenced outside the control of the DBMS).

    In practice, I can think of extremely few things that can be uniquely identified by natural data that are guaranteed unchanging. Locations? Names, zip code/postal codes, even country names and codes changes all the time. You could use latitude/longitude, but even that may be problematic (a region might be subdivided or a country might see regions gain indepenence, for instance, and the lat/long you've used may turn out to be in a part that's breaking away).

    Here in the UK county borders have changed many times through history, including recent decades - the area I lived in changed counties AND had a land area more than the size of the original borough, yet the current Croydon borough (now part of London) is a direct continuation of the then much smaller borough of Croydon, Surrey, so this isn't just an academic example.

    What's the natural key for an order? Surely not the items ordered, or the customers details, or the price or time or any combination of that, as they're not guaranteed to be unique? The closest to a "natural key" would be an invoice id - again an artificial piece of information (an aritifical key originally created for paper objects if you like), but even that isn't safe - it is not uncommon for a single order to result in multiple invoices or for a single invoice to refer to multiple ordes. And while that may not be the case in your application today, expect it to change the next time sales, marketing or finance gets a bright idea.

    I challenge you to list 5 things likely to occur in a real application that have "natural keys" that aren't simply artifical keys created to manage the things they identify in the real world, and that are actually likely to be unique and stable (i.e. won't need to change)

    Vidar Hokstad



    Vidar,

    I echo Jeff S.  Good Stuff.

    I have a couple of points.

    First, I don't think that anyone here has been arguing that SSN is a good key.  In fact I have used it several times as an example of a something that seems seductive but makes a poor key.

    Second, we don't build data models in a vaccuum.  We build them to implement business processes of some sort.  So while there may not be a general unique natural key for employee, invoice, or order, a business process will frequently implement one.  Every business of reasonable size will implement some sort of employee number, invoice number, or order number.  Our data models are supposed to be reflective of the business processes we are building.  It is impossible to build a generic data model, so our choice is really what parts of the business rules get built into the system and what parts get abstracted out.  If some attribute isn't likely to change and has a reasonable set of business rules around it, we can use it as a natural key.  Otherwise we fall back to an artificial key.  This is where a database designer needs a good understanding of the business domain so that he can make appropriate judgements.
  • Vidar Hokstad (unregistered) in reply to Jeff S

    Jeff S:
    Anonymous:

    Not a very good try. Account numbers change. "Often". While the account itself stays the same. The account number will often be referenced from large numbers of other systems, including systems that will NOT store their data in your database.

    And account numbers may/will be recycled - the frequency would depend on bank policy and legislation, but it's not like the numbers are limitless.

    A typical example of this is a bank with multiple sort codes that denotes a specific branch - though the branch system is slowly breaking down in most countries, some banks will change the account number if someone wishes to change the branch. But the account is the same - only the id used to refer to it EXTERNALLY has changed.

    And an account may often be referred to by multiple numbers. Case in point: Credit card accounts, which usually have multiple card numbers associated with them.

    In both these cases not tieing the account number to the identity of the account  serves an important purpose: It decouples the two so that when one changes, it doesn't require an immediate update of half a dozen other systems, some of which may even be beyond the banks control (such as standing orders with companies etc.).

    I recently changed my card numbers, but my old card numbers can still be used as payment references to settle my card balances - they're just blocked so they can't be used to charge the account any more.

    So what would the natural keys for a bank account or credit card account be?

    The nearest thing I could think of would be the original account number + the creation date, assuming no account numbers will be recycled within a single day (if so you obviously only increase the granularity). But this is information that really is no better than an artificial key for a bank account relation - the original account number alone only has value for a limited period before it becomes just a meaningless value used as part of a key.

    Vidar Hokstad



    Vidar !  This is great stuff.  Thank you for your posts.  They are missing one thing, though: can you let me know what your conclusion is after all this, or what your thesis is?  You made a large amount of points using several examples, but can you tie it all together for me so I can really get a full feel for your perspective on the issues.  (I..e, what kinds of keys are good on what kinds of tables, rules of thumb to use when modelling, physical versus logical, etc)

    Thanks for taking the the time to respond.

     



    Jeff,
    I must admit I first thought you were dead set against using artificial keys, but as I read through the rest of your messages I think we agree about quite a few things.

    I'm certainly not against using natural keys where they are guaranteed to be stable, or where changes doesn't matter (i.e. where cascading updates isn't a problem). For me, using artificial keys serve two purposes:
       - I would use them if the externally visible key is not guaranteed stable and I need stability, or where they are not guaranteed unique. I find that this is the case much more often than I'd like. I.e. the account number issue.
      - I would use them as a possible optimization/simplification where the primary key would otherwise be a large composite key. I see this as overcoming database shortcomings - I mentioned at some point that I wished SQL had a simple way of requesting equality of a full composite key against a full composite key of another table without explicitly listing the fields, and that the DBMS then could do the whole artificial key optimization for us behind the scenes exactly when it makes sense from a performance point of view. That would save a whole lot of uglines and a whole lot of profiling that otherwise might be needed for pragmatic performance reasons.

    The first reason above is really the most important for me. The more I work with databases the more I see uniqueness constraints on externally visible keys breaking down, and/or needs to maintain history and/or changes that needs to be contained.

    My general solution to that is to either introduce a purely internal artificial key AND if the system needs to maintain history I will likely either pull the real key out into a separate mapping table OR create a mapping table for historical data.

    I see either as workarounds to contain "real world breakdown" on what is otherwise sound theoretical models by containing all the ugliness that stems from dealing with actual customers (internal to the organisation or external).

    How far I go depends on how glacial the changes are (I mentioned US states changing names as a possibility, but more as an example of how you always need to take your specific needs into account before assuming data doesn't change... For most apps. an occasional change like that doesn't warrant the introduction of artificial keys) and how static my data needs to be.

    For an invoicing system for instance my basic assumption is: The invoice NEVER EVER changes, and any data it refers to needs to be static, but customer data rarely is, so I either introduce an indirection, most likely using artificial keys that should never make it out of  the system anywhere, or I duplicate data. I may end up keeping the customers address in the invoice table, for instance, or copy addresses into a never changing invoice address table, or I may end up referring to an address table (this is one of those rare occasions where an address makes sense outside of a location).

    I probably end up with more artificial keys than many people would like, often out of paranoid after seeing what kinds of "static" data suddenly changes ;) But it is not that I have a strong objection to use natural keys, only that I guess I have a very pessimistic view on what types of data actually is unique and changes rarely enough, and also that I often work with systems where the possible natural keys are used all over the place externally and where a change may take a lot of time to propagate to all external systems that reference them (hence the mentioned mapping tables/history tables).

    In the long term I think this is much more of a problem with existing database technology than it is an issue with the relational model - database systems COULD do a lot more to reduce the need (or temptation if you like...) to use artificial keys by providing opaque ways of doing many of the things artificial keys are often used for, or at least abstracting away the actual values of those keys - if I use an artificial key to map an external id to a specific entity I don't care about the value of the artificial key.

    In fact, I'd love it if the value of it was never, ever visible to anyone. What I want is simply an efficient way of linking one or more external id's to a unique record that may not have any suitable natural keys or where the natural keys that would otherwise be used are simply too large or change too frequently. That SHOULD be an implementation detail of the database. Obviously some places the values would need to be returned, but one way of doing this might be to introduce a datatype that only get bound to a specific user visible value during the lifetime of a single connection to the database. SPARQL, one of the query languges for RDF, allows this for attributes that does not have a provided value (they are essentially used as unnamed nodes in a graph), to able to explicitly prevent you from assigning any long term meaning to the returned attribute.

    Vidar Hokstad

  • (cs) in reply to Vidar Hokstad
    Vidar Hokstad:

    In the long term I think this is much more of a problem with existing database technology than it is an issue with the relational model - database systems COULD do a lot more to reduce the need (or temptation if you like...) to use artificial keys by providing opaque ways of doing many of the things artificial keys are often used for, or at least abstracting away the actual values of those keys - if I use an artificial key to map an external id to a specific entity I don't care about the value of the artificial key.

    In fact, I'd love it if the value of it was never, ever visible to anyone. What I want is simply an efficient way of linking one or more external id's to a unique record that may not have any suitable natural keys or where the natural keys that would otherwise be used are simply too large or change too frequently. That SHOULD be an implementation detail of the database...

    Vidar Hokstad


    I'm reminded of the choice in Java to place pointers out of reach and instead use an opaque object reference.  In this way, there is no temptation to do unsafe manipulation of pointers.
  • Vidar Hokstad (unregistered) in reply to RevMike
    RevMike:

    I'm reminded of the choice in Java to place pointers out of reach and instead use an opaque object reference.  In this way, there is no temptation to do unsafe manipulation of pointers.


    As much as I dislike Java, that was one of the good decisions they made IMHO. There are the occasional instances where being able to do pointer arithmetic is useful in C/C++, but far more often than not when it is being used it is for things where it's complete pointless... Interestingly enough current practice for C++ is moving more and more in the direction of avoiding bare pointers and instead use smart pointer templates that hide the actual pointer values - which also is pretty close to what I had in mind.

    It would remove the issue that Gene Wirchenko pointed out as well about the id's "escaping" - if they're on purpose modified for each database connection, then even if some of the values "escapes" people will quickly learn that those values are useless to them for any other purpose than as a temporary system internal reference to an entity.

    I'm almost tempted to do a proof of concept implementation for some simple DB like Sqlite where it would be quite easy to hook it in.... If only I had time :)

    Vidar

  • (cs) in reply to Vidar Hokstad

    Anonymous:
    I'm almost tempted to do a proof of concept implementation for some simple DB like Sqlite where it would be quite easy to hook it in.... If only I had time :)

    <FONT face="Courier New" size=2>vidar,</FONT>

    <FONT face="Courier New" size=2>bring me the gentle force of magma.  take it to the accumulator and continuously destroy the toppings.  please produce a notarized stamp in the likeness of the virgin mary; my heart condition requires it.  walk my minature schnauzer in the darkness of the full moon.  cook me the soup of a thousand meals and provide a large stack of those addictive crackers you crumble up and throw in.  make the signal propagate through the air and land in the basement of a man by the name of charlie: he's got red hair and a big cigar.  participate in the octagon and win the prize.  watch out for snakes.  feed the giant panda before it destroys the fig tree.  ordain yourself and marry this lamp to my third cousin.</FONT>

    <FONT face="Courier New" size=2>sincerely,</FONT>

    <FONT face="Courier New" size=2>emptyset</FONT>

  • (cs) in reply to RevMike
    RevMike:

    Second, we don't build data models in a vaccuum.  We build them to implement business processes of some sort.  So while there may not be a general unique natural key for employee, invoice, or order, a business process will frequently implement one.  Every business of reasonable size will implement some sort of employee number, invoice number, or order number.  Our data models are supposed to be reflective of the business processes we are building.  It is impossible to build a generic data model, so our choice is really what parts of the business rules get built into the system and what parts get abstracted out.  If some attribute isn't likely to change and has a reasonable set of business rules around it, we can use it as a natural key.  Otherwise we fall back to an artificial key.  This is where a database designer needs a good understanding of the business domain so that he can make appropriate judgements.


    If you build a system especially for one company, you can depend on their business rules. They say "emplyee numbers are unique and never change" and you use it a your primary key. No problem.
    On the other hand, if you create a system that is meant to be sold to several companies, you have to be paranoid about it. Your next customer might code the department into the emplyoee number, so it changes whenever a employee is moved to another department. Or they might have several profit centers, and the employee numbers are only unique within one profit center.
    The main business of the company I'm working for is creating warehouse management systems, and after 10 years or so in business, we eventually decided to put artificial IDs into each and every database table. This gives us a better chance that at least some of the code can be reused. As far as I can tell after about 5 years of running this strategy, these are the drawbacks I've noticed:
    a) Additional joins are necessary to retrieve what would otherwise be the natural key; but this should not be overestimated, since in many cases the join is necessary anyway (e.g. a list of order items not only shows the product number, but also the product name, so I have to do the join anyway).
    b) More difficult are situations where the whole content of a table is regulary imported from an external system and the easiest way to synchronize would be to delete the old data and insert the new version. (Of course referential constraints have to be disabled to do that). Artificial keys make the delete-insert method impossible, since the newly assigned IDs would not match the old values.
    c) In some cases it is desirable to be able to delete a database row that is referenced by another database row, and keep the other row.  (Of course you need to omit the referential constraint to do that.) If the foreign key in that other row is a natural key, external systems can be used to look up it's meaning should it be ever necessary, which is "good enough" in some cases. For example, it can be desirable to delete orders (in the warehouse management system) a few weeks or months after they have been processed, but keep the stock transaction log (which refers to the orders) longer.

    On the plus side, the issues of changing/non-unique natural keys are gone; code resuse between different customers has become much easiser. E.g. a STOCK_ITEM is always located on a LOCATION and contains a PRODUCT, but the structure of the natural keys of LOCATION and PRODUCT varies between companies; so with artifical keys, moving a stock item to a new location is always update stock_item set location_id = :new_location_id where stock_item_id = :stock_item_id, while natural keys might force me to write update stock_item set location_no = :new_location_no where warehouse = :old_warehouse and location_no = :old_location_no and company_no=:company_no and product_no = :product_no and product_variant = :product_variant for one company and something completely different for the next. (This example has been simplified, in reality it's more complex)
    The fact that all keys have a predictable name and the same datatype gives us additional benefits on the technical side.
  • (cs) in reply to ammoQ

    (part 2, since the forum software did not post the final version... I guess everything I changed after switching back from HTML to Design is gone)

    Drawback d): ad-hoc queries are less readable, or you write longer queries to do all the joins

    To avoid the drawbacks where they really hurt, in some cases we use the "natural key" as foreign key in other tables. (We keep the artificial IDs because of the mentioned technical benefits)

  • (cs) in reply to ammoQ
    ammoQ:
    If you build a system especially for one company, you can depend on their business rules. They say "emplyee numbers are unique and never change" and you use it a your primary key. No problem.


    Until they merge with another company and say "we need to change the employee numbers so that the two companies' employee number systems don't collide". I rather doubt they'll be happy with the answer "That's impossible. You said the numbers would never change and I based my DB design on that!"

    I think it was Martin Fowler who said "There are few things that are less logical than so-called business logic".
  • (cs) in reply to brazzy
    brazzy:
    ammoQ:
    If you build a system especially for one company, you can depend on their business rules. They say "emplyee numbers are unique and never change" and you use it a your primary key. No problem.

    Until they merge with another company and say "we need to change the employee numbers so that the two companies' employee number systems don't collide". I rather doubt they'll be happy with the answer "That's impossible. You said the numbers would never change and I based my DB design on that!"

    The correct answer in that case: "It's possible, but it will cost a lot. The requirements specification said the numbers would never change and I based my DB design on that!"

  • (cs) in reply to ammoQ
    ammoQ:
    brazzy:
    ammoQ:
    If you build a system especially for one company, you can depend on their business rules. They say "emplyee numbers are unique and never change" and you use it a your primary key. No problem.

    Until they merge with another company and say "we need to change the employee numbers so that the two companies' employee number systems don't collide". I rather doubt they'll be happy with the answer "That's impossible. You said the numbers would never change and I based my DB design on that!"

    The correct answer in that case: "It's possible, but it will cost a lot. The requirements specification said the numbers would never change and I based my DB design on that!"



    Don't forget that for fundamental changes such as this, lots of things need to change and the data model is only one of them.  To use your employee number change as an example:  Even if the data model is quick and easy to change, it isn't hard to imagine that easily 50% of your transaction screens and reports will need to be modified.  Unless someone did a truly amazing design job, the application changes are likely going to dwarf the data model changes regardless.

    Anytime you modify a key that is fundamental to your system, it is going to be expensive.  Should adding a few hundred thousand dollars on a several million dollar project that may or may not happen at indeterminate time in the future drive your design decisions now?  This potential future expense has to be balanced against needing to perform additional joins for virtually every query right now, as well as maintain additional indexes and constraints for a large percentage of your tables.
  • (cs) in reply to ammoQ

    ammoQ:
    brazzy:
    ammoQ:
    If you build a system especially for one company, you can depend on their business rules. They say "emplyee numbers are unique and never change" and you use it a your primary key. No problem.

    Until they merge with another company and say "we need to change the employee numbers so that the two companies' employee number systems don't collide". I rather doubt they'll be happy with the answer "That's impossible. You said the numbers would never change and I based my DB design on that!"

    The correct answer in that case: "It's possible, but it will cost a lot. The requirements specification said the numbers would never change and I based my DB design on that!"

    This is another one that drives me nuts.  As "RevMike" points out, changing the database tables is the least of your worries.  Your entire application changes.  This is not a good argument for never using a natural key; it is a good argument for thinking ahead and designing your database with flexibility.  Using natural keys does not in any way inhibit flexibility or growth if you design things properly.  In fact, it forces you to think ahead and forces your application coders and report writers to plan ahead when they write their code if they are using a properly designed datamodel and not just pulling ID's everywhere. 

    Often, using surrogates gives you that false impression that everything is perfect and flexible and a change is just a quick UPDATE statment to change customer numbers.  That mindset is far, far more dangerous and costly than actually taking the time to design your model to be flexible.

    There is also a huge contradiction in this scenerio.  Why do we have to change employee numbers? Well, because of the merger, we have operlapping employee numbers.  But we are changing the employee numbers!  By definition, if we can do that, then we can avoid the overlaps !  Either we are changing them , or we are not!   If we can change the old numbers to append a "1-" before them, and change the new numbers to append a "2-" before them, why do we have to change both the old numbers and the new numbers?  In either scenario, we assign the new guys new numbers !

     

Leave a comment on “A Truly ID-iotic Design”

Log In or post as a guest

Replying to comment #:

« Return to Article