• (cs) in reply to Hank Miller
    Anonymous:
    Jeff S:

    What should the primary key be of a table listing U.S. States?



    The two letter post office abbreviation, or the full name spelled correctly.

    Do not use a number!   States were signed into being in order.  It may seem like trivia, but to some this order is important.  Someone will assume meaning of your numbers when it doesn't exist.  Calling Alaska anything other than the 50th state (49th?   I wasn't born then and can't remember) would be wrong.  

    You cannot use this number as any key.  There is no 40th state, both North Dakota and South Dakota claim 39.   (The president covered their names before he signed them into being, and then shuffeled the papers before revealing the names)

    This is always the problem wish assigning something meaningless as a primary key - someone may assume meaning you did not intend.

    No, it's not a problem. The assumption is what's retarded.

    Primary keys should be meaningless and numeric, and their assignment should have NOTHING to do with the semantics of the data. I.e. a higher id should never be interpreted as "the data refer to a later event", as something could be backfilled, etc. That's what date/time/timestamp fields are for. Primary keys should be numeric because indexing and searching can be a lot more efficient that way.

    The primary key is a db-internal element and should be logically separate from the data in the database, just like page numbers in a book have nothing to do with the content on each page.

    For states, this is the optimal design:

    <FONT face="Courier New">state_id</FONT>

    <FONT face="Courier New">state_name</FONT>

    <FONT face="Courier New">state_abbreviation</FONT>

    and if you need it,

    <FONT face="Courier New">state_birth_order (<FONT face="Times New Roman">or</FONT> state_birth_date)</FONT>

    That's all there is to it. Using the state abbreviation as the PK is shortsighted because your needs might change (e.g. if you expand to foreign countries, or what have you) and scanning the table is always slower than a nicely indexed numerical key lookup.

    No wonder sites like this exist with what some folks post in response to WTFs.

  • (cs) in reply to Ben Cooke
    Anonymous:
    A guy I used to know had an interesting technique involving exceptions. For some inexplicable reason, he would pre-allocate certain exceptions ready to be thrown or even returned later. One interesting thing came of this, though: I discovered that Java's exceptions actually remember where they were created, not where they were thrown. His pre-allocated exceptions would always give the line number and stack trace for the point at which he called "new Exception", not the point at which he used the throw keyword. Weird stuff.


    Delphi always has a EOutOfMemory exception pre-allocated, for quite obvious reasons :P
  • (cs) in reply to rogthefrog

    "I have a table," I tell you, "in which I want to store information regarding the 50 States in the U.S.  What should the primary key should be?"

    You tell me "easy, an identity called "StateID".

    "Why?" I ask, "Surely a 2-char abbreviation would be a fine natural key, right?"

    And you respond "Of course not ! What if we ever want to store things other than US States in the U.S. States table!?  Then your database is broken, you moron!  Plan for the future!  Maybe we want to put 'Quebec' in there! Never use real data as primary keys in your tables!"

    Stop and think a minute and think about that.

    It's a table of U.S. States! 

  • (cs) in reply to Jeff S
    Jeff S:

    "I have a table," I tell you, "in which I want to store information regarding the 50 States in the U.S.  What should the primary key should be?"

    You tell me "easy, an identity called "StateID".

    "Why?" I ask, "Surely a 2-char abbreviation would be a fine natural key, right?"

    And you respond "Of course not ! What if we ever want to store things other than US States in the U.S. States table!?  Then your database is broken, you moron!  Plan for the future!  Maybe we want to put 'Quebec' in there! Never use real data as primary keys in your tables!"

    Stop and think a minute and think about that.

    It's a table of U.S. States! 

    What happens when W has his way, the US annexes Canada and Mexico, and you have to include all those province abbreviations?

    Seriously, using data as primary key is always retarded, and this is no exception.

  • (cs) in reply to rogthefrog

    rogthefrog:
    If we want to be really anal about it, "FBI" is an initialism. An acronym can be pronounced as a word, while an initialism is pronounced as discrete letters. "LED" should be an acronym, since "ledd" is perfectly pronounceable, but for some odd reason people pronounced it "ell-ee-dee". Compare that to a true acronym like "fubar" or "snafu".

    According to the dictionary 'acronym' is an acceptable synonym for 'initialism'.

  • (cs) in reply to dubwai

    Which is why I prefaced my speech with "if we want to be really anal about it".

  • (cs) in reply to rogthefrog

    >>Seriously, using data as primary key is always retarded, and this is no exception.

    Well, you have me convinced of the error of my ways!  Can't argue with that logic.

  • Some guy (unregistered) in reply to rogthefrog

    rogthefrog:

    Primary keys should be meaningless and numeric, and their assignment should have NOTHING to do with the semantics of the data. I.e. a higher id should never be interpreted as "the data refer to a later event", as something could be backfilled, etc. That's what date/time/timestamp fields are for. Primary keys should be numeric because indexing and searching can be a lot more efficient that way.

    ....

    No wonder sites like this exist with what some folks post in response to WTFs.

    You will not find a single expert in the field who will agree with you. Your assertion is based on an incorrect understanding of relational databases and goes against over fourty years of "lessons learned" in data modeling.

    I can tell that you have "self-taught" yourself databases. One thing you skipped over is the fundamentals. Go pick up a book on databases. I recommend Date's INTRODUCTION TO DATABASE SYSTEMS. You can find used copies for $5.

    Once you get a grasp of the Relational Model, you will probably realize how true your latter statement is, though not really in the way you intended it to be.

  • (cs) in reply to Jeff S
    Jeff S:

    >>Seriously, using data as primary key is always retarded, and this is no exception.

    Well, you have me convinced of the error of my ways!  Can't argue with that logic.


    Sarcasm doesn't work well when you ignore the reasoned arguments while focusing on the ones that imply your point is retarded.
  • (cs) in reply to Some guy
    Some guy:

    rogthefrog:

    Primary keys should be meaningless and numeric, and their assignment should have NOTHING to do with the semantics of the data. I.e. a higher id should never be interpreted as "the data refer to a later event", as something could be backfilled, etc. That's what date/time/timestamp fields are for. Primary keys should be numeric because indexing and searching can be a lot more efficient that way.

    ....

    No wonder sites like this exist with what some folks post in response to WTFs.

    You will not find a single expert in the field who will agree with you. Your assertion is based on an incorrect understanding of relational databases and goes against over fourty years of "lessons learned" in data modeling.


    Now forgive my ignorance, but my experience agrees with rog.  While I'm not qualified to make assertions regarding the performance of numeric vs. non-numeric keys, his other words ring quite true and I would be quite interested in exactly what arguments the experts would make that could enlighten me.

    PS: Yes, I "self-taught" myself databases.
  • (cs) in reply to Chris F
    Chris F:
    Jeff S:

    >>Seriously, using data as primary key is always retarded, and this is no exception.

    Well, you have me convinced of the error of my ways!  Can't argue with that logic.


    Sarcasm doesn't work well when you ignore the reasoned arguments while focusing on the ones that imply your point is retarded.

    Ignore the "reasoned arguments" ?????

    I made a direct statement refuting the  "reasoned argument" and the response was first sarcasm, then a general statement that "natural keys are retarded", ignoring my entire point. 

  • (cs) in reply to Jeff S
    Jeff S:
    Never use real data as primary keys in your tables!"


    I do want to point out that I'm not in the camp that you MUST use meaningless identity keys.  I just think that it's the best route to take in most cases.  In the case of a table of US States, using the state abbreviation as the PK would be perfectly reasonable.  In the case of this table, there's only going to be ~50 states, so performance should not be an issue. 

    However, whenever there's the possibility of large amounts of data, I believe it will be sheer folly to have a string-type PK.  String comparisons on joins will be more expensive than numeric comparisons.  Plus consider the possibility of having to do a join on a table that has a multi-column PK, that would be complex and even more expensive.

    I understand the arguments for natural keys, and in some cases I agree.  But what are the arguments against identity keys?  Other than the fact that they are meaningless, and that some developer from South Dakota might freak out when he sees that South Dakota has an Id of 40.
  • Anonymous (unregistered) in reply to Volmarias

    Would this not be better written as:

    <FONT face="Courier New" size=2>Object obj /*= null */;
    try
    {
       obj = someFunctionOrAnother(...);
    </FONT><FONT face="Courier New"><FONT size=2>} catch {
       obj = null;</FONT>
    </FONT><FONT face="Courier New" size=2>} finally {
       if(obj!=null) doReleaseStuff(obj);
    }</FONT>

    Or more likely:

    <FONT face="Courier New" size=2>


    Object obj = someFunctionOrAnother(...);
    try
    {
       ...
    } finally {
       doReleaseStuff(obj);
    }

    I'm not sure about Java, but I cannot think of a good reason to initialize an object to null... ever.  Maybe someone could provide a better example?</FONT>

    No, this would not. Java is pedantic about local objects; if it's not been initialized and you try to check the value, the compiler will yell at you. For example: String foo; if(bar) { foo = "bar"; } else { foo = "foo"; } System.out.println(foo); This code will cause a compiler error; foo may not have been initialized. Lets take your examples.

    <FONT face="Courier New" size=2>Object obj /*= null */;
    try
    {
       obj = someFunctionOrAnother(...);
    </FONT><FONT face="Courier New"><FONT size=2>} catch {
       obj = null;</FONT>
    </FONT><FONT face="Courier New" size=2>} finally {
       if(obj!=null) doReleaseStuff(obj);
    }</FONT>

    In the catch, why are you setting obj to null? Lets say that obj contains a critical resource that has to specifically be unlocked before it can be used again. If obj is not null, then you MUST unlock it with doReleaseStuff(obj). In your second example, if someFunctionOrAnother(...) throws an exception, it's not in the try block. And if you put the object declaration in the try block, then outside of the try/catch/finally, it will be out of scope. This could be awkward if you're going to do something else with it. Suffice to say, the fact that he's setting the object to null isn't a bad practice in and of itself; it's that he's setting it to null for no reason since the object will always be initialized in a way that the compiler will recognize, and that furthermore he's checking to see if it's null later on. Someone correct me if I'm wrong, but the way I've programmed java includes declaring objects initialized to null if their real initialization occurs in a further if, try, or other such scope.

    ----------------------------------------------------

    Assuming that the <FONT face="Courier New" size=2>someFunctionOrAnother<FONT face="Times New Roman" size=3> throws an exception, and this executing the catch block, there is no way that the object 'obj' would have a value assigned.  So there is no possibility of having critical resources to dispose.</FONT></FONT>

    I will revise the code such that if ANOTHER method throws an exception, you could dispose of the object:

    <FONT face="Courier New" color=#000000 size=2>Object obj = someFunctionOrAnother();
    try
    {
      ...some other code...
    }
    finally
    {
      //Unless obj is assigned a null value in the try block, there is no way it would be null here.
      obj.doReleaseStuff();
    }</FONT>

    <FONT face="Courier New" color=#000000 size=2>Oh wait, I aleady gave this example, and you could not state any objections.

    </FONT>

    <FONT face="Courier New" color=#000000><FONT face="Times New Roman" color=#0000ff>I still ask you to show me an example that REQUIRES you to initialize a value to null.  Initializing a value to null is sloppy and lazy, without exception.</FONT>

    </FONT>
  • (cs) in reply to Chris F

    Chris F:
    Some guy:
    You will not find a single expert in the field who will agree with you. Your assertion is based on an incorrect understanding of relational databases and goes against over fourty years of "lessons learned" in data modeling.


    Now forgive my ignorance, but my experience agrees with rog.  While I'm not qualified to make assertions regarding the performance of numeric vs. non-numeric keys, his other words ring quite true and I would be quite interested in exactly what arguments the experts would make that could enlighten me.

    PS: Yes, I "self-taught" myself databases.

    In my travels, I have been to all corners of North America. I've been atop mountans. I've peered out of skyscraper. I've even flown in a plane. With all my traveling experience, I have yet to find any evidence whatsoever that the earth is round; as far as I can tell, it's flat as a board (moutants et al withstanding).

    Despite my personal observations, I have not joined the Flat Earth Society. I rely on the expertise of others to mathematically prove a round earth. I trust the experience of those who have personally observed that the earth is in fact round.

    When you teach yourself something, you end up relying on your own observations and assumptions. When you rely on what you know, you lose out on volumes of collective knowledge and experience. Had you not read it in a science book, do you really think you'd come to the conclusion that the earth is round?

    The same holds true for databases. When a programmer self-teaches himself databases, he applies his own experience: procedural programming. This is what happened in the fifties when there were no databases and, therefore, no experience or expertise to go by. Programmers back then modeled data exactly like you do now: like a programmer.

    Fifty years, the data field has made a lot of progress. The best model we have now is the relational model. One of the fundamentals of this model is that data do not need to be accessed by pointers (meaningless keys); they can be accessed (and related) by their value.

    When you ignore this fundamental, your model becomes closer and closer to the ineffective ways of the past. Worse still, you lose the tools they had back then (IDMS pointer functions, for example) to keep their models running. As Santayana said, "Those who cannot remember the past are condemned to repeat it."

  • (cs) in reply to Anonymous
    Anonymous:
    <FONT face="Courier New" color=#000000 size=2>
    </FONT>

    <FONT face="Courier New" color=#000000><FONT face="Times New Roman" color=#0000ff>I still ask you to show me an example that REQUIRES you to initialize a value to null.  Initializing a value to null is sloppy and lazy, without exception.</FONT>

    </FONT>

    Can you rewrite this such that you don't have to assign null to lastChild?

    <FONT color=#008000>Object lastChild = null;</FONT>

    <FONT color=#008000>for (Child c : allChildren) {
        if (c.parent == parent) lastChild = o;
    }</FONT>

    <FONT color=#008000>return lastChild;</FONT>

  • (cs) in reply to Jeff S
    Jeff S:
    Chris F:
    Jeff S:

    >>Seriously, using data as primary key is always retarded, and this is no exception.

    Well, you have me convinced of the error of my ways!  Can't argue with that logic.


    Sarcasm doesn't work well when you ignore the reasoned arguments while focusing on the ones that imply your point is retarded.

    Ignore the "reasoned arguments" ?????

    I made a direct statement refuting the  "reasoned argument" and the response was first sarcasm, then a general statement that "natural keys are retarded", ignoring my entire point. 


    Your refutation is poor because it completely ignores what everyone has been saying: That when change occurs your methodology is ill prepared to deal with it.  Your refutation, if you want to call it that, was to restrict the scope of the request to an unchangeable, narrowed view.  This is the debator's equivalent of closing your eyes and sticking your fingers in your ears while humming happy songs.  The problem doesn't go away just because you're doing this.
  • (cs) in reply to Ytram

    Ytram:
    Jeff S:
    Never use real data as primary keys in your tables!"


    I do want to point out that I'm not in the camp that you MUST use meaningless identity keys.  I just think that it's the best route to take in most cases.  In the case of a table of US States, using the state abbreviation as the PK would be perfectly reasonable.  In the case of this table, there's only going to be ~50 states, so performance should not be an issue. 

    However, whenever there's the possibility of large amounts of data, I believe it will be sheer folly to have a string-type PK.  String comparisons on joins will be more expensive than numeric comparisons.  Plus consider the possibility of having to do a join on a table that has a multi-column PK, that would be complex and even more expensive.

    Shouldn't a relational DB optimize the foreign key relationships internally? This kind of seems like you arer saying that you should do the DB's work for it.

  • (cs) in reply to Alex Papadimoulis
    Alex Papadimoulis:

    Chris F:
    Some guy:
    You will not find a single expert in the field who will agree with you. Your assertion is based on an incorrect understanding of relational databases and goes against over fourty years of "lessons learned" in data modeling.


    Now forgive my ignorance, but my experience agrees with rog.  While I'm not qualified to make assertions regarding the performance of numeric vs. non-numeric keys, his other words ring quite true and I would be quite interested in exactly what arguments the experts would make that could enlighten me.

    PS: Yes, I "self-taught" myself databases.

    In my travels, I have been to all corners of North America. I've been atop mountans. I've peered out of skyscraper. I've even flown in a plane. With all my traveling experience, I have yet to find any evidence whatsoever that the earth is round; as far as I can tell, it's flat as a board (moutants et al withstanding).

    Despite my personal observations, I have not joined the Flat Earth Society. I rely on the expertise of others to mathematically prove a round earth. I trust the experience of those who have personally observed that the earth is in fact round.

    When you teach yourself something, you end up relying on your own observations and assumptions. When you rely on what you know, you lose out on volumes of collective knowledge and experience. Had you not read it in a science book, do you really think you'd come to the conclusion that the earth is round?

    The same holds true for databases. When a programmer self-teaches himself databases, he applies his own experience: procedural programming. This is what happened in the fifties when there were no databases and, therefore, no experience or expertise to go by. Programmers back then modeled data exactly like you do now: like a programmer.

    Fifty years, the data field has made a lot of progress. The best model we have now is the relational model. One of the fundamentals of this model is that data do not need to be accessed by pointers (meaningless keys); they can be accessed (and related) by their value.

    When you ignore this fundamental, your model becomes closer and closer to the ineffective ways of the past. Worse still, you lose the tools they had back then (IDMS pointer functions, for example) to keep their models running. As Santayana said, "Those who cannot remember the past are condemned to repeat it."



    When I ask a science professor to show me why we think the earth is round, she shows me a picture and says, "Here.  Astronauts went up into space and took this picture of the earth.  It's round.  And way back in the day, some discoveries were made that showed the earth was round by...(explanation follows)"

    When I ask self-proclaimed database experts why meaningless primary keys are bad, they tell me, "because they're bad."  When I ask why again, they take an arrogant tone and imply that if I'd only read what they had read, I'd easily understand.  While this may be true, I'm prone to think that if meaningless primary keys are indeed so obviously diametric to the very fundamentals of relational database design, an illustrative example or two showing exactly why they are so would be forthcoming from those who are so well read.  This hasn't been the case though, here or anywhere else.
  • Anonymous (unregistered) in reply to dubwai
    dubwai:
    Anonymous:
    <FONT face="Courier New" color=#000000 size=2>
    </FONT>

    <FONT face="Courier New" color=#000000><FONT face="Times New Roman" color=#0000ff>I still ask you to show me an example that REQUIRES you to initialize a value to null.  Initializing a value to null is sloppy and lazy, without exception.</FONT>

    </FONT>

    Can you rewrite this such that you don't have to assign null to lastChild?

    <FONT color=#008000>Object lastChild = null;</FONT>

    <FONT color=#008000>for (Child c : allChildren) {
        if (c.parent == parent) lastChild = o;
    }</FONT>

    <FONT color=#008000>return lastChild;</FONT>

    No of course I can not do it without assigning a null value to lastChild at some point, as null is an allowable return value for the method.  What I can do, is not blindly assign a null value to lastChild:

    Object lastChild;
    if(allChildred.Count==0) {
      lastChild = null;
    } else {
      bool foundChild = false;
      for(Child c : allChildren)
      {
        if(c.parent==parent) {
          lastChild = c;
          foundChild = true;
        }
      }
      if(!foundChild)
        lastChild = null;
    }
    return lastChild;

    The two statments 'lastChild = null;' could be replaced by throwing exceptions.

  • (cs) in reply to Anonymous
    Anonymous:
    dubwai:
    Anonymous:
    <FONT face="Courier New" color=#000000 size=2>
    </FONT>

    <FONT face="Courier New" color=#000000><FONT face="Times New Roman" color=#0000ff>I still ask you to show me an example that REQUIRES you to initialize a value to null.  Initializing a value to null is sloppy and lazy, without exception.</FONT>

    </FONT>

    Can you rewrite this such that you don't have to assign null to lastChild?

    <FONT color=#008000>Object lastChild = null;</FONT>

    <FONT color=#008000>for (Child c : allChildren) {
        if (c.parent == parent) lastChild = o;
    }</FONT>

    <FONT color=#008000>return lastChild;</FONT>

    No of course I can not do it without assigning a null value to lastChild at some point, as null is an allowable return value for the method.  What I can do, is not blindly assign a null value to lastChild:

    Object lastChild;
    if(allChildred.Count==0) {
      lastChild = null;
    } else {
      bool foundChild = false;
      for(Child c : allChildren)
      {
        if(c.parent==parent) {
          lastChild = c;
          foundChild = true;
        }
      }
      if(!foundChild)
        lastChild = null;
    }
    return lastChild;

    The two statments 'lastChild = null;' could be replaced by throwing exceptions.

    1.  That can't compile in Java, even after the errors are corrected.  It gives a 'local variable lastChild may not have been initialized' error.

    2.  Do you really think that code is better than what I posted?

  • (cs) in reply to Anonymous

    No of course I can not do it without assigning a null value to lastChild at some point, as null is an allowable return value for the method.  What I can do, is not blindly assign a null value to lastChild:

    Object lastChild;
    if(allChildred.Count==0) {
      lastChild = null;
    } else {
      bool foundChild = false;
      for(Child c : allChildren)
      {
        if(c.parent==parent) {
          lastChild = c;
          foundChild = true;
        }
      }
      if(!foundChild)
        lastChild = null;
    }
    return lastChild;

    The two statments 'lastChild = null;' could be replaced by throwing exceptions.


    You turned four lines of easy-to-read and compact code into that mess?

    What is the difference between:

    Object lastChild;
    and
    Object lastChild = null;

    In .NET there is none, and I can hardly remember anything from my Java days.  Despite all of that, I see no benefit of your code over dubwai's example.  In fact, it turned into a fairly convoluted way of doing a very simple thing.  That is the very basis of most WTFs.
  • (cs) in reply to JThelen

    I am talking about practical, real-world application, where performance and maintenance are important. I'm not talking about theoretical RDB theory.

    The old saw about CS types throwing recursion at EVERYTHING even when it's neither efficient nor readable nor maintainable nor scalable is true of a lot of fields in IT. Witness the abundance of XML crap on this site for an example of the old "when all you know how to use is a hammer, everything looks like a nail" corollary.

    If it is set in stone that your table will never change, then yes, what the hell, just use the abbreviation as the PK. Performance isn't an issue and it is indeed more readable.

    But nothing in business is set in stone. Business decisions change, and the data structures underlying a business had better be flexible enough to accommodate new business rules. If your business expands to Canada, I don't know that your 2-letter abbreviation will work anymore (I don't know enough about Canadian provinces to know if any of their abbreviations overlap with US states, but in principle, shit can happen, and it usually does). Regardless of the situation, the way the data is accessed should be independent of the data's semantics, especially when joining tables.

    I remember an old thread here about people ragging on me for storing qualified paths. A lot of the raggers were self-appointed RDB experts. I suppose being an expert is a prelude to becoming a highly-paid consultant.

    In any case, I'd like to read a reasonable practical argument against semantically empty primary keys, and none of the grandstanding "read such and such a book, you young fool" bullshit some folks here have been spewing.

  • (cs) in reply to Alex Papadimoulis
    Alex Papadimoulis:
    Fifty years, the data field has made a lot of progress. The best model we have now is the relational model. One of the fundamentals of this model is that data do not need to be accessed by pointers (meaningless keys); they can be accessed (and related) by their value.

    When you ignore this fundamental, your model becomes closer and closer to the ineffective ways of the past. Worse still, you lose the tools they had back then (IDMS pointer functions, for example) to keep their models running. As Santayana said, "Those who cannot remember the past are condemned to repeat it."

    You have a good point, but still I am reserved on this issue.  Allow me to explain why:  The way I see it, a meaningless key abtracts the concept of record identity.  In general, abstractions are always removed from the fundamentals (or other abstractions) that they are built upon.  While frivolous abstraction needlessly complicates a system, the right abstractions can make worlds of difference with how the system holds up against the forces of entropy.

    Thus when one makes the argument that we are moving further from the relational model, my answer is always: Of course.  That is the point.  If implementations and interfaces with the relational model allowed us to manipulate our meaningful key structures as easily as we can manipulate a meaningless key, that would allow us the same benefit without the abstraction and it would be my choice.  But since we are not living in a world with such ideal terms, the abstraction is necessary -- and highly recommended -- because of its great benefit.
  • (cs) in reply to rogthefrog
    rogthefrog:

    I am talking about practical, real-world application, where performance and maintenance are important. I'm not talking about theoretical RDB theory.

    The old saw about CS types throwing recursion at EVERYTHING even when it's neither efficient nor readable nor maintainable nor scalable is true of a lot of fields in IT. Witness the abundance of XML crap on this site for an example of the old "when all you know how to use is a hammer, everything looks like a nail" corollary.

    If it is set in stone that your table will never change, then yes, what the hell, just use the abbreviation as the PK. Performance isn't an issue and it is indeed more readable.

    But nothing in business is set in stone. Business decisions change, and the data structures underlying a business had better be flexible enough to accommodate new business rules. If your business expands to Canada, I don't know that your 2-letter abbreviation will work anymore (I don't know enough about Canadian provinces to know if any of their abbreviations overlap with US states, but in principle, shit can happen, and it usually does). Regardless of the situation, the way the data is accessed should be independent of the data's semantics, especially when joining tables.

    I remember an old thread here about people ragging on me for storing qualified paths. A lot of the raggers were self-appointed RDB experts. I suppose being an expert is a prelude to becoming a highly-paid consultant.

    In any case, I'd like to read a reasonable practical argument against semantically empty primary keys, and none of the grandstanding "read such and such a book, you young fool" bullshit some folks here have been spewing.

     With all due respect, that is completely 100% wrong.

    So your application is designed for the future by using a StateID field.  When your app goes multi-national, you simply stuff countries, provinces, continents, villages, and so on into your States table and say "done!"  There is an address table somewhere with a column called StateID that links to your various items stored in the State table.  You put in application logic that says "if stateID>50, don't ask for zip code" and so on.  Or maybe you add columns to your States table to include other things like what Country the State is in, or if it should really be a region or a county.  That's a nice, clean, expandable application and database design you have there! 

    My app has a table of States. When we go multi-national, I add as many new tables to PROPERLY store the new entities that I need to track or provide in drop-downs.  I expand the "Address" tables (or create 1-1 related subtables) that include columns for Country or Province or whatever else is needed for storage of that type.  The UI is reconfigured properly to allow for these new changes.  Logic is written to properly handle area codes and postal codes for different countries and regions, all in properly formed tables with actual data in them that makes sense and is also properly constrained and related to the various entities. 

    People here have mentioned the "downfall" of using a natural key for the States table.  "You can only put states in there!  You can't put anything else in your table!  If the 2-char abbreviation for CA is in there, how can I add Canada?  The stupid PK stops me!" 

    Uh ... yeah!  That's what databases DO -- they store clearly defined entities in tables.  Natural PK's and constraints ensure that your data is well-formed and doesn't grow randomly and organically -- it constrains your data to ensure structure and consistency.  That "stupid primary key" that limits the growth of your table and forces you to design your database correctly and to only store proper data in the proper place is SUPPOSED to do that.

    I have honestly never read weaker arguments than in this thread; the worse is not that people are defending the occasional use of surrogate keys (which many DB purists almost never allow) but people here are avocating ALWAYS using them and claiming that ALL natural keys are bad!   Absolutely horrible.

    You may note I never claimed that you should only use natural keys; to be so absolute-minded, stubborn and inflexible would be moronic. 

    Every day -- literally every day -- in DB forums like SQLTeam we help people clean up their databases, remove duplicates, add additional constraints, and restructure and normalize their data because of mindsets like those vocalized in this thread.  People who figured it is "easier" to just add random, meaningless identities to every table w/o questioning it and putting thought into their design, so the database won't force them to put proper dat in the proper places and so it will never throw those annoying "primary key violation" errors.

    That is the biggest argument you and others here have used -- using a natural keys forces you to think ahead and design correctly and be consistent with your data, and you just don't have time to do that.

  • (cs) in reply to Jeff S
    Jeff S:
    rogthefrog:

    I am talking about practical, real-world application, where performance and maintenance are important. I'm not talking about theoretical RDB theory.

    The old saw about CS types throwing recursion at EVERYTHING even when it's neither efficient nor readable nor maintainable nor scalable is true of a lot of fields in IT. Witness the abundance of XML crap on this site for an example of the old "when all you know how to use is a hammer, everything looks like a nail" corollary.

    If it is set in stone that your table will never change, then yes, what the hell, just use the abbreviation as the PK. Performance isn't an issue and it is indeed more readable.

    But nothing in business is set in stone. Business decisions change, and the data structures underlying a business had better be flexible enough to accommodate new business rules. If your business expands to Canada, I don't know that your 2-letter abbreviation will work anymore (I don't know enough about Canadian provinces to know if any of their abbreviations overlap with US states, but in principle, shit can happen, and it usually does). Regardless of the situation, the way the data is accessed should be independent of the data's semantics, especially when joining tables.

    I remember an old thread here about people ragging on me for storing qualified paths. A lot of the raggers were self-appointed RDB experts. I suppose being an expert is a prelude to becoming a highly-paid consultant.

    In any case, I'd like to read a reasonable practical argument against semantically empty primary keys, and none of the grandstanding "read such and such a book, you young fool" bullshit some folks here have been spewing.

     With all due respect, that is completely 100% wrong.

    So your application is designed for the future by using a StateID field.  When your app goes multi-national, you simply stuff countries, provinces, continents, villages, and so on into your States table and say "done!"  There is an address table somewhere with a column called StateID that links to your various items stored in the State table.  You put in application logic that says "if stateID>50, don't ask for zip code" and so on.  Or maybe you add columns to your States table to include other things like what Country the State is in, or if it should really be a region or a county.  That's a nice, clean, expandable application and database design you have there! 

    My app has a table of States. When we go multi-national, I add as many new tables to PROPERLY store the new entities that I need to track or provide in drop-downs.  I expand the "Address" tables (or create 1-1 related subtables) that include columns for Country or Province or whatever else is needed for storage of that type.  The UI is reconfigured properly to allow for these new changes.  Logic is written to properly handle area codes and postal codes for different countries and regions, all in properly formed tables with actual data in them that makes sense and is also properly constrained and related to the various entities. 

    People here have mentioned the "downfall" of using a natural key for the States table.  "You can only put states in there!  You can't put anything else in your table!  If the 2-char abbreviation for CA is in there, how can I add Canada?  The stupid PK stops me!" 

    Uh ... yeah!  That's what databases DO -- they store clearly defined entities in tables.  Natural PK's and constraints ensure that your data is well-formed and doesn't grow randomly and organically -- it constrains your data to ensure structure and consistency.  That "stupid primary key" that limits the growth of your table and forces you to design your database correctly and to only store proper data in the proper place is SUPPOSED to do that.

    I have honestly never read weaker arguments than in this thread; the worse is not that people are defending the occasional use of surrogate keys (which many DB purists almost never allow) but people here are avocating ALWAYS using them and claiming that ALL natural keys are bad!   Absolutely horrible.

    You may note I never claimed that you should only use natural keys; to be so absolute-minded, stubborn and inflexible would be moronic. 

    Every day -- literally every day -- in DB forums like SQLTeam we help people clean up their databases, remove duplicates, add additional constraints, and restructure and normalize their data because of mindsets like those vocalized in this thread.  People who figured it is "easier" to just add random, meaningless identities to every table w/o questioning it and putting thought into their design, so the database won't force them to put proper dat in the proper places and so it will never throw those annoying "primary key violation" errors.

    That is the biggest argument you and others here have used -- using a natural keys forces you to think ahead and design correctly and be consistent with your data, and you just don't have time to do that.



    So, you've given some advantages of natural primary keys, and you've shown how meaningless primary keys in the hands of no-talent ass-clowns can be bad, but you still haven't shown why one shouldn't, in general, use meaningless primary keys.  I'm still waiting for someone to do that.  And no, I'm not trying to be a smart-ass; I honestly am waiting for someone to show me why meaningless primary keys are so awful.  I'm getting ready to work on designing a database at work soon, and all I've learned in college and in experience has pointed to meaningless primary keys being perfectly alright if not ideal.  If they are indeed so awful and will lead to the certain eventual ruin of my database, then I want to know now rather than later.  And yes, I'm looking to get a copy of something by Date.
  • (cs) in reply to Jeff S

    I most certainly do NOT advocate hacking together a system that would cram all manner of countries into a US state table and check for zipcode > 99999 or whatever. That's what separate tables are for.

    I was only talking about expanding to a neighboring country where addresses are stored in a similar way to US addresses, namely Canada. Granted, that's a contrived system and not the most compelling argument for numeric keys in a state table.

    However, you still haven't made a single argument why semantically empty keys are wrong in this instance, which is the question I was asking. All you've said is that natural keys are acceptable in this small example, and I granted that they are in an earlier post. The argument that a state_id is useless if you use the state_abbreviation as PK doesn't count, since you were so high and mighty about my original argument, which is that data PKs should generally be avoided.

  • (cs) in reply to Alex Papadimoulis
    Alex Papadimoulis:

    In my travels, I have been to all corners of North America. I've been atop mountans. I've peered out of skyscraper. I've even flown in a plane. With all my traveling experience, I have yet to find any evidence whatsoever that the earth is round; as far as I can tell, it's flat as a board (moutants et al withstanding).

    Despite my personal observations, I have not joined the Flat Earth Society. I rely on the expertise of others to mathematically prove a round earth. I trust the experience of those who have personally observed that the earth is in fact round.

    When you teach yourself something, you end up relying on your own observations and assumptions. When you rely on what you know, you lose out on volumes of collective knowledge and experience. Had you not read it in a science book, do you really think you'd come to the conclusion that the earth is round?

    The same holds true for databases. When a programmer self-teaches himself databases, he applies his own experience: procedural programming. This is what happened in the fifties when there were no databases and, therefore, no experience or expertise to go by. Programmers back then modeled data exactly like you do now: like a programmer.

    Fifty years, the data field has made a lot of progress. The best model we have now is the relational model. One of the fundamentals of this model is that data do not need to be accessed by pointers (meaningless keys); they can be accessed (and related) by their value.

    When you ignore this fundamental, your model becomes closer and closer to the ineffective ways of the past. Worse still, you lose the tools they had back then (IDMS pointer functions, for example) to keep their models running. As Santayana said, "Those who cannot remember the past are condemned to repeat it."



    While earth is always round - I think we can accept that as an indisputable truth - there is no such absolute answer to the question if natural keys are better or meaningless keys are better.
    It depends on many parameters which way will work out better in a certain scenario.
    BTW: Those guys in the 50s and 60s made computers work. Computers that had by far less computing power than todays cell phones. Don't tell me they did not know anything about performance. IMO the biggest problem with pre-relational data modelling is the lack of flexibility - access paths are pre-defined (e.g. in hierachical structures, DL/1 anyone?) and creating an analysis like 'which customers bought laxatives within a week after buying chicken' means a lot of programming in this systems, while it is a relatively short statement in SQL.
    Anyway, this flexibility of relational databases does not depend on a certain kind of primary key. For obvious reasons, meaningless keys result in more look-ups. In some cases, it means you have to join more tables for a query, thus loosing performance. On the other hand, you never have to worry if they are unique or volatile. Anyway, much of our (computing) environment is already defined by rather meaningless keys, like zip codes and SSNs and EAN codes etc, so
    it might not make much of a difference how you decide. Because of my personal experience with both ways, I decided to go the "meaningless key" way. I prefer to have all relations on exactly one key field, not on concetenated keys of up to 6 fields (no joking). It means more look-ups, I accept that. In many cases, I would need that look-up anyway, because I need more than just the key from the other table.
    As a final statement, I'd like you to remember that about every relational database system has a mechanism to create stupid meaningless keys. Autoincremented fields, sequence objects or both. Are these only relicts of the pre-relational past? I don't think so.


  • (cs) in reply to Jeff S
    Jeff S:

    I have honestly never read weaker arguments than in this thread; the worse is not that people are defending the occasional use of surrogate keys (which many DB purists almost never allow) but people here are avocating ALWAYS using them and claiming that ALL natural keys are bad!   Absolutely horrible.



    There are practical reasons for using the same kind of surrogate key for all tables. That doesn't mean natural keys are bad, but in some cases it's more convenient to standardize on surrogate keys. For example, if you want to have one extra table which keeps track of who accessed records in all (or at least: a lot of) other database tables, it's easy to do when every table has a number(12) key. Tablename+ID will be sufficient to identify any record in the whole database.
  • (cs) in reply to Jeff S
    Jeff S:
    So your application is designed for the future by using a StateID field.  When your app goes multi-national, you simply stuff countries, provinces, continents, villages, and so on into your States table and say "done!"

    There is an address table somewhere with a column called StateID that links to your various items stored in the State table.  You put in application logic that says "if stateID>50, don't ask for zip code" and so on.  Or maybe you add columns to your States table to include other things like what Country the State is in, or if it should really be a region or a county.  That's a nice, clean, expandable application and database design you have there!

    You are hurting your own argument by tearing down exceptionally poor examples of the opposing side.  Please read up on straw men.

    Jeff S:
    My app has a table of States. When we go multi-national, I add as many new tables to PROPERLY store the new entities that I need to track or provide in drop-downs.  I expand the "Address" tables (or create 1-1 related subtables) that include columns for Country or Province or whatever else is needed for storage of that type.

    When my app goes multi-national, I rename 'states' to 'administrative regions' and extend this region table with the country to which it belongs.  The country table relates to an 'administrative region types' table which tells us the name of the region (state, province, municipality, etc.).  This solution requires a modification to two isolated tables and renames on the referencing columns to clearly reflect the change.  Your solution requires a modification to the basic structure of a state reference for every reference.  My solution is far more elegant because it has a complexity of O(1) for administrative region extensions, while yours is O(n), where n is the number of administrative region references (DB-side only, of course).

    Suffice to say, you have failed to convince me.

    Jeff S:
    The UI is reconfigured properly to allow for these new changes.

    Just like every other app, meaningless keys or not.

    Jeff S:
    Logic is written to properly handle area codes and postal codes for different countries and regions, all in properly formed tables with actual data in them that makes sense and is also properly constrained and related to the various entities.

    Just like every other well-designed app, meaningless keys or not.

    Jeff S:
    That is the biggest argument you and others here have used -- using a natural keys forces you to think ahead and design correctly and be consistent with your data, and you just don't have time to do that.

    You're putting words in people's mouths that clearly nobody has said.  Your integrity in this argument is hovering near zero with this.
  • (cs) in reply to Chris F

    Chris F-- you completely missed the point.  The point wasn't what we each do to fix our db to become muti-national (i.e., "my way is better than yours!"), it's that simply adding StateID to your States table and relying on that as your PK and in relations doesn't help you in anyway for "it's more expandable!" argument, which was given several times so far.  If I choose to expand by creating the same structure you proposed (which might very well be the way to go depending on your needs) then we have the same amont of work to do. As you repeated several times, "just like every other app needs to do", it seems we agree that the "more expandable" argument is finally resolved.

    (by the way, I'll read up on "straw man", you read up on big O notation; that "analysis" of exactly how much more efficient your upgrade method is than my vague comments was pretty embarassing to read.)

    Since that particular anti-natural key argument is squashed, I'll rely on my silly notion of data integrity and PK's that enforce consistency and integrity in your data and require careful planning in your database design.  (hint: those are good things)  My PK of Abbreviation ensures that only 1 entry for MA exists in the table; your PK of StateID does not do this.  My PK gives the code "MA" a unique meaning, a definition, it the clearly defined label for an entity.  Your StateID is meaningless.  Technically, a database cannot even be considered normalized if you have nothing but meaningless keys on your tables!

    "Hold on!" you say; for data integrity and to guard against duplicates you can always just add a unqiue constraint on the Abbreviation column as well.  Definitely a good idea .. and now your meaningless, random StateID column has *no purpose at all* in your table, it is just there for that warm and fuzzy "I have an ID column like MS Access taught me I should!" feeling ....

    Do you have a random MonthID column on a table of Months?  or a random NumberID column on a table of numbers?  YearID on a table of years?  (it will be interesting to read the responses regarding the notion of having tables like these in your db's -- a great way to get a feel for how much database skill and experience is out there)

    It's simple: if a natural key arises in your data, use it.  If nothing short or simple naturally can be used, go ahead and use a surrogate key.  Heck, even identities aren't always surrogate in my opinion -- if your business rules dictate that each employee should be assigned a unique numeric ID, and that this code should be assigned at the point of data entry into the system, by all means use an identity !  I think assigning short codes to entities is a great idea in general.  But it is NOT a good idea to ignore preexisting short codes or ways to identify an entity if they exist already and then to assign your own random ones instead. 

    If you want to blindly ignore the natural ways to validate and constrain and relate your data, and use identity columns in all of your tables, go ahead, but guys like me are going to yell at you when you (or the poor guy who inherits your work) come to us for help to clean things up.

    It's hilarious that I mentioned a scenerio in which need to design a customers table, and you are told the business process will have assigned each one a short, unqiue code before entry into the system .  That is an absolute blessing, a godsend in the world of Db design to have that scenerio.   So what do people here think the PK for customer table should be?  Ignore the short, unique, existing, meaningful code and use a meaningless random number!

  • (cs) in reply to Jeff S
    Jeff S:
    Chris F-- you completely missed the point.  The point wasn't what we each do to fix our db to become muti-national (i.e., "my way is better than yours!"), it's that simply adding StateID to your States table and relying on that as your PK and in relations doesn't help you in anyway for "it's more expandable!" argument, which was given several times so far.

    Er, but it does make it more expandable.  Your solution requires changing (1) database functions / stored procedures that reference administrative regions, (2) tables that reference administrative regions including all foreign keys, and (3) all code which depends on the changed natural key for determining identity.  My solution requires changing the administrative regions table and that's it.  We do have some shared changes -- when we make the initial jump then of course things like UI and logic need to be changed.  But things like foreign keys and identity checks do not.

    You are arguing that there is no purpose for the abstraction because both the non-abstracted and abstracted versions require the same amount of work.  This is clearly not the case.

    Jeff S:
    If I choose to expand by creating the same structure you proposed (which might very well be the way to go depending on your needs) then we have the same amont of work to do.

    This is simply not true.  You would not only have to rename the columns, you would have to add columns and manipulate every join so that they included the new natural key (which presumably would include the full administrative region name and the country name).  Also, you would have to fill in the previous country on all of your tables.

    Is that hard?  Heavens no.  Is it more work?  Undoubtedly yes.

    Jeff S:
    As you repeated several times, "just like every other app needs to do", it seems we agree that the "more expandable" argument is finally resolved.

    I agreed that some changes were shared but not all.  Please reread my post if you missed that.

    Jeff S:
    (by the way, I'll read up on "straw man", you read up on big O notation; that "analysis" of exactly how much more efficient your upgrade method is than my vague comments was pretty embarassing to read.)

    The "analysis" was accurate.  The efficiency of your solution is linear whereas mine was constant on the DB-side.

    Jeff S:
    Since that particular anti-natural key argument is squashed, I'll rely on my silly notion of data integrity and PK's that enforce consistency and integrity in your data and require careful planning in your database design.  (hint: those are good things)

    You're on this careful planning and design crusade when nobody has made any statements to the contrary.  You must be mistaking meaningless keys for poor planning even though nearly every argument has given evidence it is the opposite.

    Jeff S:
    My PK of Abbreviation ensures that only 1 entry for MA exists in the table; your PK of StateID does not do this.  My PK gives the code "MA" a unique meaning, a definition, it the clearly defined label for an entity.  Your StateID is meaningless.

    It is not meaningless, it is an abstraction that serves a significant purpose should the natural key change!  Are you even reading these posts?!

    Jeff S:
    Do you have a random MonthID column on a table of Months?  or a random NumberID column on a table of numbers?  YearID on a table of years?  (it will be interesting to read the responses regarding the notion of having tables like these in your db's -- a great way to get a feel for how much database skill and experience is out there)

    It depends entirely on the purpose of the table.

    Jeff S:
    It's hilarious that I mentioned a scenerio in which need to design a customers table, and you are told the business process will have assigned each one a short, unqiue code before entry into the system .  That is an absolute blessing, a godsend in the world of Db design to have that scenerio.   So what do people here think the PK for customer table should be?  Ignore the short, unique, existing, meaningful code and use a meaningless random number!

    Actually, I ran across such a scenario recently.  A short, unique code for every customer in the system!  And the previous architect did indeed use it for a key into everything -- right up until the guys in charge needed to import an old database of customers which did not conform to these rules, and they did not want to assign codes to every one of these old customers because the codes have some significance within their business that these old entries did not fit.  Suddenly it didn't seem like such a good idea anymore because the system had to change instead of the business.

  • (cs) in reply to Chris F

    >>Er, but it does make it more expandable.  Your solution requires changing (1) database functions / stored procedures that reference administrative regions, (2) tables that reference administrative regions including all foreign keys, and (3) all code which depends on the changed natural key for determining identity.  My solution requires changing the administrative regions table and that's it.  We do have some shared changes -- when we make the initial jump then of course things like UI and logic need to be changed.  But things like foreign keys and identity checks do not.

    >>The "analysis" was accurate.  The efficiency of your solution is linear whereas mine was constant on the DB-side

    >> You must be mistaking meaningless keys for poor planning even though nearly every argument has given evidence it is the opposite.

    >>It is not meaningless, it is an abstraction that serves a significant purpose should the natural key change!  Are you even reading these posts?!

    >>Actually, I ran across such a scenario recently.  A short, unique code for every customer in the system!  And the previous architect did indeed use it for a key into everything -- right up until the guys in charge needed to import an old database of customers which did not conform to these rules, and they did not want to assign codes to every one of these old customers because the codes have some significance within their business that these old entries did not fit.  Suddenly it didn't seem like such a good idea anymore because the system had to change instead of the business.

    Chris, buddy, you just don't get it.  And that's OK.  Some day, you will ....  I gave it my best shot. 

    [Y]

  • (cs) in reply to Jeff S
    Jeff S:
    Chris, buddy, you just don't get it.  And that's OK.  Some day, you will ....  I gave it my best shot.

    OK.  It's not like my statements weren't empirically disprovable.  All you'd have to show is how you really did less work and I'd believe you.
  • oakie (unregistered) in reply to Chris F

    For what it's worth, I'd like to elaborate on one of Jeff's earlier points about migrating data across databases (say from a development to production environment), because it's a real, practical example of where identity primary keys unnecessarily complicate matters. Suppose you wanted to copy a table of employee skills from one database to another.

    Using surrogate keys:

    <!--StartFragment --><!--StartFragment -->Employee(EmployeeId, <!--StartFragment --> EmployeeCode, ...)
    Skill(SkillId, SkillName, ...)
    EmployeeSkill(EmployeeSkillId, EmployeeId, SkillId)

    Words in italics are identity primary keys; words in bold are fields with unique constraints as an alternative way to preserve data integrity.

    1. First you populate Employee and Skill in the destination database, because EmployeeSkill has references to these tables.
    2. You then join (by Id) EmployeeSkill to Employee and Skill in the source tables in order to get a set of employees and skills by their actual defining values instead of meaningless numbers.
    3. You join this set of employee-skill values from the source database to the destination Employee and Skill tables, by EmployeeCode and SkillName. From this you can determine which Id's they've just been assigned.
    4. You insert these Id's into the target EmployeeSkill table.

    Using natural keys:

    <!--StartFragment --><!--StartFragment -->Employee(EmployeeCode,<!--StartFragment --> ...)
    Skill(SkillName, ...)
    EmployeeSkill(EmployeeCode, SkillName)

    1. You populate Employee and Skill in the destination database, for the same reason as before.
    2. Because our primary keys are based directly on data, they're totally predictable and remain the same across databases. A straight copy is all that's required. And because we've avoided a whole bunch of joins, this insert operation may in fact be faster as well.
  • (cs) in reply to oakie
    Anonymous:

    For what it's worth, I'd like to elaborate on one of Jeff's earlier points about migrating data across databases (say from a development to production environment), because it's a real, practical example of where identity primary keys unnecessarily complicate matters. Suppose you wanted to copy a table of employee skills from one database to another.

    Using surrogate keys:

    <!--StartFragment --><!--StartFragment -->Employee(EmployeeId, <!--StartFragment --> EmployeeCode, ...)
    Skill(SkillId, SkillName, ...)
    EmployeeSkill(EmployeeSkillId, EmployeeId, SkillId)

    Words in italics are identity primary keys; words in bold are fields with unique constraints as an alternative way to preserve data integrity.

    1. First you populate Employee and Skill in the destination database, because EmployeeSkill has references to these tables.
    2. You then join (by Id) EmployeeSkill to Employee and Skill in the source tables in order to get a set of employees and skills by their actual defining values instead of meaningless numbers.
    3. You join this set of employee-skill values from the source database to the destination Employee and Skill tables, by EmployeeCode and SkillName. From this you can determine which Id's they've just been assigned.
    4. You insert these Id's into the target EmployeeSkill table.

    Using natural keys:

    <!--StartFragment --><!--StartFragment -->Employee(EmployeeCode,<!--StartFragment --> ...)
    Skill(SkillName, ...)
    EmployeeSkill(EmployeeCode, SkillName)

    1. You populate Employee and Skill in the destination database, for the same reason as before.
    2. Because our primary keys are based directly on data, they're totally predictable and remain the same across databases. A straight copy is all that's required. And because we've avoided a whole bunch of joins, this insert operation may in fact be faster as well.


    I think you are referencing to the way "Autoincrement" fields work on some database systems, which makes this kind of migration extremely painfull. Other database systems, like Oracle, use "Sequence" objects instead; because of that, you can simply transfer the tables, without adjusting anything, preserving the original IDs. After the migration, you have one easy task to do: adjust the sequence objects so they will not produce the same IDs like in the migrated data. So you argument against meaningless keys is rather an argument against autoincrement fields.
  • (cs) in reply to Alex Papadimoulis
    Alex Papadimoulis:

    In my travels, I have been to all corners of North America. I've been atop mountains. I've peered out of skyscraper. I've even flown in a plane. With all my traveling experience, I have yet to find any evidence whatsoever that the earth is round; as far as I can tell, it's flat as a board (moutants et al withstanding).

    Go to the Ocean (preferably near a shipping port) and stare out at the sea.

    Wait for a large sail boat to approach (a bit tricky to find one these days, but a few are still around)

    First you will see the top of the sails rising from the edge of the water, then the rest of the sails, and only then, finally the hull of the boat.   There you have it --- Solid evidence that the planet is round (discovered and widely known circa 2000 BCE)

    Actually, the fact that you can look out at the ocean and NOT see the continent on the other side is pretty fair proof the world is round.

  • rdm (unregistered)

    Can't believe nobody's mentioned this yet...but...

    What's with the TOTAL lack of comments?  How do we even know what this function is supposed to do?  How do we know it's not MEANT to return that exception? (sure, I can't imagine WHY someone would want a function to return an exception, but I count nothing out).

    Thus, my list:

    1. May return an exception
    2. 'table' parameter is a string (tableName would make sense; table does not)
    3. Parameter name i (this is probably the WORST wtf -- how is anyone supposed to know what this is without delving into the code?  What if the person writing the code that calls this doesn't know any SQL?)
    4. Initializing r to null immediately before giving it another value
    5. Variable name r (not as bad as #3, because it's internal, but a name like 'record' would be a LOT clearer)
    6. Very, very bad SQL query formation (uses select * when getting a single record, does no checking on table name integrity and no comment informs the user of this, can only search tables where id is an int)
    7. Returning the exception
    8. Poor formation of the exception (all errors look the same, and they look like a completely unrelated error)
    9. Total lack of comments
  • bp (unregistered)
    Alex Papadimoulis:

    <font color="#000099">public Object</font> findRecord(<font color="#000099">String</font> table, int i) {
    <font color="#000099">Object</font> r = <font color="#000099">null</font>;
    r = executeQuery(<font color="#990000">"select * from "</font> + table + <font color="#990000">" where id = "</font> + i);
    <font color="#000099">if</font> (r == <font color="#000099">null</font>) r = <font color="#000099">new</font> <font color="#800080">ClassNotFoundException</font>();
    <font color="#000099">return</font> r;
    }



    Only 5 jump out at me:
    1. Query not parameterized
    2. Dynamically specified table
    3. returns Object which could be an exception or a ResetulSet
    4. Returns ClassNotFoundExcepton for some reason.
    5. doesn't thrown any exceptions (there are a couple of checked exceptions in JDBC)
    Of course there is a pretty basic WTF'age in the design of the code besides the obvious.

  • bp (unregistered)

    Almost every comment in the post has its Own WTF.

    In a method, a variable is not defined when dimensioned, so setting it to null is actually the preffered pratice in Java. In fact a lot of compilers will warn you that the var is undefined.

    However, the query will return null or a ResultSet (so you might not get a warning about an uninitialized var).

  • (cs) in reply to Schol-R-LEA
    Schol-R-LEA:
    For that matter, how is that executeQuery() supposed to work, anyway? It's called as if it were a global function, rather than as a method for a Statement object. Java doesn't even allow global functions! There's no way this would compile. Perhaps that was the instance variable it should have been referring to, eh?


    Amazing.  You came that close to figuring it out, and then missed it.  ammoQ proved above that this cannot possibly be Statment.executeQuery().  It's their own executeQuery(), and we don't know anything about what it does.  It should be read as "this.executeQuery()".
  • (cs) in reply to strongarm
    strongarm:

    4.) Not using a StringBuffer for creating the SQL.  I read somewhere (can't remember where off the top of my head) that if there are more than three things being concatenated, StringBuffer will perform better than concatenation.  Less than three, the cost of allocating the StringBuffer and converting it to a string is more expensive then allocating the new strings on each concat.


    If you have several separate statements that are concatenating values together, or especially if concatenation is being done in a loop, then this could be true.  But in this WTF, the code that would be generated is exactly the same as if a StringBuffer was used "manually".  Try compiling something that concats like this and then disassemble it with javap -c, and see for yourself.

  • (cs) in reply to maribert
    maribert:
    I'm not so sure about Java either, but in languages like C it is very advisable to initialize any and all pointer variables to NULL for roughly the same reasons why you should not do that in Java, as far as I have gathered from this thread. If you don't initialize a pointer in C, it just might contain any old bit pattern, which just might point to something. So using a non-initialized pointer can appear to work correctly while insidiously destroying some completely unrelated part of memory.


    It's about redundant initialization, not initialization period.  In C, you wouldn't do the following, would you?

        char *p = NULL;
        p = argv[1];

    maribert:
    If you use a pointer that has been set to NULL, it will blow up in your face. At least it will not corrupt anything or read random data out.


    That wasn't always true, but thankfully we now have operating systems where that is the case.
  • (cs) in reply to Ytram
    Ytram:
    What is the difference between:

    Object lastChild;
    and
    Object lastChild = null;

    In .NET there is none, and I can hardly remember anything from my Java days.


    You know, I've been wondering about this.  So many people have said that they're the same (they aren't), and I thought that maybe it was because .NET did it that way.  Certainly no language I've ever used initializes local variables for you, for performance reasons.  So after reading the above message I thought I would give it a try in C#.

    using System;

    class Initter {
        Object instance;

        public void Foo() {
            Object local;
            Console.WriteLine("instance: " + (instance == null));
            Console.WriteLine("local: " + (local == null));
        }

        public static void Main(string[] args) {
            new Initter().Foo();
        }
    }

    And the result?

    Foo.cs(9,40): error CS0165: Use of unassigned local variable 'local'

    So it looks like C# has the same rules as Java, here.  Not too surprising, considering the provenance of C#.

  • (cs) in reply to rogthefrog
    rogthefrog:

    If your business expands to Canada, I don't know that your 2-letter abbreviation will work anymore (I don't know enough about Canadian provinces to know if any of their abbreviations overlap with US states, but in principle, shit can happen, and it usually does).



    The Canadian abbreviations don't overlap with the U.S., but there are Mexican states that do (Veracruz with Vermont, Hidalgo with Hawaii, and for that matter Nuevo Leon with Newfoundland and Labrador, or Baja California Norte with British Columbia), which I guess proves once again that shit rolls downhill?

  • (cs)

    hey i was reading this and was like a daily wtf page, thatd be cool.  So i went here and seen this computer coding and im where the hell is the funny jokes and shit.  and i looked and then seen the wtf forum pages and went there.  but then i realized that the coding really was the wtf joke.  so i started reading and i seen that there is tons of peaple that read this shit and think of ways to  fix the coding, and the guy who wrote this is like a fucking king in the computer world.  and i mn like whoa this is some weird shit , i mean go  who does this shit.  like make forums about weird coding u make up , and peoplw trying to fix it for fum.  thats fuckin weird isnt it?  so i just wrote this to tell u i thought u are fuckin insane crazy and shit so have fun makin your crazy puzzles.

    -peace out
    A town-

  • (cs) in reply to DrCode
    DrCode:
    Ytram:
    What is the difference between:

    Object lastChild;
    and
    Object lastChild = null;

    In .NET there is none, and I can hardly remember anything from my Java days.


    You know, I've been wondering about this.  So many people have said that they're the same (they aren't), and I thought that maybe it was because .NET did it that way.  Certainly no language I've ever used initializes local variables for you, for performance reasons.  So after reading the above message I thought I would give it a try in C#.

    using System;

    class Initter {
        Object instance;

        public void Foo() {
            Object local;
            Console.WriteLine("instance: " + (instance == null));
            Console.WriteLine("local: " + (local == null));
        }

        public static void Main(string[] args) {
            new Initter().Foo();
        }
    }

    And the result?

    Foo.cs(9,40): error CS0165: Use of unassigned local variable 'local'

    So it looks like C# has the same rules as Java, here.  Not too surprising, considering the provenance of C#.



    You're right of course.  I was thinking global members, versus locals.  Any global members need not be initialized, and their values will always be null or the default value(0 for numeric, false for boolean).

    class Foo
    {
        string stringMember;
        MyObject myObjectMember;
        int intMember;

        public Foo()
        {
        }
    }

    All variables will be null references except for intMember, which will be zero.
  • (cs) in reply to kkkkyle

    Always interesting to see what others think of this website. Pretty funny. I've always found it pretty educational, especially with stuff I know little about ... like dbs and asp.

  • (cs) in reply to diGriz
    diGriz:  "COBOL programs tend to bloat. They need an excessive header (most of it is obsolete, but the compiler whines, if you don't do it...."
     
    Like:
     
    SOURCE_COMPUTER: IBM 360
    OBJECT_COMPUTER: IBM 360
     
    (From memory of reading COBOL written by others; I never wrote any.)
     
    Actually, maybe there's a reason behind telling the compiler that you were compiling on one computer, but the program was going to run on something different, maybe a PDP-11...
     
     
  • (cs) in reply to Jeff S
    "My app has a table of States. When we go multi-national, I add as many new tables to PROPERLY store the new entities that I need to track or provide in drop-downs."
     
    Hear, hear, Jeff!  Why are most commenters assuming that anyone, ANYONE, would add things like Canadian provinces to a table of US states?  That's just dumb.  Jeff's post is great.
  • (cs) in reply to DWalker59

    ... but perhaps the commenters weren't actually saying that, and if so, I apologize.  Still, Jeff hasd some good things to say.

Leave a comment on “Exception Disfunction”

Log In or post as a guest

Replying to comment #:

« Return to Article