The Daily WTF: Curious Perversions in Information Technology

2005-07-13 Reply Admin

Maybe the execute query returns an exception.

CornedBee · 2005-07-13 Reply Admin

ammoQ:

For example, in Austria, the social security number should be unique; but there are a few cases where two people got the same number. If you build a system that relies on the SSN being unique, it may work perfectly through 1000000 test cases but fail in production.

Interesting detail, especially as right now I'm building a system that does use Austrian SSNs as keys. But then - how DO you correctly find identity between two persons entered by different people?

2005-07-13 Reply Admin

Anonymous:
imo, there's nothing inherently wrong with using "select * ".

if the database is designed properly, and the query is limiting the select correctly with WHERE statements, it is fully possible and even reasonable that returning the data in every column in the particular table is what is actually desired.

in that case, explicitly calling column names is both error prone (typos, etc.) and less flexible (suppose a column is added or removed).

The problem is, most databases are designed poorly, and most queries are written incorrectly, so using "select * " has become a wtf to most folks.

my 2 bits.

While I agree that its possible that you do need every column I think the Select * is like a goto.. sure its there and it can be used correctly (or atleast nonhorrifically) . Listing out the column names are the plus to me where you see a minus. Typos should be an error and if columns are removed then the code should complain about not finding the column. Adding columns rarely screw things up though if you are naming the columns. Using Select * makes it uncertain. Misspelled column names will probably still be caught with an exception (in java) or worst case just always give you a NULL value (php and the like). The NULL might cause enough problems to remind you that "Oh yea I removed that column and need to change my code" if you're lucky. Using Select * and adding columns might be okay unless other WTFs are in the code like using the column index to get data. In that case adding columns might make the data wrong but not enough to break anything. So using Select * can cause hard to spot errors in code while enumerating the column names makes errors very loud (atleast in java since I have the most experience with that). And I think that loud errors are better than silent or hard to spot errors. Hence to me Select * signals possible trouble in the future and should be avoided.

2005-07-13 Reply Admin

this code merely plants the seeds for may other WTFs to grow from....

Think of all the classes that have to do something like...

/*******************************
WTFDatabaseHandler dbHandler = new WTFDatabaseHandler();

Object aRecord = dbHandler.findRecord("some_table", 55);

if(!aRecord instanceof ClassNotFoundException){
    //Yay!!! I can do stuff....
} else {
    //do i return the exception here? hmm....
    //no record returned?
    //database meltdown?
    //end of the world is coming?
}
/********************************

I'd like to see some of the code that has to work with this crappy 'findRecord' method...

Jeff S · 2005-07-13 Reply Admin

Anonymous:
Jeff S:

What should the primary key be of a table listing U.S. States?

How about customers or employees if your business process assigns them each a unique code when they come aboard?

How about the chart of accounts in a GL? Or a summary table containing 1 value per month/account?

Or a table relating Employees to Skills?

Identities for all of these?

Blindly adding identity columns to tables as the "primary key" ensures no integrity whatsoever in your data, and also hinders things like moving from DEV to PROD or re-importing or re-updating tables. If you have lots of code that checks to ensure StatusID = 34 and DeptID = 938 and OfficeID = 27, what happens when you need to re-import the data or move from DEV to PROD? Isn't it a *little* bit more consistent and easier to work with when you have Status="A" and Dept = "ACCT" and Office = "BOS" which also won't randomly be assigning and/or change?

Wtf are you talking about? If you have code doing hard coded ID checks, then you have already committed a WTF far greater than any I have seen on this page.

It's an old debate -- identities everywhere versus "natural" keys.

There's no point in getting into too thoroughly from here, but I can't tell you how many times we have to help people clean up duplicate, redundant and sloppy tables at sqlteam.com every day because people don't assign meaningful primary keys that actual help to logically constrain and clearly define their data.

I have seen plenty *more* bugs dealing with improperly defined unique constraints, and changing uniqueness rules than I ever want to deal with. I have seen tables with literally 5 or 6 column primary keys. Maybe the data I work is just more dynamic than yours, but I hate being constrained with changing a department name, because of a database integrity constraint, for some reason my customer's don't like that either.

As for the debate, if you don't want to continue, I don't really care. My original point is that if you going to use numeric ID column, in Java atleast it needs to be a long.

(It would be nice if you'd answer the questions! I'm quite curious to hear about the design decisions you'd choose for those tables)

Also -- not a good argument: "lots of people don't define constraints properly, so I don't bother -- I avoid them completely!". Lots of people don't comment their code properly or use appropriate datatypes, do you follow the same logic and avoid those also?

And, again, a natural key doesn't have to be a department's name or a department's description or anything that might change. It might even be a number. Just because a key is natural doens't mean it has to be a varchar(255) descriptive text! You are still allowed to store other columns in your tables and to join to them to present names and descriptions of your objects even if you don't use identities!

We once had a consultant in the office who kept confusing "identity" with "primary key" -- and refused to acknowledge that any table w/o an identity column could have a primary key.

Ytram · 2005-07-13 Reply Admin

I'll go ahead and take a shot at this, with my standing being that in general non-identity primary keys should be avoided. There are cases where they could be used, but in those cases an identity can usually be used as well.

What should the primary key be of a table listing U.S. States?

This is a case where a non-numeric key could very easily be used. However, an identity field could also be used for the primary key. Just an example schema: States(StateId, StateAbbreviation, StateName). Joins could be done on the identity field, StateId, and they would be less expensive than string comparisons. Either way in this case, it's not a big deal.

How about customers or employees if your business process assigns them each a unique code when they come aboard?

Ideally, the unique code given to the employee/customer would BE the identity column when they are entered in as a new employee/customer. But either way,

Employee(EmployeeId, EmployeeCode, FirstName, LastName, etc)

How about the chart of accounts in a GL?

I'm not sure what GL stands for(although I'll probably smack myself when you tell me later ;) )

Or a table relating Employees to Skills?

Example Schemas:
Employee(EmployeeId, FirstName, LastName, Address, etc)
Skill(SkillId, SkillName, SkillDescription)
EmployeeSkill(EmployeeSkillId, EmployeeId, SkillId)

UncleMidriff · 2005-07-13 Reply Admin

Jeff S:
We once had a consultant in the office who kept confusing "identity" with "primary key" -- and refused to acknowledge that any table w/o an identity column could have a primary key.

Well that is patently retarded. Please remember though that not everyone of us who would, in general, rather use an auto-increment ID column for a primary key instead of a natural primary key is likewise retatrded.

Instead of hijacking this thread any further with this debate, I think this thread, which has already been hijacked with this debate, should be hijacked further.

dubwai · 2005-07-13 Reply Admin

Ytram:
I'll go ahead and take a shot at this, with my standing being that in general non-identity primary keys should be avoided. There are cases where they could be used, but in those cases an identity can usually be used as well.

What should the primary key be of a table listing U.S. States?

This is a case where a non-numeric key could very easily be used. However, an identity field could also be used for the primary key. Just an example schema: States(StateId, StateAbbreviation, StateName). Joins could be done on the identity field, StateId, and they would be less expensive than string comparisons. Either way in this case, it's not a big deal.

It could be a big deal. Lets say the company expands internationally. Now, you need to modify specify what country the code belongs to If you used the state abbreviation as the key, you are most likely going to want to change the primary key of that table. If an identity was used, you can just add the new columns. The existing records that referenced the US states are completely unaffected.

Now, what about a case that I am working on right now. Without getting into the requirements too deeply, we have a need for some very focused custom mappings from one thing o another. We don't want to create a new table for each mapping, and we don't want to hard code these things. We also don't want to denormalize other tables that are semi-related. So I am creating a normalized table with a number of columns where the PK is all of the columns except one. Essentially the columns say, who I am, what I'm doing, what my input type is, what output type I'm looking for and in and out values.

There shouldn't ever be a need for other tables to have foreign keys to this one and I can't think of any reason to add an identity column. Would you argue that I should do it anyway? I don't really have a good reason why I didn't do it the other way other than laziness.

Ytram · 2005-07-13 Reply Admin

dubwai:
Ytram:
I'll go ahead and take a shot at this, with my standing being that in general non-identity primary keys should be avoided. There are cases where they could be used, but in those cases an identity can usually be used as well.

What should the primary key be of a table listing U.S. States?

This is a case where a non-numeric key could very easily be used. However, an identity field could also be used for the primary key. Just an example schema: States(StateId, StateAbbreviation, StateName). Joins could be done on the identity field, StateId, and they would be less expensive than string comparisons. Either way in this case, it's not a big deal.

It could be a big deal. Lets say the company expands internationally. Now, you need to modify specify what country the code belongs to If you used the state abbreviation as the key, you are most likely going to want to change the primary key of that table. If an identity was used, you can just add the new columns. The existing records that referenced the US states are completely unaffected.

Now, what about a case that I am working on right now. Without getting into the requirements too deeply, we have a need for some very focused custom mappings from one thing o another. We don't want to create a new table for each mapping, and we don't want to hard code these things. We also don't want to denormalize other tables that are semi-related. So I am creating a normalized table with a number of columns where the PK is all of the columns except one. Essentially the columns say, who I am, what I'm doing, what my input type is, what output type I'm looking for and in and out values.

There shouldn't ever be a need for other tables to have foreign keys to this one and I can't think of any reason to add an identity column. Would you argue that I should do it anyway? I don't really have a good reason why I didn't do it the other way other than laziness.

In my opinion, one of the primary reasons for using an identity field is for foreign key references in another table. If you can say with a reasonable amount of certainty that it won't be used as a foreign key, I wouldn't personally see an issue with leaving out an identity key.

2005-07-13 Reply Admin

Jeff S:
Jeff S:

What should the primary key be of a table listing U.S. States?

How about customers or employees if your business process assigns them each a unique code when they come aboard?

How about the chart of accounts in a GL? Or a summary table containing 1 value per month/account?

Or a table relating Employees to Skills?

Identities for all of these?

(It would be nice if you'd answer the questions! I'm quite curious to hear about the design decisions you'd choose for those tables)

I will try (briefly below).

Also -- not a good argument: "lots of people don't define constraints properly, so I don't bother -- I avoid them completely!". Lots of people don't comment their code properly or use appropriate datatypes, do you follow the same logic and avoid those also?

You don't seem to understand, this wasn't designed in the first place like this, it evolved into this horric state over time. When it was first created, I guess the developer thought like you and had one column unique identifier. But then it turns, the business rules changed and we organized everything in to business units, and different business units wanted the same unique code... OUCH. Then we get categorization of this table so that each business unit for each category can share the unique name, but unfortenately the category table had likewise been improperly designed with a two column unique key. Viola, we now have the horrific table design that I was forced to deal with. (Btw this happened over years and many developers).
My point is that it can be really hard to tell what users will eventually want, I don't like adding unnecessary constraints to make a database look "cleaner", when my ORM will abstract all that crap away anyways. Not to mention you can do nice abstraction and generic helper routines if you store keys in a consistent way. This can of course be done using a Object Key model, but its harder.

What should the primary key be of a table listing U.S. States?

Why are you storing the US states? If it is just for UI drop downs? If so I would probably use a more generic lookup table. Then I can put a nice abstraction around handling of the Lookup objects for conveniently handling lookups based on a lookup type code.

How about customers or employees if your business process assigns them each a unique code when they come aboard?

Will that unique code stay unique? Or will the business process down the line start reusing code of expired users?

How about the chart of accounts in a GL? Or a summary table containing 1 value per month/account?

I am not well versed in General Ledger business rules, so I can't really comment.

Or a table relating Employees to Skills?

Absolutely provide ids, the employee name, and skills are both likely to have changes.

2005-07-13 Reply Admin

I'll say it again - executeQuery NEVER RETURNS A NULL. Thanks for duplicating the WTF!

pjabbott:
Why re-use the variable "r" for both a recordset and an exception?

Object r = null;
r = executeQuery("select * from " + table + " where id = " + i);
if (r == null) {
return new ClassNotFoundException;
}
else {
return r;
}

Heck, why not do this? Only declaring one variable saves memory, right? [;)]

Object r = null;
r = "select * from " + table + " where id = " + i;
r = executeQuery(r)
if (r == null) r = new ClassNotFoundException;
return r;

dubwai · 2005-07-13 Reply Admin

Anonymous:
I'll say it again - executeQuery NEVER RETURNS A NULL. Thanks for duplicating the WTF!

I'll ask again, how do you know?

christoofar · 2005-07-13 Reply Admin

Object r = null; //wtf? Object r; is the same as Object r = null;
select *... mild WTF
Raising a CLASSNOTFOUND exception for a DATABASE fuckup (ahem I mean error). Either raise Application.Exception or SqlException, one of the two or roll your own... but not that!
This function is called findRecord, and yet I bet "executeQuery" is doing something besides just executing a query. It is probably returning a dataset or some other enumeration, not just a record.
5. The ClassNotFoundException would only be raised if executeQuery did something like this
   try{
      //foo
    catch(){..nocode..}, because database exceptions would have been thrown. So unless you wrap executeQuery in try(), why bother checking for null?
6. Why return a type Object if this is data? Shoudln't the return type be of type DataSet?

Schol-R-LEA · 2005-07-13 Reply Admin

The function itself is a WTF! While I am all for decomposing programs down into a lot of small functions and abstracting methids as far as possible, in this case there is a loss of abstraction from using this function. The caller has to discard the information about both the actual parameters of the query and the query's results. Furthermore, the incorrect exception handling means that the function swallows the intended exception (and would be using an incorrect exception type anyway), and it fails to either handle of declare the possible exceptions from the executeQuery() as well - hell, it shouldn't even compile like that. If they'd simply written out the damn query execution (as a parameterized query, of course) where it is being used, it would actually be a lot clearer.

The saddest part is that it looks like the reason he returns an Object rather than a RecordSet is so it can pass either a ResultSet or a ClassNotFoundException. If the coder had known what they were doing, they would never have done it that way.

Oh yes, and it makes no reference to the instance variables of the class it belongs to; it's a static method in an instance method's clothing. Since it isn't mentioned what sort of class it is written for, I assume it should have been a static function (and probably a private one at that), though I wouldn't be surprised if the objects of the class in question would already have the table name in an instance var. Does this coder actually know Java, or OO programming, at all? Apparently not.

For that matter, how is that executeQuery() supposed to work, anyway? It's called as if it were a global function, rather than as a method for a Statement object. Java doesn't even allow global functions! There's no way this would compile. Perhaps that was the instance variable it should have been referring to, eh?

dubwai · 2005-07-13 Reply Admin

christoofar:
1. Object r = null; //wtf? Object r; is the same as Object r = null;

That's only true for instance and class variables. Local variables have no default initialization.

dubwai · 2005-07-13 Reply Admin

Schol-R-LEA:

For that matter, how is that executeQuery() supposed to work, anyway? It's called as if it were a global function, rather than as a method for a Statement object. Java doesn't even allow global functions! There's no way this would compile. Perhaps that was the instance variable it should have been referring to, eh?

I'm not sure if this is what you mean in the above, but executeQuery is apparently a member of the instance or class.

2005-07-13 Reply Admin

Jeff Wrote: 'Don't get me wrong, there is a time and a place for everything. And some tables (most often transaction tables or tables that take lots of raw, unvalidated user input) should have identities as the PK. But not every single table in your database for goodness sake!'

I would say that you have gotten the rule backwards. Most tables should have identities as the PK, but there are exceptions.

loneprogrammer · 2005-07-13 Reply Admin

ammoQ:
No need to check the return value, just try to cast it to (presumably) RecordSet. If it was an exception, this attempt will - voila - throw an exception. ;-)

<voice style="two guys in a british pub"> Brillant! </voice>

dubwai · 2005-07-13 Reply Admin

loneprogrammer:
ammoQ:
No need to check the return value, just try to cast it to (presumably) RecordSet. If it was an exception, this attempt will - voila - throw an exception. ;-)

<VOICE>Brillant! </VOICE>

Annoyingly, this is how a lot of code I have to maintain works. Just substitute NullPointerException for ClassCastException.

Interestingly enough, if that's actually what is done, the exception message from that would be something like:

"ClassCastException: ClassCastException"

2005-07-13 Reply Admin

Anonymous:
I would say that you have gotten the rule backwards. Most tables should have identities as the PK, but there are exceptions.

I would definately agree with this.

2005-07-13 Reply Admin

Schol-R-LEA:

For that matter, how is that executeQuery() supposed to work, anyway? It's called as if it were a global function, rather than as a method for a Statement object. Java doesn't even allow global functions! There's no way this would compile. Perhaps that was the instance variable it should have been referring to, eh?

Well if he extended a Statement object it would exist as a method and could be called the way that its listed... thats what made me wonder about that possibility.

2005-07-13 Reply Admin

Ahh finally some java on this site :)

Anonymous:
imo, there's nothing inherently wrong with using "select * ".

if the database is designed properly, and the query is limiting the select correctly with WHERE statements, it is fully possible and even reasonable that returning the data in every column in the particular table is what is actually desired.

in that case, explicitly calling column names is both error prone (typos, etc.) and less flexible (suppose a column is added or removed).

The problem is, most databases are designed poorly, and most queries are written incorrectly, so using "select * " has become a wtf to most folks.

my 2 bits.

Yes, I used select * allot in my webform code. Maybe thats because I've designed the tables I'm quering so I need every column in a given record anyways. I don't consider this a wtf.

I say that setting a new obj or var to null to initialize is fine, as long as your checking against it being null. However, in this case it's borderline WTF and it's a good clue as to the authors skills, they probably initialize all their objects to null like this which could definately cause problems.

Returning an exception as an object, wow, I've never seen that before! Lol, glad I don't do that. Yes, definately a wtf.

The call to method: executeQuery(String sql) could be a wtf or not, depending on what that method looks like. That exectuteQuery could be creating a prepared statement and opening and closing connections to the db, or even validating the table name. Without a wider context of what this little gem is doing it's hard to say.

Concatinating (sp) an int into a string directly, no problem with that in java, no wtf there.

Using "i" as the int name, not a WTF (in this case) imo, it's a local variable name(local to that method only). However, in general practice it's not good form and this falls into he same category as initilizing to null. Based on these two little "clues" I bet there are some coding doosies by this author that trump this example.

Another little style choice, naming the string table should be "tableName", but it's a minor niggle.

loneprogrammer · 2005-07-13 Reply Admin

It looks as if the guy read this article and came up a way to eliminate the "multiple exit points" problem.

2005-07-13 Reply Admin

Jeff S:
Anonymous:

We are going off the subject of the WTF, although I'll note that assuming the key is always 'id' is a little wonky.
But this is something that I've heard both ways. Some people say always use an index, others say always use meaningful columns.

The way I see it, using meaningful columns is nice because they are, well, meaningful. The downside is if you ever need to change the PK, it's a noghtmare (that I've been through more than once.)

I seem to recall there being a downside to using indexes as the PKs, but I can't remember what it was.

Thats downside is large enough to justify IDs on its own, but there are plenty others.
For instance, if I have a table Employee referencing a table Company, and Company's primary key is "Company_Name" instead of an ID column. Then if the Company names gets changed (gee that never happens)... OOPS integrity contstraint.

That's why you don't use CompanyName. You have to pick things intelligently. What makes each company unique? Does it have a code name, or a short abbreviation? You can even use a number or a code if you want, just don't use randomly changing, meaningless identities.

Don't get me wrong, there is a time and a place for everything. And some tables (most often transaction tables or tables that take lots of raw, unvalidated user input) should have identities as the PK. But not every single table in your database for goodness sake!

I know, lets abuse the DUNS number.

dubwai · 2005-07-13 Reply Admin

loneprogrammer:
It looks as if the guy read this article and came up a way to eliminate the "multiple exit points" problem.

Thanks for posting that. I knew there was a reason I think Joel is a maroon.

2005-07-13 Reply Admin

BTW, the select * is merely defining the width of my result set, a WHERE clause is a must with little exception (no pun intended).

Furthermore with a little more reflection on this, setting the object to "null" in this case is probabaly retarded as excecuteQuery won't most likely be returning a null. But without seeing said method, there's no way to really know what it's returning.

2005-07-13 Reply Admin

1) Declaration of table and i not final.
2) Assigning a local (r) to null.
3) Redundantly assigning a local (r).
4) Assigning to a reference of a more abstract type (java.lang.Object) to an interface type (java.sql.ResultSet).
5) Passing a parameter (table) of a public API to the database (i.e. potential SQL Injection vulnerability).
6) A one line statement block not surrounded in braces.
7) Failure to restrict scope optimally.

Everything else that has been mentioned is an extrapolation that assumes too much context (perhaps is could be provided?).

dubwai · 2005-07-13 Reply Admin

Anonymous:

Furthermore with a little more reflection on this, setting the object to "null" in this case is probabaly retarded as excecuteQuery won't most likely be returning a null. But without seeing said method, there's no way to really know what it's returning.

If executeQuery could return null, would that make setting the variable to null before that less retarded?

2005-07-13 Reply Admin

Alex Papadimoulis:
** Three Letter Acronym, which happens to be the best self-describing thing ever.

Wanna know something really geeky?

It's actually 'Three Letter Abbreviation'. An 'acronymn' is an abbreviation that forms a word, like 'laser' or 'radar'.

Grammar geekiness rules! [:D]

Cheers,
mabster

Jeff S · 2005-07-13 Reply Admin

Jonathan -- great job avoiding answering every question. The one you came closest to actually attempting to answer (the last), you still didn't. I didn't ask what foreign keys you would use to relate the EmployeeSkills table to the Employees and the Skills tables, I asked what the primary key of THAT table would be.

Clearly, this is not the proper forum for an intelligent database discussion!

dubwai · 2005-07-13 Reply Admin

Anonymous:

Alex Papadimoulis:
** Three Letter Acronym, which happens to be the best self-describing thing ever.

Wanna know something really geeky?

It's actually 'Three Letter Abbreviation'. An 'acronymn' is an abbreviation that forms a word, like 'laser' or 'radar'.

Grammar geekiness rules! [:D]

Cheers,
mabster

http://merriamwebster.com/cgi-bin/dictionary?va=acronym

http://merriamwebster.com/cgi-bin/dictionary?book=Dictionary&va=abbreviation

examples from these:

acronym: FBI

abbreviation: amt (for amount)

2005-07-13 Reply Admin

Jeff S:

What should the primary key be of a table listing U.S. States?

A numbered index, Jeff!

Remember when U.S. State abbreviations weren't just two Capital letters? Massachusetts was "Mass.", not "MA", for example. Values change, even if you don't think they will. Use numbered indexes and keep them unrelated to the data itself.

2005-07-13 Reply Admin

Here's what I think is going on.

Somewhere in this class is a method along the lines of:

public ResultSet executeQuery(String query) {
    Statement statement = comeEstablishedConnection.createStatement();
    ResultSet record = null;
    try {
       record = statement.executeQuery(query);
       return record;
    }
    catch (SQLException e) {
       return null;
    }
}

The author of "findRecord" just takes the above written piece of amatuerism and builds on everything that's wrong with it.

Schol-R-LEA · 2005-07-13 Reply Admin

dubwai:
Schol-R-LEA:

For that matter, how is that executeQuery() supposed to work, anyway? It's called as if it were a global function, rather than as a method for a Statement object. Java doesn't even allow global functions! There's no way this would compile. Perhaps that was the instance variable it should have been referring to, eh?

I'm not sure if this is what you mean in the above, but executeQuery is apparently a member of the instance or class.

*kicks self* Oh, yes if course, I should have realized, especially since it was even mentioned earlier. Never mind.

Jeff S · 2005-07-13 Reply Admin

Anonymous:
Jeff S:

What should the primary key be of a table listing U.S. States?

A numbered index, Jeff!

Remember when U.S. State abbreviations weren't just two Capital letters? Massachusetts was "Mass.", not "MA", for example. Values change, even if you don't think they will. Use numbered indexes and keep them unrelated to the data itself.

The nice thing is with a "numbered index" is luckily nothing ever attempts to stop you from having

1,MA

2,Massachusetts

3,Mass

4,Massachusets

5,Ma.

6,Mass.

in your table .... No pesky constraint or primary key violations to worry about!

Why would you want to make any attempt at the database layer to enforce only 1 value for massachusetts, right!?

It is April 1st or something?

2005-07-13 Reply Admin

dubwai:
Anonymous:

Furthermore with a little more reflection on this, setting the object to "null" in this case is probabaly retarded as excecuteQuery won't most likely be returning a null. But without seeing said method, there's no way to really know what it's returning.

If executeQuery could return null, would that make setting the variable to null before that less retarded?

No it wouldn't :)

Just thinking out loud.

Theres a big difference between something that isn't "proper", or bad practice from a style standpoint and something that is just flat out wrong. To me, real WTFs are those things are the latter.

Setting this to null in this case is just unecessary but not really harmful, as if this needed something else wrong with it to make it more borked up then it is.

2005-07-13 Reply Admin

The real WTF is how the editor can claim to be a programmer and never have heard of LOC before.

UncleMidriff · 2005-07-13 Reply Admin

Jeff S:
Anonymous:
Jeff S:

What should the primary key be of a table listing U.S. States?

A numbered index, Jeff!

Remember when U.S. State abbreviations weren't just two Capital letters? Massachusetts was "Mass.", not "MA", for example. Values change, even if you don't think they will. Use numbered indexes and keep them unrelated to the data itself.

The nice thing is with a "numbered index" is luckily nothing ever attempts to stop you from having

1,MA

2,Massachusetts

3,Mass

4,Massachusets

5,Ma.

6,Mass.

in your table .... No pesky constraint or primary key violations to worry about!

Why would you want to make any attempt at the database layer to enforce only 1 value for massachusetts, right!?

It is April 1st or something?

Gosh dangit, all of you, can we stop yelling at each other and calling each other names? This is a subject of great interest to me and I'd rather learn from you people than watch you bicker! ;)

Jeff, please correct me if I am wrong. Let's say you don't have a auto-increment primary key on your States table. I'm assuming you would rather use the name of the state? If so, you could still have craptastic data, e.g.

MA

Massachusetts

Mass

Massachusets

Ma.

Mass.

Just to refrain from using an auto-increment primary key alone won't give you any "pesky constraint or primary key violations to worry about" in this situation, will it?

Something I have seen quite a bit from those who don't condone the use of meaningless primary keys is the assumption that those who do are incapable of using additional and necessary constraints. The mere fact that I might use an auto-increment primary key on a table does not necessitate that I let the user's do as they please with the meaningful columns in that table all slapdash and willy-nilly. I can have a table States(StateID, StateName) such that StateID is the primary key AND StateName is unique.

strongarm · 2005-07-13 Reply Admin

My nickel's worth....some of it's already been said, some of it disagrees with what's been said...

First a correction to people that are saying Object r = null; is bad because it should already be initialized by Java. Java doesn't work that way. Variables and references declared inside of a method are not automatically initialized (at the class level they are). The author could have condensed the two lines together though as in Object r = executeQuery....

On to my WTF's...

1.) Returning an exception....that amuses me.
1.a.) Returning Object doesn't bother me as much, depending on the context in which this is being used. Maybe Object is the most concrete thing that can be returned in this case. So it may be a WTF and may not be.
2.) Using ClassNotFoundException....seems like a RecordNotFoundException to me. At least he's not propagating SQLException all the way out to the caller.
3.) Exceptions should be for exceptional conditions...not finding a record is not exceptional (and in many cases expected). As a personal rule, my methods named "findBlahBlah" do not throw exceptions if they can't find it. My rationale for this has been...the find method doesn't have enough information to make the determination whether it's an exception....e.g. "System please find me this record if it exists." "Oh good, it doesn't exist, so we can execute this functionality." or "Ok, I guess we should create it then." Whatever method calls findRecord should make the determination as to whether it's an error condition.
4.) Not using a StringBuffer for creating the SQL. I read somewhere (can't remember where off the top of my head) that if there are more than three things being concatenated, StringBuffer will perform better than concatenation. Less than three, the cost of allocating the StringBuffer and converting it to a string is more expensive then allocating the new strings on each concat.
5.) Using i as the variable name for an id.
6.) Returning an exception....it's so bad it counts twice.
7.) Not condensing Object r= null; and the following line into one line.
8.) I have a feeling the author of the executeQuery method is swallowing the exceptions...no try catch surrounding executeQuery and no throws clause in the declaration (unless he's throwing an unchecked exception -- then I retract #8).

Also, did anybody notice that the there are two posts hanging out above the original?

strongarm · 2005-07-13 Reply Admin

Anonymous:
Jeff S:
What should the primary key be of a table listing U.S. States?
How about customers or employees if your business process assigns them each a unique code when they come aboard?

How about the chart of accounts in a GL? Or a summary table containing 1 value per month/account?

Or a table relating Employees to Skills?

Identities for all of these?

Blindly adding identity columns to tables as the "primary key" ensures no integrity whatsoever in your data, and also hinders things like moving from DEV to PROD or re-importing or re-updating tables. If you have lots of code that checks to ensure StatusID = 34 and DeptID = 938 and OfficeID = 27, what happens when you need to re-import the data or move from DEV to PROD? Isn't it a *little* bit more consistent and easier to work with when you have Status="A" and Dept = "ACCT" and Office = "BOS" which also won't randomly be assigning and/or change?

Wtf are you talking about? If you have code doing hard coded ID checks, then you have already committed a WTF far greater than any I have seen on this page.

It's an old debate -- identities everywhere versus "natural" keys.

There's no point in getting into too thoroughly from here, but I can't tell you how many times we have to help people clean up duplicate, redundant and sloppy tables at sqlteam.com every day because people don't assign meaningful primary keys that actual help to logically constrain and clearly define their data.

I have seen plenty *more* bugs dealing with improperly defined unique constraints, and changing uniqueness rules than I ever want to deal with. I have seen tables with literally 5 or 6 column primary keys. Maybe the data I work is just more dynamic than yours, but I hate being constrained with changing a department name, because of a database integrity constraint, for some reason my customer's don't like that either.

As for the debate, if you don't want to continue, I don't really care. My original point is that if you going to use numeric ID column, in Java atleast it needs to be a long.

Agreed, this is why I'm from the school of use both dumb keys and business keys. a dumb primary key makes life so much easier when having the computer look up the data or making foreign keys. At the same time, the business key (alternate key whatever the name you learned it as) is used to constrain the data.

I love it in a system that uses "natural" keys to enforce RI...as the schema evolves you start getting 7 or 8 part keys, and then someday someone comes up with a need for adding a 0..n relationship so now you're looking at an 8 or 9 part key and so one. If every table gets it's own dumb key and business key, then the business key in the descendant table can simply include the foreign key along with whatever else is required.

Second, when dealing with data warehousing...it's a good idea to have both. Otherwise trying to uniquely identify a customer without using a dumb key becomes a real pain...usually something like...fname, lname, suffix, middle initial, street address, city, postal code, state/province, and country. No thanks...define an AK with all that crap defined and provide a customer_id so my code is nice and simple.

Also, if everything in the system uses the same key type...i.e. a long or a UUID it's a whole lot easier to write generic lookup code (i.e. test for existence, retrieve a single record, delete a record etc.).

FrostCat · 2005-07-13 Reply Admin

The way I see it, using meaningful columns is nice because they are, well, meaningful. The downside is if you ever need to change the PK, it's a noghtmare (that I've been through more than once.)

This is sufficient reason in and of itself to use a unique ID for your primary index.

Just wait til you have a database with 100K records in a table where the primary index has 4 fields. The joins alone are obnoxious enough to make it worth skipping, let alone the more serious issue of referential integrity issues.

Alex Papadimoulis · 2005-07-13 Reply Admin

strongarm:

Also, if everything in the system uses the same key type...i.e. a long or a UUID it's a whole lot easier to write generic lookup code (i.e. test for existence, retrieve a single record, delete a record etc.).

I say, if you're going to play CODASYL, you may as well use CODASYL. It becomse so much easier that way because you get all these great things like functions to help you walk pointer-chains ...

FrostCat · 2005-07-13 Reply Admin

Anonymous:
You don't seem to understand, this wasn't designed in the first place like this, it evolved into this horric state over time. When it was first created, I guess the developer thought like you and had one column unique identifier. But then it turns, the business rules changed and we organized everything in to business units, and different business units wanted the same unique code... OUCH.

Geez, there's your WTF right there. The identity field shouldn't be visible to the departments--that way they can have their own non-unique department codes.

FrostCat · 2005-07-13 Reply Admin

dubwai:
Anonymous:

Alex Papadimoulis:
** Three Letter Acronym, which happens to be the best self-describing thing ever.

Wanna know something really geeky?

It's actually 'Three Letter Abbreviation'. An 'acronymn' is an abbreviation that forms a word, like 'laser' or 'radar'.

Grammar geekiness rules! [:D]

Cheers,
mabster

http://merriamwebster.com/cgi-bin/dictionary?va=acronym

http://merriamwebster.com/cgi-bin/dictionary?book=Dictionary&va=abbreviation

examples from these:

acronym: FBI

abbreviation: amt (for amount)

Remember, kids, all acronyms are abbreviations, but not all abbreviations are acronyms. Quick test: in set theory, how do we define that relationship?

Chris F · 2005-07-13 Reply Admin

Jeff S.:
Crazy talk, I know, but keep in mind that if your database has an identity primary key called "ID" on every table, then you what you have is not a database but a complete mess.

The only mess we have here is your inductive reasoning.

Meaningless keys are clearly advantageous over meaningful keys, which you advocate. You are crusading against the superior option. As soon as you couple record identity to some external source, you introduce a dependency which possibly causes you to rearchitect your entire key structure when that source changes.

There is no reason why a database with a meaningless key structure can't be as tightly constrained as your vision. Or do you honestly believe the only way to enforce unique constraints is via a primary key?

2005-07-13 Reply Admin

Anonymous:

Object obj /*= null */; try { ... obj = someFunctionOrAnother(...); } finally { if(obj!=null) doReleaseStuff(obj); }

Would this not be better written as:

Object obj /*= null */;
try
{
 obj = someFunctionOrAnother(...);
} catch {
 obj = null;
} finally {
 if(obj!=null) doReleaseStuff(obj);
}

Or more likely:

Object obj = someFunctionOrAnother(...);
try
{
...
} finally {
doReleaseStuff(obj);
}

I'm not sure about Java, but I cannot think of a good reason to initialize an object to null... ever. Maybe someone could provide a better example?

2005-07-14 Reply Admin

Here is the biggest WTF for me. Quoting from the Java API documentation


public ResultSet executeQuery(String sql)
                       throws SQLException

Executes an SQL statement that returns a single ResultSet object.

Returns:: a ResultSet object that contains the data produced by the given query; never null

so by this, the if (r== null) line and subsequently r = new ClassNotFoundException() will never execute. You're either going to get an empty dataset, or one with records in it. You'll never get the exception object back.

2005-07-14 Reply Admin

Well few things that i figured out were:

<font>1.public Object</font> findRecord(<font>String</font> table, int i) {
2. <font>Object</font> r = <font>null</font>;
3. r = executeQuery(<font>"select * from "</font> + table + <font>" where id = "</font> + i);
4. <font>if</font> (r == <font>null</font>) r = <font>new</font> <font>ClassNotFoundException</font>();
5. <font>return</font> r;
6.}

a)in line 3 you are concatenating Strings, "select * from", and "table" are static strings. What internally is being done is that four objects are created everytime the query string is formed. So if you have 1000 query you would have the overhead of creating 5000 objects , i.e. 1000 for each String + 1000 for each object returned. Using StringBuffer will be a lot less resource intensive.

b) we r returning ClassNotFoundException()... how can the caller figure out whether it is returned by java or the database??

It would be great if the author enlightens us about the exact nature of those >7 errors

Drak · 2005-07-14 Reply Admin

First of all, I didn't read all of the posts.

Secondly, there's nothing wrong in returning an exception, and here's why:

If you use the Invoke method to invoke methods from an assembly you loaded into memory (like a plugin) and this method throws an exception all you will ever get back is 'Invocation threw an exception' or some such message. The original exception gets lost. So you declare the return type as an object, return either what you want, or an exception. Then you can check after your invoke if the return value is of the exception type and throw the right exception.

I've used this in various places, but mostly only to propagate real exceptions ie:

Try
...
Catch ex as Exception
Return ex
End Try
Return RealValue

Drak

Drak · 2005-07-14 Reply Admin

Ofcourse I wrote the above without noticing that everybody seems to agree that the original code is Java. I don't knwo if Java has assemblies, or Invoke methods etc. So if it does not, ignore the above post, unless you want to know why you might want to return an exception in (vb).Net.

Drak

Exception Disfunction

Leave a comment on “Exception Disfunction”