The Daily WTF: Curious Perversions in Information Technology

dubwai · 2005-06-09 Reply Admin

Sean:

I'd say it's a WTF on both ends. You'd think the developer would maybe, just maybe try to either determine the cause of the various forms of strings-containing-nulls that come or at least try to do some sort of simple data cleansing so he/she doesn't have to add a new conditional every time a new issue comes in.

In a perfect world, yes but sometimes the developer knows exactly why this is happening and can't do anything about it. If you are working with a 3rd party API, you might know that the API is causing the issue but you don't have a choice but to deal with it. Or if you are getting this data from another team that can't or won't deal with the issue so you just suck it up and deal.

I think this could be written better, though. In the HashSet example I gave above, you could load the nullStrings from a file or something so that you don't have to change the code everytime a new one of these is found.

triso · 2005-06-09 Reply Admin

Rick:

Arguably the best symphony in the country and a fantastic antique car museum, though

Nonsense! The TCO doesn't hold a candle to the BugTussle String and Jug Philharmonic.

John Smallberries · 2005-06-09 Reply Admin

rogthefrog:
A Wizard A True Star:


rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.


Storing delimited strings in database fields makes baby Jesus cry.




Baby Jesus deserves it. Sometimes that's actually a good way to do things.

I agree, sometimes. Not this time though.

One time I did it was when implementing a hierarchy in SQL. One of my columns was similar to an XPath expression representing the path to the row's node in the hierarchy. It looked like: "/1/2/3/" but the values were the PK's of other rows (nodes)... "/GrandDad/Dad/Me/".

To find all descendants of an arbitrary node, I could query it like "WHERE path like '%/GrandDad/%'" (replace "GrandDad" with an actual int PK value, of course).

Worked like a charm.

2005-06-09 Reply Admin

rogthefrog:
Rick:
rogthefrog:

Gawd... I think I know where __null and null__ and all other variants comes from.

It looks like a string is used to hold values for several fields at once, like a comma-separated line, except using __ as the delimiter, or like a bitmask.

So if the semantics of the strings is

lastname__firstname__zipcode

you could have

papadimoulis__alex__12345

This would explain why no test for "" or "null" since the input would always have the underscores added before the test.

Correction to my post: the separator is _ not __. The occurrence of __ signifies that all 3 possible values in the string are blank.

If this is right (which it probably is, sadly) this design is pretty moronic.

There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db. If each task's result uniquely identifies the task, this schema allows you to data-mine your db for questions like "how many users did task X right before task Y and didn't return null for either task"; "how many users stopped the whole process immediately after task X"; etc. You could store one record per task for each user but that would mean a lot more rows in your db and make said data mining possibly harder.

I shudder at the mere thought, yet you seem to be right: /^(null_|_)(null_|_)(null)?$/ seems to indeed match all 8 cases (the number 8 is itself a clue)

My eyes, my eyes

Jan

Miszou · 2005-06-09 Reply Admin

rogthefrog:
A Wizard A True Star:

rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.

Storing delimited strings in database fields makes baby Jesus cry.

Baby Jesus deserves it. Sometimes that's actually a good way to do things.

I don't believe that is *ever* a good way to do things...

The design of your database should be able to expand to accomodate multiple tasks, without overloading the meaning of a particular column or storing multiple values in a single column. What kind of monstrous SQL statement are you going to write if you need to do something simple such as count the sub-fields of a column? Are you going to load the data into your application and parse it, or would you attempt to write a parser in SQL?

Please submit some of your database designs so that we don't run out of material any time soon...

2005-06-09 Reply Admin

Anonymous:

I shudder at the mere thought, yet you seem to be right: /^(null_|_)(null_|_)(null)?$/ seems to indeed match all 8 cases (the number 8 is itself a clue)

My eyes, my eyes

Jan

/^(null)?_(null)?_(null)?$/ would have been marginally more readable

Bustaz Kool · 2005-06-09 Reply Admin

haveworld:
Bustaz Kool:
Is anyone else surprise that there are at least #126185 issues reported so far?

And how did you come to that conclusion?

The comment of "Added for Issue #126185" is most likely a reference to a problem report number; implying that there were 126,184 previous errors (and counting).

John Bigboote:

Interestingly, an empty string will return false. Not sure if that's an oversight or an f'ed up twist on the rule.

Maybe it's my database centricity but strictly steaking an empty string is not Null. And if this is what's involved in determining if a value is Null, I am dying to see how they go about detecting an empty string.

Anonymous:
Reloacate to Cleveland! +o( Now there is a WTF!

There was a major tornado that hit Cleveland last night... It is estimated that it did $25,000,000.00 worth of improvements... (Sorry, Alex, I just souldn't resist).

Great orchestra though and the Rock'n'Roll Hall of Fame.

El Duderino · 2005-06-09 Reply Admin

rogthefrog:
A Wizard A True Star:

rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.

Storing delimited strings in database fields makes baby Jesus cry.

Baby Jesus deserves it. Sometimes that's actually a good way to do things.

I'm afraid I'll have to disagree as well. I've yet to see an example of when it would be good and would love to see one!

At my last job, one of my coworkers supported a years old system (which he wrote and only he really understood). One of the key tables used a natural key as its primary key. Inevitably, the business model changed so that the entities defined by the table no longer could be handled by his key. Redesigning the schema was clearly beyond him because his solution was to turn the PK into a concatenated string -- now every join to the table involves parsing the key. Hmmm . . . wonder why the system's running so much slower now?

2005-06-09 Reply Admin

Try looking materialized paths for representing trees in SQL.

Jeff S · 2005-06-09 Reply Admin

rogthefrog:
A Wizard A True Star:

rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.

Storing delimited strings in database fields makes baby Jesus cry.

Baby Jesus deserves it. Sometimes that's actually a good way to do things.

It is a great way to do things if you have no idea how databases work or any desire to use indexes, constraints, relationships, and so on.

rogthefrog · 2005-06-09 Reply Admin

John Smallberries:

rogthefrog:
A Wizard A True Star:


rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.


Storing delimited strings in database fields makes baby Jesus cry.




Baby Jesus deserves it. Sometimes that's actually a good way to do things. 



I agree, sometimes. Not this time though.

One time I did it was when implementing a hierarchy in SQL. One of my columns was similar to an XPath expression representing the path to the row's node in the hierarchy. It looked like: "/1/2/3/" but the values were the PK's of other rows (nodes)... "/GrandDad/Dad/Me/".

To find all descendants of an arbitrary node, I could query it like "WHERE path like '%/GrandDad/%'" (replace "GrandDad" with an actual int PK value, of course).

Worked like a charm.

Exactly what I meant.

rogthefrog · 2005-06-09 Reply Admin

Jeff S:
rogthefrog:
A Wizard A True Star:

rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.

Storing delimited strings in database fields makes baby Jesus cry.

Baby Jesus deserves it. Sometimes that's actually a good way to do things.

It is a great way to do things if you have no idea how databases work or any desire to use indexes, constraints, relationships, and so on.

I'm not saying it's a good way to do things all the time, or even 1% of the time. All I'm saying is that you occasionally run into one ad-hoc problem where this is the optimal solution. See John Dingleberries' example above.

El Duderino · 2005-06-09 Reply Admin

It's A solution. I wouldn't call it an optimal solution -- and for the task at hand it might be the fastest-to-implement solution. I admit that I've taken lots of shortcuts like this when writing throw away code or am prototyping functionality not related to the DB schema but in a production database I'd use a properly indexed xref table.

mizhi · 2005-06-09 Reply Admin

dubwai:
I don't find the single exit point to make anything clearer. It makes the code path merge back on itself artifically (making the code more complex) instead of a clean branch. It keeps the compiler from ensuring that all the code paths actually return something meaningful. I like to be able to follow the code and easily see what it returns. I don't like to have to walk through a bunch of extra code to see whether the returnValue variable is changed later on in the method.

Depends on the code. Sometimes it makes the code more complex and that would be one of those instances where you don't use a single exit point. Other cases, the code is much cleaner if you restrict yourself to one exit point. It's not a hard and fast rule you have to apply to every single function written.

dubwai · 2005-06-09 Reply Admin

rogthefrog:
Jeff S:
rogthefrog:
A Wizard A True Star:

rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.

Storing delimited strings in database fields makes baby Jesus cry.

Baby Jesus deserves it. Sometimes that's actually a good way to do things.

It is a great way to do things if you have no idea how databases work or any desire to use indexes, constraints, relationships, and so on.

I'm not saying it's a good way to do things all the time, or even 1% of the time. All I'm saying is that you occasionally run into one ad-hoc problem where this is the optimal solution. See John Dingleberries' example above.

Why wouldn't you just create a table that with a parent column which would point to the parent key or null if it was the root?

dubwai · 2005-06-09 Reply Admin

mizhi:

Depends on the code. Sometimes it makes the code more complex and that would be one of those instances where you don't use a single exit point. Other cases, the code is much cleaner if you restrict yourself to one exit point. It's not a hard and fast rule you have to apply to every single function written.

OK, we are of the same mind on this. I thought you were one of those 'multiple exit points are evil' people.

John Smallberries · 2005-06-09 Reply Admin

El Duderino:
It's A solution. I wouldn't call it an optimal solution -- and for the task at hand it might be the fastest-to-implement solution. I admit that I've taken lots of shortcuts like this when writing throw away code or am prototyping functionality not related to the DB schema but in a production database I'd use a properly indexed xref table.

There are many facets to "optimal". Are talking, the most performant? the most scalable? the most maintainable? the most elegent? the most dogmatic?

I modeled and prototyped a bunch of techniques (self-referential tables, nested sets...) and this technique (known as materialized path) was the fastest, most flexible and was implemented with the simplest and least opaque code. In my estimation, it was optimal for this case.

John Smallberries · 2005-06-09 Reply Admin

dubwai:
Why wouldn't you just create a table that with a parent column which would point to the parent key or null if it was the root?

The app had to support multiple hierarchies with arbitrary depths. To build a tree of nodes with the schema you suggest would require n (the depth) number of joins (or worse, calls), but n is not known a priori. Your model makes it very easy to get all the children of a given node, but it's not as easy to find all the descendants or ancestors of an arbitrary node. If anyone is interested, I can dig out the code.

My thinking is: when a row is inserted, we know its position in the hierarchy. Why not record that rather than recomputing it for each subsequent query? I know, I know...don't store computed data in the DB, but this kind of computation is very expensive.

John Smallberries · 2005-06-09 Reply Admin

rogthefrog:
I'm not saying it's a good way to do things all the time, or even 1% of the time. All I'm saying is that you occasionally run into one ad-hoc problem where this is the optimal solution. See John Dingleberries' example above.

Hey! No name calling!

rogthefrog · 2005-06-09 Reply Admin

John Smallberries:
rogthefrog:
I'm not saying it's a good way to do things all the time, or even 1% of the time. All I'm saying is that you occasionally run into one ad-hoc problem where this is the optimal solution. See John Dingleberries' example above.

Hey! No name calling!

That was meant as a compliment. [:D]

Your solution sounds a lot like what I mentioned in my original post that had all the naysayers going WTF at me.

Another advantage to this kind of solution is that you can easily spot long-distance dependencies. For example if you're compiling lists of tasks it's very easy to see who did both task A and task B in that order, adjacent or not, etc.

Doesn't seem to be a lot of data miners on this board. A damn shame.

rogthefrog · 2005-06-09 Reply Admin

dubwai:
rogthefrog:
Jeff S:
rogthefrog:
A Wizard A True Star:

rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.

Storing delimited strings in database fields makes baby Jesus cry.

Baby Jesus deserves it. Sometimes that's actually a good way to do things.

It is a great way to do things if you have no idea how databases work or any desire to use indexes, constraints, relationships, and so on.

I'm not saying it's a good way to do things all the time, or even 1% of the time. All I'm saying is that you occasionally run into one ad-hoc problem where this is the optimal solution. See John Dingleberries' example above.

Why wouldn't you just create a table that with a parent column which would point to the parent key or null if it was the root?

As Bradford Lingonberries mentioned, your solution doesn't address arbitrary-length sequences (which in my case are the rule rather than the exception) and there's no easy way to traverse the tree/path/hierarchy in both directions.

AJR · 2005-06-09 Reply Admin

Bustaz Kool:
haveworld:
Bustaz Kool:
Is anyone else surprise that there are at least #126185 issues reported so far?

And how did you come to that conclusion?

The comment of "Added for Issue #126185" is most likely a reference to a problem report number; implying that there were 126,184 previous errors (and counting).

But not necessarily all in the one product. If it's a company-wide (or group with the company) bug tracking system, the numbers could well be unique across many different products. Consider, for example, mozilla.org's bugzilla, which has bug numbers up to 297252 (the highest number shown on the "new reports" page atm.) Bug number 297252 is listed as a Firefox bug, but that doesn't mean FF has had 297,252 bugs - that database includes many duplicates, bugs with other Mozilla products (like the Bugzilla system itself, for example), and other stuff, and the numbers cover all of those.
Who knows what's in the 126,184 previous "issues" in the system that the author of the WTF was using?

Bustaz Kool · 2005-06-09 Reply Admin

AJR:
Bustaz Kool:

haveworld:
Bustaz Kool:
Is anyone else surprise that there are at least #126185 issues reported so far?

And how did you come to that conclusion?

The comment of "Added for Issue #126185" is most likely a reference to a problem report number; implying that there were 126,184 previous errors (and counting).

But not necessarily all in the one product. If it's a company-wide (or group with the company) bug tracking system, the numbers could well be unique across many different products. Consider, for example, mozilla.org's bugzilla, which has bug numbers up to 297252 (the highest number shown on the "new reports" page atm.) Bug number 297252 is listed as a Firefox bug, but that doesn't mean FF has had 297,252 bugs - that database includes many duplicates, bugs with other Mozilla products (like the Bugzilla system itself, for example), and other stuff, and the numbers cover all of those.
Who knows what's in the 126,184 previous "issues" in the system that the author of the WTF was using?

You are absolutely correct... (My original remark was just kinda a joke...)

2005-06-09 Reply Admin

It's probably just a bug in the shareware. Maybe the management came up
with a new processor to test the validity of their sql-statement.
On the other hand, it seems likely that the bug is within the grasp
of a properly formatted text-string. It's pretty common. Maybe it could even
be downloaded to a handheld system for greater endurance!!!! Both pros and cons.
It's probably not even sold to thousands of households worldwide :(
Sad but that's how it goes. Anyway, cool code!

foxyshadis · 2005-06-09 Reply Admin

rogthefrog:
dubwai:
rogthefrog:
Jeff S:
rogthefrog:
A Wizard A True Star:

rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.

Storing delimited strings in database fields makes baby Jesus cry.

Baby Jesus deserves it. Sometimes that's actually a good way to do things.

It is a great way to do things if you have no idea how databases work or any desire to use indexes, constraints, relationships, and so on.

I'm not saying it's a good way to do things all the time, or even 1% of the time. All I'm saying is that you occasionally run into one ad-hoc problem where this is the optimal solution. See John Dingleberries' example above.

Why wouldn't you just create a table that with a parent column which would point to the parent key or null if it was the root?

As Bradford Lingonberries mentioned, your solution doesn't address arbitrary-length sequences (which in my case are the rule rather than the exception) and there's no easy way to traverse the tree/path/hierarchy in both directions.

Oh God, please stop quoting. ;_;
(=p)

El Duderino · 2005-06-10 Reply Admin

John Smallberries:
The app had to support multiple hierarchies with arbitrary depths. To build a tree of nodes with the schema you suggest would require n (the depth) number of joins (or worse, calls), but n is not known a priori. Your model makes it very easy to get all the children of a given node, but it's not as easy to find all the descendants or ancestors of an arbitrary node. If anyone is interested, I can dig out the code.

My thinking is: when a row is inserted, we know its position in the hierarchy. Why not record that rather than recomputing it for each subsequent query? I know, I know...don't store computed data in the DB, but this kind of computation is very expensive.

You mean my textbooks lied to me!?!?!?

[:'(]

'K I'll buy it. I'm certainly one who'll try anything once and I'm a big proponent of making the best decision regardless of conventions. Truth is, this sort of thing flys in the face of all the standards I've ever read about but I'll certainly keep it under my cap for an occasion when it might come in handy -- might even dazzle some DBAs with it! [;)]

Foon · 2005-06-10 Reply Admin

This is caused by using this:

String mungTogether= obj1 + "_" + obj2 + "_" + obj3;

to implement equals() and hashCode().

The correct way to do this kind of thing is to create a small object holding the potential tuples (triples), to treat the objects as objects (with types), and to write equals and hashCode properly. If you do it right, you only need to do it once.

diGriz · 2005-06-10 Reply Admin

Somehow I miss values like "completely_null", "real_full_null", "believe_me_this _is_so_null", "dull_null" or "hyper_null".

2005-06-10 Reply Admin

What scares me the most is that the "null_null_null" thing is obviously a symptom of something in the code being passed a "null_" string, which then looks at it, decides it's not a NULL string, and acts upon it.
Expect a new addition to this function soon, as "null_null_null_null" is bound to arrive as soon as some function is passed a "null_" string 4 times ;)

V. · 2005-06-10 Reply Admin

This time I don't think it's the developer himself , but the previous developers who failed to clean and management who fail to put maintenance money in the project.
I've seen stuff like this before. Main reason is: "no time, project must not crash".

2005-06-10 Reply Admin

John Smallberries:

True, str.equals("blah") || str.equals("blah") really makes me scratch my head trying to figure out the operator precendence. You'd think it would evaluate to the result of the first method call or'd with the result of the second method call.

Not necessarily. What if the || operator has higher precedence to the . operator? Then it could mean (str.equals("blah") || str).equals("blah")

[^o)]

2005-06-10 Reply Admin

AJR:
Bustaz Kool:
haveworld:
Bustaz Kool:
Is anyone else surprise that there are at least #126185 issues reported so far?

And how did you come to that conclusion?

The comment of "Added for Issue #126185" is most likely a reference to a problem report number; implying that there were 126,184 previous errors (and counting).

But not necessarily all in the one product. If it's a company-wide (or group with the company) bug tracking system, the numbers could well be unique across many different products. Consider, for example, mozilla.org's bugzilla, which has bug numbers up to 297252 (the highest number shown on the "new reports" page atm.) Bug number 297252 is listed as a Firefox bug, but that doesn't mean FF has had 297,252 bugs - that database includes many duplicates, bugs with other Mozilla products (like the Bugzilla system itself, for example), and other stuff, and the numbers cover all of those.
Who knows what's in the 126,184 previous "issues" in the system that the author of the WTF was using?

Also its entirely possible they started at bug 100,000 so they could have 900,000 bugs before entering this one:

bug #1,000,000: Bug numbers no longer sort properly as text.

drjava · 2005-06-10 Reply Admin

dubwai:

I thought you were one of those 'multiple exit points are evil' people.

In certain situations gotos are a very elegant solution; emulating them with if/while is a pain. A safe goto could easily be added to Java or C#, too bad it's not there.

2005-06-10 Reply Admin

This reminds me of something I read in a PHP 5 book earlier this year. The issue was how to treat the string "false" inside of XML tags as a boolean in the context of a PHP application. Soon the author was making this much harder than it had to be, applying a regular expression that not only tested for the existence of the string "false", but also of "no", "off", etcetera. Immediately two simple solutions jumped out at me as to how to address the original issue of converting the string "false" to boolean: 1) use eval() on the string, or better yet 2) making the "false" implicit, and only emitting the XML tag when the attribute is true. So it's no wonder people are writing code such as in your example, when "experts" who write books can hardly do better!

2005-06-10 Reply Admin

A Wizard A True Star:
rogthefrog:
There are, however, possible explanations. Say the app is tracking tasks, each of which returns some string (which may be null). You could store each task's result in some session variable, and when your user is done write it to your db as a single string separated by _. This makes it extensible: if you add another task, you don't have to add a field to your db.

Storing delimited strings in database fields makes baby Jesus cry.

another WTF in the making !!

smitty_one_each · 2005-06-10 Reply Admin

Charles Nadolski:
WTF. Why not just use str.Find("null") and str.Find("__"). That would handle all of these stupid cases, including Issue #eventy-billion. That is, unless for some damn reason "null" is a valid string.

//Added for Issue #107724
//Added for Issue #126185

now THAT is priceles. What is exactly issue number X? Reading too much slashdot on the job?

Because somebody might be feeding "annull" or "nullify" to the function, and trigger false positives.
A regular expression to make the test more precise might be worth considering.

2005-06-10 Reply Admin

John Smallberries:
dubwai:
Why wouldn't you just create a table that with a parent column which would point to the parent key or null if it was the root?

The app had to support multiple hierarchies with arbitrary depths. To build a tree of nodes with the schema you suggest would require n (the depth) number of joins (or worse, calls), but n is not known a priori. Your model makes it very easy to get all the children of a given node, but it's not as easy to find all the descendants or ancestors of an arbitrary node. If anyone is interested, I can dig out the code.

My thinking is: when a row is inserted, we know its position in the hierarchy. Why not record that rather than recomputing it for each subsequent query? I know, I know...don't store computed data in the DB, but this kind of computation is very expensive.

This reminds me of one of my very first projects I ever worked on. It was a fairly basic online shopping system written in PHP/MySQL. The category table had three columns, id, pid, and name, with the pid (parent-id) set to zero if it was a "top-level category." Everything was going well until I got to the part where the client added new items to inventory...the select box needed to have a higher level of indentation based on the subcategory (no leading spaces for top-level item, 2 for each level of nested-ness(?) )
Anyways, I ended up doing this:
$maintext.="<select name='cid'>";
 $sql = "SELECT * FROM `categories` WHERE `pid` = \"0\";";
 $toplevelcats = $db->query($sql) or send_error_page($db->error());
 //add extra whitespace for subcategories based on current indentationlevel
 $currentindentlevel=0;
 while($thistoplevelcat = $db->getrow($toplevelcats)){
 //no parents for this level
 $thistoplevelcatname=prepString($thistoplevelcat['name']);
 $thistoplevelcatid = $thistoplevelcat['id'];
 $maintext.="<option value='$thistoplevelcatid'>$thistoplevelcatname</option>";
 $sql = "SELECT * FROM `categories` WHERE `pid` = \"$thistoplevelcatid\" ORDER BY `name` ASC;";
 //store results index in array...increment to go to a new, deeper sublevel
 $subcatresults[++$currentindentlevel]=$db->query($sql) or send_error_page($db->error());
 while($currentindentlevel>0){
 while($thissubcatrow = $db->getrow($subcatresults[$currentindentlevel])){
 $thissubcatname = prepString($thissubcatrow['name']);
 $thissubcatid = $thissubcatrow['id'];
 $maintext.="<option value='$thissubcatid' ";
 if($thissubcatid == $cid){
 $maintext.="SELECTED";
 }
 $maintext.=">";
 for($i=0;$i<$currentindentlevel;$i++){
 $maintext.=" ";
 }
 $maintext.="$thissubcatname</option>";
 $sql = "SELECT * FROM `categories` WHERE `pid` = \"$thissubcatid\" ORDER BY `name` ASC;";
 $subcatresults[++$currentindentlevel]=$db->query($sql) or send_error_page($db->error());
 }
 $currentindentlevel--;
 }
 }
 $maintext.="</select></td></tr>";

Sadburger

dubwai · 2005-06-10 Reply Admin

Bobby Loganberries:
The app had to support multiple hierarchies with arbitrary depths. To build a tree of nodes with the schema you suggest would require n (the depth) number of joins (or worse, calls), but n is not known a priori. Your model makes it very easy to get all the children of a given node, but it's not as easy to find all the descendants or ancestors of an arbitrary node. If anyone is interested, I can dig out the code.

You can build a stored procedure to do this. I belive Oracle has a built in recursive query for this type of thing. As another person also mentioned, you can also create a cross-reference table that does this kind of thing.

Giovanni Huckleberries:

My thinking is: when a row is inserted, we know its position in the hierarchy. Why not record that rather than recomputing it for each subsequent query? I know, I know...don't store computed data in the DB, but this kind of computation is very expensive.

It seems to me that your solution requires retrieving the data, parsing it, and then going back to the DB for the rows specified in the parsed data. Maybe I am misunderstanding the solution.

dubwai · 2005-06-10 Reply Admin

rogthefrog:
As Bradford Lingonberries mentioned, your solution doesn't address arbitrary-length sequences (which in my case are the rule rather than the exception) and there's no easy way to traverse the tree/path/hierarchy in both directions.

I'm not sure how it doesn't address arbitrary length sequences.

To find all the children, search for rows with the node as the parent and recurse. To find the ancestors, you just find the parent row, find it's parent row until there is no parent.

This was just a simple example of the idea, you can get much fancier with a xref table as mentioned by someone else.

2005-06-10 Reply Admin

Why evaluate a boolean to return the same boolean?

return (blah || bleah || yadda || bubba);

should be more than enough.

PstScrpt · 2005-06-10 Reply Admin

mizhi:
dubwai:
I don't find the single exit point to make anything clearer. It makes the code path merge back on itself artifically (making the code more complex) instead of a clean branch. It keeps the compiler from ensuring that all the code paths actually return something meaningful. I like to be able to follow the code and easily see what it returns. I don't like to have to walk through a bunch of extra code to see whether the returnValue variable is changed later on in the method.

Depends on the code. Sometimes it makes the code more complex and that would be one of those instances where you don't use a single exit point. Other cases, the code is much cleaner if you restrict yourself to one exit point. It's not a hard and fast rule you have to apply to every single function written.

I make no effort to avoid multiple exit points. In fact, sometimes I'll move a block of code into its own procedure so that I can use an extra exit point rather than nesting the if's deeper. However, I can nearly always keep those exits within a few lines of the beginning of the procedure, so they're obvious.

dubwai · 2005-06-10 Reply Admin

PstScrpt:

I make no effort to avoid multiple exit points. In fact, sometimes I'll move a block of code into its own procedure so that I can use an extra exit point rather than nesting the if's deeper. However, I can nearly always keep those exits within a few lines of the beginning of the procedure, so they're obvious.

Exactly. People will sit there and tell you that 'return false' is more difficult to understand than an extra level of nesting and setting a variable to false and retuning it at the end of the method.

John Smallberries · 2005-06-10 Reply Admin

dubwai:
You can build a stored procedure to do this. I belive Oracle has a built in recursive query for this type of thing. As another person also mentioned, you can also create a cross-reference table that does this kind of thing.

I'm not sure about Oracle, but in SQL Server, recursive queries are hard to write and don't perform well. There is also a 32 level limit on the recursion. The xref table (nested sets technique) is also difficult to implement and requires many joins for deep hierarchies

dubwai:
It seems to me that your solution requires retrieving the data, parsing it, and then going back to the DB for the rows specified in the parsed data. Maybe I am misunderstanding the solution.

Not at all. The query only has a LIKE clause and no joins or recursion.

2005-06-10 Reply Admin

I love this site. It helps keep me calm when I am in despair, at work as one dirty hack is suggested to sort out the problem caused by another dirty hack. Ok there is a time and a place for a dirty hack but there is also a time and place for actually doing things 'right' in the first place or at least fixing them when they come to light. Anyway I digress.
Currently I am finding solace in the fact that one of my colleagues who I sent a link to this site in the hope that he might learn something useful seems to find it hilarious whilst remaining blissfully unaware that if I started sending his code in you wouldn't need any extra content for at least several months.

2005-06-10 Reply Admin

times new roman??

that's what comes of using word to check your spelling

dubwai · 2005-06-10 Reply Admin

Nigel Ginberries:
dubwai:
You can build a stored procedure to do this. I belive Oracle has a built in recursive query for this type of thing. As another person also mentioned, you can also create a cross-reference table that does this kind of thing.

I'm not sure about Oracle, but in SQL Server, recursive queries are hard to write and don't perform well. There is also a 32 level limit on the recursion. The xref table (nested sets technique) is also difficult to implement and requires many joins for deep hierarchies

OK, what about a stored-procedure?

Wesley Raspberries:
dubwai:
It seems to me that your solution requires retrieving the data, parsing it, and then going back to the DB for the rows specified in the parsed data. Maybe I am misunderstanding the solution.

Not at all. The query only has a LIKE clause and no joins or recursion.

But don't you need to retrieve the rows at some point? Perhaps you are dealing with smaller data sets that I have to work with. Generally, we can't rely on like queries (especially unbounded ones) because they can take hours, if not days to return.

dubwai · 2005-06-10 Reply Admin

Anonymous:
there is also a time and place for actually doing things 'right' in the first place or at least fixing them when they come to light.

Hmm. This is highly unorthodox. Doing things right in the first place? Fixing the root problem? I don't know what planet you are from but this is earth, buddy. We don't do things that way around here.

2005-06-10 Reply Admin

dubwai:
Anonymous:
there is also a time and place for actually doing things 'right' in the first place or at least fixing them when they come to light.

Hmm. This is highly unorthodox. Doing things right in the first place? Fixing the root problem? I don't know what planet you are from but this is earth, buddy. We don't do things that way around here.

sorry that comment was out of order

mizhi · 2005-06-10 Reply Admin

PstScrpt:
I make no effort to avoid multiple exit points. In fact, sometimes I'll move a block of code into its own procedure so that I can use an extra exit point rather than nesting the if's deeper. However, I can nearly always keep those exits within a few lines of the beginning of the procedure, so they're obvious.

That's usually when I feel having an extra exit point is fine, at the beginning of the function where it's obvious. It's usually to check input to make sure that it's kosher. In that case, though, you really haven't gotten to the meat of the function itself.

mizhi · 2005-06-10 Reply Admin

dubwai:

OK, we are of the same mind on this. I thought you were one of those 'multiple exit points are evil' people.

I'm a 'whatever makes the code elegant and easy to understand' person. :-)

There's More Than One Way to Null a String

Leave a comment on “There's More Than One Way to Null a String”