The Daily WTF: Curious Perversions in Information Technology

Cyresse · 2005-10-06 Reply Admin

Incidentally, in the sample data, it could occur in any sort of mixture from

var1 = value1

to

var1 = { subvar1 = { subvar1a = { i, i1, i2 } subvar1b = { } } subvar2 = 5 }

This is a syntax I've had to parse before, and I had a choice between a parsing library, a bunch of string methods, or a regex. I tried all three.

Jon Limjap · 2005-10-06 Reply Admin

I think this is called "test-driven development" :p

pbounaix · 2005-10-07 Reply Admin

lol!

2005-10-07 Reply Admin

A pythonic solution was presented, here's a vaguely improved version.

import re
ALPHA_REGEX = re.compile(<font face="Courier New">'^[-\w#\. ]+$')</font>
def isAlphaNumericOnly(value):
  '''
  @param value: value to check for alphanumericness
  @returns: True if value is numeric, False otherwise
  '''
  return bool(ALPHA_REGEX.match(value))

2005-10-07 Reply Admin

Y'know, if the isAlphaNumeric member is actually referenced in the clipped-out code, or if there is an accessor for it, this is slightly less of a WTF. Note that isAlphaNumeric properly records the status of the last validation performed (except in the null string case; this is just a bug). Setting it in the constructor is still goofy, as is raising an exception when it's false. But one can at least imagine what the member might be for. OK, the name is bad; should be "wasAlphaNumeric". Also, why would you want the object to remember this?

Never mind. WTF.

2005-10-07 Reply Admin

stevekj:
Nice. So let's see just how many WTFs there are in this one class:

1) private variable not used (written, but not read)
2) constructor taking meaningless argument
3) boolean-valued method that cannot return "false"
4) incorrect definition of "alphanumeric" (normally does not include punctuation)

It may be that snipped parts of the class make use of the otherwise useless private variable, but that would be a WTF too, since it still doesn't look like the flag is associated with other class-scope data as would normally be expected, but only calculated based on transient arguments to class methods. (In this case the variable should be called "wasLastArgumentTo_IsAlphaNumericOnly_AlphaNumeric", which may or may not make any sense for this application...)

Hmm, not quite a 1:1 line-to-WTF ratio, but a good try nonetheless.

How about
5) Pointless object-instantiation.
(Q. Why didn't you use a static method?
A. I didn't really understand that chapter in the book.)

foxyshadis · 2005-10-07 Reply Admin

The string of ten people in a row answering the same dumb question at the beginning was arguably almost as good as the real wtf. But nothing compares to the punchline of reading down and hitting that damn exception. Awesome.

2005-10-07 Reply Admin

evnafets:
My guess is that originally instead of throwing the exception, it set the variable to false.

Then they either changed their mind, or someone came along afterwards to change the code.
Either way it is still a very good example of WTF?

Maybe someone didn't like the "break" keyword? Don't tell me you can't picture some clueless manager looking at the code going "What does that 'break' mean?! Are you breaking the code on purpose?! We can't have that!"

2005-10-07 Reply Admin

Anonymous:
Looks like a good opportunity to use a regexp... All that looping, what a waste...

What do you think the Pattern and Matcher class do? Thats right: looping. So I say your statement is pretty crap. Unless you meant waste of space.

2005-10-07 Reply Admin

And a regexp doesn't loop over all characters?

2005-10-07 Reply Admin

Cyresse:

So, I hereby issue a challenge...

I'll make up a random required syntax...

var = { subvar1 = value1 subvar2 = value2 subvar3 = { i, i2, i3, i4 } } \n

Now, who's got the most efficient validation for my syntax?
I would be keen to see efficient, simple, non regex based solutions to this, I really would.

But yes, there are some ugly regexes out there, so just use a regex engine that allows commenting within a regex pattern.

Ehm, well, technically speaking you won't be able to parse the expression with a regular expression, since it involves nested constructions. Ok, probably you're a Perl cowboy and you use (R) or whatever the shorthand for a "recursive regexp", an oxymoron if every I heard one, is nowadays, but I'm not sure it would help you here.

Instead, you could take a look at the LL(1) (context-free) parser system included in Boost, called Spirit. In just a few rules, you can write down a validating parser that will also build a datastructure perform actions while reading the string. See Spirit Quick start.

2005-10-07 Reply Admin

Someshow I have this image in my head of Eek the cat coding this.

Maybe it's the title ....

2005-10-07 Reply Admin

I love the fact that it operates on a member variable and is thus totally thread-unsafe. So, if any other string you happen to be testing at the same time is not alphanumeric, then this one won't be either!

Alexis de Torquemada · 2005-10-07 Reply Admin

Anonymous:
Ehm, well, technically speaking you won't be able to parse the expression with a regular expression, since it involves nested constructions. Ok, probably you're a Perl cowboy and you use (R) or whatever the shorthand for a "recursive regexp", an oxymoron if every I heard one, is nowadays, but I'm not sure it would help you here.

Instead, you could take a look at the LL(1) (context-free) parser system included in Boost, called Spirit. In just a few rules, you can write down a validating parser that will also build a datastructure perform actions while reading the string. See Spirit Quick start.

Perl regexes are Turing complete. As for Spirit, is has the @&#?/! habit of trying to initialize all of your closure variables with boost::spirit::nil_t. WTF?

johnl · 2005-10-07 Reply Admin

On regular expressions:

Revision: (?<file_revision>\S+) View: (?<view_name>.+) Branch Revision: (?<branch_revision>\S+).\nAuthor: (?<author_name>.) Date: (?<date_string>.) \w+\r\n(?<change_comment>.*)

(all on one line)

This is an expression I use (not by choice!) in CruiseControl to parse a Starteam log. Still doesn't work that effectively, but I really don't feel like trudging through the whole thing again.

As long as a regex is along the lines of [\w.*] (IIRC, any number of word characters) then it's not too much hassle, as long as you have a syntax guide for that language to hand (since some languages have slightly different syntax for regex). It's when you want to do something really really funky that things start to go wrong.

johnl · 2005-10-07 Reply Admin

Weird. For some reason it cut off the last part of that regex.

dubwai · 2005-10-07 Reply Admin

Actually it's possible for this method to return false for a non-empty String. It's not thread-safe, which adds to the list of problems with this code (obviously, this may not matter.)

masklinn · 2005-10-07 Reply Admin

Anonymous:

A pythonic solution was presented, here's a vaguely improved version.

import re
ALPHA_REGEX = re.compile(<font face="Courier New">'^[-\w#\. ]+$')</font>
def isAlphaNumericOnly(value):
  '''
  @param value: value to check for alphanumericness
  @returns: True if value is numeric, False otherwise
  '''
  return bool(ALPHA_REGEX.match(value))

It should be noted that if the test was for true alphanumericness (with none of this -#whatever crap), the python solution would become

def isAlphaNumericOnly(value): return value.isalnum()

(nothing like a useless function call to make me feel real good

John Smallberries · 2005-10-07 Reply Admin

Anonymous:
Anonymous:
Looks like a good opportunity to use a regexp... All that looping, what a waste...

What do you think the Pattern and Matcher class do? Thats right: looping. So I say your statement is pretty crap. Unless you meant waste of space.

Wow, what clear thinking.
Obviously it's better to reimplement the looping logic in already in the regex library. Since we're soooo clever, we'll do a much better job. And we get to maintain all that shiny new code too.

emptyset · 2005-10-07 Reply Admin

kipthegreat:

Because regular expressions are known for being so easy to read and understand.... :)

"regexes are hard to understand."
"but, there are many perl programmers."
"a contradiction."

emptyset · 2005-10-07 Reply Admin

Anonymous:
And a regexp doesn't loop over all characters?

it knows prior to the input which substrings it needs to find and outputs them.

Alexis de Torquemada · 2005-10-07 Reply Admin

emptyset:
kipthegreat:

Because regular expressions are known for being so easy to read and understand.... :)

"regexes are hard to understand."
"but, there are many perl programmers."
"a contradiction."

<devils-advocate>
"Perl is a write-only language."
"Oh, I see. No contradiction after all."
</devils-advocate>

Alexis de Torquemada · 2005-10-07 Reply Admin

emptyset:
kipthegreat:

Because regular expressions are known for being so easy to read and understand.... :)

"regexes are hard to understand."
"but, there are many perl programmers."
"a contradiction."

Sorry for not using the preview.

<devils-advocate>
"Perl is a write-only language."
"Oh, I see. No contradiction after all."
</devils-advocate>

Alexis de Torquemada · 2005-10-07 Reply Admin

emptyset:
Anonymous:
And a regexp doesn't loop over all characters?

it knows prior to the input which substrings it needs to find and outputs them.

WTF?

Alexis de Torquemada · 2005-10-07 Reply Admin

John Smallberries:
Anonymous:
Anonymous:
Looks like a good opportunity to use a regexp... All that looping, what a waste...

What do you think the Pattern and Matcher class do? Thats right: looping. So I say your statement is pretty crap. Unless you meant waste of space.

Wow, what clear thinking.
Obviously it's better to reimplement the looping logic in already in the regex library. Since we're soooo clever, we'll do a much better job. And we get to maintain all that shiny new code too.

What is a straw-man argument?

All that Kevin said is that the regex engine also loops over the input, which you haven't disputed.

Mung Kee · 2005-10-07 Reply Admin

I have never seen anyone complement Alex on his witty titles. Consider this one. Sometimes they're even better than the code.

Yeah, I can see how, in some scenarios, it would sting a little bit to ask if(isAlphaNumericOnly()). <o:p></o:p>

2005-10-07 Reply Admin

So this is alphanumeric????
---### ...........

Alexis de Torquemada · 2005-10-07 Reply Admin

Anonymous:

How about
5) Pointless object-instantiation.
(Q. Why didn't you use a static method?
A. I didn't really understand that chapter in the book.)

I think the author just took James "free functions are evil" Gosling one step further and decided to use only instance methods.

OneFactor · 2005-10-07 Reply Admin

emptyset:

kipthegreat:

Because regular expressions are known for being so easy to read and understand.... :)

"regexes are hard to understand."
"but, there are many perl programmers."
"a contradiction."

That is almost a Haiku, maybe we can change it a little:

Few can read regex
But many program in perl
a contradiction? [:O]

Though my favorite Haiku is:

Keyboard not attached
Press F1 to continue
Where's my USB? [:@]

Mung Kee · 2005-10-07 Reply Admin

Alexis de Torquemada:
Anonymous:

How about
5) Pointless object-instantiation.
(Q. Why didn't you use a static method?
A. I didn't really understand that chapter in the book.)

I think the author just took James "free functions are evil" Gosling one step further and decided to use only instance methods.

If you strictly believe in OOP principals, you have to say that they ARE evil. They can't be inherited and they aren't polymorphic. Fortunately, most people aren't as uptight about it, but I have run into a couple over the years. They're usually opinionated a$$es in everyday life too.

OneFactor · 2005-10-07 Reply Admin

Mung Kee:
I have never seen anyone complement Alex on his witty titles. Consider this one. Sometimes they're even better than the code.

Yeah, I can see how, in some scenarios, it would sting a little bit to ask if(isAlphaNumericOnly()). <?xml:namespace prefix = o /><o:p></o:p>

Yeah his titles are pretty good. Though it must be hard to top the WTF content like the for-switch paradigm.

Satanicpuppy · 2005-10-07 Reply Admin

John Smallberries:
Ytram:
John Boysenberries:
Hey Oliver!
Didja see this? Flame on!

I... don't get it. Should I be scared? Will there be an actual fire, or is "flame" being used as a euphemism of some sort?

I would like to qualify my statement as pertaining only to looping through strings versus regex verification, and not to today's WTF as a whole. I would also like to further state that early optimization is bad.

Oliver and I were just having a discussion on this very topic (parsing/looping vs. regex), and I thought I would goad him into action. The gist is that regex usually simplifies the code so much that the potential performance penalty is well worth it.

Flame as in...well...flame.

I personally don't have much of a preference when it comes to reges over parsing, but I deployed this application once, with an nice elegant regex doing all the validation work for a critical value that was being used over and over...Well the guy who was hired to maintain it decided he didn't like it, and he "improved" it, in a small but very significant way, which (eventually) resulted in a very large data snafu.

They called me in extremely pissed off, and claimed I'd delivered a broken application. I did a line-by-line comparison and found that almost 40% of the code had been changed, and refused to take any responsiblity for the problems.

It went on and on. Eventually we compromised on hiring a third party auditor, who went through the code, agreed with me, and settled the whole thing. I made dick out of it though, and it ate up an insane amount of time.

Anyway, one of the worst errors came from that damn regex "fix", so I decided to stay away from them in the future so when someone hires a mentally handicapped monkey-spawn to support it, they don't break the important bits. Either that, or I don't give them the source code, depending on the language.

2005-10-07 Reply Admin

evnafets:
>I'm pretty sure this is Java code.
>If so, regular expressions are a pretty recent (version 5) addition to the language

Actually java.util.regex package was added with Java1.4. (ie the version before version 5)
That still doesn't excuse the programmer.

There still is a fair amount of Java1.3 out there, so the concept of looping instead of using a regex is definitely not a WTF. The big WTF here is the use of an exception for a non-exceptional condition.

emptyset · 2005-10-07 Reply Admin

Alexis de Torquemada:
emptyset:
Anonymous:
And a regexp doesn't loop over all characters?

it knows prior to the input which substrings it needs to find and outputs them.

WTF?

regexes avoid looping through a string by clever "caching" of the expected output before the input string is read. in fact, the input string isn't 'read' either, not character by character, because that would also be a loop. regexes avoid all this with the "caching" of the all the possible substrings it's going to find. a number of random characters are sampled uniformly from a string, (some constant number of samplings based off the length of the string), and matches with the possible substrings are surmised from these sampled characters. in this way, regexes provide O(1) substring matching.

John Smallberries · 2005-10-07 Reply Admin

Alexis de Torquemada:

All that Kevin said is that the regex engine also loops over the input, which you haven't disputed.

Oh, please, don't be obtuse.

What program doesn't loop? The poster implied that since both techniques involved looping, that doing it yourself is Ok. My response was in the vein of "don't reinvent the wheel".

2005-10-07 Reply Admin

Anonymous:
Looks like a good opportunity to use a regexp... All that looping, what a waste...

Depends. Regex support was added to Java 1.4, and there is a fair amount of Java 1.3 out there (WebLogic 7 for instance). However, assuming that a regex engine is available the looping might still be better. Regex engines do have a fair amount of overhead. If this is called inside a tight loop, I'd probably roll my own too. However, if this is only called occassionally, then regex would definitely be an improvement.

Here is my shot at a better method:

public class FieldFormat {

// snip
public static boolean isAlphaNumericOnly(String value) {

 if (null == value) {
 throw new IllegalArgumentException("null value passed to FieldFormat.isAlphaNumeric");

 }
 else if (value.length() == 0) {

 return false;
 }
char[] character = value.toCharArray();

 for (int i = 0; i < character.length; i++) {

 if (!( Character.isLetter(character[i])

 || Character.isDigit(character[i])

 || Character.isSpaceChar(character[i])

 || character[i] == '.'
 || character[i] == '#'
 || character[i] == '-'))) {
 return false;
 }
 }
 return true;

}

JohnO · 2005-10-07 Reply Admin

emptyset:
Alexis de Torquemada:
emptyset:
Anonymous:
And a regexp doesn't loop over all characters?

it knows prior to the input which substrings it needs to find and outputs them.

WTF?

regexes avoid looping through a string by clever "caching" of the expected output before the input string is read. in fact, the input string isn't 'read' either, not character by character, because that would also be a loop. regexes avoid all this with the "caching" of the all the possible substrings it's going to find. a number of random characters are sampled uniformly from a string, (some constant number of samplings based off the length of the string), and matches with the possible substrings are surmised from these sampled characters. in this way, regexes provide O(1) substring matching.

There is no way a general purpose regex engine can provide O(1) performance for random set of inputs.

endergt · 2005-10-07 Reply Admin

[\w.*] (IIRC, any number of word characters)

"any number of word characters" would be \w*

I don't think that expression is legal... the [] indicate a character set, and I don't think .* can be in the set.

if it were \w.* without the [] then it would mean "a single word character followed by 0 or more of any character"

whojoedaddy · 2005-10-07 Reply Admin

Wow, this is the stupidest piece of code I've seen yet on tdwtf. OMG some people just abuse exceptions to death.

emptyset · 2005-10-07 Reply Admin

JohnO:
There is no way a general purpose regex engine can provide O(1) performance for random set of inputs.

modern regexes have come a long way. they used to loop through the string and parse it, so there was no clear advantage to using them. regexes now only take constant time by looking at only a few characters in the string before returning a result.

Quinnum · 2005-10-07 Reply Admin

emptyset:
JohnO:
There is no way a general purpose regex engine can provide O(1) performance for random set of inputs.

modern regexes have come a long way. they used to loop through the string and parse it, so there was no clear advantage to using them. regexes now only take constant time by looking at only a few characters in the string before returning a result.

Somehow that doesn't sound correct. In fact it kind sounds like total bs.

You are going to have to check the whole string at some point to determine if it satisfies the regex or not. Who's to say that last char of a 1000 character string is going to fail the whole expression?? Hence you will have ended up checking 999 before getting to that point, and that's more than 'a few characters in the string'.

brazzy · 2005-10-08 Reply Admin

Satanicpuppy:

I personally don't have much of a preference when it comes to reges over parsing, but I deployed this application once, with an nice elegant regex doing all the validation work for a critical value that was being used over and over...Well the guy who was hired to maintain it decided he didn't like it, and he "improved" it, in a small but very significant way, which (eventually) resulted in a very large data snafu.

You know, if that was all it might really be considered partially your fault. The problem is that regexps are as close to being write-only as anything. They're extremely flexible and extremely terse, a combination that spells disaster for maintainability unless there are very, very detailed comments. You say the regexp was "elegant" - I say that's impossible because IMO the most important property of elegant code is that it's easy to understand, and regexps of any complexity are NEVER easy to understand.

I'd say that the best way to parse data is to design its structure to be simple so that you don't NEED any complex parsing logic.

2005-10-08 Reply Admin

Don't use a regex, a fsm, or whatever. Use an existing method in a quality java library such as Apache Commons. End of Story.

As for how they did it in StringUtils.java: public static boolean isAlpha(String str) { if (str == null) { return false; } int sz = str.length(); for (int i = 0; i < sz; i++) { if (Character.isLetter(str.charAt(i)) == false) { return false; } } return true; }

2005-10-08 Reply Admin

and the alphanumeric version (above was alpha only), again from Apache Commons StringUtils:

   public static boolean isAlphanumeric(String str) {
        if (str == null) {
            return false;
        }
        int sz = str.length();
        for (int i = 0; i < sz; i++) {
            if (Character.isLetterOrDigit(str.charAt(i)) == false) {
                return false;
            }
        }
        return true;
    }

2005-10-08 Reply Admin

I've never seen a site with worse posts formatting rules...anyway.

Alexis de Torquemada · 2005-10-08 Reply Admin

Satanicpuppy:
Anyway, one of the worst errors came from that damn regex "fix", so I decided to stay away from them in the future so when someone hires a mentally handicapped monkey-spawn to support it, they don't break the important bits.

You seem to be a very patient person. If said monkey-spawn would break the important bits of my code, I would surely break his important bits.

Alexis de Torquemada · 2005-10-08 Reply Admin

emptyset:
regexes avoid looping through a string by clever "caching" of the expected output before the input string is read. in fact, the input string isn't 'read' either, not character by character, because that would also be a loop. regexes avoid all this with the "caching" of the all the possible substrings it's going to find. a number of random characters are sampled uniformly from a string, (some constant number of samplings based off the length of the string), and matches with the possible substrings are surmised from these sampled characters. in this way, regexes provide O(1) substring matching.

A small set of regexes can process any input in constant time, e.g. "^foo" or "bar$" where ^ and $ are the start and the end of the string, rsp. (this assumes that the string length is known). Most regexes have to look at most of the string (that is O(n) characters), though. In fact, it is easy to prove that a regex that behaves like the posted function cannot safely accept a string before it has examined all of its characters, which makes it O(n) by necessity.

By the way: "some constant number of samplings based off the length of the string" sounds like an oxymoron to me. What I think you tried to express is that if I am searching all occurrences of "Abba" in an input string, and the fourth character in the string is "y", then I don't even need to look at characters 1 through 3. This solves an entirely different problem, though, and even then it doesn't change the time complexity, only the proportionality constant.

Richard Nixon · 2005-10-08 Reply Admin

Alexis de Torquemada:
Satanicpuppy:
Anyway, one of the worst errors came from that damn regex "fix", so I decided to stay away from them in the future so when someone hires a mentally handicapped monkey-spawn to support it, they don't break the important bits.

You seem to be a very patient person. If said monkey-spawn would break the important bits of my code, I would surely break his important bits.

Sure you would. You resort to physical violence often then? When was the last time you punched someone in the mouth? When was the last time you pulled a knife on someone?

You sound quite sad and pathetic when you talk on the Internet as if you're some big tough guy who hurts people; and no one is buying it either. You're a limp-wristed sissy.

2005-10-08 Reply Admin

Alexis de Torquemada:

Perl regexes are Turing complete. As for Spirit, is has the @&#?/! habit of trying to initialize all of your closure variables with boost::spirit::nil_t. WTF?

The definition of "regular expression" is and always has been a grammar for a regular language, which can be accepted by an FSM. And Perl has some, eh, interesting extensions to the concept, but that doesn't make them regular.

But I doubt Perl "regexps" are Turing complete, if that means "accept any language a turing machine can accept". I mean, are there goto-labels and look-ahead/move-back operations in Perl's matcher???

In that case, I would love to see your Perl regexp for a valid C (typechecking and everything). Or, for starters, I would like to see a Perl regexp for checking multiplications, so that "21*3=63" is accepted, but "21*3=36" isn't. And that for all expressions, of course. I'm really curious if that could be done.

If you can manage, I'm sure that will be a very, very big WTF in itself.

Alexis de Torquemada · 2005-10-08 Reply Admin

Mung Kee:
If you strictly believe in OOP principals, you have to say that they ARE evil. They can't be inherited and they aren't polymorphic.

Depends. Not in "normal" C++, but an extension to C++ I co-developed allowed polymorphic behavior for free functions, and, more notably, LISP even allows free functions that are polymorphic in more than one parameter ("multimethods").

It Never Hurts To Ask

Leave a comment on “It Never Hurts To Ask”