- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Incidentally, in the sample data, it could occur in any sort of mixture from
var1 = value1
to
var1 = { subvar1 = { subvar1a = { i, i1, i2 } subvar1b = { } } subvar2 = 5 }
This is a syntax I've had to parse before, and I had a choice between a parsing library, a bunch of string methods, or a regex. I tried all three.
Admin
I think this is called "test-driven development" :p
Admin
lol!
Admin
Admin
Y'know, if the isAlphaNumeric member is actually referenced in the clipped-out code, or if there is an accessor for it, this is slightly less of a WTF. Note that isAlphaNumeric properly records the status of the last validation performed (except in the null string case; this is just a bug). Setting it in the constructor is still goofy, as is raising an exception when it's false. But one can at least imagine what the member might be for. OK, the name is bad; should be "wasAlphaNumeric". Also, why would you want the object to remember this?
Never mind. WTF.
Admin
How about
5) Pointless object-instantiation.
(Q. Why didn't you use a static method?
A. I didn't really understand that chapter in the book.)
Admin
The string of ten people in a row answering the same dumb question at the beginning was arguably almost as good as the real wtf. But nothing compares to the punchline of reading down and hitting that damn exception. Awesome.
Admin
Maybe someone didn't like the "break" keyword? Don't tell me you can't picture some clueless manager looking at the code going "What does that 'break' mean?! Are you breaking the code on purpose?! We can't have that!"
Admin
What do you think the Pattern and Matcher class do? Thats right: looping. So I say your statement is pretty crap. Unless you meant waste of space.
Admin
And a regexp doesn't loop over all characters?
Admin
Instead, you could take a look at the LL(1) (context-free) parser system included in Boost, called Spirit. In just a few rules, you can write down a validating parser that will also build a datastructure perform actions while reading the string. See Spirit Quick start.
Admin
Someshow I have this image in my head of Eek the cat coding this.
Maybe it's the title ....
Admin
I love the fact that it operates on a member variable and is thus totally thread-unsafe. So, if any other string you happen to be testing at the same time is not alphanumeric, then this one won't be either!
Admin
Perl regexes are Turing complete. As for Spirit, is has the @&#?/! habit of trying to initialize all of your closure variables with boost::spirit::nil_t. WTF?
Admin
On regular expressions:
Revision: (?<file_revision>\S+) View: (?<view_name>.+) Branch Revision: (?<branch_revision>\S+).\nAuthor: (?<author_name>.) Date: (?<date_string>.) \w+\r\n(?<change_comment>.*)
(all on one line)
This is an expression I use (not by choice!) in CruiseControl to parse a Starteam log. Still doesn't work that effectively, but I really don't feel like trudging through the whole thing again.
As long as a regex is along the lines of [\w.*] (IIRC, any number of word characters) then it's not too much hassle, as long as you have a syntax guide for that language to hand (since some languages have slightly different syntax for regex). It's when you want to do something really really funky that things start to go wrong.
Admin
Weird. For some reason it cut off the last part of that regex.
Admin
Actually it's possible for this method to return false for a non-empty String. It's not thread-safe, which adds to the list of problems with this code (obviously, this may not matter.)
Admin
It should be noted that if the test was for true alphanumericness (with none of this -#whatever crap), the python solution would become
(nothing like a useless function call to make me feel real good
Admin
Wow, what clear thinking.
Obviously it's better to reimplement the looping logic in already in the regex library. Since we're soooo clever, we'll do a much better job. And we get to maintain all that shiny new code too.
Admin
<FONT face="Courier New" size=2>"regexes are hard to understand."
</FONT><FONT face="Courier New"><FONT size=2><FONT></FONT></FONT></FONT>"but, there are many perl programmers."
"a contradiction."
</FONT><FONT face="Courier New" size=2>
Admin
<FONT face="Courier New" size=2>it knows prior to the input which substrings it needs to find and outputs them.</FONT>
Admin
<devils-advocate>
"Perl is a write-only language."
"Oh, I see. No contradiction after all."
</devils-advocate>
Admin
Sorry for not using the preview.
<devils-advocate>
"Perl is a write-only language."
"Oh, I see. No contradiction after all."
</devils-advocate>
Admin
WTF?
Admin
What is a straw-man argument?
All that Kevin said is that the regex engine also loops over the input, which you haven't disputed.
Admin
I have never seen anyone complement Alex on his witty titles. Consider this one. Sometimes they're even better than the code.
Yeah, I can see how, in some scenarios, it would sting a little bit to ask if(isAlphaNumericOnly()). <o:p></o:p>
Admin
So this is alphanumeric????
<font style="color: rgb(0, 0, 255);" size="4">---### ...........</font>
Admin
I think the author just took James "free functions are evil" Gosling one step further and decided to use only instance methods.
Admin
That is almost a Haiku, maybe we can change it a little:
Few can read regex
But many program in perl
a contradiction? [:O]
Though my favorite Haiku is:
Keyboard not attached
Press F1 to continue
Where's my USB? [:@]
Admin
If you strictly believe in OOP principals, you have to say that they ARE evil. They can't be inherited and they aren't polymorphic. Fortunately, most people aren't as uptight about it, but I have run into a couple over the years. They're usually opinionated a$$es in everyday life too.
Admin
Yeah his titles are pretty good. Though it must be hard to top the WTF content like the for-switch paradigm.
Admin
I personally don't have much of a preference when it comes to reges over parsing, but I deployed this application once, with an nice elegant regex doing all the validation work for a critical value that was being used over and over...Well the guy who was hired to maintain it decided he didn't like it, and he "improved" it, in a small but very significant way, which (eventually) resulted in a very large data snafu.
They called me in extremely pissed off, and claimed I'd delivered a broken application. I did a line-by-line comparison and found that almost 40% of the code had been changed, and refused to take any responsiblity for the problems.
It went on and on. Eventually we compromised on hiring a third party auditor, who went through the code, agreed with me, and settled the whole thing. I made dick out of it though, and it ate up an insane amount of time.
Anyway, one of the worst errors came from that damn regex "fix", so I decided to stay away from them in the future so when someone hires a mentally handicapped monkey-spawn to support it, they don't break the important bits. Either that, or I don't give them the source code, depending on the language.
Admin
There still is a fair amount of Java1.3 out there, so the concept of looping instead of using a regex is definitely not a WTF. The big WTF here is the use of an exception for a non-exceptional condition.
Admin
<FONT face="Courier New" size=2>regexes avoid looping through a string by clever "caching" of the expected output before the input string is read. in fact, the input string isn't 'read' either, not character by character, because that would also be a loop. regexes avoid all this with the "caching" of the all the possible substrings it's going to find. a number of random characters are sampled uniformly from a string, (some constant number of samplings based off the length of the string), and matches with the possible substrings are surmised from these sampled characters. in this way, regexes provide O(1) substring matching.</FONT>
Admin
Oh, please, don't be obtuse.
What program doesn't loop? The poster implied that since both techniques involved looping, that doing it yourself is Ok. My response was in the vein of "don't reinvent the wheel".
Admin
Depends. Regex support was added to Java 1.4, and there is a fair amount of Java 1.3 out there (WebLogic 7 for instance). However, assuming that a regex engine is available the looping might still be better. Regex engines do have a fair amount of overhead. If this is called inside a tight loop, I'd probably roll my own too. However, if this is only called occassionally, then regex would definitely be an improvement.
Here is my shot at a better method:
Admin
There is no way a general purpose regex engine can provide O(1) performance for random set of inputs.
Admin
"any number of word characters" would be \w*
I don't think that expression is legal... the [] indicate a character set, and I don't think .* can be in the set.
if it were \w.* without the [] then it would mean "a single word character followed by 0 or more of any character"
Admin
Wow, this is the stupidest piece of code I've seen yet on tdwtf. OMG some people just abuse exceptions to death.
Admin
<FONT face="Courier New" size=2>modern regexes have come a long way. they used to loop through the string and parse it, so there was no clear advantage to using them. regexes now only take constant time by looking at only a few characters in the string before returning a result.</FONT>
Admin
Somehow that doesn't sound correct. In fact it kind sounds like total bs.
You are going to have to check the whole string at some point to determine if it satisfies the regex or not. Who's to say that last char of a 1000 character string is going to fail the whole expression?? Hence you will have ended up checking 999 before getting to that point, and that's more than 'a few characters in the string'.
Admin
You know, if that was all it might really be considered partially your fault. The problem is that regexps are as close to being write-only as anything. They're extremely flexible and extremely terse, a combination that spells disaster for maintainability unless there are very, very detailed comments. You say the regexp was "elegant" - I say that's impossible because IMO the most important property of elegant code is that it's easy to understand, and regexps of any complexity are NEVER easy to understand.
I'd say that the best way to parse data is to design its structure to be simple so that you don't NEED any complex parsing logic.
Admin
Don't use a regex, a fsm, or whatever. Use an existing method in a quality java library such as Apache Commons. End of Story.
As for how they did it in StringUtils.java: public static boolean isAlpha(String str) { if (str == null) { return false; } int sz = str.length(); for (int i = 0; i < sz; i++) { if (Character.isLetter(str.charAt(i)) == false) { return false; } } return true; }
Admin
and the alphanumeric version (above was alpha only), again from Apache Commons StringUtils:
Admin
I've never seen a site with worse posts formatting rules...anyway.
Admin
You seem to be a very patient person. If said monkey-spawn would break the important bits of my code, I would surely break his important bits.
Admin
A small set of regexes can process any input in constant time, e.g. "^foo" or "bar$" where ^ and $ are the start and the end of the string, rsp. (this assumes that the string length is known). Most regexes have to look at most of the string (that is O(n) characters), though. In fact, it is easy to prove that a regex that behaves like the posted function cannot safely accept a string before it has examined all of its characters, which makes it O(n) by necessity.
By the way: "some constant number of samplings based off the length of the string" sounds like an oxymoron to me. What I think you tried to express is that if I am searching all occurrences of "Abba" in an input string, and the fourth character in the string is "y", then I don't even need to look at characters 1 through 3. This solves an entirely different problem, though, and even then it doesn't change the time complexity, only the proportionality constant.
Admin
Sure you would. You resort to physical violence often then? When was the last time you punched someone in the mouth? When was the last time you pulled a knife on someone?
You sound quite sad and pathetic when you talk on the Internet as if you're some big tough guy who hurts people; and no one is buying it either. You're a limp-wristed sissy.
Admin
The definition of "regular expression" is and always has been a grammar for a regular language, which can be accepted by an FSM. And Perl has some, eh, interesting extensions to the concept, but that doesn't make them regular.
But I doubt Perl "regexps" are Turing complete, if that means "accept any language a turing machine can accept". I mean, are there goto-labels and look-ahead/move-back operations in Perl's matcher???
In that case, I would love to see your Perl regexp for a valid C (typechecking and everything). Or, for starters, I would like to see a Perl regexp for checking multiplications, so that "21*3=63" is accepted, but "21*3=36" isn't. And that for all expressions, of course. I'm really curious if that could be done.
If you can manage, I'm sure that will be a very, very big WTF in itself.
Admin
Depends. Not in "normal" C++, but an extension to C++ I co-developed allowed polymorphic behavior for free functions, and, more notably, LISP even allows free functions that are polymorphic in more than one parameter ("multimethods").