- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Do you really think that having a degree means you will be a good programmer? Is this borne out by your experience?
Admin
You are quite right, this is significantly better than quadratic complexity. Since the number of spaces is reduced by 1/2 each iteration, this function is O(Log n).
Admin
If simple regex doesn't look clear to you, then you don't use regex enough. If you don't use regex enough, you are missing out. What about stripping html tags or finding a date in a string? I'd love to see a loop (or more likely many nested loops) that was nearly as clean as regex for such tasks.
Admin
Admin
This is super easy in COBOL:
MOVE STRING_TO_BE_CLEANED TO STRING_WITH_EXTRAS_REMOVED REPLACING EXTRA SPACES WITH " ".
As long as none of your strings ever go over 80 characters, you're golden!
Admin
Except you're only counting iterations; it's also copying the string each call, which is O(n). That gives O(nlog n) total. (Really I'd say it's O(nlog s), where 's' is the longest sequence of spaces in the string. That provides enough extra information I think it's useful to make the distinction.)
But if we want to compare to the regex solution* this also doesn't take into account the efficiency of whatever the regex library is doing. There's some effort it's putting into compiling the regex, etc., and matching may also be less efficient. Basically, if you want to argue efficiency, I can construct strings that will be faster with each method, so without knowing the characteristics of the strings in question, it's hard to know which would be faster.
nyway, all of this is missing the point. I'm one of the earlier people to pull out the "now you have two problems" quote, but even I think that if you have problems reading something like "[ ][ ]+" or " {2,}" or whatever, and prefer the loop based solution that spurred this sequence of comments, then... I strongly disagree.
Admin
amischiefr: "Why almost? I think they should. You can't practice Law without a degree <snip/>. Why not institute a similar practice for programming?"
Brian: "Do you really think that having a degree means you will be a good programmer?"
Me: "No, he doesn't. Try again. You are allowed to move your lips and make 'choo choo' noises if it helps this time."
amischiefr: "Yes I know, there are some talented guys out there that don't have a degree, yada yada, some exceptions to the rule, yada yada. <snip/> And we shouldn't let just anybody write code."
Brian: "Is this borne out by your experience?"
Me: "Have you ever been experienced?"
I think amischiefr's point is not that a CompSci degree (or equivalent) sprinkles you with magic pixie dust and "means" that you will be a good programmer. That would be silly. I went to college with a girl who majored in psychology, and when I asked her why, she said "I get to work with rats a lot. I'm not comfortable with people. I prefer rats."
She now has a well-paid job in the City of London. Go figure.
However, as I think amischiefr would agree, you at least have some sort of comparator when you're choosing between CompSci (etc) graduates. 20%, you fail. Fire the useless bastards. 80%, you more or less succeed.
Basement-diving for PHP trolls tends to reverse the numbers; which is not good for anybody.
If I can make an honest plea here, the key is to make the qualification a permit rather than a restriction. I've dealt with enough lawyers to realise that their qualifications essentially give them a pass into the Magic Circle. This may also apply to other, longstanding, professions.
It doesn't necessarily have to apply to a "Computer Science" profession. All such a profession would need to do is to require a minimum bar for entry, and the (TDWTF-sponsored, no doubt) derision of abject failures who are unsuited to the job.
Admin
Admin
Most people don't comprehend math. Even fewer use regular expressions.
Admin
Well, you know what they say: if you outlaw programming, only outlaws will program. Think about it.
Admin
I think it's O(n * log n), but not because of the string copy. The n comes from indexOf().
A regex for simple string replacement, on the other hand, is probably closer to O(n + C), where C is regex overhead, or just O(n).
Of course, this is probably being run on relatively short strings, so the value of C might actually be important.
Admin
Also, I kind of like "[ ]+" better than " +", if only for clarity.
Admin
You know what they say: if you outlaw outlaws, only outlaws will outlaw outlaws outlawing outlaws.
Left recursion, please.
Admin
When you write a regex, you're programming a finite state machine. You should have a pretty good idea of how it will work. Read 'Mastering Regular Expressions' - it's not just for Perl wranglers. It certainly won't go back to the start of the string for each replace, unlike your indexOf().
Sorry, you may consider your code clearer (I don't). It's definitely slower. No 'may not' about it.
The regex examples that almost declaratively say "replace every instance of two-or-more--contiguous-spaces with a single space' seem best to me.
Admin
Make it
... and you're golden.
Admin
[quote user="luis.espinal"Goto? Dude, what the...? Of all the things considering that DO/WHILE loops have existed since BASIC dialects evolved out of the GW-BASIC/PICK-BASIC/TARD-BASIC decades ago? Goto? Worst fix ever. I hope that was a joke.
As for the code snippet featured in the article, it is not surprising at all. It happens a lot, not only on VB, but on Java and C#. It almost makes me want to see programming licenses being mandatory to do perform any programming job or rigorous across-the-board examination exams mandatory for graduating with a CS or MIS degree... almost.
These type of WTFs are not just due to not knowing the language, but they display a fundamental flaw on the way of thinking and problem-solving, completely inexcusable.
-- second try --[/quote]
You must be new here.
Admin
The staggering amount of ignorance in this rant is quite refreshing. I am now really considering creating a new WTF site with ignorant rants of this quality. You sir, could become the next WAFIRAh of the day (what a f* ignorant ranting ah).
Admin
Thanks You! Regex is like wiping your butt with a belt sander! Right, I don't like it! The only folks that use this are the guy's that think the more complex it looks the cooler he is!
Admin
Admin
No kidding, Alex should hire new commenters.
At least ones that know acsi from unicode.
Admin
[/quote] In that case: regexReplace(itemDesc, " +", " "); [/quote]
Won't this find only one group of consecutive spaces rather than up-to 24 individual spaces like the original?
Admin
Result:
O(n), unbeatable by all your sissi-languages ;-)
Admin
Well, change is change. Right?
Admin
Ignorance.I do not think that word means what you think it means.
Rant. I do not think that word means what you think it means.
It's a point of view, you cretin.
It's possible to disagree, or to agree, with a point of view. (I suppose it's also possible to take a point of view to court on the basis of slander or libel.)
It is, however, personally demeaning to attack it on an ad hominem basis, It's also slightly sad that you can't spell out "fucking ignorant ranting asshole" without self-censorship. I'll give you props. Everybody should defend their own family.
Palin, much?
Admin
Is this is one of those cases where the solution is to type 'regex' and then run a rolling pin across the keyboard?
Admin
Admin
Because no-one died or lost his liberty or even a significant amount of money from writing this code. Remember that Don's boss would have to get all the way up to board level before anyone outside the company can sack him. Why should Don be any less secure?
Unless you really meant to say that requiring programmers to have a degree (which may or may not have been bought over the Internet) before they can be employed as a programmer would benefit society.
Admin
Actually, a push-down automaton.
Admin
Of course, what they don't say is that the second problem you have when you think "I know! I'll use a regular expression!" is a contest to see how high people can piss up a wall.
Admin
I suppose that depends on exactly how stringReplace works... I'm actually having a hard time even establishing language the code snippet is in. Regardless, stringReplace returns the replaced string; this suggests to me that it copies the string rather than updates it in place.
That said, either a copy within stringReplace or indexOf will give you the O(n) per-iteration.
(The fact that O(log n) is sub-linear is an indication that can't be the right complexity, at least in general. If you allow characters in the string to vary independently one another, if you algorithm skips any two-character substring of the input, then the same input with those two characters changed to spaces would produce the same result. Thus you need to read at least every other character, and O(log n) time isn't enough to do that.)
This certainly isn't true of PCREs in general, which are more powerful than regular expressions and hence more powerful than FSMs. (Backreferences alone get you a class of languages that's strictly more powerful than what FSMs can accept, and in fact contain languages that aren't even context-free. Hence Joel's Actually, a push-down automaton also can't describe what PCRE libraries do in general.)
So yes, we're back to internals of your regex library. How are non-regular regexs compiled? Are regular regexs compiled into a FSM, or does it use a more general technique?
I really have no idea about these questions, and considering the familiarity I have with parsing and such, I'd argue it's pretty unreasonable to expect someone to.
That's a pretty bold statement.
How long does compilation take? What's the input? If I give Voice of Reason's loop a string without any " " substrings, I virtually guarantee it will take less time than any regexReplace function that does copying, and would be basically the same as any regexReplace that doesn't.
If the input strings are drawn from a distribution where there are very few strings with double spaces, such strings are typically shorter than those without double spaces, etc., your statement could be wrong.
Admin
Inconceivable!
Admin
ACK! Repeat after me -- HTML and XML are not regular languages. You cannot use a regular expression to parse them. It is pure distilled evil to try. It is more evil than your mother in law. Cthulu will eat your soul if you use code like this. </rant>.
Admin
Sorry, "Woosh" describes it better...
Admin
It seems pretty obvious who here knows how to create and read a RegEx and who doesn't / refuses to learn.
decet: Blinded deceit . Look ma no I's!
Admin
TRWDF is 5+ different solutions in regular expressions and no mention to XCKD... http://xkcd.com/208/
Luiz Borges captcha: causa
Admin
Still amazed at how few modern programmers actually grok regular expressions, as illustrated by this post and the subsequent comments which are all subtly wrong or over-engineered.
Admin
I know; linking to an encyclopedia instead of a dictinary for a deff! That's like using nested stringReplace calls instead of a loop to get rid of, oh wait...
Admin
I've tried to look up the article that inspired me on the Thompson NFA algorithm, but unfortunately I've failed. The link from Lambda the Ultimate appears to be broken. It's worth looking at the Haskell site.
I'm not sure that compilation time is much of an issue, since any regexp implementation that I can think of is capable of using precompiled expressions.
Execution time? That might be more important. I'd prefer to have a bounded limit on that, and I'm not sure that non-deterministic backtracking implementations would give me one.
Admin
Furthermore, I'm fucking certain that it is not spelt the way you think it is.
Admin
Admin
Agreed...
Admin
So THAT'S where you find them. I've been looking in the wrong places.
Thank you, thank you, I'll be here all night.
Admin
Yeah...
I work at a newspaper office, and more than once a year we get a report on an ultra-modern 3.5" floppy disk from the county treasurer for a tax report. I watched a well-meaning person spend more than an hour in Quark editing the thing, and asked if I could help, and was handed it...I looked at it long enough to ask for the original text file, fire up TextWrangler, and throw this into find-and-replace (rewritten as a regexp since TW splits the find and replace):
s/\s{2+}/\t/g
TW also has this handy feature named "Zap Gremlins" which proved handy in removing the manual page breaks embedded in the file...yes, the text file was the output for the line printer...
(CAPTCHA: dolor, which seems appropriate for this week so far)
Admin
pshh. The regexes shown here are not clever or unclear. Only someone who hasn't bothered to even try to learn the syntax would claim that. I mean, what's unclear about " +", once you know that "+" means "the preceding, possibly repeated many times"? Seriously, it's about as simple as things get.
That's not to say that all regexes are pretty or understandable. But all of the ones shown here were pretty damn clear.
Admin
Admin
Food for thought. Note that Thomas Lord on the aforementioned Lambda the Ultimate states (along with several other interesting observations):
"The linked article disappoints me a bit because it rehearses the Thompson construction, swell, but only mentions in passing how the non-regular features people want (like back-references) make everything harder."
Now, you can go in the direction of theory, and NP-completeness, and try to figure out a set of decent heuristics. (See the LtU article for details of current research.)
Or you can just be a dummy, like me, and ask "Why would anybody need back-references? Even Jamie Zawinski could only deal with two problems at a time."
Admin
Another tragic example of Code Cancer, showing how even the most innocuous line of bad code can, over time, expand to a malignant tumour.
Fortunately the extraction worked in this case, but not all software is so lucky. Let's all hope we eventually find a cure.
Admin
Why throw in the extra copy?
Admin
Admin
Assuming a C99 compiler, this is the most compact I can make it.