The Daily WTF: Curious Perversions in Information Technology

2009-12-07 Reply Admin

Yes I know, there are some talented guys out there that don't have a degree, yada yada, some exceptions to the rule, yada yada.

Do you really think that having a degree means you will be a good programmer? Is this borne out by your experience?

2009-12-07 Reply Admin

gallier2:
What I thought at first look, but the complexity of his function is not quadratic (Schlemiel is O(n²)). It's less than that because StringReplace replaces more than 1 double white in each loop.

You are quite right, this is significantly better than quadratic complexity. Since the number of spaces is reduced by 1/2 each iteration, this function is O(Log n).

2009-12-07 Reply Admin

If simple regex doesn't look clear to you, then you don't use regex enough. If you don't use regex enough, you are missing out. What about stripping html tags or finding a date in a string? I'd love to see a loop (or more likely many nested loops) that was nearly as clean as regex for such tasks.

codeReign · 2009-12-07 Reply Admin

eViLegion:
MD:
Boring!

And yet not quite boring enough for you to post a comment, thereby suggesting that your life must be even more so.

yeppers

2009-12-07 Reply Admin

This is super easy in COBOL:

MOVE STRING_TO_BE_CLEANED TO STRING_WITH_EXTRAS_REMOVED REPLACING EXTRA SPACES WITH " ".

As long as none of your strings ever go over 80 characters, you're golden!

EvanED · 2009-12-07 Reply Admin

Lego:
gallier2:
What I thought at first look, but the complexity of his function is not quadratic (Schlemiel is O(n²)). It's less than that because StringReplace replaces more than 1 double white in each loop.

You are quite right, this is significantly better than quadratic complexity. Since the number of spaces is reduced by 1/2 each iteration, this function is O(Log n).

Except you're only counting iterations; it's also copying the string each call, which is O(n). That gives O(nlog n) total. (Really I'd say it's O(nlog s), where 's' is the longest sequence of spaces in the string. That provides enough extra information I think it's useful to make the distinction.)

But if we want to compare to the regex solution* this also doesn't take into account the efficiency of whatever the regex library is doing. There's some effort it's putting into compiling the regex, etc., and matching may also be less efficient. Basically, if you want to argue efficiency, I can construct strings that will be faster with each method, so without knowing the characteristics of the strings in question, it's hard to know which would be faster.

nyway, all of this is missing the point. I'm one of the earlier people to pull out the "now you have two problems" quote, but even I think that if you have problems reading something like "[ ][ ]+" or " {2,}" or whatever, and prefer the loop based solution that spurred this sequence of comments, then... I strongly disagree.

2009-12-07 Reply Admin

Brian White:
Yes I know, there are some talented guys out there that don't have a degree, yada yada, some exceptions to the rule, yada yada.

Do you really think that having a degree means you will be a good programmer? Is this borne out by your experience?

I may have to get procedural on your butt:

amischiefr: "Why almost? I think they should. You can't practice Law without a degree <snip/>. Why not institute a similar practice for programming?"

Brian: "Do you really think that having a degree means you will be a good programmer?"

Me: "No, he doesn't. Try again. You are allowed to move your lips and make 'choo choo' noises if it helps this time."

amischiefr: "Yes I know, there are some talented guys out there that don't have a degree, yada yada, some exceptions to the rule, yada yada. <snip/> And we shouldn't let just anybody write code."

Brian: "Is this borne out by your experience?"

Me: "Have you ever been experienced?"

I think amischiefr's point is not that a CompSci degree (or equivalent) sprinkles you with magic pixie dust and "means" that you will be a good programmer. That would be silly. I went to college with a girl who majored in psychology, and when I asked her why, she said "I get to work with rats a lot. I'm not comfortable with people. I prefer rats."

She now has a well-paid job in the City of London. Go figure.

However, as I think amischiefr would agree, you at least have some sort of comparator when you're choosing between CompSci (etc) graduates. 20%, you fail. Fire the useless bastards. 80%, you more or less succeed.

Basement-diving for PHP trolls tends to reverse the numbers; which is not good for anybody.

If I can make an honest plea here, the key is to make the qualification a permit rather than a restriction. I've dealt with enough lawyers to realise that their qualifications essentially give them a pass into the Magic Circle. This may also apply to other, longstanding, professions.

It doesn't necessarily have to apply to a "Computer Science" profession. All such a profession would need to do is to require a minimum bar for entry, and the (TDWTF-sponsored, no doubt) derision of abject failures who are unsuited to the job.

2009-12-07 Reply Admin

                                                          fifty-eighth!

coyo · 2009-12-07 Reply Admin

Most people don't comprehend math. Even fewer use regular expressions.

toth · 2009-12-07 Reply Admin

amischiefr:
luis.espinal:
Steenbergh:
He shouda used this:
RemoveSpaces: strItem = StringReplace(" ", " "); if instr(" ") > 0 then goto RemoveSpaces

We don't need no regular expressions!

Goto? Dude, what the...? Of all the things considering that DO/WHILE loops have existed since BASIC dialects evolved out of the GW-BASIC/PICK-BASIC/TARD-BASIC decades ago? Goto? Worst fix ever. I hope that was a joke.

As for the code snippet featured in the article, it is not surprising at all. It happens a lot, not only on VB, but on Java and C#. It almost makes me want to see programming licenses being mandatory to do perform any programming job or rigorous across-the-board examination exams mandatory for graduating with a CS or MIS degree... almost.

These type of WTFs are not just due to not knowing the language, but they display a fundamental flaw on the way of thinking and problem-solving, completely inexcusable.

-- second try --

Why almost? I think they should. You can't practice Law without a degree, you can't perform operations without a medical degree (legally anyway, which goes back to point 1...). Why not institute a similar practice for programming?

Oh right, because 1/2 of the tards out there are lazy, uneducated fucks that think because they write stupid script in their mother's basement that qualifies them as a programmer.

Yes I know, there are some talented guys out there that don't have a degree, yada yada, some exceptions to the rule, yada yada. And I'm pretty sure that there are people who could be good trial lawyers that can't get into Law school because they don't have the money or can't do well on the LSAT. But guess what? We don't let them practice law! And we shouldn't let just anybody write code.

Just my rant.

Well, you know what they say: if you outlaw programming, only outlaws will program. Think about it.

2009-12-07 Reply Admin

EvanED:
Lego:
gallier2:
What I thought at first look, but the complexity of his function is not quadratic (Schlemiel is O(n²)). It's less than that because StringReplace replaces more than 1 double white in each loop.

You are quite right, this is significantly better than quadratic complexity. Since the number of spaces is reduced by 1/2 each iteration, this function is O(Log n).

Except you're only counting iterations; it's also copying the string each call, which is O(n). That gives O(nlog n) total. (Really I'd say it's O(nlog s), where 's' is the longest sequence of spaces in the string. That provides enough extra information I think it's useful to make the distinction.)

I think it's O(n * log n), but not because of the string copy. The n comes from indexOf().

A regex for simple string replacement, on the other hand, is probably closer to O(n + C), where C is regex overhead, or just O(n).

Of course, this is probably being run on relatively short strings, so the value of C might actually be important.

2009-12-07 Reply Admin

Also, I kind of like "[ ]+" better than " +", if only for clarity.

2009-12-07 Reply Admin

toth:
Well, you know what they say: if you outlaw programming, only outlaws will program. Think about it.

Very profound.

You know what they say: if you outlaw outlaws, only outlaws will outlaw outlaws outlawing outlaws.

Left recursion, please.

2009-12-07 Reply Admin

Oh, I bow to your in-depth understanding of the implementation of your particular RegEx library. (...) Maybe you're right, but, it'll vary widely, and you certainly can't fault code that's clearer and _may not_ be any slower.

When you write a regex, you're programming a finite state machine. You should have a pretty good idea of how it will work. Read 'Mastering Regular Expressions' - it's not just for Perl wranglers. It certainly won't go back to the start of the string for each replace, unlike your indexOf().

Sorry, you may consider your code clearer (I don't). It's definitely slower. No 'may not' about it.

The regex examples that almost declaratively say "replace every instance of two-or-more--contiguous-spaces with a single space' seem best to me.

Maurits · 2009-12-07 Reply Admin

James T.:
Oh I favore my all time favorite wtf:
while(itemDesc != stringReplace(itemDesc, "  ", " ")) {
  itemDesc = stringReplace(itemDesc, "  ", " ");
}

Make it

for (
    itemNew = stringReplace(itemDesc, "
  ", " ";
    itemDesc != itemNew;
    itemNew = stringReplace(itemDesc, "
  ", " "
) {
    itemDesc = itemNew;
}

... and you're golden.

2009-12-07 Reply Admin

[quote user="luis.espinal"Goto? Dude, what the...? Of all the things considering that DO/WHILE loops have existed since BASIC dialects evolved out of the GW-BASIC/PICK-BASIC/TARD-BASIC decades ago? Goto? Worst fix ever. I hope that was a joke.

As for the code snippet featured in the article, it is not surprising at all. It happens a lot, not only on VB, but on Java and C#. It almost makes me want to see programming licenses being mandatory to do perform any programming job or rigorous across-the-board examination exams mandatory for graduating with a CS or MIS degree... almost.

These type of WTFs are not just due to not knowing the language, but they display a fundamental flaw on the way of thinking and problem-solving, completely inexcusable.

-- second try --[/quote]

You must be new here.

2009-12-07 Reply Admin

amischiefr:
Yes I know, there are some talented guys out there that don't have a degree, yada yada, some exceptions to the rule, yada yada. And I'm pretty sure that there are people who could be good trial lawyers that can't get into Law school because they don't have the money or can't do well on the LSAT. But guess what? We don't let them practice law! And we shouldn't let just anybody write code. Just my rant.

The staggering amount of ignorance in this rant is quite refreshing. I am now really considering creating a new WTF site with ignorant rants of this quality. You sir, could become the next WAFIRAh of the day (what a f* ignorant ranting ah).

2009-12-07 Reply Admin

Thanks You! Regex is like wiping your butt with a belt sander! Right, I don't like it! The only folks that use this are the guy's that think the more complex it looks the cooler he is!

2009-12-07 Reply Admin

[k\|<]

? Really? So you'd say

h4c|

and

h@(<

are valid permutations? ...Are you sure you don't mean

(k|\|<)

?

2009-12-07 Reply Admin

No kidding, Alex should hire new commenters.

At least ones that know acsi from unicode.

2009-12-07 Reply Admin

[/quote] In that case: regexReplace(itemDesc, " +", " "); [/quote]

Won't this find only one group of consecutive spaces rather than up-to 24 individual spaces like the original?

replace( "Hello multiple spaces!", " +", " ") => "Hellomultiple spaces!"

2009-12-07 Reply Admin

#include <stdio.h>
#include <stdlib.h>

void remove_dblanks(char *s)
{
  char *p = s;
  int   ch=0;

  if(s) {
    do {
      ch = *p++;
      if(ch == ' ') 
        while(*p == ' ') p++; 
      *s++ = ch;
    } while(ch);
  }
}


int main(void)
{
char s1[] = "  Hello    this    is   full        of blanks     .    ";

	printf("String before:'%s'\n", s1);
	remove_dblanks(s1);
	printf("String after :'%s'\n", s1);
	
	return EXIT_SUCCESS;
}

Result:

$ ./a.exe
String before:'  Hello    this    is   full        of blanks     .    '
String after :' Hello this is full of blanks . '

O(n), unbeatable by all your sissi-languages ;-)

2009-12-07 Reply Admin

Bubba:
Clearly, the boss voted for Obama.

Well, change is change. Right?

2009-12-07 Reply Admin

BSAnywhere:
The staggering amount of ignorance in this rant is quite refreshing. I am now really considering creating a new WTF site with ignorant rants of this quality. You sir, could become the next WAFIRAh of the day (what a f* ignorant ranting a*h*).

Consider it, really. The global economy is currently in dire need of morons spending money to no good purpose.

Ignorance.I do not think that word means what you think it means.

Rant. I do not think that word means what you think it means.

It's a point of view, you cretin.

It's possible to disagree, or to agree, with a point of view. (I suppose it's also possible to take a point of view to court on the basis of slander or libel.)

It is, however, personally demeaning to attack it on an ad hominem basis, It's also slightly sad that you can't spell out "fucking ignorant ranting asshole" without self-censorship. I'll give you props. Everybody should defend their own family.

Palin, much?

2009-12-07 Reply Admin

Is this is one of those cases where the solution is to type 'regex' and then run a rolling pin across the keyboard?

2009-12-07 Reply Admin

gallier2:
#include <stdio.h>
#include <stdlib.h>
<snip loooong code>
O(n), unbeatable by all your sissi-languages ;-)

unlines . map (unwords . words) . lines

2009-12-07 Reply Admin

amischiefr:
Why almost? I think they should. You can't practice Law without a degree, you can't perform operations without a medical degree (legally anyway, which goes back to point 1...). Why not institute a similar practice for programming?

Because no-one died or lost his liberty or even a significant amount of money from writing this code. Remember that Don's boss would have to get all the way up to board level before anyone outside the company can sack him. Why should Don be any less secure?

Unless you really meant to say that requiring programmers to have a degree (which may or may not have been bought over the Internet) before they can be employed as a programmer would benefit society.

2009-12-07 Reply Admin

Actually, a push-down automaton.

2009-12-07 Reply Admin

Of course, what they don't say is that the second problem you have when you think "I know! I'll use a regular expression!" is a contest to see how high people can piss up a wall.

EvanED · 2009-12-07 Reply Admin

Jonathan Collins:
I think it's O(n * log n), but not because of the string copy. The n comes from indexOf().

I suppose that depends on exactly how stringReplace works... I'm actually having a hard time even establishing language the code snippet is in. Regardless, stringReplace returns the replaced string; this suggests to me that it copies the string rather than updates it in place.

That said, either a copy within stringReplace or indexOf will give you the O(n) per-iteration.

(The fact that O(log n) is sub-linear is an indication that can't be the right complexity, at least in general. If you allow characters in the string to vary independently one another, if you algorithm skips any two-character substring of the input, then the same input with those two characters changed to spaces would produce the same result. Thus you need to read at least every other character, and O(log n) time isn't enough to do that.)

ukslim:
Oh, I bow to your in-depth understanding of the implementation of your particular RegEx library. (...) Maybe you're right, but, it'll vary widely, and you certainly can't fault code that's clearer and _may not_ be any slower.

When you write a regex, you're programming a finite state machine. You should have a pretty good idea of how it will work.

This certainly isn't true of PCREs in general, which are more powerful than regular expressions and hence more powerful than FSMs. (Backreferences alone get you a class of languages that's strictly more powerful than what FSMs can accept, and in fact contain languages that aren't even context-free. Hence Joel's Actually, a push-down automaton also can't describe what PCRE libraries do in general.)

So yes, we're back to internals of your regex library. How are non-regular regexs compiled? Are regular regexs compiled into a FSM, or does it use a more general technique?

I really have no idea about these questions, and considering the familiarity I have with parsing and such, I'd argue it's pretty unreasonable to expect someone to.

ukslim:
It's *definitely* slower. No 'may not' about it.

That's a pretty bold statement.

How long does compilation take? What's the input? If I give Voice of Reason's loop a string without any " " substrings, I virtually guarantee it will take less time than any regexReplace function that does copying, and would be basically the same as any regexReplace that doesn't.

If the input strings are drawn from a distribution where there are very few strings with double spaces, such strings are typically shorter than those without double spaces, etc., your statement could be wrong.

2009-12-07 Reply Admin

Bim Job:
Rant. I do not think that word means what you think it means.

Inconceivable!

2009-12-07 Reply Admin

Complete Moron:
If simple regex doesn't look clear to you, then you don't use regex enough. If you don't use regex enough, you are missing out. What about stripping html tags or finding a date in a string? I'd love to see a loop (or more likely many nested loops) that was nearly as clean as regex for such tasks.

ACK! Repeat after me -- HTML and XML are not regular languages. You cannot use a regular expression to parse them. It is pure distilled evil to try. It is more evil than your mother in law. Cthulu will eat your soul if you use code like this. </rant>.

2009-12-07 Reply Admin

Bim Job:
BSAnywhere:
The staggering amount of ignorance in this rant is quite refreshing. I am now really considering creating a new WTF site with ignorant rants of this quality. You sir, could become the next WAFIRAh of the day (what a f* ignorant ranting a*h*).
Consider it, really. The global economy is currently in dire need of morons spending money to no good purpose.
Ignorance.I do not think that word means what you think it means.

Rant. I do not think that word means what you think it means.

It's a point of view, you cretin.

It's possible to disagree, or to agree, with a point of view. (I suppose it's also possible to take a point of view to court on the basis of slander or libel.)

It is, however, personally demeaning to attack it on an ad hominem basis, It's also slightly sad that you can't spell out "fucking ignorant ranting asshole" without self-censorship. I'll give you props. Everybody should defend their own family.

Palin, much?

WTF?

Sorry, "Woosh" describes it better...

2009-12-07 Reply Admin

It seems pretty obvious who here knows how to create and read a RegEx and who doesn't / refuses to learn.

decet: Blinded deceit . Look ma no I's!

2009-12-07 Reply Admin

TRWDF is 5+ different solutions in regular expressions and no mention to XCKD... http://xkcd.com/208/

Luiz Borges captcha: causa

2009-12-07 Reply Admin

Still amazed at how few modern programmers actually grok regular expressions, as illustrated by this post and the subsequent comments which are all subtly wrong or over-engineered.

2009-12-07 Reply Admin

Vizzini:
Bim Job:
Rant. I do not think that word means what you think it means.

Inconceivable!

I know; linking to an encyclopedia instead of a dictinary for a deff! That's like using nested stringReplace calls instead of a loop to get rid of, oh wait...

2009-12-07 Reply Admin

EvanED:
How long does compilation take? What's the input? If I give Voice of Reason's loop a string without any " " substrings, I virtually guarantee it will take less time than any regexReplace function that does copying, and would be basically the same as any regexReplace that doesn't.
If the input strings are drawn from a distribution where there are very few strings with double spaces, such strings are typically shorter than those without double spaces, etc., your statement could be wrong.

Well, back-tracking PCRE systems can take an awfully long (and, afaik, unbounded) time to produce a result. I know it's not what you're discussing, but for the purpose of recursively replacing string X with string X', it might be relevant.

I've tried to look up the article that inspired me on the Thompson NFA algorithm, but unfortunately I've failed. The link from Lambda the Ultimate appears to be broken. It's worth looking at the Haskell site.

I'm not sure that compilation time is much of an issue, since any regexp implementation that I can think of is capable of using precompiled expressions.

Execution time? That might be more important. I'd prefer to have a bounded limit on that, and I'm not sure that non-deterministic backtracking implementations would give me one.

2009-12-07 Reply Admin

BSAnywhere:
Sorry, "Woosh" describes it better...

I do not think that word means what you think it means.

Furthermore, I'm fucking certain that it is not spelt the way you think it is.

2009-12-07 Reply Admin

if (itemDesc.indexOf(" ") != -1)
  throwException("Upstream producing invalid Item Descriptions; Fix the problem, don't patch over it already");

2009-12-07 Reply Admin

Todd Lewis:

if (itemDesc.indexOf(" ") != -1)
  throwException("Upstream producing invalid Item Descriptions; Fix the problem, don't patch over it already");

Agreed...

justsomedude:
TRWTF is allowing free-form field entries on a value that's getting fed to a 3rd party and is expected to match known possibles. Should be limited to list...

toth · 2009-12-07 Reply Admin

Complete Moron:
What about stripping html tags or finding a date in a string?

So THAT'S where you find them. I've been looking in the wrong places.

Thank you, thank you, I'll be here all night.

2009-12-07 Reply Admin

Yeah...

I work at a newspaper office, and more than once a year we get a report on an ultra-modern 3.5" floppy disk from the county treasurer for a tax report. I watched a well-meaning person spend more than an hour in Quark editing the thing, and asked if I could help, and was handed it...I looked at it long enough to ask for the original text file, fire up TextWrangler, and throw this into find-and-replace (rewritten as a regexp since TW splits the find and replace):

s/\s{2+}/\t/g

TW also has this handy feature named "Zap Gremlins" which proved handy in removing the manual page breaks embedded in the file...yes, the text file was the output for the line printer...

(CAPTCHA: dolor, which seems appropriate for this week so far)

2009-12-07 Reply Admin

pshh. The regexes shown here are not clever or unclear. Only someone who hasn't bothered to even try to learn the syntax would claim that. I mean, what's unclear about " +", once you know that "+" means "the preceding, possibly repeated many times"? Seriously, it's about as simple as things get.

That's not to say that all regexes are pretty or understandable. But all of the ones shown here were pretty damn clear.

DaveK · 2009-12-07 Reply Admin

Todd Lewis:

if (itemDesc.indexOf("  ") != -1)
  throwException("Upstream producing invalid Item Descriptions; Fix the problem, don't patch over it already");

You missed the next couple of lines ...

pullDoubleSpaces(itemDesc);
if (itemDesc.indexOf("  ") != -1)
  throwException("Boss code detected!");

FTFY :)

2009-12-07 Reply Admin

EvanED:
ukslim:
Oh, I bow to your in-depth understanding of the implementation of your particular RegEx library. (...) Maybe you're right, but, it'll vary widely, and you certainly can't fault code that's clearer and _may not_ be any slower.

When you write a regex, you're programming a finite state machine. You should have a pretty good idea of how it will work.

This certainly isn't true of PCREs in general, which are more powerful than regular expressions and hence more powerful than FSMs. (Backreferences alone get you a class of languages that's strictly more powerful than what FSMs can accept, and in fact contain languages that aren't even context-free. Hence Joel's Actually, a push-down automaton also can't describe what PCRE libraries do in general.)

So yes, we're back to internals of your regex library. How are non-regular regexs compiled? Are regular regexs compiled into a FSM, or does it use a more general technique?

I really have no idea about these questions, and considering the familiarity I have with parsing and such, I'd argue it's pretty unreasonable to expect someone to.

Just as a theoretical follow-up (and you'll be able to make more sense of it than I can), I've found the Google cache for Russ Cox article I was searching for. It's essentially a defence of DFAs over NDFAs.

Food for thought. Note that Thomas Lord on the aforementioned Lambda the Ultimate states (along with several other interesting observations):

"The linked article disappoints me a bit because it rehearses the Thompson construction, swell, but only mentions in passing how the non-regular features people want (like back-references) make everything harder."

Now, you can go in the direction of theory, and NP-completeness, and try to figure out a set of decent heuristics. (See the LtU article for details of current research.)

Or you can just be a dummy, like me, and ask "Why would anybody need back-references? Even Jamie Zawinski could only deal with two problems at a time."

2009-12-07 Reply Admin

Another tragic example of Code Cancer, showing how even the most innocuous line of bad code can, over time, expand to a malignant tumour.

Fortunately the extraction worked in this case, but not all software is so lucky. Let's all hope we eventually find a cure.

2009-12-07 Reply Admin

gallier2:

void remove_dblanks(char *s)
{
  char *p = s;
  int   ch=0;
if(s) {
do {
ch = *p++;
if(ch == ' ')
while(*p == ' ') p++;
*s++ = ch;
} while(ch);
}
}

Why throw in the extra copy?

void remove_dblanks(char *s) {
    char *p = s++;
    while (*p) {
        if (*p != ' ' || *s != ' ')
            *++p = *s;
        s++;
    }
}

2009-12-07 Reply Admin

Bubba:
Spaces are good. They give the compiler time to think.

XD

2009-12-07 Reply Admin

MG:

void remove_dblanks(char *s) {
    char *p = s++;
    while (*p) {
        if (*p != ' ' || *s != ' ')
            *++p = *s;
        s++;
    }
}

Assuming a C99 compiler, this is the most compact I can make it.

void remove_dblanks3(char *s) {
    for (char *p = s++; *p; s++) (*p != ' ' || *s != ' ') && (*++p = *s);
}

A Spacy Problem

Leave a comment on “A Spacy Problem”