The Daily WTF: Curious Perversions in Information Technology

2008-02-05 Reply Admin

Not a complete WTF, in the sense that it does more than a simple replace. Several spaces are replaced with only a single underscore. Hey could have used a regexp, maybe.

2008-02-05 Reply Admin

public static String replaceSpaceWithUnderscore(String str) { return str.replace(' ', '_'); }

This obvious solution would however not yield the same results as the function above, as the StringTokenizer handles sequences of token characters as one token, thus a sequence of n spaces will result in one underscore instead of n.

2008-02-05 Reply Admin

Well, you could do a first step in which you replace all sequences of spaces with a single space. ;-)

2008-02-05 Reply Admin

Sebastian:
Not a complete WTF, in the sense that it does more than a simple replace. Several spaces are replaced with only a single underscore. Hey could have used a regexp, maybe.

But I shudder at the implementation anyway, as it is severely lacking in the StringBuffer department. All that complexity, and it's going to be very slow on anything at all non-trivial. I know some people like to say that regexps are never the answer, but in this case they'll give something better (and more efficient) than this for sure. For example...

public static String replaceSpaceWithUnderscore(String str) {
    return (str==null) ? null : str.replaceAll(" +", "_");
}

snoofle · 2008-02-05 Reply Admin

I suppose the amount of WTFness depends on a) if the coder more or less knows the length of the string and the number of spaces it will contain; for short strings, it's probably more efficient to do it like this than with regex's, and b) if the coder even knows that you can use a regex with String.replaceAll

2008-02-05 Reply Admin

Why would one write a function for that?!

tr/ /_/s;

ParkinT · 2008-02-05 Reply Admin

Regexp ROCKS!!

2008-02-05 Reply Admin

dkf:
Sebastian:
Not a complete WTF, in the sense that it does more than a simple replace. Several spaces are replaced with only a single underscore. Hey could have used a regexp, maybe.
But I shudder at the implementation anyway, as it is severely lacking in the StringBuffer department. All that complexity, and it's going to be very slow on anything at all non-trivial. I know some people like to say that regexps are never the answer, but in this case they'll give something better (and more efficient) than this for sure. For example...
public static String replaceSpaceWithUnderscore(String str) {
    return (str==null) ? null : str.replaceAll(" +", "_");
}

Anyone who says regular expressions are never the answer isn't asking very interesting questions.

2008-02-05 Reply Admin

I wouldn't of thought of the regex at first, but I would have used a StringBuffer. If that function gets a large string with many spaces, the whole app may crash. (I learned that the hard way when I was a Java newbie)

Grovesy · 2008-02-05 Reply Admin

shelteredcoder:
I wouldn't of thought of the regex at first, but I would have used a StringBuffer. If that function gets a large string with many spaces, the whole app may crash. (I learned that the hard way when I was a Java newbie)

came across a similar function in a .net production system. It looked something like

while (myString.IndexOf(" ") > 0) { myString = myString.Replace(" ", ""); }

All fine, not a major major WTF except IndexOf thought Nul chars where a space, and Replace didn't.... The net effect was an 8cpu server chugging along at 30% averaging on each CPU, then 7 CPU's would go quite and 1 CPU would go nuts at 100% untill the service was shut down.

(We never quite figured why only the infinite looping thread became the only one running... we guessed it was some kind of wierd thread starvation problem, due to some thread synchronisation going on somewhere up the call stack...After all, the system in question was The Beast)

2008-02-05 Reply Admin

Even if he needed to take away multiple spaces, and didn't know regex he could have just done

while (theString.indexOf("  ") > -1){
   theString = theString.replace("  ",' ');
}
theString = theString.replace(' ','_');

At the very least.

Captcha: abigo (what?)

lmollea · 2008-02-05 Reply Admin

WTF mention for category: "most fancy string handling"

2008-02-05 Reply Admin

dkf:
But I shudder at the implementation anyway, as it is severely lacking in the StringBuffer department.

you mean StringBuilder right?

anyone who ends up calling synchronized methods all the time has no busines calling other peoples code slow.

2008-02-05 Reply Admin

dkf:
I know some people like to say that regexps are never the answer, but in this case they'll give something better (and more efficient) than this for sure. For example... [/code]

There is a word for people like that: moron.

Volmarias · 2008-02-05 Reply Admin

Our code is littered with stuff like this, mostly because regex and elegant replacement methods didn't come around until much later (that and inexperience with Java's core methods). I don't know whether String.replace(char, char) was around for 1.0 or whenever this may have been written, but it's possible that it was not, or that the developer just didn't know about it.

Not using a StringBuffer, or even a char array, though, that's pretty silly.

2008-02-05 Reply Admin

snoofle:
for short strings, it's probably more efficient to do it like this than with regex's

Only idiot would use regexes for replacing single character value with another character.

BTW, the "Think outside of box" ad I get right under the article is funnily appropriate.

2008-02-05 Reply Admin

WTF? He didn't temporarily store the tokens in XML, interspersed with the underscores? That way he would be able to get the final string with one giant, performance crippling swoop of XSLT. That way it's really enterprisey, and customers will better make use of their oversized and overpriced hardware, thereby justifying the cost.

Something like this:

<MyTokenList>
<MyToken>PartA</MyToken>
<MySeparator></MySeparator>
<MyToken>PartB</MyToken>
<MySeparator></MySeparator>
<MyToken>PartC</MyToken>
</MyTokenList>

The XSLT:

xsl:stylesheet
  <!-- Left as an exercise to the reader -->
</xsl:stylesheet>

Captcha: minim. WTF? Maxim would be better here!

2008-02-05 Reply Admin

Grovesy:
not a major major WTF except IndexOf thought Nul chars where a space

I hereby nominate this behavior to be The Real WTF(tm).

2008-02-05 Reply Admin

I agree with Volmarias: our project is chock-full of seeming head-slappers like this from years ago. One's sense of smug superiority gets a bit dented when you realise Java's String functionality in 1.2 was on a par with BBC BASIC. So I call it only a 0.5 WTF for dumb StringTokenizer usage. If my CAPTCHA was almost a rude word, is that a separate WTF?

2008-02-05 Reply Admin

me:
snoofle:
for short strings, it's probably more efficient to do it like this than with regex's

Only idiot would use regexes for replacing single character value with another character.

BTW, the "Think outside of box" ad I get right under the article is funnily appropriate.

i like the "only idiot would" cause it seems that you and the one you speak of have something in common

2008-02-05 Reply Admin

How do you figure? The token between each space will just be a blank string. He will basically be appending the blank string to the end, then adding the underscore. Multiple consecutive spaces mean multiple consecutive underscores.

2008-02-05 Reply Admin

3rd party regex and string util libs became available pretty quickly after Java was released. There really is no excuse for that sort of function.

2008-02-05 Reply Admin

yeah, people say that about StringBuffer all the time. But the compiler fixes it, so chill.

Stylistically, I'd agree that StringBuffer is the correct choice since it makes clear what is happening and doesn't rely on optimization, but it is with a clear conscience that I write readable code such as :

sb.append(S+"...");

powerlord · 2008-02-05 Reply Admin

java.lang.WTFException:
dkf:
But I shudder at the implementation anyway, as it is severely lacking in the StringBuffer department.

you mean StringBuilder right?

anyone who ends up calling synchronized methods all the time has no busines calling other peoples code slow.

Only if you're using Java 1.5 or newer. Some businesses are still on earlier versions.

Volmarias · 2008-02-05 Reply Admin

tickle me emo:
yeah, people say that about StringBuffer all the time. But the compiler fixes it, so chill.
Stylistically, I'd agree that StringBuffer is the correct choice since it makes clear what is happening and doesn't rely on optimization, but it is with a clear conscience that I write readable code such as :

sb.append(S+"...");

You realize that this creates a StringBuffer just to parse S + "...", right? Easier to write sp.append(S).append("...")

Outlaw Programmer · 2008-02-05 Reply Admin

Fortunately, this is one of those rare WTFs that can be refactored away in seconds. The Real WTF here is all of the above people who didn't immediately think of regex. MAYBE I could understand not liking regex because they can get really complicated but for something like this?

If you guys don't know the regex solution, look it up now. I'll be here....waiting....

Volmarias · 2008-02-05 Reply Admin

Outlaw Programmer:
Fortunately, this is one of those rare WTFs that can be refactored away in seconds. The Real WTF here is all of the above people who didn't immediately think of regex. MAYBE I could understand not liking regex because they can get really complicated but for something like this?
If you guys don't know the regex solution, look it up now. I'll be here....waiting....

Fortunately, it can be. Unfortunately, at the time that it was written, the version was likely such that there was no convenient regex method to use.

2008-02-05 Reply Admin

ParkinT:
Regexp ROCKS!!

I'd rather say:

RegexpS ROCK!!

;o) but otherwise d'accord

Outlaw Programmer · 2008-02-05 Reply Admin

Volmarias:
Outlaw Programmer:
Fortunately, this is one of those rare WTFs that can be refactored away in seconds. The Real WTF here is all of the above people who didn't immediately think of regex. MAYBE I could understand not liking regex because they can get really complicated but for something like this?
If you guys don't know the regex solution, look it up now. I'll be here....waiting....

Fortunately, it can be. Unfortunately, at the time that it was written, the version was likely such that there was no convenient regex method to use.

Yeah, I believe that. I don't really blame the original programmer here. The implementation this guy picked is a little wacky but at least he had the foresight to make it a static utility method that's used everywhere instead of copying-and-pasting this logic all over the place. At least, that's what I hope happened.

2008-02-05 Reply Admin

Volmarias:
I don't know whether String.replace(char, char) was around for 1.0 or whenever this may have been written

String.replace has been around since at least Java 1.0.2.

2008-02-05 Reply Admin

Concatenating strings inside a loop like that has quadratic complexity in Java because a copy of the string has to be made on each iteration. This is why Java has StringBuffers and StringBuilders, the latter being added in Java 5.

2008-02-05 Reply Admin

java.lang.WTFException:
dkf:
But I shudder at the implementation anyway, as it is severely lacking in the StringBuffer department.

you mean StringBuilder right?

anyone who ends up calling synchronized methods all the time has no busines calling other peoples code slow.

You appear not to be aware of the scale of the speed difference between the naive implementation vs StringBuffer vs StringBuilder.

Consider that this code:

public class MonkeyMain
{
    public static final int NUMBER_OF_ITERATIONS = 100000;
    
    public static void main( String[] args )
    {
        long startTime = System.currentTimeMillis();
        String buffer = "";
        for ( int i = 0; i < NUMBER_OF_ITERATIONS; i++ )
        {
            buffer += "foo";
        }
        System.out.println( "naive impl() : " + ( System.currentTimeMillis() - startTime ) + "ms." );
        
        startTime = System.currentTimeMillis();
        StringBuffer stringBuffer = new StringBuffer();
        for ( int i = 0; i < NUMBER_OF_ITERATIONS; i++ )
        {
            stringBuffer.append( "foo" );
        }
        System.out.println( "StringBuffer() : " + ( System.currentTimeMillis() - startTime ) + "ms." );
        
        startTime = System.currentTimeMillis();
        StringBuilder stringBuilder = new StringBuilder();
        for ( int i = 0; i < NUMBER_OF_ITERATIONS; i++ )
        {
            stringBuilder.append( "foo" );
        }
        System.out.println( "StringBuilder() : " + ( System.currentTimeMillis() - startTime ) + "ms." );
    }
}

produces the following output on my machine.

naive impl() : 276504ms. StringBuffer() : 11ms. StringBuilder() : 7ms.

This is the difference between a "constant time" algorithm and a quadratic time algorithm. (see "big O notation")

so yeah.. he can call people's code slow.

2008-02-05 Reply Admin

This is a solved problem. Save string to a text file, load an instance of Internet Explorer, navigate to an asp page that reads the text file, scrape the text, replace (the now) single spaces with underscores. That's how we do it here at MoroniTech, anyway.

real_aardvark · 2008-02-05 Reply Admin

"Of all the java beans, in all the regexps, in all the world, s/ /_/ walks into mine..."

Seriously, it's a bit friggin dull, isn't it? I mean, what do you expect of Java drones?

brazzy · 2008-02-05 Reply Admin

tickle me emo:
yeah, people say that about StringBuffer all the time. But the compiler fixes it, so chill.
Stylistically, I'd agree that StringBuffer is the correct choice since it makes clear what is happening and doesn't rely on optimization, but it is with a clear conscience that I write readable code such as :

sb.append(S+"...");

You are wrong. Yes, the compiler MAY (as per Java Language Specification 15.18.1.2 "Optimization of String Concatenation") turn the + operator used on Strings in one expression into calls to StringBuffer/StringBuilder. But it can NOT do that when the operator is used inside a loop as in the WTF (and only then is it really a big problem). The only reason it didn't kill the app's performance is probably that it was only used on very short strings.

This must be the Java issue with the most amount of misconceptions flying around:

Some are blissfully unaware of the problem and write code like in the WTF
Some have heard about it but have not understood, thus believe that "+ is Teh Evil" and will use StringBuffer whenever they concatenate Strings, not understanding when concatenation is a potential performance problem and when not.
Some have heard that compilers "fix it" but have not understood that they can only fix the simple cases that usually don't even cause performance problems.
Very few people actually understand how String concatenation can cause performance problems and in which cases it is or is not an issue.

2008-02-05 Reply Admin

Until version 1.5, Java's regex's were slow, much slower than iterating through each character in a string and checking it. On the other hand, StringTokenizer was even slower than using a regex.

And to Andrew Trumper - be careful of trying to prove a point using microbookmarks (including testing something that may be optimized away - I don't think that's the case here though). The first "native" block involves recreating a new String and copying the older one into it on each iteration. Yes, that's something that you would have to be careful not to do, but this code:

        for (int i = 0; i < 100000; i++) {
            String buffer = "foo" + i;
        }

is actually slightly faster than this:

        for (int i = 0; i < 100000 i++) {
            StringBuffer stringBuffer = new StringBuffer();
            stringBuffer.append("foo").append(i);
        }

and is close to something you're likely to meet, e.g. putting together a response to a HTTP request, than appending the same string together 100000 times.

brazzy · 2008-02-05 Reply Admin

Volmarias:
Outlaw Programmer:
Fortunately, this is one of those rare WTFs that can be refactored away in seconds. The Real WTF here is all of the above people who didn't immediately think of regex. MAYBE I could understand not liking regex because they can get really complicated but for something like this?
If you guys don't know the regex solution, look it up now. I'll be here....waiting....

Fortunately, it can be. Unfortunately, at the time that it was written, the version was likely such that there was no convenient regex method to use.

The really funny part is that a non-regexp method that fulfills the requirements (as per the method name, i.e. assuming that the multipe spaces case is not actually a requirement) WAS present in even very early JDKs.

2008-02-05 Reply Admin

John Doe:
WTF? He didn't temporarily store the tokens in XML, interspersed with the underscores? That way he would be able to get the final string with one giant, performance crippling swoop of XSLT. That way it's really enterprisey, and customers will better make use of their oversized and overpriced hardware, thereby justifying the cost.
Something like this:
<MyTokenList>
<MyToken>PartA</MyToken>
<MySeparator></MySeparator>
<MyToken>PartB</MyToken>
<MySeparator></MySeparator>
<MyToken>PartC</MyToken>
</MyTokenList>
The XSLT:
xsl:stylesheet
  
</xsl:stylesheet>
Captcha: minim. WTF? Maxim would be better here!

return document.getElementsByTagName("MyTokenList")[0].innerText;

brazzy · 2008-02-05 Reply Admin

Paul:
And to Andrew Trumper - be careful of trying to prove a point using microbookmarks (including testing something that may be optimized away - I don't think that's the case here though). The first "native" block involves recreating a new String and copying the older one into it on each iteration.

No, it doesn't. And it's YOUR microbenchmark that is completely broken by creating objects scoped inside the loop and not actually doing anything quadratical.

2008-02-05 Reply Admin

This must be the Java issue with the most amount of misconceptions flying around:

Some are blissfully unaware of the problem and write code like in the WTF

Some have heard about it but have not understood, thus believe that "+ is Teh Evil" and will use StringBuffer whenever they concatenate Strings, not understanding when concatenation is a potential performance problem and when not.

Some have heard that compilers "fix it" but have not understood that they can only fix the simple cases that usually don't even cause performance problems.

Very few people actually understand how String concatenation can cause performance problems and in which cases it is or is not an issue.

The difference between being useful and annoying:

Annoying people will supply you with a list of problems they know the answers too already and make you find them.

Useful people will type the answers with their list of problems.

I'm like you. I'm an annoying person.

brazzy · 2008-02-05 Reply Admin

Maybe Anon:
The difference between being useful and annoying:
Annoying people will supply you with a list of problems they know the answers too already and make you find them.

Useful people will type the answers with their list of problems.

I'm like you. I'm an annoying person.

Well, I sort of had the answer at least hinted at in the part before your quote.

Basically, String concatenation is only a serious problem (with quadratic running time) when you concatenate stuff inside a loop in small increments (i.e. many iterations) onto a String scoped outside the loop. Because in each iteration you copy the entire String, which gets bigger with each iteration. As an added bonus, all those increasingly huge String objects will strain the garbage collector as well.

If there is no loop involved (and of course what goes for loops also goes for recursion), you're very unlikely to reach the number of concatenations and the size of Strings where it is a noticeable problem. Copying a String of size 100 five times rather than once is irrelevant (unless you've identified it as a critical path with a profiler). Copying Strings of sizes growing to 100.000 five thousand times rather than five times IS relevant.

Rick · 2008-02-05 Reply Admin

shelteredcoder:
I wouldn't of thought of the regex at first, but I would have used a StringBuffer. If that function gets a large string with many spaces, the whole app may crash. (I learned that the hard way when I was a Java newbie)

Since we are being picky, it should be StringBuilder today.

2008-02-05 Reply Admin

Nitpicking, perhaps, but it does behave exactly like string.Replace. It doesn't throw an exception if the input string is null. :)

True, it could then have been replaced by two lines of code

2008-02-05 Reply Admin

The person who wrote that had probably never programmed in Java before. They at least had the decency to wrap their code in a logically named function that could be modified later if and when necessary. Of course, the repeated creation of strings does suck, but when you have a deadline to make it's not the kind of thing that would be a top priority.

All in all, it's one of the less-bad things I've seen posted here.

2008-02-05 Reply Admin

me:
Only idiot would use regexes for replacing single character value with another character.

Only idiot would not know how to use indefinite article.

Only idiot would think " +" was matching single character.

(It's quite amusing how Java is advertised as being so much simpler than C++, yet the simple issue of efficient string concatenation spawns large controversial discussions. In C++, you can just use std::string's efficient += operator.)

2008-02-05 Reply Admin

Andrew Trumper:
String buffer = "";
for ( int i = 0; i < NUMBER_OF_ITERATIONS; i++ ) { buffer += "foo"; }

...

produces the following output on my machine.

naive impl() : 276504ms. StringBuffer() : 11ms. StringBuilder() : 7ms.

Which JDK did you use to compile that? Most of the current generation of JDK will optimize that code such that StringBuilder will be used for the first example behind the scenes (I've verified this by decompiling the class files).

2008-02-05 Reply Admin

And they both get owned by this... the proper way to do this:

	StringBuffer stringBuffer = new StringBuffer("foo");
	int l = stringBuffer.length();
	for (int i = 0; i < 100000; i++) {

		stringBuffer.setLength(l);
		stringBuffer.append(i);

	}

string impl() : 237ms. naive impl() : 323ms. real impl() : 117ms.

2008-02-05 Reply Admin

Paul:
Until version 1.5, Java's regex's were slow, much slower than iterating through each character in a string and checking it. On the other hand, StringTokenizer was even slower than using a regex.
And to Andrew Trumper - be careful of trying to prove a point using microbookmarks (including testing something that may be optimized away - I don't think that's the case here though). The first "native" block involves recreating a new String and copying the older one into it on each iteration. Yes, that's something that you would have to be careful not to do, but this code:
        for (int i = 0; i < 100000; i++) {
            String buffer = "foo" + i;
        }
is actually slightly faster than this:
        for (int i = 0; i < 100000 i++) {
            StringBuffer stringBuffer = new StringBuffer();
            stringBuffer.append("foo").append(i);
        }
and is close to something you're likely to meet, e.g. putting together a response to a HTTP request, than appending the same string together 100000 times.

Should've quoted!

And they both get owned by this... the proper way to do this:

StringBuffer stringBuffer = new StringBuffer("foo"); int l = stringBuffer.length(); for (int i = 0; i < 100000; i++) {

stringBuffer.setLength(l); stringBuffer.append(i);

}

string impl() : 237ms. naive impl() : 323ms. real impl() : 117ms.

2008-02-05 Reply Admin

Sorry I'm a little new to programming ... how can string replacement be done with one line? Does .NET provide library functions for that? Or using reg ex?

Hate to be that guy ......

2008-02-05 Reply Admin

ahem ... ok browser refresh didn't work like I'd hoped ... sorry guys.

The Hard Way

Leave a comment on “The Hard Way”