• (cs) in reply to AdT
    AdT:
    (It's quite amusing how Java is advertised as being so much simpler than C++, yet the simple issue of efficient string concatenation spawns large controversial discussions. In C++, you can just use std::string's efficient += operator.)

    Oh snap, now you've done it. You've opened up the can of worms...

  • S11 (unregistered) in reply to John Doe
    John Doe:
    Something like this:
    <MyTokenList>
      <MyToken>PartA</MyToken>
      <MySeparator>_</MySeparator>
      <MyToken>PartB</MyToken>
      <MySeparator>_</MySeparator>
      <MyToken>PartC</MyToken>
    </MyTokenList>
    <xsl:stylesheet version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="text"/>
      <xsl:template match="MyTokenList">
        <xsl:copy-of select="MyToken/text()|MySeparator/text()"/>
      </xsl:template>
    </xsl:stylesheet>
  • Khim (unregistered) in reply to Volmarias

    Java 1.0 has String.replace - it's easy to see: there are no "Since:" line and Java's documentation is VERY accurate: toLowerCase - 1.1, compareToIgnoreCase - 1.2, etc, it's all listed there.

  • Khim (unregistered) in reply to Volmarias
    adT:
    (It's quite amusing how Java is advertised as being so much simpler than C++, yet the simple issue of efficient string concatenation spawns large controversial discussions. In C++, you can just use std::string's efficient += operator.)

    This is joke, right ? Pleeease, say you are joking. Please. Don't make me say you are full of bullshit.

    Why ? It's simple: std::string's operator+= is not efficient, sorry. Standard does not say anything and some implementations indeed are optimized for short strings and have O(n) operator += so if you'll try to use it for long strings you'll get atrocious speed. If you want guaranteed efficiency you'll use std::vector<char> and then convert it to std::string... exactly like in Java!

    Not the simplest place in C++...

  • (cs) in reply to Anon
    Anon:
    Sorry I'm a little new to programming ... how can string replacement be done with one line? Does .NET provide library functions for that? Or using reg ex?

    Hate to be that guy ......

    Yes e.g. in .Net (example in c#) you would do something like

    String myString = "xyz xyz";

    myString = myString.Replace(" ", "_");

    myString should now have a vlue of "xyz_xyz";

  • dkf (unregistered) in reply to java.lang.WTFException
    java.lang.WTFException:
    you mean StringBuilder right?
    I would have done if I hadn't been busy doing two or three things at the same time as posting here. Or if I'd spent another 5 seconds reading links on the page I referenced. D'oh! (What can I say? I learned Java back in 1.0 days...)
    java.lang.WTFException:
    anyone who ends up calling synchronized methods all the time has no busines calling other peoples code slow.
    It's still a linear algorithm, and getting the algorithm right remains the #1 task of anyone optimizing some code, since compilers are almost universally terrible at doing that for you. (A very good Java compiler might possibly improve this specific case, but in that case the compiler author has put in special code to check for this sort of WTF, which would be something of a WTF in itself...)
  • fletch (unregistered) in reply to John Doe

    And just for the hell of it, here it is (pretty much, skimped on the namespaces) :-P

    xsl:stylesheet <xsl:template match='MyToken'> <xsl:value-of select='.' /> </xsl:template> <xsl:template match='MySeparator'> xsl:text </xsl:text> </xsl:template> </xsl:stylesheet>

    There, it wasn't that hard, now was it? May still be performance crippling, but at least it seems pretty darn clear to me what it does at least ;-)

  • (cs) in reply to Khim
    Khim:
    adT:
    (It's quite amusing how Java is advertised as being so much simpler than C++, yet the simple issue of efficient string concatenation spawns large controversial discussions. In C++, you can just use std::string's efficient += operator.)

    This is joke, right ? Pleeease, say you are joking. Please. Don't make me say you are full of bullshit.

    Why ? It's simple: std::string's operator+= is not efficient, sorry. Standard does not say anything and some implementations indeed are optimized for short strings and have O(n) operator += so if you'll try to use it for long strings you'll get atrocious speed. If you want guaranteed efficiency you'll use std::vector<char> and then convert it to std::string... exactly like in Java!

    Not the simplest place in C++...

    Yes. this is joke.

    Not that anybody sane worries too much about efficiency these days, but in those corner cases where you actually need a mutable string (this being, I think, the point of AdT's comment), you always have the option in C++ of dropping down into C.

    Excuse me?

    vector<char> v = <snip/>; string s = v... something ... hmmm ... I'll just look it up.

    Probably involves an STL algorithm. Yup, that's a good idea.

    In what way is this exactly like Java? And why would you want to do it in the first place?

  • (cs) in reply to Anon
    Anon:
    Sorry I'm a little new to programming ... how can string replacement be done with one line? Does .NET provide library functions for that? Or using reg ex?

    Hate to be that guy ......

    (a) by using one line (not that using one line is at all important) (b) yes (c) very probably.

    Please don't address questions like this to comp.wtf.java.howto. We have better things to do here. What you actually want is comp.wtf.java.hownotto. You should know this. We only have a limited amount of patience.

    Anyway, on with the show ...

  • Cronus (unregistered)

    I must be missing something here folks. Now I know that putting in a huge string would slow down the entire process, but for placing an underscore between tokens, this doesn't seem that bad. (Other than the need for "Count")

    Now I've read people voicing regex and Replace, but no examples to back it up. Please tell me that you aren't saying that they should replace that block of code with replace(" ", "_");?

    So if you had a string with " hi there world" and used the replace you would get "___hi__there_world" which isn't what was done for the code block.

    The use of the tokenizer was to make sure that it would remove the spaces between characters. " hi there world" --> "hi_there_world" (As long as the delimiter was " ", which of course it defaulted to).

    So I may be fairly new to the programming scene, but I would love to see professional code to replace said tokenizer. (Just for my own knowledge if nothing more).

  • Cronus (unregistered) in reply to Cronus
    Cronus:
    So if you had a string with " hi there world" and used the replace you would get "___hi__there_world" which isn't what was done for the code block.

    Seems to have snipped off the beginning 3 spaces that I had put into the quotes... oh well, you get the picture anyway.

  • PSpeed (unregistered) in reply to Cronus
    Cronus:
    So I may be fairly new to the programming scene, but I would love to see professional code to replace said tokenizer. (Just for my own knowledge if nothing more).

    I'll bite... and probably regret it. ;)

    First and easiest way:

    public static String replaceSpaceWithUnderscore(String str){
    return str.replaceAll( " +", "_" );
    }

    Which has been mentioned a few times and has been in since JDK 1.4. It's nice because if tomorrow you decided you wanted all whitespace replaced:

    return str.replaceAll( "\s+", "_" );

    And if you really really had to implement it manually. Lay off StringTokenizer, lay off string += type stuff in the inner loop, etc..

    This is a quick-and-dirty example which should perform better than the original in all cases but could be optimized further if it were really needed:

    public static String replaceSpaceWithUnderscore(String str)
    {
    if( str == null || str.length() == 0 )
    return str;

    char[] buff = str.toCharArray();
    
    StringBuffer sb = new StringBuffer(); 
    // StringBuilder if you can get away with it
    
    for( int i = 0; i < buff.length; i++ ) {
        if( buff[i] != ' ' ) {
            sb.append( buff[i] );
        } else if( i == 0 || buff[i-1] != ' ' ) {
            sb.append( '_' );           
        }
    }    
    return sb.toString();
    

    }

    Or something like that. I haven't compiled or unit-tested it. ;)

  • Emiel (unregistered) in reply to John Doe

    It does NOT exactly does what is says on the tin. The correct function name should be replaceSpacesWithUnderscore! It replaces one ore more spaces with one underscore. So this function can only be replaced with a standard String function using a regular expression, which is only available since Java version 1.4.

  • Anonymous (unregistered)

    That's why they invented something like the java API...

  • (cs) in reply to real_aardvark
    real_aardvark:
    Yes. this is joke.

    Not that anybody sane worries too much about efficiency these days, but in those corner cases where you actually need a mutable string (this being, I think, the point of AdT's comment), you always have the option in C++ of dropping down into C.

    Excuse me?

    vector<char> v = <snip/>; string s = v... something ... hmmm ... I'll just look it up.

    Probably involves an STL algorithm. Yup, that's a good idea.

    In what way is this exactly like Java? And why would you want to do it in the first place?

    It's exactly like Java in that for those corner cases where you actually need a mutable string, you always have the option in Java of using StringBuffer/StringBuilder. Or even a char array.

    The difference is mainly that there are a lot of naive Java programmers who have little understanding of what happens "beneath the hood" and thus aren't aware of the option and when it should be used, whereas it's much harder in C++ to get by without such knowledge, and most C++ programmers have started out with or at least sometimes work with pure C where such knowledge is essential to get anything at all done.

  • Hognoxious (unregistered) in reply to Snoop
    Snoop:
    The person who wrote that had probably never programmed in Java before. They at least had the decency to wrap their code in a logically named function that could be modified later if and when necessary. Of course, the repeated creation of strings does suck, but when you have a deadline to make it's not the kind of thing that would be a top priority.
    I was debugging something recently and I saw ten lines of code used to replace a comma with a period (European decimal -> English). I say ten lines, it was ten lines each time it was used...

    But, the name of the function in this case would lhave to be changed if the logic changed. So if it had to replace spaces or tabs with underscores, you'd have to change its name. If it had been named in a different way, such as tidyFilename or formatPhonenumber (or whatever it does) you wouldn't.

  • Stu (unregistered) in reply to John Doe
    John Doe:
    WTF? He didn't temporarily store the tokens in XML, interspersed with the underscores? That way he would be able to get the final string with one giant, performance crippling swoop of XSLT. That way it's really enterprisey, and customers will better make use of their oversized and overpriced hardware, thereby justifying the cost.

    Something like this:

    <MyTokenList>
    <MyToken>PartA</MyToken>
    <MySeparator></MySeparator>
    <MyToken>PartB</MyToken>
    <MySeparator></MySeparator>
    <MyToken>PartC</MyToken>
    </MyTokenList>

    The XSLT:

    xsl:stylesheet

    <!-- Left as an exercise to the reader -->

    </xsl:stylesheet>

    Captcha: minim. WTF? Maxim would be better here!

    How about some of this http://aspn.activestate.com/ASPN/Cookbook/XSLT/Recipe/65426

  • (cs)

    There is a subtle difference between this and String.replaceChar when it comes to multiple spaces in a row (not the REAL wtf, but still).

  • (cs) in reply to brazzy
    brazzy:
    real_aardvark:
    Yes. this is joke.

    Not that anybody sane worries too much about efficiency these days, but in those corner cases where you actually need a mutable string (this being, I think, the point of AdT's comment), you always have the option in C++ of dropping down into C.

    Excuse me?

    vector<char> v = <snip/>; string s = v... something ... hmmm ... I'll just look it up.

    Probably involves an STL algorithm. Yup, that's a good idea.

    In what way is this exactly like Java? And why would you want to do it in the first place?

    It's exactly like Java in that for those corner cases where you actually need a mutable string, you always have the option in Java of using StringBuffer/StringBuilder. Or even a char array.

    The difference is mainly that there are a lot of naive Java programmers who have little understanding of what happens "beneath the hood" and thus aren't aware of the option and when it should be used, whereas it's much harder in C++ to get by without such knowledge, and most C++ programmers have started out with or at least sometimes work with pure C where such knowledge is essential to get anything at all done.

    Interesting,; but how, exactly, would you convert an STL vector into a string? Efficiently, that is. And why would you design a system where this conversion would be necessary?

    C'mon, we've both come across ludicrous designs like this in C++. Those corner cases in Java? They tend to be occupied by loons attempting to convert one container into another, with no Buffer/Builder experience whatsoever. All you can hope for is 20 lines that do the job of one, more or less efficiently. As I say, Java drones. I'm not being rude about Java, I'm just postulating that 90% of practitioners are drones.

    BTW, I always thought that making the default string immutable in Java was a bit of a cop-out. You want a functional language, you design a functional language. Java is not functional.

  • Peter (unregistered)

    Anyone who has taken a college level course in Java could tell you that this is the "right" way to do it. I recently failed a test question which asked us to do almost the same thing (it was actually removing an HTML tag) because I used a regex instead of doing it the "correct" way and removing the tag one character at a time by splitting the string into three parts etc. etc. Apparently our class was only "authorized" to use Java 1.1.8 commands because that was the "most stable" version, even though the current version of the JRE was 1.6

    This code would have garnered a student full credit in an assignment because it shows their willingness to create a completely overcomplicated solution to a simple problem.

  • Orclev (unregistered) in reply to Peter
    Peter:
    Anyone who has taken a college level course in Java could tell you that this is the "right" way to do it. I recently failed a test question which asked us to do almost the same thing (it was actually removing an HTML tag) because I used a regex instead of doing it the "correct" way and removing the tag one character at a time by splitting the string into three parts etc. etc. Apparently our class was only "authorized" to use Java 1.1.8 commands because that was the "most stable" version, even though the current version of the JRE was 1.6

    This code would have garnered a student full credit in an assignment because it shows their willingness to create a completely overcomplicated solution to a simple problem.

    Being forced to use 1.1.8 when 1.6 was out is the real WTF.

    I have yet to see anyone respond to my previous comment. At least as of JDK5 the java compiler will optimize string literal manipulations into StringBuilder calls, so it really doesn't matter all that much how you write it due to the compiler cleaning it up for you. Admittedly relying on the compiler to handle it is probably a WTF in itself, but I still don't see how the test results given previously could possibly be accurate on any current JDK (at least the Sun one does the optimization, not sure about others).

  • (cs) in reply to real_aardvark
    real_aardvark:
    C'mon, we've both come across ludicrous designs like this in C++. Those corner cases in Java? They tend to be occupied by loons attempting to convert one container into another, with no Buffer/Builder experience whatsoever. All you can hope for is 20 lines that do the job of one, more or less efficiently. As I say, Java drones. I'm not being rude about Java, I'm just postulating that 90% of practitioners are drones.
    Sturgeon's Law reigns supreme everywhere.
    real_aardvark:
    BTW, I always thought that making the default string immutable in Java was a bit of a cop-out. You want a functional language, you design a functional language. Java is not functional.
    Well, it's a cop-out in that it avoids a lot of nastiness that can happen otherwise, at IMO a small price. If you need mutable strings, you can have them; it's just not the default.

    A main reason for that design decision is probably that Java was conceived from the beginning as a multi-threaded environment, where the advantages of immutable data types are even more pronounced than otherwise. I really don't want to think about all the extra race condition potential you'd have with mutable strings as the default.

    Peter:
    Apparently our class was only "authorized" to use Java 1.1.8 commands because that was the "most stable" version, even though the current version of the JRE was 1.6
    Stable, my ass. Java 1.1 used to be what one restricted oneself to because that was what came preinstalled with Windows, avoiding the need for users to download a JRE. They most likely just clung to the obsolete restriction without remembering why. Nowadays, there is no Java preinstalled with Windows at all, but most Java apps are not written for end users anyway, and downloading a JRE is a non-issue with broadband connections.
    Orclev:
    I have yet to see anyone respond to my previous comment. At least as of JDK5 the java compiler will optimize string literal manipulations into StringBuilder calls, so it really doesn't matter all that much how you write it due to the compiler cleaning it up for you.
    Wrong. At least as of Sun's JDK 1.6.0_04 (the most current one) and unless you know any secret compiler options, the compiler will only replace the += inside the loop with a StringBuilder - but one that is created temporarily to append the two portions and then abandoned after calling toString() on it. The compiler does NOT (and, I am pretty certain cannot) fix the basic problem of quadratic running time, which requires one StringBuilder created outside the loop and used throughout the iterations.
  • Cronus (unregistered) in reply to PSpeed
    PSpeed:
    First and easiest way:
    public static String replaceSpaceWithUnderscore(String str){
      return str.replaceAll( " +", "_" );
    }
    

    I'm going to assume that you used

    str.trim();

    (or something similar) prior to sending the string to underscore function, seeing that the original "programmer" wanted only underscores in between tokens and not something like "_Hello_World!_"
  • Steve (unregistered)

    Isn't String.replaceAll() a relatively new feature of Java? The docs say it came in version 1.4. I seem to recall having to resort to a lot of ugly kluges like that in the early days.

  • PinkFloyd43 (unregistered) in reply to John Doe

    Right code some xsl/regex, would rather see crappy VB code!

  • (cs)

    Of course, the real WTF would be someone replacing the existing inelegant (but working) solution with one of the much tidier ones for no better reason than 'it looks nicer'.

    I've lost count of the amount of times someone's done this in some code and managed to introduce a nice new subtle bug.

  • (cs) in reply to hobart
    hobart:
    Of course, the real WTF would be someone replacing the existing inelegant (but working) solution with one of the much tidier ones for no better reason than 'it looks nicer'.
    It's called "refactoring" and generally considered essential for long-term maintainability.
  • (cs) in reply to brazzy
    brazzy:
    real_aardvark:
    BTW, I always thought that making the default string immutable in Java was a bit of a cop-out. You want a functional language, you design a functional language. Java is not functional.
    Well, it's a cop-out in that it avoids a lot of nastiness that can happen otherwise, at IMO a small price. If you need mutable strings, you can have them; it's just not the default.

    A main reason for that design decision is probably that Java was conceived from the beginning as a multi-threaded environment, where the advantages of immutable data types are even more pronounced than otherwise. I really don't want to think about all the extra race condition potential you'd have with mutable strings as the default.

    I can't say I'm terribly impressed by Java's "multi-threaded environment." It's a bit clunky, isn't it?

    And I'm far from persuaded that it's worth picking out strings for immutability -- 20 years of C/C++ (admittedly a disgusting tool for multi-threading) have yet to provide me with a case where mutable strings might cause a race condition. I'm sure your mileage differs.

    All I'm suggesting is that if you want the benefits of a functional programming language, you use a functional programming language. Or possibly a compromise, such as Scala. Preferably not some mutant beast like Java, which is then adopted by nine out of ten of Ted Sturgeon's evil brothers to spread chaos and stupidity through the world.

  • (cs) in reply to Andrew
    Andrew:
    dkf:
    Sebastian:
    Not a complete WTF, in the sense that it does more than a simple replace. Several spaces are replaced with only a single underscore. Hey could have used a regexp, maybe.
    But I shudder at the implementation anyway, as it is severely lacking in the StringBuffer department. All that complexity, and it's going to be very slow on anything at all non-trivial. I know some people like to say that regexps are never the answer, but in this case they'll give something better (and more efficient) than this for sure. For example...
    public static String replaceSpaceWithUnderscore(String str) {
        return (str==null) ? null : str.replaceAll(" +", "_");
    }
    

    Anyone who says regular expressions are never the answer isn't asking very interesting questions.

    Speaking of which

    Can't believe I'm the fist one to link to XKCD :)

  • Ed (unregistered)

    I've even gone so far as to write things like this when heavily used replacements start showing up as a major drain because of the regex processing.

    But in those cases I'd be using a string builder, so god knows.

  • Mike B. (unregistered) in reply to Volmarias

    Regular expressions weren't included in Java (standard) until the 1.4 release.

  • dawn (unregistered) in reply to John Doe
    John Doe:
    WTF? He didn't temporarily store the tokens in XML, interspersed with the underscores? That way he would be able to get the final string with one giant, performance crippling swoop of XSLT. That way it's really enterprisey, and customers will better make use of their oversized and overpriced hardware, thereby justifying the cost.

    Something like this:

    <MyTokenList>
    <MyToken>PartA</MyToken>
    <MySeparator></MySeparator>
    <MyToken>PartB</MyToken>
    <MySeparator></MySeparator>
    <MyToken>PartC</MyToken>
    </MyTokenList>

    The XSLT:

    xsl:stylesheet

    <!-- Left as an exercise to the reader -->

    </xsl:stylesheet>

    Captcha: minim. WTF? Maxim would be better here!

  • Brendan (unregistered) in reply to Volmarias

    Rubbish, even if they implemented String Tokenizing prior to something as trivial as a character replacement function (which I find impossible to believe), there are still ways of doing this in a low-level fashion (i.e. treating the String as an array, which it is) which wouldn't take any more than a few lines.

    This is actually a prime example or somebody who seriously has no idea how to do what they were asked and has simply hacked something crude together.

Leave a comment on “The Hard Way”

Log In or post as a guest

Replying to comment #175176:

« Return to Article