• (cs) in reply to Vollhorst
    Vollhorst:
    Since when does Java use ASCII (by default)?
    Internally, Java uses UCS-16 for the char datatype, which is a superset of ASCII.
  • (cs)

    The biggest WTF in the implementation has not been pointed out yet: assembling the result String one character at a time using the + operator. That's one of the most well-known performance No-Nos in Java, since Strings are immutable, which makes this an O(n^2) operation.

    Also, I bet you 1:10 that 95% of all "homebrew" implementations including the original WTF and those posted in the comments do not correctly handle characters like the German ß (which is transformed into TWO characters when uppercased, thus the comments about the result "growing" in the posted Java API implementation) or the Turkish dotted and dotless i (which each have a lowercase and uppercase version).

  • (cs) in reply to Mate
    Mate:
    A translation table would work with any character set. The program can switch between different encodings' tables, if necessary.
    You need to key it by locale as well (Turkish does odd things with “i”) and you might also have to cope with language-specific special rules (e.g. upper-casing “ß” gives “SS” in German IIRC).

    And then there's collation rules...

  • Supermanlovespink (unregistered)

    hmmm reminds me of the days when we used to check if the letters ascii value fell berwwen a certain range then added i think 64 to the value to get the uppercase... or was it the other way round...

  • (cs)

    Doing this properly requires you to have the full Unicode specification tables somewhere in memory where you can access it. Java (which this snippet was written in) doesn't need to bother about encoding because Strings are always encoded in UTF-16. All the character conversions are done by following whatever information is available in the Unicode tables (which includes a hell of a lot of transformations: uppercase, lowercase, titlecase (yes this is different in some languages), normalization, numeric value, directionality, etc...).

    So if you really wanted to, you could rewrite the toUpperCase() function with a 7bit ASCII lookup table and it would a lot faster than the built in function. However, you would lose all the benefits of using UTF-16 in the first place...

    Things that seem simple at first sight, are actually quite complicated when you enter the international domain... Java Character documentation

  • Anonymous (unregistered)

    This is utterly retarded. Sure, you can argue that some built in functions can be a little hard to find or are not named as well as they could be. But if you wanted to uppercase a string surely you would immediately look for a upper/toUpper/toUpperCase method on the String class. And what do you know, it's right there. Let's hope this developer never needs to do any calculations - there's no way he'd find a cryptically named class like "Math".

    If you understand English there is no excuse for moronic re-implementations like this.

  • (cs) in reply to Supermanlovespink
    Supermanlovespink:
    hmmm reminds me of the days when we used to check if the letters ascii value fell berwwen a certain range then added i think 64 to the value to get the uppercase... or was it the other way round...
    I assume you mean that all the other people who commented saying the exact same thing reminded you of that....

    EDIT: geez, I'm an angry young man today! Sorry everyone...

  • Some dude (unregistered)

    Hashmap anyone... if he wanted to code a wtf... wouldn't a hashmap make for a more elegant wtf?

  • null (unregistered)

    When will people learn how to use the else if statements? WTF!!!

  • (cs) in reply to ed
    ed:
    If we can place ourselves in this programmers chair and assume the following: - We can not find any builtin on String for converting case - We can not find reliable documentation on how a String is represented internally (which encoding)

    then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

    or ?

    Is it even possible to know how to program in Java and not know about the Java API?

  • <name value="My Name"> (unregistered)

    Why are 'a' through 'z' and 'A' through 'Z' all hard-coded?

    Why don't they use a data file in XML format to supply the data (as to what lower-case characters there are, and what lower-case character corresponds to what upper-case character)?

  • Old Coder (unregistered) in reply to Guybrush Threepwood
    Guybrush Threepwood:
    There's actually a ton of WTFs in there...
    1. Usage of string concatenation instead of a StringBuilder object
    2. Calling substr() up to 26 times per loop iteration
    3. Usage of else { if (..) {} } syntax
    4. The name of the function parameter (why call it "Account" when it could be used to convert any string?)
    5. Spelling the function parameter with a capital A
    6. Two statements on the same line when there is absolutely no need for it
    7. 2-char indentation
    8. The fact that somebody got paid for writing this mess

    Did I forget something?

    How about calling the unchanging Account.length() function once for every iteration of the loop? Do it once at the beginning instead.

  • Old Coder (unregistered) in reply to Old Coder
    Old Coder:
    Guybrush Threepwood:
    There's actually a ton of WTFs in there...
    1. Usage of string concatenation instead of a StringBuilder object
    2. Calling substr() up to 26 times per loop iteration
    3. Usage of else { if (..) {} } syntax
    4. The name of the function parameter (why call it "Account" when it could be used to convert any string?)
    5. Spelling the function parameter with a capital A
    6. Two statements on the same line when there is absolutely no need for it
    7. 2-char indentation
    8. The fact that somebody got paid for writing this mess

    Did I forget something?

    How about calling the unchanging Account.length() function once for every iteration of the loop? Do it once at the beginning instead.

    Damn, I'm slow. Also, pulling the character to be tested out of the array up to ~26 times per loop. Pull it once into a local variable at the top of the loop.

  • Mark (unregistered)

    The only flaw I see is that it does not order the if statements by the relative frequency we expect to see in the language:)

  • (cs) in reply to Co Der
    Co Der:
    Hey, at least it short circuits after discovering that the letter is, say, 'm' instead of continuing to test if 'n' .. 'z'.

    And, it is nicely indented too!

    Actually, that's very badly indented. It would be much easier to read if each "if" was at the same indentation level.

  • InsanityCubed (unregistered)

    That's terrible, it doesn't even work right (drops characters that are non-lowercase). Hopefully this is from a data structures course and the student received an F.

    Re: the xml lookup table... give it a break guys I know xml rears it's head in the majority of the wtf's round these parts but that's particularly asinine.

  • Anonymous (unregistered) in reply to SuperousOxide
    SuperousOxide:
    Co Der:
    Hey, at least it short circuits after discovering that the letter is, say, 'm' instead of continuing to test if 'n' .. 'z'.

    And, it is nicely indented too!

    Actually, that's very badly indented. It would be much easier to read if each "if" was at the same indentation level.

    I really hate the Java bracing style where the opening brace of a statement is on the same line as the condition. I've come to greatly favour the .NET style of putting it on the next line, then the code block is exactly that - a nice "block" of code that starts and ends at the same indentation level.

    if (someCondition) {
        // this is not great for readability
    }
    
    if (someCondition)
    {
        // this feels a bit better
    }
    
  • (cs) in reply to InsanityCubed
    InsanityCubed:
    That's terrible, it doesn't even work right (drops characters that are non-lowercase).
    I wonder why so many people believe that. Sure, it's not included in the snippet, but is there any reason to consider the absence of the final else clause
    else {
        result += Account.substring(i,i+1);
    }
    
    even likely?
  • Buddy (unregistered) in reply to Anonymous
    Anonymous:
    I really hate the Java bracing style where the opening brace of a statement is on the same line as the condition. I've come to greatly favour the .NET style of putting it on the next line, then the code block is exactly that - a nice "block" of code that starts and ends at the same indentation level.
    if (someCondition) {
        // this is not great for readability
    }
    

    if (someCondition) { // this feels a bit better }

    I like the second way better, too. The first was relevant in the old VT100 days when you tried to conserve screen height to see more of the functions.

  • (cs) in reply to Anonymous
    Anonymous:
    I really hate the Java bracing style where the opening brace of a statement is on the same line as the condition. I've come to greatly favour the .NET style of putting it on the next line, then the code block is exactly that - a nice "block" of code that starts and ends at the same indentation level.
    if (someCondition) {
        // this is not great for readability
    }
    

    if (someCondition) { // this feels a bit better }

    Actually, I find the first style better to read. But it's just a matter of what you're used to between these two styles.

    if (someCondition)
        {
        stuff();
        }
    

    on the other hand really hurts me.

  • Bob (unregistered) in reply to Supermanlovespink
    Supermanlovespink:
    hmmm reminds me of the days when we used to check if the letters ascii value fell berwwen a certain range then added i think 64 to the value to get the uppercase... or was it the other way round...

    It is subtract 32. Ascii value of capital A is 65. Ascii value of lowercase A is 97.

    97-65 is 32.

    So to convert lowercase to uppercase, you subtract 32 if it is a lowercase letter.

  • K (unregistered)

    In my travels I think I might have bumped into this programmer, except they did a custom IsNumeric checking that each char in the string was not a or not A .....

  • Buddy (unregistered) in reply to dhasenan
    dhasenan:
    The obvious way to write it is: auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; auto lower = "abcdefghijklmnopqrstuvwxyz"; auto result = ""; foreach (c; input) { if (contains (lower, c)) result ~= upper[find(lower, c)]; else result ~= c; }

    Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.

    Except those cases where the uppercase doesn't translate into one character. Not only the German SS but those single ligatures for ff, fl, ffl, fi, ffi, and st which translate to multiple characters as uppercase.

  • Herman (unregistered) in reply to Ilya Ehrenburg

    You're not going to tell me people actually do that? Indent the bracket itself?

  • InsanityCubed (unregistered) in reply to Ilya Ehrenburg

    Because it's not in the code snippet. One thing I have learned in 5 years as a salaried programmer is that assumptions / preconditions are rarely true and must be explicitly verified.

  • Ass Blanket (unregistered) in reply to brazzy
    brazzy:
    The biggest WTF in the implementation has not been pointed out yet: assembling the result String one character at a time using the + operator. That's one of the most well-known performance No-Nos in Java, since Strings are immutable, which makes this an O(n^2) operation.
    I believe Java works some magic and uses a StringBuffer behind the scenes for String concatenation, so it's not really that bad.
  • (cs) in reply to InsanityCubed
    InsanityCubed:
    Because it's not in the code snippet. One thing I have learned in 5 years as a salaried programmer is that assumptions / preconditions are rarely true and must be explicitly verified.
    Then why assume it is not present? I agree one can't assume it is present, but in my limited experience leaving it out isn't that kind of stupidity I would expect too often. I wouldn't estimate the probability of it missing as more than 15%.
  • C.K. (unregistered)

    I actually prefer the style of

    someFunction()    {    // some code    }

    since it keeps the braces on the same level as the code that they're holding.

    However if it's a short one-liner in the braces, I'll often use

    if (condition) {someFunction();}

    And yes I know that the semi-colon and the braces are redundant in 'C' style languages. I still use them.

  • WinformsC#SQLDevGirl (unregistered) in reply to hikari

    Core? probably not.

    Standard Libraries - most of them.

    The .Net ones are frickin' HUGE (and there are often many even there that do the same thing - e.g. the plethora of ways to convert an object/value of one data type to another).

  • (cs) in reply to Herman
    Herman:
    You're not going to tell me people actually do that? Indent the bracket itself?
    I'm afraid they do: http://en.wikipedia.org/wiki/Indent_style#Whitesmiths_style

    Addendum (2008-12-01 12:24): Oh, it took me a few minutes to find it again, but http://thedailywtf.com/Comments/Argument_About_Argument_Validation.aspx?pg=2#47352 started a nice flame war about indent styles, read and enjoy.

  • (cs) in reply to Buddy
    Buddy:
    Anonymous:
    I really hate the Java bracing style where the opening brace of a statement is on the same line as the condition. I've come to greatly favour the .NET style

    I like the second way better, too. The first was relevant in the old VT100 days when you tried to conserve screen height to see more of the functions.

    Yeh, there was a lot of Java programming going on in the old VT100 days, wasn't there? ;-)

    TRWTF is calling these 'java' and '.net' brace styles when they're a hell of a lot older than either. The first is usually called "K'n'R style", the second is apparently known as the "Allman" or "BSD" style (that's news to me too, I had to look it up).

    Oh, and the hideous style mentioned by Ilya in a follow-up is apparently called "Whitesmiths" style, although I also feel it should be called "urrghyuck" or some similarly guttural noise.

  • (cs) in reply to Zeal_
    Zeal_:
    With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC
    Uhh, no, it's not like a look-up table wouldn't work just because the chars A-Z don't occur in consecutive order.

    Actually, if you want to handle different charset encodings and everything, that's all the more reason to say that the only sane approach is to use the standard system libraries. Which handle all that stuff for you automatically.

  • SomeCoder (unregistered) in reply to Ilya Ehrenburg
    Ilya Ehrenburg:
    Anonymous:
    I really hate the Java bracing style where the opening brace of a statement is on the same line as the condition. I've come to greatly favour the .NET style of putting it on the next line, then the code block is exactly that - a nice "block" of code that starts and ends at the same indentation level.
    if (someCondition) {
        // this is not great for readability
    }
    

    if (someCondition) { // this feels a bit better }

    Actually, I find the first style better to read. But it's just a matter of what you're used to between these two styles.

    if (someCondition)
        {
        stuff();
        }
    

    on the other hand really hurts me.

    You may like the first style now, but wait until you inherit some code base that does this:

    if (lots of conditions that stretch clear across the freaking page and don't wrap around at all and just keep going and going and going and going) {
       some code that is also rather long and irritating;
    }
    

    It's hard to illustrate here, but basically that condition stretched off the page so the first { was NOT visible. Just a big blob of text is what it ended up looking like.

    I can see needing a lot of conditionals at times but for god's sake, try and make it readable!!

  • Anonymous (unregistered) in reply to DaveK
    DaveK:
    TRWTF is calling these 'java' and '.net' brace styles
    So the real WTF is that people use different terminology to you? No, I don't think that is a WTF at all. Making a comment about it, maybe.
  • (cs) in reply to Ass Blanket
    Ass Blanket:
    brazzy:
    The biggest WTF in the implementation has not been pointed out yet: assembling the result String one character at a time using the + operator. That's one of the most well-known performance No-Nos in Java, since Strings are immutable, which makes this an O(n^2) operation.
    I believe Java works some magic and uses a StringBuffer behind the scenes for String concatenation, so it's not really that bad.
    It can only do that for concatenation within a statement (where it wouldn't matter anyway), not for something that happens in a loop as in this case - I don't think it would be possible to ensure that semantic equivalence is preserved.
    Anonymous:
    I really hate the Java bracing style where the opening brace of a statement is on the same line as the condition. I've come to greatly favour the .NET style of putting it on the next line
    Both of these styles are far older than Java or .NET: http://en.wikipedia.org/wiki/Brace_style
    Old Coder:
    How about calling the unchanging Account.length() function once for every iteration of the loop? Do it once at the beginning instead.
    Useless premature optimization that pollutes the code with a n unnecessary variable that isn't, and which any half-decent JIT compiler will handle for you anyway.
  • Anonymous (unregistered) in reply to brazzy
    brazzy:
    Both of these styles are far older than Java or .NET: http://en.wikipedia.org/wiki/Brace_style
    Would someone else like to point out the bloody obvious or are we all done now?
  • (cs) in reply to SomeCoder
    SomeCoder:
    You may like the first style now, but wait until you inherit some code base that does this:
    if (lots of conditions that stretch clear across the freaking page and don't wrap around at all and just keep going and going and going and going) {
       some code that is also rather long and irritating;
    }
    

    It's hard to illustrate here, but basically that condition stretched off the page so the first { was NOT visible. Just a big blob of text is what it ended up looking like.

    I can see needing a lot of conditionals at times but for god's sake, try and make it readable!!

    Okay, but that would require drastic measures regardless of brace style.

    if (lots of conditions
            && (that stretch clear
                  || across the freaking
                  || page and don't wrap)
            && around at all and 
            && just keep going and 
            && going and going and going) {
    

    is bad enough, but I've yet to see a condition that required worse formatting.

  • Ben (unregistered) in reply to Dave G.
    Dave G.:
    ed:
    If we can place ourselves in this programmers chair and assume the following: - We can not find any builtin on String for converting case - We can not find reliable documentation on how a String is represented internally (which encoding)

    then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

    or ?

    If one truly had to implement their own method and could assume ASCII characters were being used, they would create a lookup table (array). The lookup table would be indexed by the integer value of the character to be converted to uppercase, and the contents of the lookup table at that position would be either an unmodified character for all characters which are not lowercase letters, or the uppercase character of the lowercase letter.

    Then you can simply do this:

    string[i] = uppercaseLookup[(int)string[i]];

    Characters which are numbers, punctuation etc remain the same, because uppercaseLookup[(int)'3'] = '3' and uppercaseLookup[(int)';'] = ';'. Lowercase letters are changed to uppercase letters because uppercaseLookup[(int)'a'] = 'A'.

    You would have a table of 256 characters to represent all ASCII characters. This uses a minimal amount of memory, operates with O(1) complexity, and is simple as pie to understand.

    If you wanted to be really spiffy, you could encapsulate it in a class, but if you have access to classes, you almost certainly have access to a built-in way of doing this.

    Here in the real world account numbers don't tend to have language specific latin characters in them.

    If your client wants to include ë or ß or Σ in an account number, they are [blank or blanks].

    Advise them against it.

    If they insist, insist on advising them against it (maybe refering them to this website).

    If they still insist, do it, take the money, run (leaving appropriate apology in the code for the poor soul who has to replace you).

  • (cs) in reply to DaveK
    DaveK:
    Oh, and the hideous style mentioned by Ilya in a follow-up is apparently called "Whitesmiths" style, although I also feel it should be called "urrghyuck" or some similarly guttural noise.
    What a fitting name.
  • (cs) in reply to Anonymous
    Anonymous:
    brazzy:
    Both of these styles are far older than Java or .NET: http://en.wikipedia.org/wiki/Brace_style
    Would someone else like to point out the bloody obvious or are we all done now?
    We're still a long way removed from the bloody flame war this topic demands. Or should we switch to Linux vs. Windows to get more action?
  • SomeCoder (unregistered) in reply to Ilya Ehrenburg
    Ilya Ehrenburg:
    SomeCoder:
    You may like the first style now, but wait until you inherit some code base that does this:
    if (lots of conditions that stretch clear across the freaking page and don't wrap around at all and just keep going and going and going and going) {
       some code that is also rather long and irritating;
    }
    

    It's hard to illustrate here, but basically that condition stretched off the page so the first { was NOT visible. Just a big blob of text is what it ended up looking like.

    I can see needing a lot of conditionals at times but for god's sake, try and make it readable!!

    Okay, but that would require drastic measures regardless of brace style.

    if (lots of conditions
            && (that stretch clear
                  || across the freaking
                  || page and don't wrap)
            && around at all and 
            && just keep going and 
            && going and going and going) {
    

    is bad enough, but I've yet to see a condition that required worse formatting.

    Yeah, I know but in this case, putting the brace on the next line would have done wonders for readability all by itself, nevermind the fact that all the conditions should have been separated out better.

    I've managed to show the coder who wrote that originally the light as far as formatting goes though :)

  • (cs) in reply to Anonymous
    Anonymous:
    DaveK:
    TRWTF is calling these 'java' and '.net' brace styles
    So the real WTF is that people use different terminology to you? No, I don't think that is a WTF at all. Making a comment about it, maybe.
    Perhaps English isn't your first language? In English, there is a semantic difference between "The Java brace style" and "The brace style used by Java". The first form has a posessive implication that the brace style in question "belongs to" Java, i.e. that it was originated by/for/in Java. So calling it "the Java brace style" is a bit like saying "In 1492, Christopher Columbus landed in G. W. Bush's America" just because he was the most recent one to run it when you were saying that.
  • (cs) in reply to brazzy
    brazzy:
    Anonymous:
    brazzy:
    Both of these styles are far older than Java or .NET: http://en.wikipedia.org/wiki/Brace_style
    Would someone else like to point out the bloody obvious or are we all done now?
    We're still a long way removed from the bloody flame war this topic demands. Or should we switch to Linux vs. Windows to get more action?
    's OK, I just mentioned politics.

    Flame war starting in 3 ... 2 ... 1 ...

  • Anonymous (unregistered) in reply to DaveK
    DaveK:
    Anonymous:
    DaveK:
    TRWTF is calling these 'java' and '.net' brace styles
    So the real WTF is that people use different terminology to you? No, I don't think that is a WTF at all. Making a comment about it, maybe.
    Perhaps English isn't your first language? In English, there is a semantic difference between "The Java brace style" and "The brace style used by Java". The first form has a posessive implication that the brace style in question "belongs to" Java, i.e. that it was originated by/for/in Java. So calling it "the Java brace style" is a bit like saying "In 1492, Christopher Columbus landed in G. W. Bush's America" just because he was the most recent one to run it when you were saying that.
    I'm just sighing heavily here Dave, wishing cancer upon your family. Kernel Normal Form is the single most common style of bracing used in Java. That means that referring to it as the "Java bracing style" is both accurate and proper English. What you've written about Bush doesn't even apply to this scenario. Now you'd better go check on your family because the cancer will be setting in.
  • (cs) in reply to Code Dependent
    Code Dependent:
    rt:
    What does your function look like?
    Well, back in my Assembler days, it would have been ANDA $DF. But that's for ascii only.
    And English only. And it's not a function. So, it doesn't actually meet the requirements at all. You total cowboy!

    Those requirements again:

    rt:
    write an uppercase transformation function? Come up with some approach.

    Now...

    Make sure it works with non-English characters (e.g. accented letters).

    Make sure it works with non-ascii (ANSI) string encodings (i.e. it works with UTF-xx, UCS-xx, etc).

    Optimize.

    What does your function look like?

    Mine looks like this(*):
    private static String upperCaseIt(String Account) {
      return Account.toUpperCase ();
    }
    
    Let the system libraries take care of that ugly business with locales and with translating charset encodings to/from UTF-/UCS-whatever on input and output - it's what they're there for. Anything else is a WTF composed of 50% NIH syndrome and 50% reinvented wheels.

    (*) - modulo any minor syntax errors, I don't actually speak Java but the intent should be clear enough.

  • (cs) in reply to brazzy
    brazzy:
    We're still a long way removed from the bloody flame war this topic demands. Or should we switch to Linux vs. Windows to get more action?
    No, too artificial. But we have a horrible Java snippet with all kinds of inefficiencies. So perhaps I can start the flames by saying that TRWTF is Java, Java performance sucks?
  • (cs) in reply to Anonymous
    Anonymous:
    I'm just sighing heavily here Dave, wishing cancer upon your family. Kernel Normal Form is the single most common style of bracing used in Java.
    I'm LOLing heavily, at you. I wish your family well. And KNF is an irrelevant red-herring, since it's not what was being referred to as "the Java brace style".
  • CynicalTyler (unregistered) in reply to Ilya Ehrenburg
    Ilya Ehrenburg:
    SomeCoder:
    You may like the first style now, but wait until you inherit some code base that does this:
    if (lots of conditions that stretch clear across the freaking page and don't wrap around at all and just keep going and going and going and going) {
       some code that is also rather long and irritating;
    }
    

    Okay, but that would require drastic measures regardless of brace style.

    if (lots of conditions
            && (that stretch clear
                  || across the freaking
                  || page and don't wrap)
            && around at all and 
            && just keep going and 
            && going and going and going) {
    

    is bad enough, but I've yet to see a condition that required worse formatting.

    Which is why if you have an if test that is that big you should make it its own method.

  • MindChild (unregistered) in reply to SImon

    Worse. Code formatting. Ever.

  • Michael B (unregistered) in reply to dhasenan
    dhasenan:
    The obvious way to write it is: auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; auto lower = "abcdefghijklmnopqrstuvwxyz"; auto result = ""; foreach (c; input) { if (contains (lower, c)) result ~= upper[find(lower, c)]; else result ~= c; }

    Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.

    let char_to_upper = function | 'A' -> 'a' | 'B' -> 'b' | 'C' -> 'c' ... | 'Y' -> 'y' | 'Z' -> 'z' | other -> other

    let string_to_upper s = String.map char_to_upper s

    Good: Very simple and readable, very fast, character-set agnostic. Bad: Wordy.

Leave a comment on “The Long Way toUpper”

Log In or post as a guest

Replying to comment #232036:

« Return to Article