• (cs)

    Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.

    He'd probably say, "Fist!"

  • captain obvious (unregistered)

    What percentage of code WTF's are re-coding functions that already exist in the langauge's core?

  • (cs)
  • (cs) in reply to captain obvious
    captain obvious:
    What percentage of code WTF's are re-coding functions that already exist in the langauge's core?

    Quite a high percentage, I should think. Maybe even a majority.

  • Kerio (unregistered) in reply to Code Dependent
    Code Dependent:
    Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.

    He'd probably say, "Fist!"

    no, he would say "FIST"

  • Co Der (unregistered)

    Hey, at least it short circuits after discovering that the letter is, say, 'm' instead of continuing to test if 'n' .. 'z'.

    And, it is nicely indented too!

  • Bernie (unregistered)

    Was the code really indented like that?

  • FDF (unregistered)

    If I wanted to write such an abomination of a code, I'd write a script :)

  • rt (unregistered)

    That's... interesting. It really is though I think I find it interesting in a different way than you do.

    Let's start in a different way: how would YOU write an uppercase transformation function? Come up with some approach.

    Now...

    Make sure it works with non-English characters (e.g. accented letters).

    Make sure it works with non-ascii (ANSI) string encodings (i.e. it works with UTF-xx, UCS-xx, etc).

    Optimize.

    What does your function look like?

  • (cs)

    If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).

  • Grovesy (unregistered) in reply to hikari
    hikari:
    If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).

    Pretty big assumption, as most modern frameworks support all manner of encodings.

  • Dave G. (unregistered)

    The good thing about this method is that he specifically wrote it to work on account strings. If the account string format ever changes, he has only one place to update the code, and all account strings will be fixed.

    Furthermore, other strings which may need converting to upper case can be converted in their own separate methods, thereby increasing maintanability by reducing coupling. It would be a real mess if address strings and account strings were handled by the same code, for example. Changing one would inadvertently change the other.

    The creation of many string instances is not a technique I use often, but it can be useful if you need to go back in time and get a string that was halfway through processing. This can be used to display updates to the user in a progress bar type display. A flag that activates this behvaiour should be added to this method to improve its usefulness. This is really a minor point though.

  • Guybrush Threepwood (unregistered)

    There's actually a ton of WTFs in there...

    1. Usage of string concatenation instead of a StringBuilder object
    2. Calling substr() up to 26 times per loop iteration
    3. Usage of else { if (..) {} } syntax
    4. The name of the function parameter (why call it "Account" when it could be used to convert any string?)
    5. Spelling the function parameter with a capital A
    6. Two statements on the same line when there is absolutely no need for it
    7. 2-char indentation
    8. The fact that somebody got paid for writing this mess

    Did I forget something?

  • (cs) in reply to rt
    rt:
    What does your function look like?
    Well, back in my Assembler days, it would have been ANDA $DF. But that's for ascii only.
  • Brian (unregistered)

    Obviously this is pretty bad code to start with but doesn't this code also drop any upper case letters in the Account instead of copying them through? or is the .equals() method case-insensitive? or is there another set of .equals("A"),.equals("B")... that was snipped?

  • (cs) in reply to Grovesy
    Grovesy:
    Pretty big assumption, as most modern frameworks support all manner of encodings.

    They only support encoding to / decoding from different encodings and Unicode. The string is represented internally in a Unicode format (usually UTF-16). Which makes it a lot easier.

  • CaRL (unregistered) in reply to hikari
    hikari:
    If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).
    Given a sane character set like ASCII, that's exactly what you'd do.

    If you have to support a bloody gawdawful mess of character sets like seems to be the fad lately (although we don't know that today's WTF actually did that), at least put the characters in two arrays, loop thru the first array, and use it to index the second. Yeah it might be a nanosecond slower, but a few quintillion nanoseconds easier to read, check for accuracy, and update.

  • rt (unregistered) in reply to Guybrush Threepwood
    Guybrush Threepwood:
    2) Calling substr() up to 26 times per loop iteration
    I am scared to think HOW you came up with this "26"...
    Guybrush Threepwood:
    3) Usage of else { if (..) {} } syntax
    What's wrong with that, in this specific case?
    Guybrush Threepwood:
    8) The fact that somebody got paid for writing this mess
    Hmmmm...
    Guybrush Threepwood:
    Did I forget something?
    Quite a lot, it seems.
  • (cs)

    I like the use of a while loop instead of a for loop, just to add insult to injury.

  • A.no.nymous (unregistered) in reply to rt

    [quote user="rt"] [quote user="Guybrush Threepwood"] 3) Usage of else { if (..) {} } syntax[/quote] What's wrong with that, in this specific case? [/quote][/quote]

    The extra levels of curly braces, maybe?

    Captcha: eros

  • ed (unregistered)

    If we can place ourselves in this programmers chair and assume the following:

    • We can not find any builtin on String for converting case
    • We can not find reliable documentation on how a String is represented internally (which encoding)

    then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

    or ?

  • Bob (unregistered)

    I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode

  • Bob (unregistered)

    I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode

  • (cs)

    Your build in 'ToUpper' function probably just uses a lookup table. This has the advantage that it works equally well for any (single-byte) character set.

  • Osno (unregistered)

    That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.

  • (cs)

    We had to do an assignment in high school that was converting letters to uppercase without using built in methods. This guy was either too lazy or stupid (or both) to find a chart for ASCII characters. Obviously it would never use Unicode, we'll never ship to a customer that needs Unicode support. only ships to customers that use Unicode

  • Dave G. (unregistered) in reply to ed
    ed:
    If we can place ourselves in this programmers chair and assume the following: - We can not find any builtin on String for converting case - We can not find reliable documentation on how a String is represented internally (which encoding)

    then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

    or ?

    If one truly had to implement their own method and could assume ASCII characters were being used, they would create a lookup table (array). The lookup table would be indexed by the integer value of the character to be converted to uppercase, and the contents of the lookup table at that position would be either an unmodified character for all characters which are not lowercase letters, or the uppercase character of the lowercase letter.

    Then you can simply do this:

    string[i] = uppercaseLookup[(int)string[i]];

    Characters which are numbers, punctuation etc remain the same, because uppercaseLookup[(int)'3'] = '3' and uppercaseLookup[(int)';'] = ';'. Lowercase letters are changed to uppercase letters because uppercaseLookup[(int)'a'] = 'A'.

    You would have a table of 256 characters to represent all ASCII characters. This uses a minimal amount of memory, operates with O(1) complexity, and is simple as pie to understand.

    If you wanted to be really spiffy, you could encapsulate it in a class, but if you have access to classes, you almost certainly have access to a built-in way of doing this.

  • Charles Shults (unregistered)

    Even if you had no idea of the ASCII table or Unicode, you could still do this in a way that it works for anything (and is expandable).

    Create a string of all potentially convertable characters, create an alias string containing one-for-one what the new character should be, then your input character indexes to its conversion- and you can add more if you must. This works for any character set.

    This is just a conversion table problem, isomorphic to a string pair. So what I see is that the programmer probably did not know the ASCII chart or the Unicode chart, did not know how to make a lookup table (or did not think that way), and did not know how to make a loop that would pick a character out of a lineup and swap it with its counterpart in another lineup (or did not think that way).

    Most likely? This programmer does not think the way a real programmer should. Very common any more.

  • fmobus (unregistered) in reply to Guybrush Threepwood
    Guybrush Threepwood:
    There's actually a ton of WTFs in there...
    1. Usage of string concatenation instead of a StringBuilder object
    2. Calling substr() up to 26 times per loop iteration
    3. Usage of else { if (..) {} } syntax
    4. The name of the function parameter (why call it "Account" when it could be used to convert any string?)
    5. Spelling the function parameter with a capital A
    6. Two statements on the same line when there is absolutely no need for it
    7. 2-char indentation
    8. The fact that somebody got paid for writing this mess

    Did I forget something?

    1. Use of substring() instead of charAt()
    2. weird indentation
    3. use of else { if {} } when elseif {} would be simpler
    CaRL:
    hikari:
    If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).
    Given a sane character set like ASCII, that's exactly what you'd do.

    If you have to support a bloody gawdawful mess of character sets like seems to be the fad lately (although we don't know that today's WTF actually did that), at least put the characters in two arrays, loop thru the first array, and use it to index the second. Yeah it might be a nanosecond slower, but a few quintillion nanoseconds easier to read, check for accuracy, and update.

    Well, just go the whole hog and have a large array, where a pointer to the uppercase version of each character in the UTF-8 set is stored. If space is not an issue, go for it :)

    (yes I know this is an insane solution... real-world implementations, I believe, do not have to cover the whole character set, rather just characters that actually have uppercase and lowercase representations)

  • Vollhorst (unregistered)

    Since when does Java use ASCII (by default)?

  • rt (unregistered) in reply to Osno
    Osno:
    That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.
    Interesting suggestion.
  • Jelly (unregistered)

    This looks wrong.

    Surely given "SomeAccount", it returns "OMECCOUNT"? It seems to skip all uppercase letters when building the result string.

    It's a surprise given the clarity of code that noone else has picked up on this.

  • (cs) in reply to rt
    rt:
    Osno:
    That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.
    Interesting suggestion.
    I did it once in a misguided attempt at optimization (we were stretching to get some speed and were looking at anything and everything), but the maintainability headaches (think: readability) weren't worth it.
  • rt (unregistered) in reply to Charles Shults
    Charles Shults:
    Create a string of all potentially convertable characters, create an alias string containing one-for-one what the new character should be, then your input character indexes to its conversion- and you can add more if you must. This works for any character set.
    How EXACTLY would you use this input character as an index?

    This is also a question to all people proposing the same idea disguised as "two tables".

    Charles Shults:
    This is just a conversion table problem, isomorphic to a string pair. So what I see is that the programmer probably did not know the ASCII chart or the Unicode chart, did not know how to make a lookup table (or did not think that way), and did not know how to make a loop that would pick a character out of a lineup and swap it with its counterpart in another lineup (or did not think that way).
    I am afraid that "this programmer" might know quite a bit about character sets, encodings (fixed-length and prefix code) and the remaining framework functionality.
    Charles Shults:
    Most likely? This programmer does not think the way a real programmer should. Very common any more.
    In this case, that's probably what saved the users from experiencing an utter failure.
  • Da' Man (unregistered) in reply to Bob
    Bob:
    I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode
    Yeah! I want to see the Unicode version of toUpper in this style! See you next year :-)
  • Osno (unregistered)

    This may work too:

    static char *upper_string(char* n)
    {
        char *buffer = strdup(n);
        int bufsize = strlen(n);
    
        buffer += bufsize;
        *--buffer = '\0';
    
        do
        {
            *--buffer = *--n ^ 0x20;
    	bufsize--;
        }
        while (bufsize);
    
        return buffer;
    }
    
  • Mr. Fister (unregistered) in reply to Kerio
    Kerio:
    Code Dependent:
    Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.

    He'd probably say, "Fist!"

    no, he would say "FIST"

    no, he would bend over a wooden table and somebody else would do the "FIST" and take a picture

  • SImon (unregistered)

    Why reverse engineering the built-in method? At least the sun-java6 sdk is open source... so:

        public String toUpperCase(Locale locale) {
    	if (locale == null) {
    	    throw new NullPointerException();
            }
    
            int     firstLower;
    
    	/* Now check if there are any characters that need to be changed. */
    	scan: {
    	    for (firstLower = 0 ; firstLower < count; ) {
    		int c = (int)value[offset+firstLower];
    		int srcCount;
    		if ((c >= Character.MIN_HIGH_SURROGATE) &&
    		    (c <= Character.MAX_HIGH_SURROGATE)) {
    		    c = codePointAt(firstLower);
    		    srcCount = Character.charCount(c);
    		} else {
    		    srcCount = 1;
    		}
    		int upperCaseChar = Character.toUpperCaseEx(c);
    		if ((upperCaseChar == Character.ERROR) ||
    		    (c != upperCaseChar)) {
    		    break scan;
    		}
    		firstLower += srcCount;
    	    }
    	    return this;
    	}
    
            char[]  result       = new char[count]; /* may grow */
    	int     resultOffset = 0;  /* result may grow, so i+resultOffset
    				    * is the write location in result */
    
    	/* Just copy the first few upperCase characters. */
    	System.arraycopy(value, offset, result, 0, firstLower);
    
    	String lang = locale.getLanguage();
    	boolean localeDependent =
                (lang == "tr" || lang == "az" || lang == "lt");
            char[] upperCharArray;
            int upperChar;
            int srcChar;
            int srcCount;
            for (int i = firstLower; i < count; i += srcCount) {
    	    srcChar = (int)value[offset+i];
    	    if ((char)srcChar >= Character.MIN_HIGH_SURROGATE &&
    	        (char)srcChar <= Character.MAX_HIGH_SURROGATE) {
    		srcChar = codePointAt(i);
    		srcCount = Character.charCount(srcChar);
    	    } else {
    	        srcCount = 1;
    	    }
                if (localeDependent) {
                    upperChar = ConditionalSpecialCasing.toUpperCaseEx(this, i, locale);
                } else {
                    upperChar = Character.toUpperCaseEx(srcChar);
                }
                if ((upperChar == Character.ERROR) ||
                    (upperChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {
                    if (upperChar == Character.ERROR) {
                        if (localeDependent) {
                            upperCharArray =
                                ConditionalSpecialCasing.toUpperCaseCharArray(this, i, locale);
                        } else {
                            upperCharArray = Character.toUpperCaseCharArray(srcChar);
                        }
                    } else if (srcCount == 2) {
    		    resultOffset += Character.toChars(upperChar, result, i + resultOffset) - srcCount;
    		    continue;
                    } else {
                        upperCharArray = Character.toChars(upperChar);
    		}
    
                    /* Grow result if needed */
                    int mapLen = upperCharArray.length;
    		if (mapLen > srcCount) {
                        char[] result2 = new char[result.length + mapLen - srcCount];
                        System.arraycopy(result, 0, result2, 0,
                            i + resultOffset);
                        result = result2;
    		}
                    for (int x=0; x<mapLen; ++x) {
                        result[i+resultOffset+x] = upperCharArray[x];
                    }
                    resultOffset += (mapLen - srcCount);
                } else {
                    result[i+resultOffset] = (char)upperChar;
                }
            }
            return new String(0, count+resultOffset, result);
        }
    </pre>
    
  • Addison (unregistered)

    Phew! Today's WTF is actually a WTF. I foresee considerably less flaming over this post. Very safe.

  • Zeal_ (unregistered)

    With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC

  • Herman (unregistered)

    How in God's name is "2-Char indentation" a WTF?

  • Syrup (unregistered) in reply to Jelly
    Jelly:
    This looks wrong.

    Surely given "SomeAccount", it returns "OMECCOUNT"? It seems to skip all uppercase letters when building the result string.

    It's a surprise given the clarity of code that noone else has picked up on this.

    its the 'else' not shown at the farthest level of indentation.

  • (cs)

    Wouldn't this be an infinite loop anyways? The case for the loop is:
    while (Account.length() > i)

    There's no indication, or obvious assumption of Account losing character(s) in every iteration.

  • (cs) in reply to Guybrush Threepwood
    Guybrush Threepwood:
    5) Spelling the function parameter with a capital A 6) Two statements on the same line when there is absolutely no need for it 7) 2-char indentation

    sorry, but style preferences are not WTF's

  • (cs) in reply to akatherder

    did you not see the i++ at the end?

  • (cs) in reply to Guybrush Threepwood
    Guybrush Threepwood:
    5) Spelling the function parameter with a capital A

    How is that in any way a WTF? I personally always capitalize my function parameters.

  • dhasenan (unregistered)

    The obvious way to write it is: auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; auto lower = "abcdefghijklmnopqrstuvwxyz"; auto result = ""; foreach (c; input) { if (contains (lower, c)) result ~= upper[find(lower, c)]; else result ~= c; }

    Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.

  • Mate (unregistered) in reply to Zeal_
    Zeal_:
    With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC

    A translation table would work with any character set. The program can switch between different encodings' tables, if necessary.

  • Voodoo Coder (unregistered) in reply to dhasenan
    dhasenan:
    The obvious way to write it is: auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; auto lower = "abcdefghijklmnopqrstuvwxyz"; auto result = ""; foreach (c; input) { if (contains (lower, c)) result ~= upper[find(lower, c)]; else result ~= c; }

    Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.

    Yes, but shouldn't you store the values for the alphabet in an XML file?

  • (cs) in reply to Mate

    public String toUpperReinventTheWheel(String str) { String temp = ""; for(int i = 0; i < str.length; ++i) { if((int)str.charAt(i) >= 97 && (int)str.charAt(i) <= 122) temp += (char)(str.charAt(i) + 26); else temp += str.charAt(i); }

    Might look like this...

Leave a comment on “The Long Way toUpper”

Log In or post as a guest

Replying to comment #232007:

« Return to Article