The Daily WTF: Curious Perversions in Information Technology

Code Dependent · 2008-12-01 Reply Admin

Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.

He'd probably say, "Fist!"

2008-12-01 Reply Admin

What percentage of code WTF's are re-coding functions that already exist in the langauge's core?

Rick · 2008-12-01 Reply Admin

Premature optimization is the root of all evil -- CarHoare

hikari · 2008-12-01 Reply Admin

captain obvious:
What percentage of code WTF's are re-coding functions that already exist in the langauge's core?

Quite a high percentage, I should think. Maybe even a majority.

2008-12-01 Reply Admin

Code Dependent:
Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.
He'd probably say, "Fist!"

no, he would say "FIST"

2008-12-01 Reply Admin

Hey, at least it short circuits after discovering that the letter is, say, 'm' instead of continuing to test if 'n' .. 'z'.

And, it is nicely indented too!

2008-12-01 Reply Admin

Was the code really indented like that?

2008-12-01 Reply Admin

If I wanted to write such an abomination of a code, I'd write a script :)

2008-12-01 Reply Admin

That's... interesting. It really is though I think I find it interesting in a different way than you do.

Let's start in a different way: how would YOU write an uppercase transformation function? Come up with some approach.

Now...

Make sure it works with non-English characters (e.g. accented letters).

Make sure it works with non-ascii (ANSI) string encodings (i.e. it works with UTF-xx, UCS-xx, etc).

Optimize.

What does your function look like?

hikari · 2008-12-01 Reply Admin

If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).

2008-12-01 Reply Admin

hikari:
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).

Pretty big assumption, as most modern frameworks support all manner of encodings.

2008-12-01 Reply Admin

The good thing about this method is that he specifically wrote it to work on account strings. If the account string format ever changes, he has only one place to update the code, and all account strings will be fixed.

Furthermore, other strings which may need converting to upper case can be converted in their own separate methods, thereby increasing maintanability by reducing coupling. It would be a real mess if address strings and account strings were handled by the same code, for example. Changing one would inadvertently change the other.

The creation of many string instances is not a technique I use often, but it can be useful if you need to go back in time and get a string that was halfway through processing. This can be used to display updates to the user in a progress bar type display. A flag that activates this behvaiour should be added to this method to improve its usefulness. This is really a minor point though.

2008-12-01 Reply Admin

There's actually a ton of WTFs in there...

Usage of string concatenation instead of a StringBuilder object
Calling substr() up to 26 times per loop iteration
Usage of else { if (..) {} } syntax
The name of the function parameter (why call it "Account" when it could be used to convert any string?)
Spelling the function parameter with a capital A
Two statements on the same line when there is absolutely no need for it
2-char indentation
The fact that somebody got paid for writing this mess

Did I forget something?

Code Dependent · 2008-12-01 Reply Admin

rt:
What does your function look like?

Well, back in my Assembler days, it would have been ANDA $DF. But that's for ascii only.

2008-12-01 Reply Admin

Obviously this is pretty bad code to start with but doesn't this code also drop any upper case letters in the Account instead of copying them through? or is the .equals() method case-insensitive? or is there another set of .equals("A"),.equals("B")... that was snipped?

SlyEcho · 2008-12-01 Reply Admin

Grovesy:
Pretty big assumption, as most modern frameworks support all manner of encodings.

They only support encoding to / decoding from different encodings and Unicode. The string is represented internally in a Unicode format (usually UTF-16). Which makes it a lot easier.

2008-12-01 Reply Admin

hikari:
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).

Given a sane character set like ASCII, that's exactly what you'd do.

If you have to support a bloody gawdawful mess of character sets like seems to be the fad lately (although we don't know that today's WTF actually did that), at least put the characters in two arrays, loop thru the first array, and use it to index the second. Yeah it might be a nanosecond slower, but a few quintillion nanoseconds easier to read, check for accuracy, and update.

2008-12-01 Reply Admin

Guybrush Threepwood:
2) Calling substr() up to 26 times per loop iteration

I am scared to think HOW you came up with this "26"...

Guybrush Threepwood:
3) Usage of else { if (..) {} } syntax

What's wrong with that, in this specific case?

Guybrush Threepwood:
8) The fact that somebody got paid for writing this mess

Hmmmm...

Guybrush Threepwood:
Did I forget something?

Quite a lot, it seems.

ThePants999 · 2008-12-01 Reply Admin

I like the use of a while loop instead of a for loop, just to add insult to injury.

2008-12-01 Reply Admin

[quote user="rt"] [quote user="Guybrush Threepwood"] 3) Usage of else { if (..) {} } syntax[/quote] What's wrong with that, in this specific case? [/quote][/quote]

The extra levels of curly braces, maybe?

Captcha: eros

2008-12-01 Reply Admin

If we can place ourselves in this programmers chair and assume the following:

We can not find any builtin on String for converting case
We can not find reliable documentation on how a String is represented internally (which encoding)

then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

or ?

2008-12-01 Reply Admin

I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode

2008-12-01 Reply Admin

I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode

joelkatz · 2008-12-01 Reply Admin

Your build in 'ToUpper' function probably just uses a lookup table. This has the advantage that it works equally well for any (single-byte) character set.

2008-12-01 Reply Admin

That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.

Yaos · 2008-12-01 Reply Admin

We had to do an assignment in high school that was converting letters to uppercase without using built in methods. This guy was either too lazy or stupid (or both) to find a chart for ASCII characters. Obviously it would never use Unicode, we'll never ship to a customer that needs Unicode support. only ships to customers that use Unicode

2008-12-01 Reply Admin

ed:
If we can place ourselves in this programmers chair and assume the following: - We can not find any builtin on String for converting case - We can not find reliable documentation on how a String is represented internally (which encoding)
then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

or ?

If one truly had to implement their own method and could assume ASCII characters were being used, they would create a lookup table (array). The lookup table would be indexed by the integer value of the character to be converted to uppercase, and the contents of the lookup table at that position would be either an unmodified character for all characters which are not lowercase letters, or the uppercase character of the lowercase letter.

Then you can simply do this:

string[i] = uppercaseLookup[(int)string[i]];

Characters which are numbers, punctuation etc remain the same, because uppercaseLookup[(int)'3'] = '3' and uppercaseLookup[(int)';'] = ';'. Lowercase letters are changed to uppercase letters because uppercaseLookup[(int)'a'] = 'A'.

You would have a table of 256 characters to represent all ASCII characters. This uses a minimal amount of memory, operates with O(1) complexity, and is simple as pie to understand.

If you wanted to be really spiffy, you could encapsulate it in a class, but if you have access to classes, you almost certainly have access to a built-in way of doing this.

2008-12-01 Reply Admin

Even if you had no idea of the ASCII table or Unicode, you could still do this in a way that it works for anything (and is expandable).

Create a string of all potentially convertable characters, create an alias string containing one-for-one what the new character should be, then your input character indexes to its conversion- and you can add more if you must. This works for any character set.

This is just a conversion table problem, isomorphic to a string pair. So what I see is that the programmer probably did not know the ASCII chart or the Unicode chart, did not know how to make a lookup table (or did not think that way), and did not know how to make a loop that would pick a character out of a lineup and swap it with its counterpart in another lineup (or did not think that way).

Most likely? This programmer does not think the way a real programmer should. Very common any more.

2008-12-01 Reply Admin

Guybrush Threepwood:
There's actually a ton of WTFs in there...

Usage of string concatenation instead of a StringBuilder object

Calling substr() up to 26 times per loop iteration

Usage of else { if (..) {} } syntax

The name of the function parameter (why call it "Account" when it could be used to convert any string?)

Spelling the function parameter with a capital A

Two statements on the same line when there is absolutely no need for it

2-char indentation

The fact that somebody got paid for writing this mess

Did I forget something?

Use of substring() instead of charAt()
weird indentation
use of else { if {} } when elseif {} would be simpler

CaRL:
hikari:
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).
Given a sane character set like ASCII, that's exactly what you'd do.
If you have to support a bloody gawdawful mess of character sets like seems to be the fad lately (although we don't know that today's WTF actually did that), at least put the characters in two arrays, loop thru the first array, and use it to index the second. Yeah it might be a nanosecond slower, but a few quintillion nanoseconds easier to read, check for accuracy, and update.

Well, just go the whole hog and have a large array, where a pointer to the uppercase version of each character in the UTF-8 set is stored. If space is not an issue, go for it :)

(yes I know this is an insane solution... real-world implementations, I believe, do not have to cover the whole character set, rather just characters that actually have uppercase and lowercase representations)

2008-12-01 Reply Admin

Since when does Java use ASCII (by default)?

2008-12-01 Reply Admin

Osno:
That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.

Interesting suggestion.

2008-12-01 Reply Admin

This looks wrong.

Surely given "SomeAccount", it returns "OMECCOUNT"? It seems to skip all uppercase letters when building the result string.

It's a surprise given the clarity of code that noone else has picked up on this.

snoofle · 2008-12-01 Reply Admin

rt:
Osno:
That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.
Interesting suggestion.

I did it once in a misguided attempt at optimization (we were stretching to get some speed and were looking at anything and everything), but the maintainability headaches (think: readability) weren't worth it.

2008-12-01 Reply Admin

Charles Shults:
Create a string of all potentially convertable characters, create an alias string containing one-for-one what the new character should be, then your input character indexes to its conversion- and you can add more if you must. This works for any character set.

How EXACTLY would you use this input character as an index?

This is also a question to all people proposing the same idea disguised as "two tables".

Charles Shults:
This is just a conversion table problem, isomorphic to a string pair. So what I see is that the programmer probably did not know the ASCII chart or the Unicode chart, did not know how to make a lookup table (or did not think that way), and did not know how to make a loop that would pick a character out of a lineup and swap it with its counterpart in another lineup (or did not think that way).

I am afraid that "this programmer" might know quite a bit about character sets, encodings (fixed-length and prefix code) and the remaining framework functionality.

Charles Shults:
Most likely? This programmer does not think the way a real programmer should. Very common any more.

In this case, that's probably what saved the users from experiencing an utter failure.

2008-12-01 Reply Admin

Bob:
I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode

Yeah! I want to see the Unicode version of toUpper in this style! See you next year :-)

2008-12-01 Reply Admin

This may work too:

static char *upper_string(char* n)
{
    char *buffer = strdup(n);
    int bufsize = strlen(n);

    buffer += bufsize;
    *--buffer = '\0';

    do
    {
        *--buffer = *--n ^ 0x20;
	bufsize--;
    }
    while (bufsize);

    return buffer;
}

2008-12-01 Reply Admin

Kerio:
Code Dependent:
Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.
He'd probably say, "Fist!"

no, he would say "FIST"

no, he would bend over a wooden table and somebody else would do the "FIST" and take a picture

2008-12-01 Reply Admin

Why reverse engineering the built-in method? At least the sun-java6 sdk is open source... so:

    public String toUpperCase(Locale locale) {
	if (locale == null) {
	    throw new NullPointerException();
        }

        int     firstLower;

	/* Now check if there are any characters that need to be changed. */
	scan: {
	    for (firstLower = 0 ; firstLower < count; ) {
		int c = (int)value[offset+firstLower];
		int srcCount;
		if ((c >= Character.MIN_HIGH_SURROGATE) &&
		    (c <= Character.MAX_HIGH_SURROGATE)) {
		    c = codePointAt(firstLower);
		    srcCount = Character.charCount(c);
		} else {
		    srcCount = 1;
		}
		int upperCaseChar = Character.toUpperCaseEx(c);
		if ((upperCaseChar == Character.ERROR) ||
		    (c != upperCaseChar)) {
		    break scan;
		}
		firstLower += srcCount;
	    }
	    return this;
	}

        char[]  result       = new char[count]; /* may grow */
	int     resultOffset = 0;  /* result may grow, so i+resultOffset
				    * is the write location in result */

	/* Just copy the first few upperCase characters. */
	System.arraycopy(value, offset, result, 0, firstLower);

	String lang = locale.getLanguage();
	boolean localeDependent =
            (lang == "tr" || lang == "az" || lang == "lt");
        char[] upperCharArray;
        int upperChar;
        int srcChar;
        int srcCount;
        for (int i = firstLower; i < count; i += srcCount) {
	    srcChar = (int)value[offset+i];
	    if ((char)srcChar >= Character.MIN_HIGH_SURROGATE &&
	        (char)srcChar <= Character.MAX_HIGH_SURROGATE) {
		srcChar = codePointAt(i);
		srcCount = Character.charCount(srcChar);
	    } else {
	        srcCount = 1;
	    }
            if (localeDependent) {
                upperChar = ConditionalSpecialCasing.toUpperCaseEx(this, i, locale);
            } else {
                upperChar = Character.toUpperCaseEx(srcChar);
            }
            if ((upperChar == Character.ERROR) ||
                (upperChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {
                if (upperChar == Character.ERROR) {
                    if (localeDependent) {
                        upperCharArray =
                            ConditionalSpecialCasing.toUpperCaseCharArray(this, i, locale);
                    } else {
                        upperCharArray = Character.toUpperCaseCharArray(srcChar);
                    }
                } else if (srcCount == 2) {
		    resultOffset += Character.toChars(upperChar, result, i + resultOffset) - srcCount;
		    continue;
                } else {
                    upperCharArray = Character.toChars(upperChar);
		}

                /* Grow result if needed */
                int mapLen = upperCharArray.length;
		if (mapLen > srcCount) {
                    char[] result2 = new char[result.length + mapLen - srcCount];
                    System.arraycopy(result, 0, result2, 0,
                        i + resultOffset);
                    result = result2;
		}
                for (int x=0; x<mapLen; ++x) {
                    result[i+resultOffset+x] = upperCharArray[x];
                }
                resultOffset += (mapLen - srcCount);
            } else {
                result[i+resultOffset] = (char)upperChar;
            }
        }
        return new String(0, count+resultOffset, result);
    }
</pre>

2008-12-01 Reply Admin

Phew! Today's WTF is actually a WTF. I foresee considerably less flaming over this post. Very safe.

2008-12-01 Reply Admin

With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC

2008-12-01 Reply Admin

How in God's name is "2-Char indentation" a WTF?

2008-12-01 Reply Admin

Jelly:
This looks wrong.
Surely given "SomeAccount", it returns "OMECCOUNT"? It seems to skip all uppercase letters when building the result string.

It's a surprise given the clarity of code that noone else has picked up on this.

its the 'else' not shown at the farthest level of indentation.

akatherder · 2008-12-01 Reply Admin

Wouldn't this be an infinite loop anyways? The case for the loop is:
while (Account.length() > i)

There's no indication, or obvious assumption of Account losing character(s) in every iteration.

campkev · 2008-12-01 Reply Admin

Guybrush Threepwood:
5) Spelling the function parameter with a capital A 6) Two statements on the same line when there is absolutely no need for it 7) 2-char indentation

sorry, but style preferences are not WTF's

campkev · 2008-12-01 Reply Admin

did you not see the i++ at the end?

Maxx Delusional · 2008-12-01 Reply Admin

Guybrush Threepwood:
5) Spelling the function parameter with a capital A

How is that in any way a WTF? I personally always capitalize my function parameters.

2008-12-01 Reply Admin

The obvious way to write it is: auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; auto lower = "abcdefghijklmnopqrstuvwxyz"; auto result = ""; foreach (c; input) { if (contains (lower, c)) result ~= upper[find(lower, c)]; else result ~= c; }

Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.

2008-12-01 Reply Admin

Zeal_:
With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC

A translation table would work with any character set. The program can switch between different encodings' tables, if necessary.

2008-12-01 Reply Admin

dhasenan:
The obvious way to write it is: auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; auto lower = "abcdefghijklmnopqrstuvwxyz"; auto result = ""; foreach (c; input) { if (contains (lower, c)) result ~= upper[find(lower, c)]; else result ~= c; }
Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.

Yes, but shouldn't you store the values for the alphabet in an XML file?

amischiefr · 2008-12-01 Reply Admin

public String toUpperReinventTheWheel(String str) { String temp = ""; for(int i = 0; i < str.length; ++i) { if((int)str.charAt(i) >= 97 && (int)str.charAt(i) <= 122) temp += (char)(str.charAt(i) + 26); else temp += str.charAt(i); }

Might look like this...

The Long Way toUpper

Leave a comment on “The Long Way toUpper”