Comment On The Long Way toUpper

Have you ever found yourself writing a function to do something that seems pretty simple, then months or years later you find out there's a built-in function to accomplish exactly what you were doing? Maybe you didn't know where to look or the built-in function's name was confusing. I'd argue that Java's toUpperCase() does not fall in this category. [expand full text]
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Re: The Long Way toUpper

2008-12-01 08:05 • by Code Dependent
Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.

He'd probably say, "Fist!"

Re: The Long Way toUpper

2008-12-01 08:06 • by captain obvious (unregistered)
What percentage of code WTF's are re-coding functions that already exist in the langauge's core?

Re: The Long Way toUpper

2008-12-01 08:07 • by Rick

Re: The Long Way toUpper

2008-12-01 08:12 • by hikari
231958 in reply to 231954
captain obvious:
What percentage of code WTF's are re-coding functions that already exist in the langauge's core?


Quite a high percentage, I should think. Maybe even a majority.

Re: The Long Way toUpper

2008-12-01 08:12 • by Kerio (unregistered)
231959 in reply to 231953
Code Dependent:
Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.

He'd probably say, "Fist!"


no, he would say "FIST"

Re: The Long Way toUpper

2008-12-01 08:15 • by Co Der (unregistered)
Hey, at least it short circuits after discovering that the letter is, say, 'm' instead of continuing to test if 'n' .. 'z'.

And, it is nicely indented too!

Re: The Long Way toUpper

2008-12-01 08:18 • by Bernie (unregistered)
Was the code really indented like that?

Re: The Long Way toUpper

2008-12-01 08:20 • by FDF (unregistered)
If I wanted to write such an abomination of a code, I'd write a script :)

Re: The Long Way toUpper

2008-12-01 08:22 • by rt (unregistered)
That's... interesting. It really is though I think I find it interesting in a different way than you do.

Let's start in a different way: how would YOU write an uppercase transformation function? Come up with some approach.

Now...

Make sure it works with non-English characters (e.g. accented letters).

Make sure it works with non-ascii (ANSI) string encodings (i.e. it works with UTF-xx, UCS-xx, etc).

Optimize.

What does your function look like?

Re: The Long Way toUpper

2008-12-01 08:22 • by hikari
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).

Re: The Long Way toUpper

2008-12-01 08:25 • by Grovesy (unregistered)
231967 in reply to 231966
hikari:
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).


Pretty big assumption, as most modern frameworks support all manner of encodings.

Re: The Long Way toUpper

2008-12-01 08:27 • by Dave G. (unregistered)
The good thing about this method is that he specifically wrote it to work on account strings. If the account string format ever changes, he has only one place to update the code, and all account strings will be fixed.

Furthermore, other strings which may need converting to upper case can be converted in their own separate methods, thereby increasing maintanability by reducing coupling. It would be a real mess if address strings and account strings were handled by the same code, for example. Changing one would inadvertently change the other.

The creation of many string instances is not a technique I use often, but it can be useful if you need to go back in time and get a string that was halfway through processing. This can be used to display updates to the user in a progress bar type display. A flag that activates this behvaiour should be added to this method to improve its usefulness. This is really a minor point though.

Re: The Long Way toUpper

2008-12-01 08:29 • by Guybrush Threepwood (unregistered)
There's actually a ton of WTFs in there...

1) Usage of string concatenation instead of a StringBuilder object
2) Calling substr() up to 26 times per loop iteration
3) Usage of else { if (..) {} } syntax
4) The name of the function parameter (why call it "Account" when it could be used to convert *any* string?)
5) Spelling the function parameter with a capital A
6) Two statements on the same line when there is absolutely no need for it
7) 2-char indentation
8) The fact that somebody got paid for writing this mess

Did I forget something?

Re: The Long Way toUpper

2008-12-01 08:30 • by Code Dependent
231971 in reply to 231965
rt:
What does your function look like?
Well, back in my Assembler days, it would have been ANDA $DF. But that's for ascii only.

Re: The Long Way toUpper

2008-12-01 08:31 • by Brian (unregistered)
Obviously this is pretty bad code to start with but doesn't this code also drop any upper case letters in the Account instead of copying them through? or is the .equals() method case-insensitive? or is there another set of .equals("A"),.equals("B")... that was snipped?

Re: The Long Way toUpper

2008-12-01 08:31 • by SlyEcho
231973 in reply to 231967
Grovesy:

Pretty big assumption, as most modern frameworks support all manner of encodings.


They only support encoding to / decoding from different encodings and Unicode. The string is represented internally in a Unicode format (usually UTF-16). Which makes it a lot easier.

Re: The Long Way toUpper

2008-12-01 08:34 • by CaRL (unregistered)
231975 in reply to 231966
hikari:
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).

Given a sane character set like ASCII, that's exactly what you'd do.

If you have to support a bloody gawdawful mess of character sets like seems to be the fad lately (although we don't know that today's WTF actually did that), at least put the characters in two arrays, loop thru the first array, and use it to index the second. Yeah it might be a nanosecond slower, but a few quintillion nanoseconds easier to read, check for accuracy, and update.

Re: The Long Way toUpper

2008-12-01 08:36 • by rt (unregistered)
231976 in reply to 231969
Guybrush Threepwood:

2) Calling substr() up to 26 times per loop iteration

I am scared to think HOW you came up with this "26"...
Guybrush Threepwood:

3) Usage of else { if (..) {} } syntax

What's wrong with that, in this specific case?
Guybrush Threepwood:

8) The fact that somebody got paid for writing this mess

Hmmmm...
Guybrush Threepwood:

Did I forget something?

Quite a lot, it seems.

Re: The Long Way toUpper

2008-12-01 08:37 • by ThePants999
I like the use of a while loop instead of a for loop, just to add insult to injury.

Re: The Long Way toUpper

2008-12-01 08:39 • by A.no.nymous (unregistered)
231978 in reply to 231976
[quote user="rt"]
[quote user="Guybrush Threepwood"]
3) Usage of else { if (..) {} } syntax[/quote]
What's wrong with that, in this specific case?
[/quote][/quote]

The extra levels of curly braces, maybe?

Captcha: eros

Re: The Long Way toUpper

2008-12-01 08:40 • by ed (unregistered)
If we can place ourselves in this programmers chair and assume the following:
- We can not find any builtin on String for converting case
- We can not find reliable documentation on how a String is represented internally (which encoding)

then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

or ?

Re: The Long Way toUpper

2008-12-01 08:43 • by Bob (unregistered)
I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode

Re: The Long Way toUpper

2008-12-01 08:43 • by Bob (unregistered)
I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode

Re: The Long Way toUpper

2008-12-01 08:44 • by joelkatz
Your build in 'ToUpper' function probably just uses a lookup table. This has the advantage that it works equally well for any (single-byte) character set.

Re: The Long Way toUpper

2008-12-01 08:50 • by Osno (unregistered)
That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.

Re: The Long Way toUpper

2008-12-01 08:51 • by Yaos
We had to do an assignment in high school that was converting letters to uppercase without using built in methods. This guy was either too lazy or stupid (or both) to find a chart for ASCII characters. Obviously it would never use Unicode, we'll never ship to a customer that needs Unicode support. *only ships to customers that use Unicode*

Re: The Long Way toUpper

2008-12-01 08:52 • by Dave G. (unregistered)
231986 in reply to 231979
ed:
If we can place ourselves in this programmers chair and assume the following:
- We can not find any builtin on String for converting case
- We can not find reliable documentation on how a String is represented internally (which encoding)

then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

or ?


If one truly had to implement their own method and could assume ASCII characters were being used, they would create a lookup table (array). The lookup table would be indexed by the integer value of the character to be converted to uppercase, and the contents of the lookup table at that position would be either an unmodified character for all characters which are not lowercase letters, or the uppercase character of the lowercase letter.

Then you can simply do this:

string[i] = uppercaseLookup[(int)string[i]];

Characters which are numbers, punctuation etc remain the same, because uppercaseLookup[(int)'3'] = '3' and uppercaseLookup[(int)';'] = ';'. Lowercase letters are changed to uppercase letters because uppercaseLookup[(int)'a'] = 'A'.

You would have a table of 256 characters to represent all ASCII characters. This uses a minimal amount of memory, operates with O(1) complexity, and is simple as pie to understand.

If you wanted to be really spiffy, you could encapsulate it in a class, but if you have access to classes, you almost certainly have access to a built-in way of doing this.

Re: The Long Way toUpper

2008-12-01 08:53 • by Charles Shults (unregistered)
Even if you had no idea of the ASCII table or Unicode, you could still do this in a way that it works for anything (and is expandable).

Create a string of all potentially convertable characters, create an alias string containing one-for-one what the new character should be, then your input character indexes to its conversion- and you can add more if you must. This works for any character set.

This is just a conversion table problem, isomorphic to a string pair. So what I see is that the programmer probably did not know the ASCII chart or the Unicode chart, did not know how to make a lookup table (or did not think that way), and did not know how to make a loop that would pick a character out of a lineup and swap it with its counterpart in another lineup (or did not think that way).

Most likely? This programmer does not think the way a real programmer should. Very common any more.

Re: The Long Way toUpper

2008-12-01 08:55 • by fmobus (unregistered)
231988 in reply to 231969
Guybrush Threepwood:
There's actually a ton of WTFs in there...

1) Usage of string concatenation instead of a StringBuilder object
2) Calling substr() up to 26 times per loop iteration
3) Usage of else { if (..) {} } syntax
4) The name of the function parameter (why call it "Account" when it could be used to convert *any* string?)
5) Spelling the function parameter with a capital A
6) Two statements on the same line when there is absolutely no need for it
7) 2-char indentation
8) The fact that somebody got paid for writing this mess

Did I forget something?


9) Use of substring() instead of charAt()
10) weird indentation
11) use of else { if {} } when elseif {} would be simpler

CaRL:
hikari:
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).

Given a sane character set like ASCII, that's exactly what you'd do.

If you have to support a bloody gawdawful mess of character sets like seems to be the fad lately (although we don't know that today's WTF actually did that), at least put the characters in two arrays, loop thru the first array, and use it to index the second. Yeah it might be a nanosecond slower, but a few quintillion nanoseconds easier to read, check for accuracy, and update.


Well, just go the whole hog and have a large array, where a pointer to the uppercase version of each character in the UTF-8 set is stored. If space is not an issue, go for it :)

(yes I know this is an insane solution... real-world implementations, I believe, do not have to cover the whole character set, rather just characters that actually have uppercase and lowercase representations)

Re: The Long Way toUpper

2008-12-01 08:58 • by Vollhorst (unregistered)
Since when does Java use ASCII (by default)?

Re: The Long Way toUpper

2008-12-01 08:59 • by rt (unregistered)
231992 in reply to 231984
Osno:
That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.

Interesting suggestion.

Re: The Long Way toUpper

2008-12-01 09:01 • by Jelly (unregistered)
This looks wrong.

Surely given "SomeAccount", it returns "OMECCOUNT"? It seems to skip all uppercase letters when building the result string.

It's a surprise given the clarity of code that noone else has picked up on this.

Re: The Long Way toUpper

2008-12-01 09:05 • by snoofle
231994 in reply to 231992
rt:
Osno:
That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.

Interesting suggestion.
I did it once in a misguided attempt at optimization (we were stretching to get some speed and were looking at anything and everything), but the maintainability headaches (think: readability) weren't worth it.

Re: The Long Way toUpper

2008-12-01 09:11 • by rt (unregistered)
231995 in reply to 231987
Charles Shults:
Create a string of all potentially convertable characters, create an alias string containing one-for-one what the new character should be, then your input character indexes to its conversion- and you can add more if you must. This works for any character set.

How EXACTLY would you use this input character as an index?

This is also a question to all people proposing the same idea disguised as "two tables".
Charles Shults:

This is just a conversion table problem, isomorphic to a string pair. So what I see is that the programmer probably did not know the ASCII chart or the Unicode chart, did not know how to make a lookup table (or did not think that way), and did not know how to make a loop that would pick a character out of a lineup and swap it with its counterpart in another lineup (or did not think that way).

I am afraid that "this programmer" might know quite a bit about character sets, encodings (fixed-length and prefix code) and the remaining framework functionality.
Charles Shults:

Most likely? This programmer does not think the way a real programmer should. Very common any more.

In this case, that's probably what saved the users from experiencing an utter failure.

Re: The Long Way toUpper

2008-12-01 09:12 • by Da' Man (unregistered)
231996 in reply to 231981
Bob:
I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode
Yeah! I want to see the Unicode version of toUpper in this style! See you next year :-)

Re: The Long Way toUpper

2008-12-01 09:19 • by Osno (unregistered)
This may work too:

static char *upper_string(char* n)
{
char *buffer = strdup(n);
int bufsize = strlen(n);

buffer += bufsize;
*--buffer = '\0';

do
{
*--buffer = *--n ^ 0x20;
bufsize--;
}
while (bufsize);

return buffer;
}

Re: The Long Way toUpper

2008-12-01 09:20 • by Mr. Fister (unregistered)
231999 in reply to 231959
Kerio:
Code Dependent:
Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.

He'd probably say, "Fist!"


no, he would say "FIST"


no, he would bend over a wooden table and somebody else would do the "FIST" and take a picture

Re: The Long Way toUpper

2008-12-01 09:22 • by SImon (unregistered)
Why reverse engineering the built-in method? At least the sun-java6 sdk is open source... so:


public String toUpperCase(Locale locale) {
if (locale == null) {
throw new NullPointerException();
}

int firstLower;

/* Now check if there are any characters that need to be changed. */
scan: {
for (firstLower = 0 ; firstLower < count; ) {
int c = (int)value[offset+firstLower];
int srcCount;
if ((c >= Character.MIN_HIGH_SURROGATE) &&
(c <= Character.MAX_HIGH_SURROGATE)) {
c = codePointAt(firstLower);
srcCount = Character.charCount(c);
} else {
srcCount = 1;
}
int upperCaseChar = Character.toUpperCaseEx(c);
if ((upperCaseChar == Character.ERROR) ||
(c != upperCaseChar)) {
break scan;
}
firstLower += srcCount;
}
return this;
}

char[] result = new char[count]; /* may grow */
int resultOffset = 0; /* result may grow, so i+resultOffset
* is the write location in result */

/* Just copy the first few upperCase characters. */
System.arraycopy(value, offset, result, 0, firstLower);

String lang = locale.getLanguage();
boolean localeDependent =
(lang == "tr" || lang == "az" || lang == "lt");
char[] upperCharArray;
int upperChar;
int srcChar;
int srcCount;
for (int i = firstLower; i < count; i += srcCount) {
srcChar = (int)value[offset+i];
if ((char)srcChar >= Character.MIN_HIGH_SURROGATE &&
(char)srcChar <= Character.MAX_HIGH_SURROGATE) {
srcChar = codePointAt(i);
srcCount = Character.charCount(srcChar);
} else {
srcCount = 1;
}
if (localeDependent) {
upperChar = ConditionalSpecialCasing.toUpperCaseEx(this, i, locale);
} else {
upperChar = Character.toUpperCaseEx(srcChar);
}
if ((upperChar == Character.ERROR) ||
(upperChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {
if (upperChar == Character.ERROR) {
if (localeDependent) {
upperCharArray =
ConditionalSpecialCasing.toUpperCaseCharArray(this, i, locale);
} else {
upperCharArray = Character.toUpperCaseCharArray(srcChar);
}
} else if (srcCount == 2) {
resultOffset += Character.toChars(upperChar, result, i + resultOffset) - srcCount;
continue;
} else {
upperCharArray = Character.toChars(upperChar);
}

/* Grow result if needed */
int mapLen = upperCharArray.length;
if (mapLen > srcCount) {
char[] result2 = new char[result.length + mapLen - srcCount];
System.arraycopy(result, 0, result2, 0,
i + resultOffset);
result = result2;
}
for (int x=0; x<mapLen; ++x) {
result[i+resultOffset+x] = upperCharArray[x];
}
resultOffset += (mapLen - srcCount);
} else {
result[i+resultOffset] = (char)upperChar;
}
}
return new String(0, count+resultOffset, result);
}

Re: The Long Way toUpper

2008-12-01 09:26 • by Addison (unregistered)
Phew! Today's WTF is actually a WTF. I foresee considerably less flaming over this post. Very safe.

Re: The Long Way toUpper

2008-12-01 09:27 • by Zeal_ (unregistered)
With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC

Re: The Long Way toUpper

2008-12-01 09:29 • by Herman (unregistered)
How in God's name is "2-Char indentation" a WTF?

Re: The Long Way toUpper

2008-12-01 09:30 • by Syrup (unregistered)
232006 in reply to 231993
Jelly:
This looks wrong.

Surely given "SomeAccount", it returns "OMECCOUNT"? It seems to skip all uppercase letters when building the result string.

It's a surprise given the clarity of code that noone else has picked up on this.


its the 'else' not shown at the farthest level of indentation.

Re: The Long Way toUpper

2008-12-01 09:32 • by akatherder
Wouldn't this be an infinite loop anyways? The case for the loop is:
while (Account.length() > i)

There's no indication, or obvious assumption of Account losing character(s) in every iteration.

Re: The Long Way toUpper

2008-12-01 09:34 • by campkev
232008 in reply to 231969
Guybrush Threepwood:

5) Spelling the function parameter with a capital A
6) Two statements on the same line when there is absolutely no need for it
7) 2-char indentation


sorry, but style preferences are not WTF's

Re: The Long Way toUpper

2008-12-01 09:34 • by campkev
232009 in reply to 232007
did you not see the i++ at the end?

Re: The Long Way toUpper

2008-12-01 09:38 • by Maxx Delusional
232010 in reply to 231969
Guybrush Threepwood:

5) Spelling the function parameter with a capital A


How is that in any way a WTF? I personally always capitalize my function parameters.

Re: The Long Way toUpper

2008-12-01 09:39 • by dhasenan (unregistered)
The obvious way to write it is:
auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
auto lower = "abcdefghijklmnopqrstuvwxyz";
auto result = "";
foreach (c; input)
{
if (contains (lower, c))
result ~= upper[find(lower, c)];
else
result ~= c;
}

Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.

Re: The Long Way toUpper

2008-12-01 09:45 • by Mate (unregistered)
232012 in reply to 232004
Zeal_:
With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC


A translation table would work with any character set. The program can switch between different encodings' tables, if necessary.

Re: The Long Way toUpper

2008-12-01 09:48 • by Voodoo Coder (unregistered)
232014 in reply to 232011
dhasenan:
The obvious way to write it is:
auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
auto lower = "abcdefghijklmnopqrstuvwxyz";
auto result = "";
foreach (c; input)
{
if (contains (lower, c))
result ~= upper[find(lower, c)];
else
result ~= c;
}

Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.


Yes, but shouldn't you store the values for the alphabet in an XML file?

Re: The Long Way toUpper

2008-12-01 09:53 • by amischiefr
232015 in reply to 232012
public String toUpperReinventTheWheel(String str) {
String temp = "";
for(int i = 0; i < str.length; ++i) {
if((int)str.charAt(i) >= 97 && (int)str.charAt(i) <= 122)
temp += (char)(str.charAt(i) + 26);
else
temp += str.charAt(i);
}

Might look like this...
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Add Comment