- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Should the author of this ever come back to look at it again, I wonder whether he'd be ashamed of his handiwork, or proud.
He'd probably say, "Fist!"
Admin
What percentage of code WTF's are re-coding functions that already exist in the langauge's core?
Admin
Premature optimization is the root of all evil -- CarHoare
Admin
Quite a high percentage, I should think. Maybe even a majority.
Admin
no, he would say "FIST"
Admin
Hey, at least it short circuits after discovering that the letter is, say, 'm' instead of continuing to test if 'n' .. 'z'.
And, it is nicely indented too!
Admin
Was the code really indented like that?
Admin
If I wanted to write such an abomination of a code, I'd write a script :)
Admin
That's... interesting. It really is though I think I find it interesting in a different way than you do.
Let's start in a different way: how would YOU write an uppercase transformation function? Come up with some approach.
Now...
Make sure it works with non-English characters (e.g. accented letters).
Make sure it works with non-ascii (ANSI) string encodings (i.e. it works with UTF-xx, UCS-xx, etc).
Optimize.
What does your function look like?
Admin
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).
Admin
Pretty big assumption, as most modern frameworks support all manner of encodings.
Admin
The good thing about this method is that he specifically wrote it to work on account strings. If the account string format ever changes, he has only one place to update the code, and all account strings will be fixed.
Furthermore, other strings which may need converting to upper case can be converted in their own separate methods, thereby increasing maintanability by reducing coupling. It would be a real mess if address strings and account strings were handled by the same code, for example. Changing one would inadvertently change the other.
The creation of many string instances is not a technique I use often, but it can be useful if you need to go back in time and get a string that was halfway through processing. This can be used to display updates to the user in a progress bar type display. A flag that activates this behvaiour should be added to this method to improve its usefulness. This is really a minor point though.
Admin
There's actually a ton of WTFs in there...
Did I forget something?
Admin
Admin
Obviously this is pretty bad code to start with but doesn't this code also drop any upper case letters in the Account instead of copying them through? or is the .equals() method case-insensitive? or is there another set of .equals("A"),.equals("B")... that was snipped?
Admin
They only support encoding to / decoding from different encodings and Unicode. The string is represented internally in a Unicode format (usually UTF-16). Which makes it a lot easier.
Admin
If you have to support a bloody gawdawful mess of character sets like seems to be the fad lately (although we don't know that today's WTF actually did that), at least put the characters in two arrays, loop thru the first array, and use it to index the second. Yeah it might be a nanosecond slower, but a few quintillion nanoseconds easier to read, check for accuracy, and update.
Admin
Admin
I like the use of a while loop instead of a for loop, just to add insult to injury.
Admin
[quote user="rt"] [quote user="Guybrush Threepwood"] 3) Usage of else { if (..) {} } syntax[/quote] What's wrong with that, in this specific case? [/quote][/quote]
The extra levels of curly braces, maybe?
Captcha: eros
Admin
If we can place ourselves in this programmers chair and assume the following:
then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.
or ?
Admin
I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode
Admin
I don't see the WTF. I mean, you have to code it like that rather than using Ascii values if you want to support Unicode
Admin
Your build in 'ToUpper' function probably just uses a lookup table. This has the advantage that it works equally well for any (single-byte) character set.
Admin
That code must be horribly slow. A better approach would be to check for frequently used letters first (as in, first test for 'e', then for 'a', etc). On the downside, that would make the code language specific (english in this case), and a little harder to maintain.
Admin
We had to do an assignment in high school that was converting letters to uppercase without using built in methods. This guy was either too lazy or stupid (or both) to find a chart for ASCII characters. Obviously it would never use Unicode, we'll never ship to a customer that needs Unicode support. only ships to customers that use Unicode
Admin
If one truly had to implement their own method and could assume ASCII characters were being used, they would create a lookup table (array). The lookup table would be indexed by the integer value of the character to be converted to uppercase, and the contents of the lookup table at that position would be either an unmodified character for all characters which are not lowercase letters, or the uppercase character of the lowercase letter.
Then you can simply do this:
string[i] = uppercaseLookup[(int)string[i]];
Characters which are numbers, punctuation etc remain the same, because uppercaseLookup[(int)'3'] = '3' and uppercaseLookup[(int)';'] = ';'. Lowercase letters are changed to uppercase letters because uppercaseLookup[(int)'a'] = 'A'.
You would have a table of 256 characters to represent all ASCII characters. This uses a minimal amount of memory, operates with O(1) complexity, and is simple as pie to understand.
If you wanted to be really spiffy, you could encapsulate it in a class, but if you have access to classes, you almost certainly have access to a built-in way of doing this.
Admin
Even if you had no idea of the ASCII table or Unicode, you could still do this in a way that it works for anything (and is expandable).
Create a string of all potentially convertable characters, create an alias string containing one-for-one what the new character should be, then your input character indexes to its conversion- and you can add more if you must. This works for any character set.
This is just a conversion table problem, isomorphic to a string pair. So what I see is that the programmer probably did not know the ASCII chart or the Unicode chart, did not know how to make a lookup table (or did not think that way), and did not know how to make a loop that would pick a character out of a lineup and swap it with its counterpart in another lineup (or did not think that way).
Most likely? This programmer does not think the way a real programmer should. Very common any more.
Admin
Well, just go the whole hog and have a large array, where a pointer to the uppercase version of each character in the UTF-8 set is stored. If space is not an issue, go for it :)
(yes I know this is an insane solution... real-world implementations, I believe, do not have to cover the whole character set, rather just characters that actually have uppercase and lowercase representations)
Admin
Since when does Java use ASCII (by default)?
Admin
Admin
This looks wrong.
Surely given "SomeAccount", it returns "OMECCOUNT"? It seems to skip all uppercase letters when building the result string.
It's a surprise given the clarity of code that noone else has picked up on this.
Admin
Admin
This is also a question to all people proposing the same idea disguised as "two tables".
I am afraid that "this programmer" might know quite a bit about character sets, encodings (fixed-length and prefix code) and the remaining framework functionality. In this case, that's probably what saved the users from experiencing an utter failure.Admin
Admin
This may work too:
Admin
no, he would bend over a wooden table and somebody else would do the "FIST" and take a picture
Admin
Why reverse engineering the built-in method? At least the sun-java6 sdk is open source... so:
Admin
Phew! Today's WTF is actually a WTF. I foresee considerably less flaming over this post. Very safe.
Admin
With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC
Admin
How in God's name is "2-Char indentation" a WTF?
Admin
its the 'else' not shown at the farthest level of indentation.
Admin
Wouldn't this be an infinite loop anyways? The case for the loop is:
while (Account.length() > i)
There's no indication, or obvious assumption of Account losing character(s) in every iteration.
Admin
sorry, but style preferences are not WTF's
Admin
did you not see the i++ at the end?
Admin
How is that in any way a WTF? I personally always capitalize my function parameters.
Admin
The obvious way to write it is: auto upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; auto lower = "abcdefghijklmnopqrstuvwxyz"; auto result = ""; foreach (c; input) { if (contains (lower, c)) result ~= upper[find(lower, c)]; else result ~= c; }
Lookup tables like this can also support unicode, and it's a lot easier to go in reverse.
Admin
A translation table would work with any character set. The program can switch between different encodings' tables, if necessary.
Admin
Yes, but shouldn't you store the values for the alphabet in an XML file?
Admin
public String toUpperReinventTheWheel(String str) { String temp = ""; for(int i = 0; i < str.length; ++i) { if((int)str.charAt(i) >= 97 && (int)str.charAt(i) <= 122) temp += (char)(str.charAt(i) + 26); else temp += str.charAt(i); }
Might look like this...