The Daily WTF: Curious Perversions in Information Technology

2008-12-01 Reply Admin

Christ! When I use the as400 at college we don't even have lowercased letters.

2008-12-01 Reply Admin

Michael B:
let char_to_upper = function | 'A' -> 'a' | 'B' -> 'b' | 'C' -> 'c' ... | 'Y' -> 'y' | 'Z' -> 'z' | other -> other
let string_to_upper s = String.map char_to_upper s

Good: Very simple and readable, very fast, character-set agnostic. Bad: Wordy.

Haskell:

import Data.Char
upperCaseIt = map toUpper

JamesQMurphy · 2008-12-01 Reply Admin

This? I come back from Thanksgiving break and read about somebody implementing toUpper? What's tomorrow's WTF? A home-brewed bubblesort implementation?

I guess I'm grouchy.

2008-12-01 Reply Admin

Better like this, than creating an over-engineered solution like the following, which will need an Internet connection to work (remember, the Unicode database is not constant, but will get updated from time to time...):

public class UpperCaseConverter {
	
	private Map<Character,Character> conversion = new HashMap<Character, Character>();
	
	private static final String UNICODE_DATA_URL = "http://unicode.org/Public/UNIDATA/UnicodeData.txt";
	
	public UpperCaseConverter() throws IOException {
		BufferedReader br = new BufferedReader(new InputStreamReader(new URL(UNICODE_DATA_URL).openStream()));
		String lineBuffer;
		while ((lineBuffer = br.readLine()) != null) {
			String[] fields = lineBuffer.split(";");
			if (fields.length > 14 && fields[14].length() > 0) {
				conversion.put((char)Integer.parseInt(fields[0], 16), (char)Integer.parseInt(fields[14], 16));
			}
		}
		br.close();
	}
	
	public String convert(String s) {
		char[] chars = s.toCharArray();
		boolean changed = false;;
		for (int i = 0; i < chars.length; i++) {
			if (conversion.containsKey(chars[i])) {
				changed = true;
				chars[i] = conversion.get(chars[i]);
			}
		}
		return changed ? new String(chars) : s;
	}
	
	/** Test method */
	public static void main(String[] args) throws IOException {
		System.out.println(new UpperCaseConverter().convert("¡Hello, World; wé arë løvïñg þôu!"));
	}
}

Captcha: Not "lorem ipsum dolor sit", but "amet" :)

2008-12-01 Reply Admin

I'm working on a compiler for the IBM System/390. It uses EBCDIC, I want to die.

2008-12-01 Reply Admin

@ the people on about using "else { if (condition) { } else { if (etc..." instead of "elseif"

i lost a mark for writing this: if (condition) doSomething(); else if (otherCondition) doSomethingElse(); else doAThirdThing();

instead of: if (condition) doSomething(); else if (otherCondition) doSomethingElse(); else doAThirdThing();

does that class as a WTF?

2008-12-01 Reply Admin

hmm, it lost my indentation. ach/

2008-12-01 Reply Admin

Curious what complier would need to be written for that system?

2008-12-01 Reply Admin

rt:
That's... interesting. It really is though I think I find it interesting in a different way than you do.
Let's start in a different way: how would YOU write an uppercase transformation function? Come up with some approach.

Now...

Make sure it works with non-English characters (e.g. accented letters).

Make sure it works with non-ascii (ANSI) string encodings (i.e. it works with UTF-xx, UCS-xx, etc).

Optimize.

What does your function look like?

This is java, so non-ascii is a nonissue - chars are 16 bit. My first cut would go like this:

char[] str = string.toCharArray(); for(int i=0;i<str.length;i++) { Char upper = uppercase_map.get(str[i]); if (upper!=null) str[i] = upper.charValue(); } return new String(str);

uppercase_map is a map<Char,Char> that contains all the chars that change when uppercased; I'd optimize this to a map that works on primitive types and see how that helps things (if at all).

undrline · 2008-12-01 Reply Admin

Sub CaseLower()

' I can't believe I wrote this block of code. Copy/paste is faster than thinking. ' I'm sure there's a way to do this with LCASE(), ' but I can't think of how to do it just for the cells in the selection without looping through each cell's value, ' using the function, writing it back to the appropriate cell, and continuing through to the end of the selection.

Selection.Replace What:="A", Replacement:="a", LookAt:=xlPart, _
    SearchOrder:=xlByRows, MatchCase:=True
Selection.Replace What:="B", Replacement:="b", LookAt:=xlPart, _
    SearchOrder:=xlByRows, MatchCase:=True
Selection.Replace What:="C", Replacement:="c", LookAt:=xlPart, _
    SearchOrder:=xlByRows, MatchCase:=True

...

danixdefcon5 · 2008-12-01 Reply Admin

CaRL:
hikari:
If you reverse engineered the built-in method - assuming it just works on ASCII - you'd probably end up with something that looks at the value of each character and if it lays between 0x61 and 0x7a (inclusive), subtracts 0x20 from it (probably by doing an XOR with 0x20).
Given a sane character set like ASCII, that's exactly what you'd do.
If you have to support a bloody gawdawful mess of character sets like seems to be the fad lately (although we don't know that today's WTF actually did that), at least put the characters in two arrays, loop thru the first array, and use it to index the second. Yeah it might be a nanosecond slower, but a few quintillion nanoseconds easier to read, check for accuracy, and update.

I remember doing the same thing back in my C and x86 assembly days (it was lowercasing, though):

someString[i] |= 0x20

or

xor [dx],20h

This trick might work for Java, as it internally still uses ASCII, um... well the ISO-8859-1 encoding set, but at least it is good enough if you aren't using accents.

However, Java's .toUpperCase() and .toLowerCase() are a blessing.

2008-12-01 Reply Admin

If it was in python, that bad code would at least be forced to look better :)

2008-12-01 Reply Admin

rt:
That's... interesting. It really is though I think I find it interesting in a different way than you do.
Let's start in a different way: how would YOU write an uppercase transformation function? Come up with some approach.

Now...

Make sure it works with non-English characters (e.g. accented letters).

Make sure it works with non-ascii (ANSI) string encodings (i.e. it works with UTF-xx, UCS-xx, etc).

Optimize.

What does your function look like?

char srctable[]="abcdefghijklmnopqrstuvwxyzáàâäéèêëíìîïó..."; char dsttable[]="ABCDEFGHIJKLMNOPQRSTUVWXYZÁÀÂÄÉÈÊËÍÌÎÏÓ...";

for (i=0; i<len; i++) { tmp=strchr(srctable, s[i]); if (tmp) s[i]=dsttable[tmp-srctable]; }

There's probably a better way to handle it, but this comes to mind right away.

2008-12-01 Reply Admin

Ah, the days of pay-per-KLOC ... code like this is more valuable than code like { if ($mychr >= 'a' && $mychr <= 'z') $mychr ^= ('a' ^ 'A'); } (lazy and it optimizes out)

2008-12-01 Reply Admin

Franz Kafka:
rt:
That's... interesting. It really is though I think I find it interesting in a different way than you do.
Let's start in a different way: how would YOU write an uppercase transformation function? Come up with some approach.

Now...

Make sure it works with non-English characters (e.g. accented letters).

Make sure it works with non-ascii (ANSI) string encodings (i.e. it works with UTF-xx, UCS-xx, etc).

Optimize.

What does your function look like?

This is java, so non-ascii is a nonissue - chars are 16 bit. My first cut would go like this:

char[] str = string.toCharArray(); for(int i=0;i<str.length;i++) { Char upper = uppercase_map.get(str[i]); if (upper!=null) str[i] = upper.charValue(); } return new String(str);

uppercase_map is a map<Char,Char> that contains all the chars that change when uppercased; I'd optimize this to a map that works on primitive types and see how that helps things (if at all).

Looking at the other examples, I think I'll replace the map with an array of lower, upper pairs; if you want to uppercase a letter, do a binary search on the lower letter and, if found, replace it with the next array element. This is nice and compact and is amenable to reversal. Naturally, it doesn't work at all with the turkish weirdness mentioned. For that, I'd want a base map<char, String> and a locale map that overrides the base. This allows me to keep the logic in static structures and not in the code and is reasonably space efficient.

I wonder what the standard uppercase impl looks like.

2008-12-01 Reply Admin

Maybe I am wrong, but the tone of this reply was pretty snarky. How exactly would I use an input character as an index? You derive it from the input character's position in the first string.

The first string in your conversion function is the set of lower case letters, including those with umlauts and tildes. The second string has the upper case equivalents.

You start with a loop for your input string, and for the first character (which I will call the source character), you look it up in your first string and extract the character position. If the function returns a null (or you don't find it) then you simply copy the source character down to your output string. Otherwise, you use the character position to copy its replacement from the second string.

Repeat until done. As you see, the source character generates the index from its position in the first string. Call it an indirect index.

As for the "two tables" solution, take the above explanation and substitute first table for first string, etc. The logic holds. After all, a string serves as a table if you index it.

Looking back at this reply, I wonder if "rt" is the original programmer of this particular WTF. He treats the label "real programmer" as invective. When I started programming, you got 3 kwords per user in a multitasking environment that clocked at 50 kilohertz. And we were thankful!

<humor> Even the bits were too big to fit through a microchip! </humor>

But programmers knew the ASCII table cold and did bit masking and octal/hex in their heads. Today people have to consult a calculator or scribble on a pad to do it. There is nothing like knowing your system in and out- something that is flat impossible today.

2008-12-01 Reply Admin

Anonymous:
... surely you would immediately look for a upper/toUpper/toUpperCase method on the String class. And what do you know, it's right there. Let's hope this developer never needs to do any calculations - there's no way he'd find a cryptically named class like "Math". ...

ROFL! Best comment in a week!

2008-12-01 Reply Admin

The real WTF is people still thinking in terms of what character set their string is in. Modern languages provide Unicode string types, where one character in the string is one actual character. Once you have decent strings, a string is plain text and a character set (or "encoding") is only a mapping between byte arrays and plain text.

A string-uppercase algorithm should never need to know what your favorite encoding is. It'd be ridiculous to duplicate basic string algorithms across all encodings. String algorithms should be able to loop over the characters in the string without dealing with crap.

The real question is not what format your string is in. The real question is what locale you want to uppercase the string for. In Turkish, for example, "i" uppercases to "İ" and "ı" uppercases to "I". Programming isn't always easy, but you can make it much easier for yourself by using real strings.

2008-12-01 Reply Admin

private static String addOneToIt(int Number) {

if(Number == 0){ return 1; }else if(Number == 1){ return 2; }else if(Number == 2){ return 3; }else if(Number == 3){ return 4; }else if(Number == 4){ return 5; }else if(.... ... ... ... } }

mihi · 2008-12-01 Reply Admin

I wonder what the standard uppercase impl looks like.

Open Eclipse (or another good IDE), make sure you have configured it to use JDK libraries and not JRE libraries (so that you have debug information and source attachments), and do a Ctrl+Click on any toUpper() call.

Code Dependent · 2008-12-01 Reply Admin

Charles Shults:
But programmers knew the ASCII table cold and did bit masking and octal/hex in their heads.

Yep. Like I said earlier (as a voice crying in the bewilderness), just AND the character's ascii value with $DF. You didn't even have to check whether it was within the alphabetic range.

2008-12-01 Reply Admin

Just 'cos I wanna get in on the bracket stuff too....

consider (my preferred indenting):

if(condition)
{
  doMagic();
}

if we want to see what happens inside the block without forcing the condition to be true (when you are testing naturlich), you simply comment out the if and voila - the block gets executed

if, on the other hand, you use:

if(condition) {
  doMagic();
}

then commenting out the if is a little (only a little) more complex....

I think (in a similar vein to what someone else suggested) that the second style is also a bit less friendly on the eyes because it isn't immediately apparent whether both braces are there (and there's something nice about braces lining up nicely - especially over largish chunks of code....

Incedently, AFAIK, Emacs' standard indenting (press tab on any line in emacs and it will align that row to where it feels it should be based on the row above it....) involves indenting the brace a little (to half the indenting of the actual statement) - viz:

if(condition)
 {
  doMagic(); 
 }

2008-12-01 Reply Admin

Mr.'; Drop Database --:
A string-uppercase algorithm should never need to know what your favorite encoding is. It'd be ridiculous to duplicate basic string algorithms across all encodings. String algorithms should be able to loop over the characters in the string without dealing with crap.

I still hate locales.

Anyway, the only generic 'encoding' that is large enough to hold sufficient character set you can loop over without having to deal with any crap is UTF-32. I'm sure people would be overjoyed if UTF-32 was made obligatory everywhere, especially those who have to deal with gigabytes of string data, and the storage requirements just quadrupled. But that's small price to pay for having to deal with a tiny bit less crap.

Then again 90% of everything is crap and now there'd be four times more of it. Deal.

2008-12-01 Reply Admin

The real fail is using a variable name with an upper case first letter. That naming style is discouraged according to ParaSoft!

2008-12-01 Reply Admin

That is a WTF. I can tell you that your teacher's solution wouldn't pass a code review where I work, if that makes you feel better. Your post lost its indentation; I guess the second if/else was indented further than the first?

2008-12-01 Reply Admin

ed:
If we can place ourselves in this programmers chair and assume the following: - We can not find any builtin on String for converting case - We can not find reliable documentation on how a String is represented internally (which encoding)
then I think the main wtf (explicitly checking every legal character) is what most of us would resort to.

or ?

c'mon... no matter what encoding is internally used - you can at least count on the fact that the character codepoints for all lowercase and uppercase chars respectively are contiguous (if not, take the system in question and throw it away...) and then it's a small step to calculate the difference between 'a' and 'A' and use that to add or substract and check whether you have to do it or not - something along the lines of:

char lo = 'a', hi = 'A';
		
int num = 'z' - 'a' + 1;
int offs = hi - lo;

StringBuilder b = new StringBuilder();

for(int i = 0; i < s.length(); i++)
{
    int c = s.charAt(i);
    b.append((c >= lo && c < lo + num) ? (char)c : (char)(c - offs));
}
String result = b.toString();

No assumption is made about the code points, only that a-z and A-Z are contiguous - you can even turn around the hi and lo values:

char lo = 'A', hi = 'a';

and it still works, just converts into the other direction.

It gets more complicated for characters where upper and lower do not have a constant distance, but e.g. for German umlauts (ä,ö,ü) the assumption still holds (you'd have to adapt the range to cover those, of course, the above was just quickly hacked in without consulting a char table about the position of non-english chars).

But you still wouldn't handle such special cases like in today's WTF, but e.g. rather use the charcode to index into a pre-compiled conversion array - still much much better than hardcoding everything in explicit if-statements...

2008-12-01 Reply Admin

Zeal_:
With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC

The EBCDIC codepage layout on Wikipedia shows that every uppercase letter is +64 units above its lowercase counterpart. I chose A, J, & S as samples.

foreach letter in word do if letter > 192 and letter < 234 then letter = letter + 64 endif endfor

This is why assembler & machine architecture classes should still be taught.

2008-12-01 Reply Admin

Dave G.:
The good thing about this method is that he specifically wrote it to work on account strings. If the account string format ever changes, he has only one place to update the code, and all account strings will be fixed.
Furthermore, other strings which may need converting to upper case can be converted in their own separate methods, thereby increasing maintanability by reducing coupling. It would be a real mess if address strings and account strings were handled by the same code, for example. Changing one would inadvertently change the other.

The creation of many string instances is not a technique I use often, but it can be useful if you need to go back in time and get a string that was halfway through processing. This can be used to display updates to the user in a progress bar type display. A flag that activates this behvaiour should be added to this method to improve its usefulness. This is really a minor point though.

Hooray! Welcome back.

captcha: populus

2008-12-01 Reply Admin

amischiefr:
public String toUpperReinventTheWheel(String str) { String temp = ""; for(int i = 0; i < str.length; ++i) { if((int)str.charAt(i) >= 97 && (int)str.charAt(i) <= 122) temp += (char)(str.charAt(i) + 26); else temp += str.charAt(i); }
Might look like this...

fail! = Ç{âå!

2008-12-01 Reply Admin

Yes, it is that bad.

The compiler will use a StringBuilder (or if you're pre-Java 5, a StringBuffer), but it will create a new Builder/Buffer for every line that contains a concatenation. So you really are copying the string contents N times.

Instead, if you're going to be building up a string iteratively, you should be creating the Builder/Buffer outside the loop and explicitly using it.

2008-12-01 Reply Admin

Bobblehead Troll:
I still hate locales.
Anyway, the only generic 'encoding' that is large enough to hold sufficient character set you can loop over without having to deal with any crap is UTF-32. I'm sure people would be overjoyed if UTF-32 was made obligatory everywhere, especially those who have to deal with gigabytes of string data, and the storage requirements just quadrupled. But that's small price to pay for having to deal with a tiny bit less crap.

Most of your post argues against the idea that UTF-32 should be obligatory everywhere, which seems to be an idea that you raised yourself. What I suggest is for people to use Unicode strings as their preferred string type. This has been sufficient for everything I've needed. Your example of gigabytes of data is a silly one, since it wouldn't all be loaded in memory at once. Load a bit, convert it, save it. Performance suffers a little, like it does with most useful abstractions.

In the few situations where storing Unicode data in memory is impractical (such as the implentation of an SQL database), the programmers can go ahead and use byte arrays. This is because programmers use useful abstractions when they're practical, and don't use them when they're not.

Frankly, your claim that it's only a "tiny bit less crap" indicates that you have little understanding of issues that arise in modern programming. Your hatred of locales compounds that.

Leak · 2008-12-01 Reply Admin

MindChild:
Worse. Code formatting. Ever.

Well, it could have been even worst...

Iago · 2008-12-01 Reply Admin

Michael B:
let char_to_upper = function | 'A' -> 'a' | 'B' -> 'b' | 'C' -> 'c' ... | 'Y' -> 'y' | 'Z' -> 'z' | other -> other
let string_to_upper s = String.map char_to_upper s

Good: Very simple and readable, very fast, character-set agnostic. Bad: Wordy.

Brillant: function called "string_to_upper" converts string to lowercase.

2008-12-01 Reply Admin

Five minutes googleing would reveal to even non java programmers that:

Java strings are objects, part of the java.lang package (and hence the built-in methods are always going to be available).
The internal representantion is not ASCII, it's UTF-16.

http://java.sun.com/javase/6/docs/api/java/lang/String.html

DaveK · 2008-12-01 Reply Admin

Andrew:
Zeal_:
With Extended Binary Coded Decimal Interchange Code, this may be the only sane approach. http://en.wikipedia.org/wiki/EBCDIC

The EBCDIC codepage layout on Wikipedia shows that every uppercase letter is +64 units above its lowercase counterpart. I chose A, J, & S as samples.

foreach letter in word do if letter > 192 and letter < 234 then letter = letter + 64 endif endfor

Fail. You just munged all the soft-hyphens, close curly braces and back-slashes.

Andrew:
This is why assembler & machine architecture classes should still be taught.

Not to mention reading tables carefully and paying attention to detail!

DaveK · 2008-12-01 Reply Admin

MoffDub:
The real fail is using a variable name with an upper case first letter. That naming style is discouraged according to ParaSoft!

Then shouldn't it be

That naming style is discouraged according to paraSoft

... ?

2008-12-01 Reply Admin

<name value="My Name">:
Why don't they use a data file in XML format to supply the data (as to what lower-case characters there are, and what lower-case character corresponds to what upper-case character)?

Like this?

and then parse the XML and then parse the Element text!

2008-12-01 Reply Admin

Bracer:
Incedently, AFAIK, Emacs' standard indenting (press tab on any line in emacs and it will align that row to where it feels it should be based on the row above it....) involves indenting the brace a little (to half the indenting of the actual statement) - viz:
if(condition)
 {
  doMagic()
 }

That is GNU's gay reccomended style. If you write it like this:

if (condition) { doMagic() }

it works slightly better.

2008-12-01 Reply Admin

I like decision to declare this helper function private and static. I would love to hear the inner monologue on that decision.

2008-12-01 Reply Admin

Either the trolls are outnumbering the genuine posters or this site is read mainly by non-programmers. The code is obviously WTF, yet we have many posters defending it and geniuses offering their own solutions to this very tricky problem.

2008-12-01 Reply Admin

Given a sane character set like ASCII, that's exactly what you'd do.

I don't think so. The difference between A and a is 32. So, for character between a and z, just subtract 32 from them. Then you get the uppercase character.

Scarlet Manuka · 2008-12-01 Reply Admin

Code Dependent:
Charles Shults:
But programmers knew the ASCII table cold and did bit masking and octal/hex in their heads.
Yep. Like I said earlier (as a voice crying in the bewilderness), just AND the character's ascii value with $DF. You didn't even have to check whether it was within the alphabetic range.

Right, because uppercasing a string that contains numbers and punctuation marks really SHOULD fill it with random control characters instead. And [\]^ is the uppercase version of {|}~ too. Everybody knows that.

2008-12-01 Reply Admin

Ilya Ehrenburg:
Herman:
You're not going to tell me people actually do that? Indent the bracket itself?
I'm afraid they do: http://en.wikipedia.org/wiki/Indent_style#Whitesmiths_style
Addendum (2008-12-01 12:24): Oh, it took me a few minutes to find it again, but http://thedailywtf.com/Comments/Argument_About_Argument_Validation.aspx?pg=2#47352 started a nice flame war about indent styles, read and enjoy.

gah. (allman et. al. style) whitespace padding is for inferior token processors. like any real programmer, i use a modified 1TBS bracing, with brace-on-construct-line, and only significant [max(1), only when required] whitespace between keywords, operators, constants, variables, equations, etc.

And no concession to you peons that can't keep a piece of logic in your heads without WORD-WRAPPING(?!?!) it. Only Quiche-Eaters word-wrap in code.

I also delight in nested ternary statements. You whinging readability whores can go back to Ops, where you may or may not belong. But know that they have shiny manuals there.

Requisite sample:

void SuckIt(string whitespaceFAIL,object[] whiners){
    if(whiners==null||whiners.length==0)return;
    for(int i=0;i<TDWTF.inferiorMoronathons.length;i++){
        string tellEm=TDWTF.inferiorMoronathons[i].whiteSpacePreference;
        bool toStuffIt=tellEm==whitespaceFAIL;
        if(whiners.contains(TDWTF.inferiorMoronathons[i]))Wish.upon(TDWTF.inferiorMoronathons[i],tellEm!=null?toStuffIt?WishTypes.t3hH1V:WishTypes.Cancer:WishTypes.Chlamydia);
    }
}

</pre>
The only valid whitespace is a blank line between logical block operations at the same nest level. Get over it.

2008-12-01 Reply Admin

Trolls; flame-wars; a crap-load of simultaneous wtfs to mess up...

The genuine readers are simply having a hard time telling which way is up.

Zatanix · 2008-12-02 Reply Admin

SomeCoder:
Ilya Ehrenburg:
SomeCoder:
You may like the first style now, but wait until you inherit some code base that does this:
if (lots of conditions that stretch clear across the freaking page and don't wrap around at all and just keep going and going and going and going) {
   some code that is also rather long and irritating;
}
It's hard to illustrate here, but basically that condition stretched off the page so the first { was NOT visible. Just a big blob of text is what it ended up looking like.

I can see needing a lot of conditionals at times but for god's sake, try and make it readable!!
Okay, but that would require drastic measures regardless of brace style.
if (lots of conditions
        && (that stretch clear
              || across the freaking
              || page and don't wrap)
        && around at all and 
        && just keep going and 
        && going and going and going) {
is bad enough, but I've yet to see a condition that required worse formatting.
Yeah, I know but in this case, putting the brace on the next line would have done wonders for readability all by itself, nevermind the fact that all the conditions should have been separated out better.

I've managed to show the coder who wrote that originally the light as far as formatting goes though :)

You misunderstand how we, who prefer the bracing style with the brace at the same line as the statement it belongs to, reads code. We do not match closing brace with opening brace and then match opening brace with statement. We match the closing brace directly with the statement.

Having a brace on its own line is not superior or more readable (i would say less readable, but you would disagree). It is a simple matter of preference and/or what you are used to.

But this has been flamed about 372819732198 times before. Let the brace-wars (:Ð) end

2008-12-02 Reply Admin

You are the wind beneath my wings.

2008-12-02 Reply Admin

mihi:

Better like this, than creating an over-engineered solution like the following, which will need an Internet connection to work (remember, the Unicode database is not constant, but will get updated from time to time...):

public class UpperCaseConverter {
	
	private Map<Character,Character> conversion = new HashMap<Character, Character>();
	
	private static final String UNICODE_DATA_URL = "http://unicode.org/Public/UNIDATA/UnicodeData.txt";
	
	public UpperCaseConverter() throws IOException {
		BufferedReader br = new BufferedReader(new InputStreamReader(new URL(UNICODE_DATA_URL).openStream()));
		String lineBuffer;
		while ((lineBuffer = br.readLine()) != null) {
			String[] fields = lineBuffer.split(";");
			if (fields.length > 14 && fields[14].length() > 0) {
				conversion.put((char)Integer.parseInt(fields[0], 16), (char)Integer.parseInt(fields[14], 16));
			}
		}
		br.close();
	}
	
	public String convert(String s) {
		char[] chars = s.toCharArray();
		boolean changed = false;;
		for (int i = 0; i < chars.length; i++) {
			if (conversion.containsKey(chars[i])) {
				changed = true;
				chars[i] = conversion.get(chars[i]);
			}
		}
		return changed ? new String(chars) : s;
	}
	
	/** Test method */
	public static void main(String[] args) throws IOException {
		System.out.println(new UpperCaseConverter().convert("¡Hello, World; wé arë løvïñg þôu!"));
	}
}

Captcha: Not "lorem ipsum dolor sit", but "amet" :)

Oops. Clicked the wrong button.

You are the wind beneath my wings (reprise).

Captcha: wisi. Not to wisi to click "reply" instead of "quote"....

2008-12-02 Reply Admin

I seem to be the only one who hasn't optimized the original code. Shouldn't the tests be in order of common usage e, t, i, o, n, i, s... (watch Wheel of Fortune).

Of course the best way is (as mentioned before) to see if the character is alphabetic and in the lower case range, then add the value of 'A' - 'a'. This will work regardless of the character set (ASCII, or EBCDIC), since the difference is calculated at "compile time" which hopefully is the same character set used at "run time" (sometimes it isn't!). Of course this assumes that the difference is constant over the character range desired.

crome · 2008-12-02 Reply Admin

hey! I figured out what to code when one feels like not doing any work, yet still doing some piece of programming. lets write programs that generate suchlike source monsters. I already have an endless pool of ideas available: re-inventing core routines. should be simple.

2008-12-02 Reply Admin

A fast way would be to use a hash.

E.g., in python2.5:

low = "abcdefghijklmnopqrstuvwxyz" up = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

m = dict(zip(low, up))

def upper(astr): return "".join(m.get(s, s) for s in astr)

Note the actual function is only one line. Creating the hash took 3.

The Long Way toUpper

Leave a comment on “The Long Way toUpper”