- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Christ! When I use the as400 at college we don't even have lowercased letters.
Admin
Admin
This? I come back from Thanksgiving break and read about somebody implementing toUpper? What's tomorrow's WTF? A home-brewed bubblesort implementation?
I guess I'm grouchy.
Admin
Better like this, than creating an over-engineered solution like the following, which will need an Internet connection to work (remember, the Unicode database is not constant, but will get updated from time to time...):
Captcha: Not "lorem ipsum dolor sit", but "amet" :)
Admin
I'm working on a compiler for the IBM System/390. It uses EBCDIC, I want to die.
Admin
@ the people on about using "else { if (condition) { } else { if (etc..." instead of "elseif"
i lost a mark for writing this: if (condition) doSomething(); else if (otherCondition) doSomethingElse(); else doAThirdThing();
instead of: if (condition) doSomething(); else if (otherCondition) doSomethingElse(); else doAThirdThing();
does that class as a WTF?
Admin
hmm, it lost my indentation. ach/
Admin
Curious what complier would need to be written for that system?
Admin
This is java, so non-ascii is a nonissue - chars are 16 bit. My first cut would go like this:
char[] str = string.toCharArray(); for(int i=0;i<str.length;i++) { Char upper = uppercase_map.get(str[i]); if (upper!=null) str[i] = upper.charValue(); } return new String(str);
uppercase_map is a map<Char,Char> that contains all the chars that change when uppercased; I'd optimize this to a map that works on primitive types and see how that helps things (if at all).
Admin
Sub CaseLower()
' I can't believe I wrote this block of code. Copy/paste is faster than thinking. ' I'm sure there's a way to do this with LCASE(), ' but I can't think of how to do it just for the cells in the selection without looping through each cell's value, ' using the function, writing it back to the appropriate cell, and continuing through to the end of the selection.
...
Admin
someString[i] |= 0x20
or
xor [dx],20h
This trick might work for Java, as it internally still uses ASCII, um... well the ISO-8859-1 encoding set, but at least it is good enough if you aren't using accents.
However, Java's .toUpperCase() and .toLowerCase() are a blessing.
Admin
If it was in python, that bad code would at least be forced to look better :)
Admin
char srctable[]="abcdefghijklmnopqrstuvwxyzáàâäéèêëíìîïó..."; char dsttable[]="ABCDEFGHIJKLMNOPQRSTUVWXYZÁÀÂÄÉÈÊËÍÌÎÏÓ...";
for (i=0; i<len; i++) { tmp=strchr(srctable, s[i]); if (tmp) s[i]=dsttable[tmp-srctable]; }
There's probably a better way to handle it, but this comes to mind right away.
Admin
Ah, the days of pay-per-KLOC ... code like this is more valuable than code like { if ($mychr >= 'a' && $mychr <= 'z') $mychr ^= ('a' ^ 'A'); } (lazy and it optimizes out)
Admin
Looking at the other examples, I think I'll replace the map with an array of lower, upper pairs; if you want to uppercase a letter, do a binary search on the lower letter and, if found, replace it with the next array element. This is nice and compact and is amenable to reversal. Naturally, it doesn't work at all with the turkish weirdness mentioned. For that, I'd want a base map<char, String> and a locale map that overrides the base. This allows me to keep the logic in static structures and not in the code and is reasonably space efficient.
I wonder what the standard uppercase impl looks like.
Admin
Maybe I am wrong, but the tone of this reply was pretty snarky. How exactly would I use an input character as an index? You derive it from the input character's position in the first string.
The first string in your conversion function is the set of lower case letters, including those with umlauts and tildes. The second string has the upper case equivalents.
You start with a loop for your input string, and for the first character (which I will call the source character), you look it up in your first string and extract the character position. If the function returns a null (or you don't find it) then you simply copy the source character down to your output string. Otherwise, you use the character position to copy its replacement from the second string.
Repeat until done. As you see, the source character generates the index from its position in the first string. Call it an indirect index.
As for the "two tables" solution, take the above explanation and substitute first table for first string, etc. The logic holds. After all, a string serves as a table if you index it.
Looking back at this reply, I wonder if "rt" is the original programmer of this particular WTF. He treats the label "real programmer" as invective. When I started programming, you got 3 kwords per user in a multitasking environment that clocked at 50 kilohertz. And we were thankful!
<humor> Even the bits were too big to fit through a microchip! </humor>But programmers knew the ASCII table cold and did bit masking and octal/hex in their heads. Today people have to consult a calculator or scribble on a pad to do it. There is nothing like knowing your system in and out- something that is flat impossible today.
Admin
ROFL! Best comment in a week!
Admin
The real WTF is people still thinking in terms of what character set their string is in. Modern languages provide Unicode string types, where one character in the string is one actual character. Once you have decent strings, a string is plain text and a character set (or "encoding") is only a mapping between byte arrays and plain text.
A string-uppercase algorithm should never need to know what your favorite encoding is. It'd be ridiculous to duplicate basic string algorithms across all encodings. String algorithms should be able to loop over the characters in the string without dealing with crap.
The real question is not what format your string is in. The real question is what locale you want to uppercase the string for. In Turkish, for example, "i" uppercases to "İ" and "ı" uppercases to "I". Programming isn't always easy, but you can make it much easier for yourself by using real strings.
Admin
private static String addOneToIt(int Number) {
if(Number == 0){ return 1; }else if(Number == 1){ return 2; }else if(Number == 2){ return 3; }else if(Number == 3){ return 4; }else if(Number == 4){ return 5; }else if(.... ... ... ... } }
Admin
Open Eclipse (or another good IDE), make sure you have configured it to use JDK libraries and not JRE libraries (so that you have debug information and source attachments), and do a Ctrl+Click on any toUpper() call.
Admin
Admin
Just 'cos I wanna get in on the bracket stuff too....
consider (my preferred indenting):
if we want to see what happens inside the block without forcing the condition to be true (when you are testing naturlich), you simply comment out the if and voila - the block gets executed
if, on the other hand, you use:
then commenting out the if is a little (only a little) more complex....
I think (in a similar vein to what someone else suggested) that the second style is also a bit less friendly on the eyes because it isn't immediately apparent whether both braces are there (and there's something nice about braces lining up nicely - especially over largish chunks of code....
Incedently, AFAIK, Emacs' standard indenting (press tab on any line in emacs and it will align that row to where it feels it should be based on the row above it....) involves indenting the brace a little (to half the indenting of the actual statement) - viz:
Admin
I still hate locales.
Anyway, the only generic 'encoding' that is large enough to hold sufficient character set you can loop over without having to deal with any crap is UTF-32. I'm sure people would be overjoyed if UTF-32 was made obligatory everywhere, especially those who have to deal with gigabytes of string data, and the storage requirements just quadrupled. But that's small price to pay for having to deal with a tiny bit less crap.
Then again 90% of everything is crap and now there'd be four times more of it. Deal.
Admin
The real fail is using a variable name with an upper case first letter. That naming style is discouraged according to ParaSoft!
Admin
That is a WTF. I can tell you that your teacher's solution wouldn't pass a code review where I work, if that makes you feel better. Your post lost its indentation; I guess the second if/else was indented further than the first?
Admin
c'mon... no matter what encoding is internally used - you can at least count on the fact that the character codepoints for all lowercase and uppercase chars respectively are contiguous (if not, take the system in question and throw it away...) and then it's a small step to calculate the difference between 'a' and 'A' and use that to add or substract and check whether you have to do it or not - something along the lines of:
No assumption is made about the code points, only that a-z and A-Z are contiguous - you can even turn around the hi and lo values:
and it still works, just converts into the other direction.
It gets more complicated for characters where upper and lower do not have a constant distance, but e.g. for German umlauts (ä,ö,ü) the assumption still holds (you'd have to adapt the range to cover those, of course, the above was just quickly hacked in without consulting a char table about the position of non-english chars).
But you still wouldn't handle such special cases like in today's WTF, but e.g. rather use the charcode to index into a pre-compiled conversion array - still much much better than hardcoding everything in explicit if-statements...
Admin
The EBCDIC codepage layout on Wikipedia shows that every uppercase letter is +64 units above its lowercase counterpart. I chose A, J, & S as samples.
foreach letter in word do if letter > 192 and letter < 234 then letter = letter + 64 endif endfor
This is why assembler & machine architecture classes should still be taught.
Admin
captcha: populus
Admin
fail! = Ç{âå!
Admin
Yes, it is that bad.
The compiler will use a StringBuilder (or if you're pre-Java 5, a StringBuffer), but it will create a new Builder/Buffer for every line that contains a concatenation. So you really are copying the string contents N times.
Instead, if you're going to be building up a string iteratively, you should be creating the Builder/Buffer outside the loop and explicitly using it.
Admin
In the few situations where storing Unicode data in memory is impractical (such as the implentation of an SQL database), the programmers can go ahead and use byte arrays. This is because programmers use useful abstractions when they're practical, and don't use them when they're not.
Frankly, your claim that it's only a "tiny bit less crap" indicates that you have little understanding of issues that arise in modern programming. Your hatred of locales compounds that.
Admin
Admin
Admin
Five minutes googleing would reveal to even non java programmers that:
Java strings are objects, part of the java.lang package (and hence the built-in methods are always going to be available).
The internal representantion is not ASCII, it's UTF-16.
http://java.sun.com/javase/6/docs/api/java/lang/String.html
Admin
Admin
Admin
Like this?
<translation> a => A b => B </translation>and then parse the XML and then parse the Element text!
Admin
That is GNU's gay reccomended style. If you write it like this:
if (condition) { doMagic() }
it works slightly better.
Admin
I like decision to declare this helper function private and static. I would love to hear the inner monologue on that decision.
Admin
Either the trolls are outnumbering the genuine posters or this site is read mainly by non-programmers. The code is obviously WTF, yet we have many posters defending it and geniuses offering their own solutions to this very tricky problem.
Admin
Given a sane character set like ASCII, that's exactly what you'd do.
I don't think so. The difference between A and a is 32. So, for character between a and z, just subtract 32 from them. Then you get the uppercase character.
Admin
Admin
gah. (allman et. al. style) whitespace padding is for inferior token processors. like any real programmer, i use a modified 1TBS bracing, with brace-on-construct-line, and only significant [max(1), only when required] whitespace between keywords, operators, constants, variables, equations, etc.
And no concession to you peons that can't keep a piece of logic in your heads without WORD-WRAPPING(?!?!) it. Only Quiche-Eaters word-wrap in code.
I also delight in nested ternary statements. You whinging readability whores can go back to Ops, where you may or may not belong. But know that they have shiny manuals there.
Requisite sample:
Admin
Trolls; flame-wars; a crap-load of simultaneous wtfs to mess up...
The genuine readers are simply having a hard time telling which way is up.
Admin
You misunderstand how we, who prefer the bracing style with the brace at the same line as the statement it belongs to, reads code. We do not match closing brace with opening brace and then match opening brace with statement. We match the closing brace directly with the statement.
Having a brace on its own line is not superior or more readable (i would say less readable, but you would disagree). It is a simple matter of preference and/or what you are used to.
But this has been flamed about 372819732198 times before. Let the brace-wars (:Ð) end
Admin
You are the wind beneath my wings.
Admin
Oops. Clicked the wrong button.
You are the wind beneath my wings (reprise).
Captcha: wisi. Not to wisi to click "reply" instead of "quote"....
Admin
I seem to be the only one who hasn't optimized the original code. Shouldn't the tests be in order of common usage e, t, i, o, n, i, s... (watch Wheel of Fortune).
Of course the best way is (as mentioned before) to see if the character is alphabetic and in the lower case range, then add the value of 'A' - 'a'. This will work regardless of the character set (ASCII, or EBCDIC), since the difference is calculated at "compile time" which hopefully is the same character set used at "run time" (sometimes it isn't!). Of course this assumes that the difference is constant over the character range desired.
Admin
hey! I figured out what to code when one feels like not doing any work, yet still doing some piece of programming. lets write programs that generate suchlike source monsters. I already have an endless pool of ideas available: re-inventing core routines. should be simple.
Admin
A fast way would be to use a hash.
E.g., in python2.5:
low = "abcdefghijklmnopqrstuvwxyz" up = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
m = dict(zip(low, up))
def upper(astr): return "".join(m.get(s, s) for s in astr)
Note the actual function is only one line. Creating the hash took 3.