- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I can actually understand that one somewhat. Back when I worked for the Post Office, our keyboard layout had the home-row keys doubling as digits (context was always enough to tell them apart, in our usage). After I left, it took me several months to completely stop trying to use them for that...
Admin
I agree. It's also possible that because of the nature of the code the person usually writes, that a case insensitive string compare is just the normal thing to do. This is reasonable code. While I might question it a bit in a code review, I wouldn't make the person re-write it.
Admin
writing a compiler in Java, now THAT would be WTF. Give me lex and yacc any time.
For those living exclusively in the MS world, these tools generate powerful tokenizers and parsers from rules for writing compilers.
Admin
This is more common than you might think, even without acid. One of my friends got really into Half-Life a while ago. One day, after about 5 hours playing it, he shut off his computer, and pressed the down key. The look of utter confusion on his face when this didn't make him move backwards still worries me a little.
Admin
there's nothing wrong at all with today's code. the biggest wtf so far is your proposed implementation of equalsIgnoreCase. try this guy instead:
Admin
No I get it. I thought it was a joke. But before I made an ass of myself, I decided to look it up.
http://dictionary.reference.com/search?q=cromulent (pop-ups, possibly)
So I made an ass of myself. Thanks, dictionary.
Admin
Reply to an Existing Message
2 = @
3 = #
Not quite! This is locale dependant.
Admin
Actually, there are both upper-case and lower-case numbers, although they were used in the past. One place where lower-case numbers still can be found are the page-numbers of some literary books.
What's the difference between upper- and lower-case numbers you ask? Upper-case numbers are the standard number-glyphs we write today, i.e., 1234567890. For lower-case numbers, 120 have the height of an x, 34579 all go below the line, in the same manner as gjy, and 6 extends upwards, as hkd does. For 8, I can't remember at the moment.
Admin
Not necessarily the implementation - the specification is the WTF here. Java is very internationalized and its String-handling code often contains all sorts of complex special cases to deal with weird characters in some languages. e.g. the German ß char has no uppercase form and must be replaced by SS in uppercase.
Admin
The WTF here is your belief that there's anything wrong with that. compilers can be (and are) written in pretty much any language, in fact a compiler for a language is usually the first major application in that language written by its designer. Second WTF: the ignorance expressed in the word "would" - Java compilers ARE usually written in Java. Don't denigrate things you don't know.
For you, apparently living exclusively in some C-fundamentalist world (another WTF: your belief that MS has anything to do with the choice of language and excludes C), the concept of lexer and parser gnerators is not in any way tied to the language or specific implementations - in fact, Java tools that do exactly the same thing are used to write compilers in Java.
Admin
Unicode defines a special fold case for case-insensitive comparisons. Case is not a clear-cut concept when you consider the German double-s, the Greek final sigma, Croatian digraphs, etc.
Admin
Wow. That's wrong on so many levels...
Never mind that those magic numbers should really be constants.
Admin
I agree, his 'proposed implementation' of equalsIgnoreCase is just a complete WTF. Especially given it's basically an abridged version of the equalsIgnoreCase implementation from those well-known half-crazed opium-smoking coding cowboys .. err.. Sun.
god knows how that code snippet's going to come out in the actual post....
Admin
No. He should have used XML and JavaScript, which deal with upper/lower-cased numbers transparently.
dZ.
Admin
OMG! I don't know whether to laugh or beat you over the head with the keyboard!
dZ.
Admin
Yeah, nothing good has come since Monday.
-dZ.
Admin
Oh, I guess we were wrong... *YOU* wrote this WTF then, uh?
dZ.
Admin
I can't believe you've all missed why this is obviously not a WTF. What if, for example, business demands change, and contrary to his original spec it turns out the approval type code has a possible value of zero?
Then all he'd have to do is add a check for equalsIgnoreCase("Z")... and no worries about those pesky NumberFormatExceptions either!
errr.. just in case... </sarcasm>
Admin
I see. So he's getting the getApprovalTypeCode() from scanning old literary works. Maybe this code belongs to the Google Print project and this guy is smarter than we all think...
dZ.
Admin
OMG!! That is the funniest thing I've heard all day!
dZ.
Admin
That's entirely font-dependant and isn't known as upper-case and lower-case in the print world; the tradition goes back to all the way to the beginning of the printing press, but largely died out after cleaner modern fonts with aligned numerals were widely introduced in the last century.
<font face="Book Antiqua">A few fonts still have them, such as Berling Antiqua and various Euclid (maths) fonts, if you want to give your site an archiac look. 0123456789. I'll try changing the font.</font>
Admin
Actually, in virtually every language I've ever encountered, disregarding the case check will speed up any string comparison. From my days of 68000 assembly when I had to write this sort of thing every once in a while, to compare two strings in a case insensitive way would involve:
- Get a byte from each string
- check each to see if it's in the alphabetic range (two comparisons for each char)
- OR a certain bit mask if it is (quicker then checking to see if it's lower case; ASCII codes allow a simple OR or AND to ensure a given case
- compare the two resulting bytes
whereas comparing in a case insensitive way involved:
- Get a byte from each string
- compare them
UNICODE may be handled differently, but I doubt it's any simpler.
This could be a simple speed enhancement for code which loops many, many times.. or just that the programmer got into the habit of using the most efficient check.
The real WTF is surely storing a numeric value as a string, which requires significantly more processing, because many compilers will put a string in memory (can't tell how big it is, so it has to go into memory), but a numeric value in a register (for faster access).
But not a WTF at all if the code can contain a non-numeric character.
So.. nothing to see here. Move along, please.
Admin
Well, the myths actually originate from facts about the different implementations of the JVM and java runtime libraries. They fact turn into a myth the moment someone holds on to it after switching to another brand or version of the JVM.
The fact is that on my SUN 1.5.0_01 on win32 equalsIgnoreCase wins the race only if the strings are of different length (which is probably not the case in the example). There are many ways to twist and speculate why it is faster in this special case, but .equals takes an object as an argument and my jdk source tells me that the java lib makes both an instanceof and a cast to String and this is most probably what's taking the extra time.
Another thing to watch out for when measuring (which a previous poster probably oversaw) is object equality:
The declaration:
String s1 = "Olle";
String s2 = "Olle";
means that there is an object equality between s1 and s2 (s1==s2) when using "modern" compilers. Comparing this with .equals is of course extremely fast.
equalsIgnoreCase is on my system slower when the strings are of equal length. Regardless of if they are equal or not. But as said so many times before, performance is probably not the reason for the equalsIgnoreCase anyway.
Admin
JavaCC is roughly equivalent to Lex/YACC in C. I've used JavaCC and Flex/Bison (GNU versions of Lex and YACC) for two University projects, and I have to say I prefer JavaCC. There's much less fuss with setting up JavaCC than there is with Flex/Bison. ANTLR is another parser generator which people have spoken highly of, and, according to its website, has support for C++, C#, Java and Python.
Given the choice, I'd prefer writing a compiler in Java over C.
Admin
I'd like to see any language where this is true ... when you do a case sensitive comparison, all you have to do is compare the bytes. A case insensitive requires checking if the bytes are in a certain range, adding 32 if they are, and then comparing.
While it certainly is true that bit #6 indicates capitalization for ASCII letters, the only possible way I could see a time saving is if you work in 4-bit processors and are dealing only with letters (not numbers or punctuation).
No less, there is no realistic speed difference between the two types of comparisons. But technically, a case-senstive will always be faster.
Admin
<joking>
Well, obviously the author wanted to write an application with safe internationalization, but he did it wrong. Instead of equalsIgnoreCase he should have used Integer.parseInt.
</joking>
FYI: There's more than one way to write the number onehundredandtwentyhtree (123).
<font size="2">package com.thedailywtf;
import java.lang.reflect.Field;
public class IgnoreCaseDemo {
final static String FW_123 = "\uFF11\uFF12\uFF13";
final static String AI_123 = "\u0661\u0662\u0663";
final static String EAI_123 = "\u06F1\u06F2\u06F3";
final static String B_123 = "\u09E7\u09E8\u09E9";
final static String D_123 = "\u0967\u0968\u0969";
final static String E_123 = "\u1369\u136A\u136B";
final static String GJ_123 = "\u0AE7\u0AE8\u0AE9";
final static String GK_123 = "\u0A67\u0A68\u0A69";
final static String KN_123 = "\u0CE7\u0CE8\u0CE9";
final static String KM_123 = "\u17E1\u17E2\u17E3";
final static String L_123 = "\u0ED1\u0ED2\u0ED3";
final static String MA_123 = "\u0D67\u0D68\u0D69";
final static String MO_123 = "\u1811\u1812\u1813";
final static String MY_123 = "\u1041\u1042\u1043";
final static String OR_123 = "\u0B67\u0B68\u0B69";
// OS_123 needs surrogates does not work, bug in Integer.parseInt() ??
final static String OS_123 = "\ud801\udca0\ud801\udca1\ud801\udca2";
final static String TA_123 = "\u0BE7\u0BE8\u0BE9";
final static String TE_123 = "\u0C67\u0C68\u0C69";
final static String TH_123 = "\u0E51\u0E52\u0E53";
final static String TI_123 = "\u0F21\u0F22\u0F23";
public static void main(String[] args) throws IllegalArgumentException,
IllegalAccessException {
Class c = IgnoreCaseDemo.class;
Field[] fa = c.getDeclaredFields();
for (Field f : fa) {
if (f.getName().endsWith("_123")) {
checking(f.getName(), (String) f.get(null));
}
}
}
private static int safeParseInt(String test) {
try {
return Integer.parseInt(test);
} catch (NumberFormatException nfe) {
return -1;
}
}
private static void checking(String name, String test) {
System.out.println(name + "[" + test + "] = int(" + safeParseInt(test)
+ (123 == safeParseInt(test)) + ")");
}
}
</font>
Admin
Oops, forgot to post the out put, for those, that don't have Java 5 available. (Is there anyone?)
FW_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
AI_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
EAI_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
B_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
D_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
E_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
GJ_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
GK_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
KN_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
KM_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
L_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
MA_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
MO_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
MY_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
OR_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
OS_123[???] = int(-1) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(false)
TA_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
TE_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
TH_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
TI_123[???] = int(123) = lc(???) = uc(???): eqIgnoreCase(false) eqParsedInt(true)
As you can see all these Strings are parsed as the number 123.
cu
Admin
Then why use magic numbers instead of using variables to store keys?
Admin
Thanks man! Not only are you the first one to point that out, you are the first to realize that I was being COMPLETELY serious. I feel like quite the ass for being so ignorant not to know that shift-numeric values are locale dependant.
Thanks for the enlightenment.
Admin
I'll just reiterate my point that in this case, the code will perform flawlessly, albeit unoptimizedly.
In the case of using a single equals vs ignorecase, for a if/else for 0 to 9, the difference in ms is ZERO...
If the method is called ten times for a total of 100 equals, the difference in ms is ZERO
If the method is called hundred times for a total of 1 000 equals, the difference in ms is ZERO
If the method is called a thousand times for a total of 10 000 equals, the difference in ms is 1
If the method is called ten thousand times for a total of 100 000 equals, the difference in ms is 16
So basically, if the method is called once per second, at the end of the day, you'll have a difference of 'wasted' cpu time of.... 16 ms, or 0.016 seconds PER DAY! If its called 10 times per second, then 0.16 seconds, PER DAY. Fact remains that to see any notable differences in execution time, there has to be tens of thousands of calls to the method... Also, take into comparison idle cpu time per day vs 1 million calls to equalsIgnoreCase...
Sure, using equals would have been better, but when looking at the actual results, its not as bad as some make it out to be... In fact, in but the most demanding situation, equals vs ignorecase will yield NO significant speed advantages...
Sure, better to use equals still.. Is this something worth pointing fingers and lauging? I doubt so... Especially considering the code is fully functionnal and will yield differences below 2 tenths of a second for every million calls, which it probably won't in most systems... If this is a system where execution speed is absolutely critical and the method will be called constantly, sure, problem there is, but if its not (nothing indicates that it is), no one could ever notice any performance problems from it...
Admin
Why does everyone think that
returns a string?Maybe it's an instance of some mysterious class where "equalsIgnoreCase(String)" does the right thing while "equals(Object)" crashes the system? (BTW: doesn't String.equals(Object) have to check the class of the given Object before doing the actual comparison?)
Admin
My misunderstanding, my interest in typography is a hobby I indulge in when I have time, which is at least five years ago currently :). I'm not a native English speaker, and I confused upper-case with majuscule and lower-case with minuscule, which are the terms (or rather, their almsot equivalents in Swedish) I learned long ago for the two styles of numbers.
Berling Antiqua is actually my favourite typeface :)
Personally, I like using the minuscule numbers in running text, and majuscule numbers in all other places. I know this may not be the standard right now in typography, but one can always hope...
Admin
if the code is a number, what's wrong with getCode() being an int and the code be:
int code = secApp.getCode();
switch (code) {
case 1:
// do something
break;
case 2:
// do something
break;
...
...
default:
// do something
break;
}
Admin
Georgia, the much-used web font, has 'baby'-numbers, too.
I don't find the look archaic.
A peculiarity of older print fonts that you can see in early prints of, say, War & Peace, is that the zero is just a little circle, no line-weight, and the 1 is a small uppercase I, or Roman 'one'.
Maybe a real zero and one were too expensive too make. :)
Admin
You're right about people concerning themselves with performance for things that are of little concern, such as this. But you point about MAYBE changing the key binding down the road is rediculous. If that happens, change the code. If it doesn't happen, you have to live with this pi$$ poor code for the life of the application. All too often the balance between "what-if programming" and practicality often tips in the direction of "what-if". I'm so tired of dillusional engineers that have this idea that their applications will outlive them and be extended the whole way, when actually, it will likely be rewritten when the next stupid buzzword is coined.
Admin
Dude, it's not the "proposed implementation". It IS the implementation. Check the source code.
Admin
I guess it's possible that equalsIgnoreCase() is defined on some custom class but it seems likely that it's the String.equalsIgnoreCase() method.
Admin
Well, technically, it's the regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len) method, called by equalsIgnoreCase.
Admin
Yeah, since when do we give the coder the benefit of the doubt?! haha
Admin
How method lookups are executed is defined in the JVM spec and I don't believe it has changed.
I would think long similar or the same length Strings would give equals the edge as equals need only compare the value of primitives.
Admin
Yep! And I don't even feel bad about it because "the next guy" is the project lead who told me to "use less classes". That's called justice.
Admin
No, thats right. I don't know the implementation of equals or equalsIgnoreCase but you can test this by yourself.
See: "Effective Java programming" by Joshua Bloch.
Admin
Good one. Test what though?
Admin
Lowercase numbers: i, iii, iv, x, ...
Uppercase numbers: I, III, IV, X, ...
8-|
Admin
I took a grad class in refactoring once. We started by writing a program based on a (rather complex) spec. Then the professor changed the spec in a way common to many industry situations. Then he had us add features. Then we deleted features. Then he changed the spec again. Etc., each week. The kicker was you had to leave in the ability, one way or another, to execute under any of the previous specs.
Then we all looked at everyone's code. Some people had to scrap their original designs and start over. Some were convoluted messes to handle all the overlapping conditions. Some were overly complex so that they could handle ANY situation, you know, writing a finite state automata simulator to just simulate the various cases... Mine just had a few key #ifdefs in the places where the specs diverged from each other, and it all ran beautifully and cleanly.
But the kicker was he ran it like real life -- he wouldn't tell you in advance what part of the spec would be altered, extended, or deleted. You had to think about your original code carefully to make sure you weren't locking yourself into a model that wouldn't work. Your *assumptions* were the things you had to examine the most closely. Will this always be a number? Etc.
For illuminative purposes, he'd diff everyone's source code from week to week to see how much they changed, and how many people had to 'waterfall', or start over from scratch. Great class. Professor Griswold at UC San Diego.
So to come back to the original point: we don't know how this code is being used. It's possible this is just an amusing choice of functions if you assume they will always be numbers. But if it's in a block of code holding key bindings, the *correct* way to think is to assume that key bindings have a high likelyhood of changing. Suppose you do what you're suggesting and rewrite them into Equals(). Then at some point, you refactor, and pull all the hard-coded key bindings out of the code and put them into a .cfg file (a very common thing to do). Ok, so you've refactored really cleanly. All the .equals('9') lines are now .equals(keybinding_array[SOME_ENUM]). You put all your hardcoded bindings into the .cfg file, retest the code, and move on to your next project.
This is how real bugs get introduced in real life. You've implemented a new feature, and tested it and found the program is behaving exactly as it is before. You might even try changing a '9' in the .cfg file to a 'a'. You release your code. But then one day a client calls up and says he changed a key binding from 9 to an a, but now the key doesn't work at all. You spend hours on the phone debugging with him. Eventually he mails you the config file. Everything looks fine (remember this is years later), and you can't for the life of you figure out why his code didn't work, since it worked for you when you changed it to an a on your system. Finally, you realize it's because it's a capital A in the .cfg file, not a lowercase a, and then you look at your code to fix it. There are 26 .equalsIgnoreCase(keybinding_array[]) lines, and 10 .equals(keybinding_array[]) lines. You curse whatever moron wrote the code.
Of course, you may catch it when you refactor. But there's no consistent ways of leaving notes to oneself in case of refactoring (people don't really say /* If you change this in the future, make sure to replace the linked list with a hash table */ very often), so correct coding practice dictates making your code futureproof in rational and reasonable ways.
You either have to do an .equals() or an .equalsIgnoreCase(). It's not costing you anything to do it the second way. So if this is indeed code that has a likelyhood of changing in the future, this is the correct way of coding it. If it's not something that will change, then it is indeed overengineered slightly, but not in any meaningful or harmful way.
A bit funny though, and if I was going over the code with someone, and there was an invariant that said it HAD to be a number, then I'd probably laugh a little at it and, maybe, change it.
-Bill Kerney
Admin
This is true, and my first reaction to the supposed WTF here was that it was rather stylish of the programmer to include support for this. However, when I was researchcing for writing a defence, I noticed that there appears to be no Unicode characters for miniscule digits - and since Java strings are UTF-16, there is no need to do cacse-insensitive digit comparations, and so the WTF stands.
Admin
This person probably uses words like "definitize" and "definitization" often.
Admin
The WTF to me is the endless repetition of "secApp.getApprovalTypeCode().equalsIgnoreCase" in endless if clauses. I haven't more than poked at Java to modify some classes, but is there anything wrong with defining a nice, simple temporary variable (even as long as, say, secTypeCode) to hold the value of secApp.ApprovalTypeCode() in all those if statements, just for readability?
Of course, depending on what's done for any of those conditions, I'd also wonder if a nice hash table/dictionary pointing to values to return or methods to execute would be simpler.
Admin
Yes a variable would not only make the code cleaner, it might save you from multi-threading issues, like if ithe value changed from "9" to "1" halfway through the execution of the statements.
Admin
You laugh, but what about 0x2C, or 0xE3bf ? ( and yes, I know I'm commenting on a post almost 20 years old. Blame the "random acticle" feature.)