The Daily WTF: Curious Perversions in Information Technology

2005-05-20 Reply Admin

Anonymous:
Aktshully, ASCII was defined so you can do
int flipCase(int c)
{
   return c ^ 0x20;
}
Pretty cool, uh

Yep. And if you assume the thing is an ASCII character, you can even macro that just as well and avoid the overhead of a function call.

#define FLIP_CASE(c) (c ^ 0x20)

You can also make some nice macros for forcing upper and lower case:

#define TO_UPPER(c) (c & 0xDF)
#define TO_LOWER(c) (c | 0x20)

Pretty cool as well.

Maurits · 2005-05-20 Reply Admin

Anonymous:
And if you assume the thing is an ASCII character, you can

... guarantee yourself a huge number of difficult-to-catch bugs, some of which will happen at compile time and won't show up in a debugger.

2005-05-20 Reply Admin

I think all of you people who insist that lowercase -> uppercase conversion is just a matter of XORing a bit (or that sorting strings is a matter of lexicographically comparing the numerical values of characters, for that matter) need to shed assumptions about what characters should be. Thát trïck wõn't wôrk ín mànÿ cases.

Sure, one needs to follow the assumption when dealing with legacy systems. But one under-appreciated WTF is the fact that so many of these "alphabetical sorting" functions that programmers write order all uppercase letters before all lowercase ones.

Maurits · 2005-05-20 Reply Admin

A Unicode to-upper, eh? I wonder how many commercial ToUpper() functions correctly upper-case-ize ß to SS - which requires an extra character!

E++ · 2005-05-20 Reply Admin

Anonymous:
Anonymous:
Aktshully, ASCII was defined so you can do
int flipCase(int c)
{
   return c ^ 0x20;
}
Pretty cool, uh
Yep. And if you assume the thing is an ASCII character, you can even macro that just as well and avoid the overhead of a function call.
#define FLIP_CASE(c) (c ^ 0x20)
You can also make some nice macros for forcing upper and lower case:
#define TO_UPPER(c) (c & 0xDF)
#define TO_LOWER(c) (c | 0x20)
Pretty cool as well.

And then you can just put it in the user documentation that ' is lowercase for @ and so forth... [:D]

2005-05-20 Reply Admin

>>

if ((c>='a')&&(c<='z')) c -= 'a' - 'A';

It works for every character codification where alphabetic letters are consecutive (so no ASCII/EBCDIC problems).
<<

Alphabetic characters are not consecutive in EBCDIC.

DWalker59 · 2005-05-20 Reply Admin

Purplet: Letters are not consecutive in EBCDIC.

(Second attempt at posting.)

DavidW

JamesCurran · 2005-05-20 Reply Admin

Anonymous:
Anonymous:

if ((c>='a')&&(c<='z')) c -= 'a' - 'A';

It works for every character codification where alphabetic letters are consecutive (so no ASCII/EBCDIC problems).

Alphabetic characters are not consecutive in EBCDIC.

True, but what he meant was that conversion will work for every codification where the delta between each uppercase letter & it's correspond lowercase letter is constant, and that is true of EBCDIC.

Schol-R-LEA · 2005-05-20 Reply Admin

Anonymous:
Anonymous:
Mark Twain probably would have responded along the lines of:
And zen vinaly ze drem has com tru!

Fabian (who probably should not have suggested knowing what Twain might have said...)

Indeed, you probably should not have suggested it, because it was George Bernard Shaw who proposed the simplified spelling system to which you are alluding.

Actually, Fabian is correct in his allusion to Twain's satire of the Orthographic Reform movenent of his time. While Clemens was an advocate of language reform himelf, he recognized the pitfalls involved in it, and often denounced what he considered overly niave proposals, specifically Carnegie's 'Simplified Spelling'.

While Shaw also wrote satires about English spelling, Shaw's work on orthography was completely different, and was proposed in all sincerity. He proposed a new alphabet, naturally known as the Shavian Script. Like Benjamin Franklin (who had experimented with a reformed orthography of his own in the 1790s), he was known to use his script in correspondence, and even had one of his plays printed in it.

Maurits · 2005-05-20 Reply Admin

JamesCurran:
Anonymous:
Anonymous:

if ((c>='a')&&(c<='z')) c -= 'a' - 'A';

It works for every character codification where alphabetic letters are consecutive (so no ASCII/EBCDIC problems).

Alphabetic characters are not consecutive in EBCDIC.

True, but what he meant was that conversion will work for every codification where the delta between each uppercase letter & it's correspond lowercase letter is constant, and that is true of EBCDIC.

Unfortunately there are more than 26 letters in the range 'a' to 'z' in EBCDIC. While this expression will work for c alphabetic, it will incorrectly modify certain non-letter symbols.
http://en.wikipedia.org/wiki/EBCDIC
For example, ° becomes }

Schol-R-LEA · 2005-05-20 Reply Admin

The coder who wrote this should be sued for defamation of char.

sas · 2005-05-20 Reply Admin

Anonymous:
Anonymous:
Mark Twain probably would have responded along the lines of:
And zen vinaly ze drem has com tru!

Fabian (who probably should not have suggested knowing what Twain might have said...)

Indeed, you probably should not have suggested it, because it was George Bernard Shaw who proposed the simplified spelling system to which you are alluding.

Hallelujah! We finally have a correspondent that didn't confuse "elude" with "allude"!

2005-05-21 Reply Admin

The great thing with this is of course I18N.
I actually know a programmer who managed to squeeze double-byte character into a single-byte database. He cleverly hid them as pairs of ISO-8559-1. Then he called support and claimed UPPER() didn't work properly on his Chinese data! Never mind there is no such thing as upper case in Chinese at all. Being a relational database, he was just used to uppercase all input and compare to the UPPER() result of the column.

2005-05-23 Reply Admin

Anonymous:
Of course, subtracting 32 from the lowercase letter (within appropriate bounds) is far too great of a task to accomplish.

Don't expect that to work with international characters (not that the WTF does either, but still...). In fact, once upon a time I have had to resort to making my own version of case conversion because the built in versions did very strange things with Swedish characters.

Niels · 2005-05-23 Reply Admin

Anonymous:

(PS I just tried to sign up here, but my login seems not to work? o_O)

Maybe it's case sensitive? ;)

2005-05-23 Reply Admin

Maurits:
A Unicode to-upper, eh? I wonder how many commercial ToUpper() functions correctly upper-case-ize ß to SS - which requires an extra character!

Actually, Java does this correctly. This is one of the many underappreciated features of Java -- all strings are Unicode, and all string functions are international.

        PrintStream out;
        try { out = new PrintStream(System.out, true, "UTF-8"); }
        catch (UnsupportedEncodingException e) { throw new RuntimeException(e); }

        String lowercase = "\u00df";
        String uppercase = lowercase.toUpperCase();
        out.println(lowercase+" ("+lowercase.length()+")");
        out.println(uppercase+" ("+uppercase.length()+")");

This prints:

ß (1)
SS (2)

The PrintStream stuff is because Java (on OS X anyway) doesn't print Unicode through stdout by default, understandably, though the OS X terminal can display it.

2005-05-23 Reply Admin

Anonymous:
Drak:
True, and TO_UPPER("account") would still be "account" because "account" is not a letter from a-z...

Unless the caller gets lucky and the pointer just happens to be (char *) 'a' - then he'd get a mangled pointer.

Although of course it would only be a mangled pointer if it lay between 'a' and 'y' (inclusive) - otherwise it would be ignored by the final clause.

2005-05-24 Reply Admin

Maurits:
Anonymous:
And if you assume the thing is an ASCII character, you can

... guarantee yourself a huge number of difficult-to-catch bugs, some of which will happen at compile time and won't show up in a debugger.

Don't be a moron. There's plenty of cases where you know the input and need to mess about with it. Not every single little freakin' piece of code needs to account for every single case.

Go program in the real world sometime, and then you have room to talk, fool.

Maurits · 2005-05-24 Reply Admin

Well, sure... but macros are not one of those cases. At least not in my experience of macros.

2005-05-24 Reply Admin

Honestly, I don't know what is wrong with some people. This can be defined without any if statements or ternary operators:

#define TO_UPPER(x) (x - 32 * (int) cos((double)(((int) abs(x-109.5))/13)))

This still has the problem with *p++, of course.

2005-05-24 Reply Admin

I should have protected x, too:
#define TO_UPPER(x) ((x) - 32 * (int) cos((double)(((int) abs((x)-109.5))/13)))

Mike R · 2005-05-24 Reply Admin

Anonymous:
I should have protected x, too:
#define TO_UPPER(x) ((x) - 32 * (int) cos((double)(((int) abs((x)-109.5))/13)))

Smooth.. I think I'll use that everywhere I need to get an uppercase letter.

2005-05-24 Reply Admin

Just to finish off the silliness of my funky TO_UPPER macro, here is a DAYS_IN_MONTH macro that at least matches Java for years between 1 and 2100:

<nobr>#define DAYS_IN_MONTH(m,y) 30+(((abs(13-(m)2)+1)/2)&1)-(2-(int)cos((double)((y)%4))+(1-(int)cos((double)((y)/1600)))((int)cos((double)((y)%100))-(int)cos((double)((y)%400))))*(int)cos((double)((m)-1))</nobr>

2005-05-25 Reply Admin

Anonymous:
if ((c>='a')&&(c<='z')) c -= 'a' - 'A';

It works for every character codification where alphabetic letters are consecutive (so no ASCII/EBCDIC problems).

Except that the letters are NOT consecutive in EBCDIC...

2005-05-26 Reply Admin

Won't upper-case a 'z' ...

-CB [H]

tag · 2005-05-31 Reply Admin

<font size="2">char toupper (char c) {
return c >= 'a' && c <= 'z' ? c + 'A' - 'a' : c;
}
// ...Amazing.</font>

2005-06-02 Reply Admin

Schol-R-LEA:

Actually, Fabian is correct in his allusion to Twain's satire of the Orthographic Reform movenent of his time. While Clemens was an advocate of language reform himelf, he recognized the pitfalls involved in it, and often denounced what he considered overly niave proposals, specifically Carnegie's 'Simplified Spelling'.

While Shaw also wrote satires about English spelling, Shaw's work on orthography was completely different, and was proposed in all sincerity. He proposed a new alphabet, naturally known as the Shavian Script. Like Benjamin Franklin (who had experimented with a reformed orthography of his own in the 1790s), he was known to use his script in correspondence, and even had one of his plays printed in it.

My apologies. What I was actually remembering was a totally different satire, not by either of these gentlemen but by someone else, which referred to Shaw's (apparently sincere) efforts. And a minute with Google, instead of trusting my brain, would have corrected me. I fall on my sword -- I shriek! -- I sputter -- I die.

Charles Nadolski · 2005-06-09 Reply Admin

Anonymous:
Of course, subtracting 32 from the lowercase letter (within appropriate bounds) is far too great of a task to accomplish.

In highschool my programming teacher was astonished when I discovered this solution to the upper/lower problem in a larger app... It seemed obvious to me when looking at the ASCII table he handed out to each and every one of us at the beginning of the year. I guess he had gotten so used to seeing WTF solutions from the kids that he forgot what the easy answer was.

Charles Nadolski · 2005-06-09 Reply Admin

Anonymous:
Anonymous:
Of course, subtracting 32 from the lowercase letter (within appropriate bounds) is far too great of a task to accomplish.

Aktshully, ASCII was defined so you can do
int flipCase(int c)
{
   return c ^ 0x20;
}
Pretty cool, uh

By extension:

unsigned char ToLower(unsigned char c)
{
	return c | 0x20;
}

unsigned char ToUpper(unsigned char c)
{
	return c & 0xDF;
}

Charles Nadolski · 2005-06-09 Reply Admin

Charles Nadolski:
Anonymous:
Anonymous:
Of course, subtracting 32 from the lowercase letter (within appropriate bounds) is far too great of a task to accomplish.

Aktshully, ASCII was defined so you can do
int flipCase(int c)
{
   return c ^ 0x20;
}
Pretty cool, uh
By extension:
unsigned char ToLower(unsigned char c)
{
	return c | 0x20;
}

unsigned char ToUpper(unsigned char c)
{
	return c & 0xDF;
}

Fark me, somebody beat me to it a millenium ago. That's what you get for not reading the second page :-/

2007-08-31 Reply Admin

Alex:
The macro also evaluates (x) 26 times. So if you did something like this: TO_UPPER (*(p++)) It would increment p 26 times.

Actually it would increment p until one of the expressions matched.

Macro Polo

Leave a comment on “Macro Polo”