• Otto (unregistered) in reply to Jan

    Anonymous:

    Aktshully, ASCII was defined so you can do

    int flipCase(int c)
    {
       return c ^ 0x20;
    }

    Pretty cool, uh

    Yep. And if you assume the thing is an ASCII character, you can even macro that just as well and avoid the overhead of a function call.

    #define FLIP_CASE(c) (c ^ 0x20)

    You can also make some nice macros for forcing upper and lower case:

    #define TO_UPPER(c) (c & 0xDF)
    #define TO_LOWER(c) (c | 0x20)

    Pretty cool as well.

     

  • (cs) in reply to Otto
    Anonymous:
    And if you assume the thing is an ASCII character, you can

    ... guarantee yourself a huge number of difficult-to-catch bugs, some of which will happen at compile time and won't show up in a debugger.
  • em (unregistered)

    I think all of you people who insist that lowercase -> uppercase conversion is just a matter of XORing a bit (or that sorting strings is a matter of lexicographically comparing the numerical values of characters, for that matter) need to shed assumptions about what characters should be. Thát trïck wõn't wôrk ín mànÿ cases.

    Sure, one needs to follow the assumption when dealing with legacy systems. But one under-appreciated WTF is the fact that so many of these "alphabetical sorting" functions that programmers write order all uppercase letters before all lowercase ones.

  • (cs) in reply to em

    A Unicode to-upper, eh?  I wonder how many commercial ToUpper() functions correctly upper-case-ize ß to SS - which requires an extra character!

  • (cs) in reply to Otto
    Anonymous:

    Anonymous:

    Aktshully, ASCII was defined so you can do

    int flipCase(int c)
    {
       return c ^ 0x20;
    }

    Pretty cool, uh

    Yep. And if you assume the thing is an ASCII character, you can even macro that just as well and avoid the overhead of a function call.

    #define FLIP_CASE(c) (c ^ 0x20)

    You can also make some nice macros for forcing upper and lower case:

    #define TO_UPPER(c) (c & 0xDF)
    #define TO_LOWER(c) (c | 0x20)

    Pretty cool as well.

    And then you can just put it in the user documentation that ' is lowercase for @ and so forth... [:D]

  • Scott McKellar (unregistered) in reply to Purplet

    >>

    if ((c>='a')&&(c<='z')) c -= 'a' - 'A';

    It works for every character codification where alphabetic letters are consecutive (so no ASCII/EBCDIC problems).
    <<

    Alphabetic characters are not consecutive in EBCDIC.

  • (cs) in reply to Purplet
    Purplet: Letters are not consecutive in EBCDIC. 
     
    (Second attempt at posting.)
     
    DavidW
  • (cs) in reply to Scott McKellar
    Anonymous:
    Anonymous:

    if ((c>='a')&&(c<='z')) c -= 'a' - 'A';

    It works for every character codification where alphabetic letters are consecutive (so no ASCII/EBCDIC problems).


    Alphabetic characters are not consecutive in EBCDIC.

    True, but what he meant was that conversion will work for every codification where the delta between each uppercase letter & it's correspond lowercase letter is constant, and that is true of EBCDIC.

  • (cs) in reply to rpresser
    Anonymous:
    Anonymous:
    Mark Twain probably would have responded along the lines of:
    And zen vinaly ze drem has com tru!

    Fabian (who probably should not have suggested knowing what Twain might have said...)


    Indeed, you probably should not have suggested it, because it was George Bernard Shaw who proposed the simplified spelling system to which you are alluding.


    Actually, Fabian is correct in his allusion to Twain's satire of the Orthographic Reform movenent of his time. While Clemens was an advocate of language reform himelf, he recognized the pitfalls involved in it, and often denounced what he considered overly niave proposals, specifically Carnegie's 'Simplified Spelling'.

    While Shaw also wrote satires about English spelling, Shaw's work on orthography was completely different, and was proposed in all sincerity. He proposed a new alphabet, naturally known as the Shavian Script. Like Benjamin Franklin (who had experimented with a reformed orthography of his own in the 1790s), he was known to use his script in correspondence, and even had one of his plays printed in it.
  • (cs) in reply to JamesCurran
    JamesCurran:
    Anonymous:
    Anonymous:

    if ((c>='a')&&(c<='z')) c -= 'a' - 'A';

    It works for every character codification where alphabetic letters are consecutive (so no ASCII/EBCDIC problems).


    Alphabetic characters are not consecutive in EBCDIC.

    True, but what he meant was that conversion will work for every codification where the delta between each uppercase letter & it's correspond lowercase letter is constant, and that is true of EBCDIC.



    Unfortunately there are more than 26 letters in the range 'a' to 'z' in EBCDIC.  While this expression will work for c alphabetic, it will incorrectly modify certain non-letter symbols.
    http://en.wikipedia.org/wiki/EBCDIC
    For example, ° becomes }
  • (cs)

    The coder who wrote this should be sued for defamation of char.

  • (cs) in reply to rpresser
    Anonymous:
    Anonymous:
    Mark Twain probably would have responded along the lines of:
    And zen vinaly ze drem has com tru!

    Fabian (who probably should not have suggested knowing what Twain might have said...)


    Indeed, you probably should not have suggested it, because it was George Bernard Shaw who proposed the simplified spelling system to which you are alluding.


    Hallelujah!  We finally have a correspondent that didn't confuse "elude" with "allude"!
  • kiwas (unregistered)

    The great thing with this is of course I18N.
    I actually know a programmer who managed to squeeze double-byte character into a single-byte database. He cleverly hid them as pairs of ISO-8559-1. Then he called support and claimed UPPER() didn't work properly on his Chinese data! Never mind there is no such thing as upper case in Chinese at all. Being a relational database, he was just used to uppercase all input and compare to the UPPER() result of the column.

  • Raw (unregistered) in reply to andrey

    Anonymous:
    Of course, subtracting 32 from the lowercase letter (within appropriate bounds) is far too great of a task to accomplish.

     

    Don't expect that to work with international characters (not that the WTF does either, but still...). In fact, once upon a time I have had to resort to making my own version of case conversion because the built in versions did very strange things with Swedish characters.

  • (cs) in reply to Zahlman
    Anonymous:

    (PS I just tried to sign up here, but my login seems not to work? o_O)


    Maybe it's case sensitive? ;)
  • David Greenspan (unregistered) in reply to Maurits
    Maurits:
    A Unicode to-upper, eh?  I wonder how many commercial ToUpper() functions correctly upper-case-ize ß to SS - which requires an extra character!

    Actually, Java does this correctly. This is one of the many underappreciated features of Java -- all strings are Unicode, and all string functions are international.

            PrintStream out;
            try { out = new PrintStream(System.out, true, "UTF-8"); }
            catch (UnsupportedEncodingException e) { throw new RuntimeException(e); }
    
            String lowercase = "\u00df";
            String uppercase = lowercase.toUpperCase();
            out.println(lowercase+" ("+lowercase.length()+")");
            out.println(uppercase+" ("+uppercase.length()+")");
    

    This prints:

    ß (1)
    SS (2)
    

    The PrintStream stuff is because Java (on OS X anyway) doesn't print Unicode through stdout by default, understandably, though the OS X terminal can display it.

  • noone (unregistered) in reply to Granma

    Anonymous:
    Drak:
    True, and TO_UPPER("account") would still be "account" because "account" is not a letter from a-z...

    Unless the caller gets lucky and the pointer just happens to be (char *) 'a' - then he'd get a mangled pointer.

    Although of course it would only be a mangled pointer if it lay between 'a' and 'y' (inclusive) - otherwise it would be ignored by the final clause.

  • Anonymous Coward (unregistered) in reply to Maurits

    Maurits:
    Anonymous:
    And if you assume the thing is an ASCII character, you can

    ... guarantee yourself a huge number of difficult-to-catch bugs, some of which will happen at compile time and won't show up in a debugger.

    Don't be a moron. There's plenty of cases where you know the input and need to mess about with it. Not every single little freakin' piece of code needs to account for every single case.

    Go program in the real world sometime, and then you have room to talk, fool.

  • (cs) in reply to Anonymous Coward

    Well, sure... but macros are not one of those cases.  At least not in my experience of macros.

  • Fred Flintstone (unregistered)

    Honestly, I don't know what is wrong with some people. This can be defined without any if statements or ternary operators:


    #define TO_UPPER(x) (x - 32 * (int) cos((double)(((int) abs(x-109.5))/13)))

    This still has the problem with *p++, of course.

  • Fred Flintstone (unregistered) in reply to Fred Flintstone

    I should have protected x, too:
    #define TO_UPPER(x) ((x) - 32 * (int) cos((double)(((int) abs((x)-109.5))/13)))

  • (cs) in reply to Fred Flintstone
    Anonymous:
    I should have protected x, too:
    #define TO_UPPER(x) ((x) - 32 * (int) cos((double)(((int) abs((x)-109.5))/13)))


    Smooth.. I think I'll use that everywhere I need to get an uppercase letter.
  • Fred Flintstone (unregistered) in reply to Fred Flintstone

    Just to finish off the silliness of my funky TO_UPPER macro, here is a DAYS_IN_MONTH macro that at least matches Java for years between 1 and 2100:

    <nobr>#define DAYS_IN_MONTH(m,y) 30+(((abs(13-(m)2)+1)/2)&1)-(2-(int)cos((double)((y)%4))+(1-(int)cos((double)((y)/1600)))((int)cos((double)((y)%100))-(int)cos((double)((y)%400))))*(int)cos((double)((m)-1))</nobr>

  • Brazzy (unregistered) in reply to Purplet
    Anonymous:
    if ((c>='a')&&(c<='z')) c -= 'a' - 'A';

    It works for every character codification where alphabetic letters are consecutive (so no ASCII/EBCDIC problems).



    Except that the letters are NOT consecutive in EBCDIC...
  • CodeBubba (unregistered)

    Won't upper-case a 'z' ...

    -CB [H]

  • (cs)

    <font size="2">char toupper (char c) {
        return c >= 'a' && c <= 'z' ? c + 'A' - 'a' : c;
    }
    // ...Amazing.</font>


  • Ross Presser (unregistered) in reply to Schol-R-LEA
    Schol-R-LEA:


    Actually, Fabian is correct in his allusion to Twain's satire of the Orthographic Reform movenent of his time. While Clemens was an advocate of language reform himelf, he recognized the pitfalls involved in it, and often denounced what he considered overly niave proposals, specifically Carnegie's 'Simplified Spelling'.

    While Shaw also wrote satires about English spelling, Shaw's work on orthography was completely different, and was proposed in all sincerity. He proposed a new alphabet, naturally known as the Shavian Script. Like Benjamin Franklin (who had experimented with a reformed orthography of his own in the 1790s), he was known to use his script in correspondence, and even had one of his plays printed in it.


    My apologies. What I was actually remembering was a totally different satire, not by either of these gentlemen but by someone else, which referred to Shaw's (apparently sincere) efforts.  And a minute with Google, instead of trusting my brain, would have corrected me.  I fall on my sword -- I shriek! -- I sputter -- I die.




  • (cs) in reply to andrey
    Anonymous:
    Of course, subtracting 32 from the lowercase letter (within appropriate bounds) is far too great of a task to accomplish.


    In highschool my programming teacher was astonished when I discovered this solution to the upper/lower problem in a larger app...  It seemed obvious to me when looking at the ASCII table he handed out to each and every one of us at the beginning of the year.  I guess he had gotten so used to seeing WTF solutions from the kids that he forgot what the easy answer was.
  • (cs) in reply to Jan
    Anonymous:

    Anonymous:
    Of course, subtracting 32 from the lowercase letter (within appropriate bounds) is far too great of a task to accomplish.

    Aktshully, ASCII was defined so you can do

    int flipCase(int c)
    {
       return c ^ 0x20;
    }

    Pretty cool, uh



    By extension:

    unsigned char ToLower(unsigned char c)
    {
    return c | 0x20;
    }

    unsigned char ToUpper(unsigned char c)
    {
    return c & 0xDF;
    }
  • (cs) in reply to Charles Nadolski
    Charles Nadolski:
    Anonymous:

    Anonymous:
    Of course, subtracting 32 from the lowercase letter (within appropriate bounds) is far too great of a task to accomplish.

    Aktshully, ASCII was defined so you can do

    int flipCase(int c)
    {
       return c ^ 0x20;
    }

    Pretty cool, uh



    By extension:

    unsigned char ToLower(unsigned char c)
    {
    return c | 0x20;
    }

    unsigned char ToUpper(unsigned char c)
    {
    return c & 0xDF;
    }


    Fark me, somebody beat me to it a millenium ago.  That's what you get for not reading the second page :-/
  • Immibis (unregistered) in reply to Alex
    Alex:
    The macro also evaluates (x) 26 times.  So if you did something like this: TO_UPPER (*(p++)) It would increment p 26 times.
    Actually it would increment p until one of the expressions matched.

Leave a comment on “Macro Polo”

Log In or post as a guest

Replying to comment #:

« Return to Article