• Chris (unregistered) in reply to mkb

    Its great to make them, but the problem is that people come out of the program redesigning very common functions. Its great to learn it at first, to understand it, but if you're supposed to be teaching the next generation of programmers, its more than just teaching the code - its teaching the standards and the thought process. And part of the thought process is - "Has this been done before and if so, can I use it?"
    Once again, I totally agree with learning it at first, but there comes a point to say "Hey, really, don't use that anymore."

  • anonymous (unregistered)

    You can get it to two lines if you use recursion.

    int stricmp(char* s1, char* s2) {
        if (!*s1 || !*s2 || toupper(*s1) != toupper(*s2)) return *s1 - *s2;
        else return stricmp(s1++, s2++);
    }
    
  • PS (unregistered) in reply to rbowes
    rbowes:
    In the programmer's defense, at least he doesn't leak memory!
    Either does the cleaner that comes in at night and dusts this person's desk but does that make them a good programmer too?
  • Chris (unregistered) in reply to snoofle
    snoofle:
    Forgetting that the memory was allocated for this in the first place, I don't really have a problem with sanity checking that pointers are not null before freeing whatever they point to; it's defensive programming.

    Except its not needed. Would you write code like

    if (i != 2) i = 2;

    and call it "defensive programming"?

  • (cs)

    Let me guess... It's called strcmpi because it gives you strings with a butter and garlic sauce?

    (okay, it's weak :-)

  • PS (unregistered) in reply to poochner
    poochner:
    Let me guess... It's called strcmpi because it gives you strings with a butter and garlic sauce?

    (okay, it's weak :-)

    That was cute I thought.

    I believe the classic strcmpi recipe is strings, lemon, butter, garlic, white wine, and parsley.

  • PseudoNoise (unregistered) in reply to AngryRichard
    AngryRichard:
    rbowes:
    In the programmer's defense, at least he doesn't leak memory!

    If the allocation for p2 throws std::bad_alloc, then he leaks p1, so no, he's still a tard.

    Oooh, nice call, I missed that myself.

    Here's a quick re-enactment as I was reading along:

    • null check, ok..
    • hm, i would've returned if both null instead of nesting, but ok..
    • strlen on both...um...ok....
    • malloc jaw hits desk
  • lol (unregistered) in reply to rbowes

    His code isn't memory-leak bulletproof, one exception thrown by the 2nd new[] and bam you have it.

  • (cs) in reply to Tim
    Tim:
    return *s1 == NULL ? !(*s2 == NULL) : -1;
    That has got to be one of the ugliest uses of the ternary operator I've ever seen.
  • NeoMojo (unregistered) in reply to PseudoNoise

    I think code like this is great! Let's rewrite every well known publicly available function to not make sense!

    It's like art. What good is half a cow in formaldehyde? As good as this:

    int strcmp(char * str1, char * str2)
    {
       if(str1 > str2)
          return 1;
       else if (str2 > str1)
          return -1;
       else
          return 0;
    
       delete str1;
       delete str2;
    }
    

    Can you see the beuaty?

  • (cs) in reply to Tim
    Tim:
    I thought it was legal in C to compare characters to NULL i.e. *s1 == '\0' is equivalent to *s1 == NULL, but I'll take your word for it.

    Hey, not everything legal is a good idea :)

    NULL is sometimes defined as (void *)0, which should at least make you rcompiler complain.

  • (cs) in reply to AngryRichard
    AngryRichard:
    If the allocation for p2 throws std::bad_alloc, then he leaks p1, so no, he's still a tard.

    There still environments where you have to force the compiler to NOT throw std::bad_alloc, because that exception will crash whatever you're doing.

    Yes, even if you catch it. :(

  • me (unregistered) in reply to anonymous

    Well i think this function:

    int dontknowstrcmp(const char * s1, const char * s2){
    //since he checked for NULL Pointer we should do this,
    //too:
    if (s1 == s2) return 0;
    if (s1 == NULL) return -1;
    if (s2 == NULL) return 1;
    
    //"compare code":
        for (size_t i = 0; ; ++i){
    	//one string (or both) ended -> must return:
    	if ( (s1[i] == '\0') || (s2[i] == '\0')) 
               return ( toupper(s1[i]) - toupper(s2[i]) );
    
    	//compare the two next characters of the strings:
            if ( (r = toupper(s1[i]) - toupper(s2[i]) ) != 0) 
               return r;
       }
    }
    

    looks nicer than the while(...) stuff posted above, since it's easier to read and comment.

  • totolamoto (unregistered)

    In our libs we have two implementations of stricmp (in fact they are used for Unicode but I have modified it here for standard srings)

    int stricmp(char* s1, char * s2) { for ( ; toupper(*s1) == toupper(*s2); s1++, s2++ ) if ( *s1 == EOS ) return 0;

    return (toupper(*s1) > toupper(*s2)) ?
                (toupper(*s1) - toupper(*s2)) :
                -(int)(toupper(*s2) - toupper(*s1));
    

    }

    The return code looks WTFy but is in fact the real portable way of writing it in Ansi-C.

    int strcmpi(char *sz1, char *sz2) { while(*sz1 && (*sz1==*sz2 || toupper(*sz1)==toupper(*sz2))) { sz1++; sz2++; } return *sz2-*sz1; }

    This one is not bad because our implementation of toupper is quite expensive (turkish I and Greek Sigma problem) so it is a bit faster when the strings start is identical. We do no checking of pointers, because if it gets called with a null pointer it is a bug in the caller.

  • wigwam (unregistered) in reply to Chris
    Chris:
    The best bit about that is that delete on a NULL pointer is safe and just doesn't do anything.
    This is not necessarily true, and you'd have bitten by one of the worst bugs ever. There are several (granted, mostly old and outdated) versions of delete that doesn't handle a NULL argument properly, and some even cause weird behaviors. This is especially true when trying to use some of the C++ libs for embedded systems.
  • Hallvard (unregistered)

    Don't know about C++, but in C, the ctype.h functions only take character arguments in the range of unsigned char, plus EOF. So the toupper calls should be toupper((unsigned char)*str).

  • Julian (unregistered) in reply to Chris
    The best bit about that is that delete on a NULL pointer is safe and just doesn't do anything.

    According to the language standard yes - sometimes in practice no. IIRC, an old Borland compiler would sometimes generate code that crashed on "delete []p;" if p was null.

  • craiga (unregistered) in reply to div
    Standard behaviour of standard library functions (not syscalls) when faced with unexpected NULL pointers is to SEGV, so no, this is not standard.

    Erm ... what? Do you even know what SEGV means? SEGment Violation. You have accessed an invalid memory segment, i.e. one that isn't allocated to your process. This could, in fact, be 0, or it could be 0x000043f1 or any other random number that isn't a valid, allocated address. It does NOT mean "somebody passed in NULL and they weren't supposed to so I'm going to crash now".

  • (cs) in reply to Benanov
    Benanov:
    Honestly?

    Toss it, use library functions. You don't get paid for rewriting libraries.

    I got paid for re-writing the C string library. We'd just switched from CodeWarrior to Xcode, and needed versions of wstrlen and friends that worked with 16-bit wchar_t -- the system libraries only work with 32-bit wchar_t.

  • D-flat (unregistered)

    Even more problems with this one is the potential for SEGV:

    p1= new char [iLen1+1]; // on failure, p1 is NULL
    memset (p1, 0, iLen1+1); // And down you go
    
  • (cs) in reply to mkb
    mkb:
    Indeed, POSIX (but not ANSI, looks like) defines stricmp and wcsicmp and ISO C++ has _stricmp, _wcsicmp, _mbsicmp, _stricmp_l, _wcsicmp_l, and _mbsicmp_l.

    Well, according to Microsoft anyway.

    I've certainly seen a "strcmpi" in the past. Too many people seem to assume that C and C++ were introduced at the same time as their ISO standards.

  • (cs) in reply to mkb
    mkb:
    There still environments where you have to force the compiler to NOT throw std::bad_alloc, because that exception will crash whatever you're doing.

    Yes, even if you catch it. :(

    Most embedded environments disable exceptions. Mostly because exception support often results in noticeably degraded performance, even in code that doesn't use exceptions. It's also sort of annoying to get the runtime initialization right, and to port the exceptions part of the runtime library.

  • Marc (unregistered) in reply to anonymous
    anonymous:
    You can get it to two lines if you use recursion.
    int stricmp(char* s1, char* s2) {
        if (!*s1 || !*s2 || toupper(*s1) != toupper(*s2)) return *s1 - *s2;
        else return stricmp(s1++, s2++);
    }
    

    With this implementation stricmp("a", "B") will return 31. It should return -1.

  • Marc (unregistered) in reply to totolamoto
    totolamoto:
    In our libs we have two implementations of stricmp
    int strcmpi(char *sz1, char *sz2)
    {
      while(*sz1 && (*sz1==*sz2 || toupper(*sz1)==toupper(*sz2)))
        {
            sz1++;
            sz2++;
        }
        return *sz2-*sz1;
    }
    

    Same bug in this one. strcmpi("a", "B") return 31 instead of -1.

  • TooTired (unregistered)
    int iLen1= 0, iLen2= 0; iLen1= strlen (csz1); iLen2= strlen (csz2);
    I realize this isn't the same caliber of problem as the other WTFs mentioned, but am I the only one bothered by samples like this?

    I see this pattern every day, and it drives me nuts. Obviously, someone, somewhere back in their education was told to make sure variables are initialized so they don't have random values, but what's wrong with initializing it to the value you want?

    int iLen1= strlen (csz1); int iLen2= strlen (csz2);
    (Not to mention, I personally, hate definining multiple variables in one statement -- too many opportunities to forget whether the * binds to the variable or the type.)
  • (cs) in reply to Anonymous Tart
    Anonymous Tart:
    snoofle:
    Cyrus:
    if (p1 != NULL) delete [] p1; if (p2 != NULL) delete [] p2;

    Oh...my...word...

    I would murder him; death is not good enough though, so I would have to find something even worse.

    Forgetting that the memory was allocated for this in the first place, I don't really have a problem with sanity checking that pointers are not null before freeing whatever they point to; it's defensive programming. Assuming this code was actually necessary in the first place, down the line, after numerous modifications and the inevitable code bloat, there might be intervening code that sets one of the pointers to null, but the subsequent coder forgot to change the delete[]p at the bottom. It happens.

    Checking whether something is not NULL before deleting isnt called 'defensive programming', its called 'I am a retard and never learnt that deleting a NULL is perfectly sane, legal and allowed'.

    Of course you are right, it is ESSENTIAL that we allow for code bloat in strncasecmp.... I'm always finding I have to go rewrite bzero(), add in some bloat......

    Um, I did say "assuming this code was actually necessary in the first place"!

  • (cs) in reply to Chris
    Chris:
    snoofle:
    Forgetting that the memory was allocated for this in the first place, I don't really have a problem with sanity checking that pointers are not null before freeing whatever they point to; it's defensive programming.

    Except its not needed. Would you write code like

    if (i != 2) i = 2;

    and call it "defensive programming"?

    I personally wouldn't. However, I've worked on some massive projects with 150+ programmers, and it doesn't matter how well YOU code, it's the other idiots down the line. They change something and don't propagate it properly, then the code breaks in YOUR class because something that wasn't supposed to be null was set to null per the above, and you become responsible.

    Ideally, you'd never have to do a check for null since you should always know beforehand. However, I've found that putting in what should be redundant checks and tossing an error saying "such and such should never have happened - find out how you f'd up" usually cuts down on the crap I need to debug.

  • Teh dayli WFT (unregistered) in reply to snoofle
    snoofle:
    Chris:
    snoofle:
    Forgetting that the memory was allocated for this in the first place, I don't really have a problem with sanity checking that pointers are not null before freeing whatever they point to; it's defensive programming.

    Except its not needed. Would you write code like

    if (i != 2) i = 2;

    and call it "defensive programming"?

    I personally wouldn't. However, I've worked on some massive projects with 150+ programmers, and it doesn't matter how well YOU code, it's the other idiots down the line. They change something and don't propagate it properly, then the code breaks in YOUR class because something that wasn't supposed to be null was set to null per the above, and you become responsible.

    Ideally, you'd never have to do a check for null since you should always know beforehand. However, I've found that putting in what should be redundant checks and tossing an error saying "such and such should never have happened - find out how you f'd up" usually cuts down on the crap I need to debug.

    You completely miss the point though. It doesn't matter if it's 0 or not (we C++ programmers use 0, not the weird macro "NULL"), you can delete it either way. It's just like checking if i is 2 before setting it to 2, as per above. It's retarded.

  • Tim (unregistered) in reply to Marc
    Marc:
    anonymous:
    You can get it to two lines if you use recursion.
    int stricmp(char* s1, char* s2) {
        if (!*s1 || !*s2 || toupper(*s1) != toupper(*s2)) return *s1 - *s2;
        else return stricmp(s1++, s2++);
    }
    

    With this implementation stricmp("a", "B") will return 31. It should return -1.

    Not to mention that stricmp("a", "a") (and most other inputs) recurses endlessly.

  • riaa (unregistered) in reply to Teh dayli WFT
    Teh dayli WFT:
    You completely miss the point though. It doesn't matter if it's 0 or not (we C++ programmers use 0, not the weird macro "NULL"), you can delete it either way. It's just like checking if i is 2 before setting it to 2, as per above. It's retarded.
    You are completely missing other points. As pointed out before, not all implementations of C++ safely support deleting NULL (0, address 0x00000000, ....).
  • (cs) in reply to Tim
    Tim:
    Hmm, upon further reflection, maybe I can improve my implementation slightly...
    int stricmp(char* s1, char* s2) {
        while (toupper(*s1) == toupper(*s2) && *s1 && *s2) {
            s1++;
            s2++;
        }
        return *s1 - *s2;
    }
    
    I think this version still works...

    Why is it that everyone assumes they need to check both strings for the zero terminator?

    You already KNOW *s1 == *s2. So you only need to check to see if one of them isn't 0. Guess what value *s2 is when *s1 != 0?

  • EvilTeach (unregistered) in reply to Wintaki

    If your strings are "JAN" and "JAo" it returns positive. If your strings are "JAn" and "JAO" it returns negative.

    That is not the intended behavior.

  • SS (unregistered) in reply to craiga
    Standard behaviour of standard library functions (not syscalls) when faced with unexpected NULL pointers is to SEGV, so no, this is not standard.
    Erm ... what? Do you even know what SEGV means? SEGment Violation. You have accessed an invalid memory segment, i.e. one that isn't allocated to your process. This could, in fact, be 0, or it could be 0x000043f1 or any other random number that isn't a valid, allocated address. It does NOT mean "somebody passed in NULL and they weren't supposed to so I'm going to crash now".

    I think his point was that it's bad to swallow errors, so let the segfault happen and then debug (using electric fence or however the kids are doing it these days) to find the actual problem.

  • AN (unregistered)

    Here's my slightly modified version of one posted above by Tim.

    int stricmp(char* s1, char* s2) { while (toupper(*s1) == toupper(*s2) && *(s1++)) s2++;

    return toupper(*s1) - toupper(*s2);
    

    }

  • Tim Showalter (unregistered) in reply to cant get 2 lines :/

    Incorrect if both strings are not the same length.

  • AN (unregistered) in reply to Tim Showalter

    If they're not the same length, then the while loop will end and it will return toupper(*s1) - toupper(*s2)

    I haven't looked at the internals, but I expect toupper(0) = 0, right? So then at the end of s1, you have 0-(some positive value), which will return negative (correct), and at the end of s2, you have (some positive value)-0, which will return positive.

  • Talchas (unregistered) in reply to Benanov
    Benanov:
    (Of course this probably craps all over i18n; it would return -1 on different representations of the same string that haven't been optimized to the shortest representation in UTF-8.)
    No. In UTF-8 anything but the shortest representation is invalid and should be treated as such by any parser. Besides, the whole thing is broken for anything other than a fixed character size that toupper supports.
  • df (unregistered)
    Derrick Pallas:
    Someone once said if you can't implement any O(n) string function in two lines, you are not a programmer."
    int strcmpi(const char* a, const char* b) {
     for (int i = 0, z = 0; *(a+i) || *(b + i); i++)
      if ((z |= (!!*(a+i) - !!*(b+i))) || (z |= (toupper(*(a+i)) - toupper(*(b+i))))) return z;
    } // implicit return 0
    
  • dkf (unregistered) in reply to Talchas
    Talchas:
    No. In UTF-8 anything but the shortest representation is invalid and should be treated as such by any parser.
    So... you're saying that that code should throw an exception or something like that if someone manages to pitch invalid UTF-8 (henceforth known as "WTF-8" in honor of a greatly missed site name) at it? (Personally I prefer normalizing everything at the point when the string enters my code, whether through I/O or through decoding from something else, and converting everything bad into the official invalid character. Like that, only known good UTF-8 is ever delivered to other code, meaning that code doesn't need to be defensive.)
    Besides, the whole thing is broken for anything other than a fixed character size that toupper supports.
    That's definitely true. What's even better is that a toupper() without a locale argument is actually impossible to get correct; capitalization rules are different in different languages. Thus the real WTF is in the C standard itself (or perhaps the lack of decent locale-aware alternatives).
  • lilo_booters (unregistered) in reply to Marc
    Marc:
    anonymous:
    You can get it to two lines if you use recursion.
    int stricmp(char* s1, char* s2) {
        if (!*s1 || !*s2 || toupper(*s1) != toupper(*s2)) return *s1 - *s2;
        else return stricmp(s1++, s2++);
    }
    

    With this implementation stricmp("a", "B") will return 31. It should return -1.

    return (!*s1 || !*s2 || toupper(*s1) != toupper(*s2)) ? toupper(*s1) - toupper(*s2) : stricmp(s1++, s2++);
  • milo (unregistered) in reply to df
    df:
    Derrick Pallas:
    Someone once said if you can't implement any O(n) string function in two lines, you are not a programmer."
    int strcmpi(const char* a, const char* b) {
     for (int i = 0, z = 0; *(a+i) || *(b + i); i++)
      if ((z |= (!!*(a+i) - !!*(b+i))) || (z |= (toupper(*(a+i)) - toupper(*(b+i))))) return z;
    } // implicit return 0
    

    Well done. And now try implementing suffix tree construction in the same style (its also O(n)).

  • TDM (unregistered)
    int stricmp1(char const* s1, char const* s2)
    {
        while (s1 && s2 && ++s2 && (*s1++ != '\0' || *(s2 - 1) != '\0') && (tolower(*(s1 - 1)) == tolower(*(s2 - 1))));
        return (s1 && s2) ? (int)tolower(*(s1 - 1)) - (int)tolower(*(s2 - 1)) : s1 - s2;
    }
    

    Two lines, accepts NULL for either argument, and it's even readable! Do I get the caek? (Don't give me any crap about repeatedly checking for NULL -- it's two freaking lines!)

  • Tim (unregistered) in reply to lilo_booters
    lilo_booters:
    Marc:
    anonymous:
    You can get it to two lines if you use recursion.
    int stricmp(char* s1, char* s2) {
        if (!*s1 || !*s2 || toupper(*s1) != toupper(*s2)) return *s1 - *s2;
        else return stricmp(s1++, s2++);
    }
    

    With this implementation stricmp("a", "B") will return 31. It should return -1.

    return (!*s1 || !*s2 || toupper(*s1) != toupper(*s2)) ? toupper(*s1) - toupper(*s2) : stricmp(s1++, s2++);

    Your recursive call should be stricmp(s1 + 1, s2 + 1), otherwise it recurses endlessly.

  • sp (unregistered) in reply to RevEng
    RevEng:
    Tim:
    return *s1 == NULL ? !(*s2 == NULL) : -1;
    That has got to be one of the ugliest uses of the ternary operator I've ever seen.

    This is nicer: return *s1 ? !!*s2 : -1;

  • sp (unregistered) in reply to sp
    sp:
    RevEng:
    Tim:
    return *s1 == NULL ? !(*s2 == NULL) : -1;
    That has got to be one of the ugliest uses of the ternary operator I've ever seen.

    This is nicer: return *s1 ? !!*s2 : -1;

    Errr, i meant:

    return *s1 ? -1 : !!*s2;

  • (cs) in reply to mkb
    mkb:
    MaW:
    Yes, yes it is, but this function's ultimate return values aren't useful for sorting, unless your sorting requirements are really really weird.

    It's necessary if you want to use the function for both sorting AND testing for equality. An STL-style less than comparator can't test for equality! Perl sorts are the same way (the <=> and cmp operators).

    You can define equality in terms of the less-than operator, it's a common implementation.

    !(foo < bar) && !(bar < foo)

  • Karrde712 (unregistered)

    For those of you making wiseass statements about null-checking before deleting a character pointer, there are in fact compilers out there that will generate code that attempts to dereference the pointer when deleting it (HP-UX's compiler is notorious for this, as well as several custom compilers for embedded processors).

    It is generally considered good practice to perform a NULL check before deletion when writing vastly cross-platform code. Considering that testing for NULL is only marginally slower than performing a NOOP, it's not likely to hurt anything.

  • (cs)

    I see the usual slew of wrong comments posted, whenever there is a C or C++ snippet. Here's the definitive answers. Note that I may have omitted some -- every time I look at this code I see more problems!!

    Major WTFs

    • If one input is NULL and the other is not, then the function returns 0, indicates a match.
    • If one or both inputs is an empty string, then the function returns -1, indicating csz1 is lesser.
    • Free-store memory is allocated when the function can be easily implemented without it
      • Use of new[] instead of automatically-allocating container
      • Memory is leaked if allocation of p2 fails (p1 is never deleted)
    • Misuse of toupper()
      1. Argument must be in the range 0-255 or EOF, but this function can pass in negative char values.
      2. Return value is in the range 0-255 or EOF, but it is assigned to a char which could be signed, with a maximum value of 127. (1 causes undefined behaviour, 2 causes implementation-defined behaviour). Note that many C library vendors cater for negative argument values, because it is such a common error (and a bad function specification IMHO), so your program is quite likely to work on the mainstream systems despite being wrong. For further discussion of this topic search the comp.lang.c archives for 'toupper'

    Minor WTFs

    • 'int' used for string length (should be 'size_t'): program will fail miserable if a string longer than INT_MAX - 1 is passed as argument
    • Redundant check for NULL before deleting p1 and p2 (the delete expression accepts a NULL pointer)
    • Redundant call to memset()
    • memset() then strcpy() could have been replaced with strncpy() (assuming you wanted to set the rest of the buffer to blank for some reason)
    • iLen1 and iLen2 are set to 0 and then immediately reset to another value
    • csz1 is copied into p1 and then p1 is modified in-place -- the characters could have been modified as they were copied instead
    • Code duplication for the p1 and p2 cases
    • Entire function design not suitable for i18n (Perhaps not a WTF depending on context)

    WTFs in the Comments

    • (Many commentators) There is no standard C function for a case-insensitive comparison. strcasecmp() etc. are POSIX functions.
    • Many commentators also repeated the mistaken use of toupper() in their supposedly 'correct' code
    • softwarebear "I love the way they check for NULL at the bottom... you mean p1 or p2 could be NULL before that?" Yes - if the string length is 0 then the block of code that allocates memory is skipped.
    • rbowes "at least he doesn't leak memory" - memory is leaked if p1's allocation succeeds but p2's fails (the function will throw an exception at that point which will propagate back to the caller)
    • anonymous "Not a WTF to me" - LOL
    • div "Standard behaviour of standard library functions when faced with unexpected NULL pointers is to SEGV"
      1. delete expects NULL (or a valid pointer, of course)
      2. The behaviour is undefined in those cases, no standard I'm aware of says that a SEGV should occur.
    • Ejg "...would crash on 'delete [] p2;' if the NULL check were left out" - would not crash
    • mkb "NULL is sometimes defined as (void *)0" -- this is not allowed in C++, it must be defined as an integral expression evaluating to 0
    • D-flat "p1 = new char[iLen1+1]; // on failure, p1 is NULL" - no it isn't, the new expression either returns a valid pointer or throws an exception
  • lplatypus (unregistered)
    int strcmpi(const char *s1, const char *s2) {
        while(*s1||*s2) if (int i=toupper(*s1++)-toupper(*s2++)) return i; return 0;
    }
  • Ashley Visagie (unregistered) in reply to Anonymous Tart

    Yikes, better hope that none of the following conditions occur: (1) s1 is null (2) s2 is null (3) strlen(s1) < len (4) strlen(s2) < len

Leave a comment on “Enough String to Hang Yourself”

Log In or post as a guest

Replying to comment #:

« Return to Article