The Daily WTF: Curious Perversions in Information Technology

2007-02-27 Reply Admin

Ashley Visagie:
Yikes, better hope that none of the following conditions occur: (1) s1 is null (2) s2 is null (3) strlen(s1) < len (4) strlen(s2) < len

Apologies ... my comments were relating to the following implementation:

int strnicmp(const char * s1, const char * s2, size_t len) { for (ssize_t r = 0, i = 0; i < len; ++i) if ((r = (toupper(s1[i]) - toupper(s2[i]))) != 0) return r; return 0; }

2007-02-28 Reply Admin

Hey, thats Oracle style: http://forums.worsethanfailure.com/forums/thread/26879.aspx

Summary: The empty string is null, and therefore the expression (""="") does not evaluate to true, neither does (""<>"").

2007-02-28 Reply Admin

Karrde712:
For those of you making wiseass statements about null-checking before deleting a character pointer, there are in fact compilers out there that will generate code that attempts to dereference the pointer when deleting it (HP-UX's compiler is notorious for this, as well as several custom compilers for embedded processors).

Don't use a broken compiler. If you can't switch, post-process the assembly to get rid of such WTFs. I used to have a pretty elaborate post-optimizer for Z8 Encore ZDS II compiler. It was a nice stop-gap before I rolled my own compiler (not for C, though).

Earl Purple · 2007-02-28 Reply Admin

toupper takes an int as input and returns an int. The standard behaviour is to not modify the input unless it appears in a list of values, so for a signed char of say -99, it should return -99. The "undefined" behaviour is probably whether it may interpret -99 as that or as 157, but either way it probably should have little effect, especially as you would expect both characters to convert the same way (i.e. characters in both strings) and therefore would fail only if for some reason 157 converted uppercase to a different value and one string had that different value and the conversion didn't happen.

C++ has other case-conversion functions in locales but still lacks a decent case-comparison / case-conversion function.

Note that char_traits has a compare function and Dinkumware's is "broken" when it comes to signed chars because it uses memcmp which does unsigned-char comparison.

There are compiler switches that make char automatically unsigned and it is generally suggested by many of importance that you set this.

It is possible that there was code that already used strcmpi and this was ported to a platform that didn't support it so someone had to write it. Of course if there is such a function but it has a different name, then define the below in terms of that function. Otherwise I would probably write it in pure C anyway, and a reasonable solution would be

int strcmpi( const char * s1, const char * s2 )
{
   while( 1 )
   {
     int diff = toupper( (unsigned char)(*s1) ) 
         - toupper( (unsigned char)(*s2) );

     if ( diff == 0 )
     {
        if ( *s1 )
        {
           ++s1;
           ++s2;
        }
        else
        {
          return 0;
        }
     }
     else
     {
        return ( diff < 0 ? -1 : 1 );
     }
   }
}

Note that behaviour is undefined if either pointer passed in is NULL.

We can assume that any compiler would optimise the check while( 1 ) so no actual comparison there, and 2 comparisons for each matching character in the string.

2007-02-28 Reply Admin

Old Wolf:
I see the usual slew of wrong comments posted, whenever there is a C or C++ snippet. Here's the definitive answers. ...

div "Standard behaviour of standard library functions when faced with unexpected NULL pointers is to SEGV"

delete expects NULL (or a valid pointer, of course)

The behaviour is undefined in those cases, no standard I'm aware of says that a SEGV should occur.

Oh Wise Wolf!

If it is conforming. Not all implementations are conforming. Non-conformance doesn't magically mean that they don't exist, nor that real people don't have to use them to get real work done. It costs practically nothing to check before calling.
Of bloody course not. However what do you think most implementations actually end up doing? Faulting! Standard de facto, not de jure.

2007-02-28 Reply Admin

craiga:
Erm ... what? Do you even know what SEGV means?

Urrrr, what? Did something I wrote in any way suggest that I don't know what causes SIGSEGV/ACCESS_VIOLATION or whatever to be raised? I think you need to drink your morning coffee before reconsidering the practical effect of passing in NULL to a function that, for efficiency as well as fault intolerance, is going to blindly dereference it.

2007-02-28 Reply Admin

Yet another bug-- it's not thread-safe.

The code takes the length of the strings, then a bit later blindly copies them. If the strings have gotten longer in the meantime, kaboom! goes the heap.

2007-02-28 Reply Admin

softwarebear:
I love the way they check for NULL at the bottom before deleting ... you mean p1 or p2 could be NULL before that ?

If one of the strings is just a single '/0' char representing the end of string, then pointers won't be innitialised, thus this actually makes sense. In fact i was quite pleased to see that... condition, since that means that some thought went into that function... no really.

2007-02-28 Reply Admin

[quote user="MaW"][quote]...unless your sorting requirements are really really weird.[/quote]

no, not even then

KenW · 2007-02-28 Reply Admin

anonymous:
Not a WTF to me.
Defensive programming, and maybe poor unoptimiced code. Maybe ugly on the non-standard return values.

Other than that, boring code.

After saying that, I understand why you post as "anonymous". I wouldn't want people to know who I was either.

2007-02-28 Reply Admin

Tim:
I don't think it's possible to reasonably write this function in just "two lines", though.

int stricmp(const char* s1, const char* s2) {
	while(toupper(*s1) == toupper(*s2++)) if(!*s1++) return 0;
	return (tolower(*s1) - tolower(*--s2));
}

2007-02-28 Reply Admin

rbowes:
In the programmer's defense, at least he doesn't leak memory!

Actually, the new operator throws a std::bad_alloc exception if memory can't be allocated, and there's no exception handling in this code. If the first allocation succeeds (p1), but the second fails (p2), p1 is never freed.

Granted, it doesn't always leak, but it does if you run out of memory - probably the last occasion you would want a leak :)

2007-02-28 Reply Admin

KenW:
anonymous:
Not a WTF to me.
Defensive programming, and maybe poor unoptimiced code. Maybe ugly on the non-standard return values.

Other than that, boring code.

After saying that, I understand why you post as "anonymous". I wouldn't want people to know who I was either.

Well the described "defensive programming" (which is admitedtly non-sense) is not that bad IMHO.

What is really bad are our Uber-Coders here (not refering to you).

Just take a look at http://worsethanfailure.com/Articles/Paid_by_the_Line.aspx where 50% of the code which was posted afterwards as "correct" solution for the "WTF" (simple switch-case) was wrong. Sometimes I think the comments here are the biggest WTF...

So let people do their defensive programming thingy if they feel safer - while still producing usable (though sligthly bloated) code. And beware of the UBER-Programmers.

2007-02-28 Reply Admin

Ejg:
Not to mention that if one ot the strings is 0 length, the pointer is not allocated. thus strcmpi ("wtf", ""); would crash on "delete [] p2;" if the NULL check were left out

Wooo, yet another WTF... go stand on the naughty step. This was supposed to be C++ code (even though it looks like dreadful C). Any fule kno that delete on a null pointer is safe.

2007-02-28 Reply Admin

i love how the comments section is full with wtfs worse than the original article, as usual.

does anyone think the site's new name actually refers not to the published articles, but to its user base?

the best comment wtf in my view is how so many "correct" implementations use toupper just for checking if the chars are equal, but still return *s1 - *s2 without toupper once a difference is found. that's just brillant.

i didn't even bother to read all of the comments, perhaps this was noted already.

2007-02-28 Reply Admin

Why "i"?

Imbecilic? Idiotic? INCONCEIVABLE!

Bonus: CAPTCHA was "pirates" -- how can WTF read my mind and pick out the Princess Bride reference BEFORE I EVEN MAKE IT?!?!?

2007-02-28 Reply Admin

Kuba:
Don't use a broken compiler. If you can't switch, post-process the assembly to get rid of such WTFs. I used to have a pretty elaborate post-optimizer for Z8 Encore ZDS II compiler. It was a nice stop-gap before I rolled my own compiler (not for C, though).

Hah, you must work for a nice non-WTF company. You probably even have a taskchair that's newer than 2000. How do I apply?

2007-02-28 Reply Admin

your function returns 'true' if any or both strings are NULL (you may want some assert()s for your own testing). also, you're doing twice as many tests as required. i would think it would be something like:

int strcmpi(const char *s1, const char *s2)
{
  int cmp;
  do
    cmp = toupper(*s1) - toupper(*s2);
  while (!cmp && *s1++ && *s2++);
  return cmp;
}

Old Wolf · 2007-02-28 Reply Admin

div:
Old Wolf:
1. delete expects NULL (or a valid pointer, of course)
2. The behaviour is undefined in those cases, no standard I'm aware of says that a SEGV should occur.
If it is conforming. Not all implementations are conforming.

In that cause you should either switch to a conforming implementation, or write kludges, or add in a comment in your code indicating that this code is necessary to avoid a compiler bug in compiler XX.YYY.ZZZ

2. Of bloody course not. However what do you think most implementations actually end up doing? Faulting! Standard de facto, not de jure.

Or overwriting important system tables, trashing the entire device and forcing you to reload the operating system (Been there, done that)

Old Wolf · 2007-02-28 Reply Admin

Earl Purple:
toupper takes an int as input and returns an int. The standard behaviour is to not modify the input unless it appears in a list of values, so for a signed char of say -99, it should return -99.

I suppose that by 'standard' you mean 'what gcc does'. The C standard is that the behaviour is undefined if the argument is negative (and not EOF). The reason for this is to allow the implementation:

const __tu[256] = { 0, 1, 2, 3, /.../, 255 }; #define toupper(X) __tu[X]

which obviously has trouble if the argument is negative. NB. I have actually used an ANSI-C conforming compiler that did things this way; and fair enough too, it may be faster at runtime than performing some tests and branches and bit twiddles.

Another reason for passing in positive values is because it is the difference between the positive values that determines whether you should return a negative value or a positive value, eg. if one string is { -1, 0 } and the next string is { 3, 0 }, the return value should be +1 (the latter string is lesser because 3 < 255). See the quote right at the end of this message for reference.

The "undefined" behaviour is probably whether it may interpret -99 as that or as 157, but either way it probably should have little effect, especially as you would expect both characters to convert the same way

Well, the difference is that -99 is a character and 157 is not (assuming you are on a compiler where plain char is 8-bit and signed).

(i.e. characters in both strings) and therefore would fail only if for some reason 157 converted uppercase to a different value and one string had that different value and the conversion didn't happen.

Huh? Can you give a code example that would demonstrate what you are trying to say?

C++ has other case-conversion functions in locales but still lacks a decent case-comparison / case-conversion function.

Well, every programming language has the same problem. The problem is that case comparison and conversion is very complicated for some written languages!

Note that char_traits has a compare function and Dinkumware's is "broken" when it comes to signed chars because it uses memcmp which does unsigned-char comparison.

memcmp does not use any sort of typing, let alone unsigned chars. memcmp says the blocks are equal iff they consist of exactly the same sequence of bits. If you think there is a bug in Dink, either report it or post some example code in the comp.lang.c++ newsgroup that shows the problem (the Dinkumware author reads and posts in that NG).

        return ( diff < 0 ? -1 : 1 );

strcmp() is specified as any negative value meaning less-than, etc., there is no need to convert to -1.

Finally, note ISO/IEC 9899:1999 7.21.4#1, specifically the bracketed clause:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

2007-02-28 Reply Admin

TDM:
int stricmp1(char const* s1, char const* s2)
{
    while (s1 && s2 && ++s2 && (*s1++ != '\0' || *(s2 - 1) != '\0') && (tolower(*(s1 - 1)) == tolower(*(s2 - 1))));
    return (s1 && s2) ? (int)tolower(*(s1 - 1)) - (int)tolower(*(s2 - 1)) : s1 - s2;
}
Two lines, accepts NULL for either argument, and it's even readable! Do I get the caek? (Don't give me any crap about repeatedly checking for NULL -- it's two freaking lines!)

except you cannot modify s1 and s2 as they are const. no cake for you, sir.

2007-02-28 Reply Admin

arent you supposed to store intermediate strings in a DB? in case there was a failure, you need not start from 0.

2007-03-01 Reply Admin

The writer of this article believes he's a very good programmer and, how he can feel better about it or demonstrate it ? easy ! by putting some code less than good and then criticize it ! That's lame, if you're a good programmer you'll look for challenges, not for some badly written code from which you can feed your ego. Nobody start writing good code, there's something called learn curve and most people goes through that. So average Joe made this kind of code and now some people here say he must die or not ever touch a computer ? Oh yes ! right ! he can't make mistakes ! he can't learn from mistakes either ! he must be perfect, born perfect like all of you here, right ? I bet someone of the criticizers had made that kind of code once. Try to be better persons also, not just better programmers.

PS: http://norvig.com/21-days.html

2007-03-01 Reply Admin

plrf:
TDM:
int stricmp1(char const* s1, char const* s2)
{
    while (s1 && s2 && ++s2 && (*s1++ != '\0' || *(s2 - 1) != '\0') && (tolower(*(s1 - 1)) == tolower(*(s2 - 1))));
    return (s1 && s2) ? (int)tolower(*(s1 - 1)) - (int)tolower(*(s2 - 1)) : s1 - s2;
}
Two lines, accepts NULL for either argument, and it's even readable! Do I get the caek? (Don't give me any crap about repeatedly checking for NULL -- it's two freaking lines!)
except you cannot modify s1 and s2 as they are const. no cake for you, sir.

s1 and s2 aren't const pointers, they are pointers to const chars. Modifying the pointers is allowed, modifying the chars is not.

There is no const problem in that function. Definitely no cake for you!

2007-03-01 Reply Admin

In the programmer's defense, at least he doesn't leak memory!

No, but he does crash when he runs out of memory:

p1 = new char[iLen1 + 1];
memset(p1, 0, iLen1+1);

No points either for first filling an array completely with blanks, then in the next line overwriting them with the string. Which copies the terminating \0 anyway....

2007-03-01 Reply Admin

Obviously, you haven't been programming in C++ very long. Way back in the 90's, before C++ was standardized, there were compilers that generated code such that calling delete on a NULL pointer would crash the system. Basically, they just passed the value through to free(), and let the system do whatever it would with it.

2007-03-01 Reply Admin

div:
Standard behaviour of standard library functions (not syscalls) when faced with unexpected NULL pointers is to SEGV.

No, it's not.

2007-03-01 Reply Admin

div:
2. Of bloody course not. However what do you think most implementations actually end up doing? Faulting! Standard de facto, not de jure.

Baloney... many systems don't even have SIGSEGV, but ok, Windows exceptions are somewhat similiar. However, some operating systems don't use memory protection, e.g. FreeDOS. Some CPU architectures don't even have MMUs. I am co-developing tools for programming embedded systems and it's definitely not unheard of. What use would be a "segment violation" error in these cases, even if implementors cared to check for a case that the real standard says they are not required to check for?

If you think that the fact that 90% of deployed implementations raise a signal makes it a standard then I sincerely hope that no one who's using one of the remaining 10% ever has to use any of your code.

2007-03-01 Reply Admin

One test in loop.

int stricmp(char *str1, char *str2){
        int i;
        assert(str1 && str2);
        for (i=0; ;i++){
                if (toupper(str1[i]) != toupper(str2[i]))
                        return toupper(str1[i]) - toupper(str2[i]);
                else if (! str2[i])
                        return 0;
        }
}

2007-03-02 Reply Admin

WC:

One test in loop.

int stricmp(char *str1, char *str2){
        int i;
        assert(str1 && str2);
        for (i=0; ;i++){
                if (toupper(str1[i]) != toupper(str2[i]))
                        return toupper(str1[i]) - toupper(str2[i]);
                else if (! str2[i])
                        return 0;
        }
}

Using assert() to check for NULL is a bad idea. If compiled with debugging, it will abort() if the condition is false (usually undesirable except during development/testing - certainly not behaviour you want in production code); and without debugging, it does nothing - so your prod code will still go to 'undefined' or 'implementation-defined' behaviour (typically, the aforementioned SEGV on most platforms).

If you feel you must make it handle NULLs, a better choice would be

if(!str1 || !str2) return (str1 - str2)

It still only checks for NULL once, but does so regardless of how it is compiled, and without rudely terminating your application.

2007-03-02 Reply Admin

mkb:
strcmp is intended to be used for sorting strings as well, not just as a binary comparator.

<semiOT> Well, it can be used for sorting in the C locale, but that's it. Trying to sort e.g. Finnish with it would result in an error. And it's not even because of scandinavian letters.

You know, here we treat v and w as equal. So this is correct sorting in Finnish:

wait valid win vista </semiOT>

2007-03-02 Reply Admin

onlytwolines:
if(!str1 || !str2) return (str1 - str2)
It still only checks for NULL once, but does so regardless of how it is compiled, and without rudely terminating your application.

"When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements."

(ISO C99, 6.5.6/9)

"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

(ISO C++ Final Draft, 5.7-5)

2007-03-02 Reply Admin

int stricmp(char* s1, char * s2) { for ( ; toupper(*s1) == toupper(*s2); s1++, s2++ )

What the heck, are you an idiot? That will crash on NULL.

Jesus...

2007-03-03 Reply Admin

AdT:
onlytwolines:
if(!str1 || !str2) return (str1 - str2)
It still only checks for NULL once, but does so regardless of how it is compiled, and without rudely terminating your application.
"When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements."

(ISO C99, 6.5.6/9)

"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

(ISO C++ Final Draft, 5.7-5)

Fair enough, but a) the code as posted works on most platforms b) it's still better than using assert() c) I still think the best approach is leave NULL handling up to the caller (as the standard library string comparison functions do - comparing a string against NULL doesn't really make sense)

To avoid upsetting standards nazis, you could replace that line with something like

if(!str1) return (str2 ? -1 : 0);
if(!str2) return 1;

However I still stand by my first post - I haven't yet seen a better implementation of a case-insensitive string compare than:

int stricmp(const char* s1, const char* s2) 
{

	while(toupper(*s1) == toupper(*s2++)) if(!*s1++) return 0;

	return (toupper(*s1) - toupper(*--s2));

}

2007-03-04 Reply Admin

I'm not a skilled programmer so I thought this would be a good exercise for me, this is how I would write the code, is there any flaw here?

// Case insensitive string compare: int StrCmpInsensitive(const char* a, const char* b) { while(*a || *b) { // While null character's not been reached for a and b simultaneously: if(toupper(*a) != toupper(*b)) // In case the two characters are not the same: return toupper(*a) - toupper(*b);

	a++; b++; 
}

return 0; // Both strings were the same.

}

2007-06-27 Reply Admin

this did not make any sense to me?

Enough String to Hang Yourself

Leave a comment on “Enough String to Hang Yourself”