The Daily WTF: Curious Perversions in Information Technology

2012-12-20 Reply Admin

Omg, somebody is wrong on the Internet.

2012-12-20 Reply Admin

¯\(°_o)/¯ I DUNNO LOL:
Evan:
Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more
You mean 397535?
Do you even know how strlen works? Hint: strlen is an O(n) operation.

What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length.

I'm sure EVAN knows that. The thing is, strlen's implementation in most compilers is optimized to death. I've looked at some implementations where it scans 8 bytes in one loop iteration. This means the loop is around 8 times faster as if you scan byte by byte. Then they also took care to limit the number of failed branch predictions, they added prefetch instruction, to limit the number of cache misses. On long strings it makes a hell of a big difference. Furthermore, the compiler has informations about standard functions that it hasn't (generaly) for hand-written one. The compiler can use intrinsics (Microsoft compilers did that already in DOS times), which means they replaced the function call by ad-hoc code. The compiler also knows that the function strlen is pure. That means that 2 consecutive calls to the function will return the same value if fed the same parameters.

 l = strlen(x);
 l2 = strlen(x);

will call strlen only once, an optimisation it cannot perform in the general case (it would be fatal for rand() for example).

2012-12-20 Reply Admin

Hi,

Evan:
OK, in case anyone wants some experimental evaluation for the proposed solutions, I have timing and test data for every full solution I've seen that I didn't think was an obvious joke. (Aufgehaben's may have been a joke too but I included it anyway.)

Wow - I'm impressed!

However, if we're testing speeds then I'd like to make them functionally "more equivalent". Specifically, I'd like to replace "isspace()" with "*str == ' '" (or equivalent as appropriate) in my entry, as I suspect this is causing extra overhead.

Note: I tried g++ version 4.5.4 and it doesn't like range-based for loops at all, so the code wouldn't compile. Newer versions of g++ (that do support range-based for loops) haven't hit repositories yet.

-Brendan

2012-12-20 Reply Admin

I read the other comments you made and I see that you are aware of all my pontifications I made above, sorry for sounding a bit arrogant.

EVAN, however, did the right thing. He tested with various inputs and I'm actually really surprised my solution was that good overall. I am fully aware that it has O(nm) complexity (n number chars, m number of trailing blanks) but as ever the complexity is not the only thing to consider. The constant factor, is imo often more important than the O() class.

2012-12-20 Reply Admin

lunaryorn:
gallier2:
[…] one should not rely on compiler optimizations (not guaranteed by the language standard) to excuse poor algorithmic skills.
Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.

Please differentiate between functional languages that depend on recursion because lack of side effects and guarantee tail recursion optimisation and imperative languages that give no TRO guarantee and can bust the stack with sufficiently deep recursion.

I would like financial transaction systems not to crash because of strings containing 2k spaces, even if encountering such strings seems improbable.

TGV · 2012-12-20 Reply Admin

Evan:
Lex Luthor:
Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.
Why does everyone seem to think that "the author used recursion" is the wtf instead of "it's buggy and horribly inefficient [not due directly to the recursion]"?

Probably because (nearly) everyone is a total moron? Since (nearly) everyone only spots problems by looking for unusual constructions? Recursion looks odd, strlen doesn't, so they can stop spotting problems. Since tail recursion is not a big WTF, and some are even so clever to remember that it can be optimized away, they think they look so much smarter by saying "that's not a WTF". Bit sad, innit?

2012-12-20 Reply Admin

lunaryorn:
gallier2:
[…] one should not rely on compiler optimizations to excuse poor algorithmic skills.
Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.

Try mentioning Haskell in a discussion about performance and realtime computing... be prepared to be laughed at.

Cbuttius · 2012-12-20 Reply Admin

The recursive trim is worst-case O(N²) as strlen is O(N).

Which is why the trim function would be more efficient to implement in C++ than in C as you can get to the end of a std::string in constant time as the std::string object stores its length so doesn't need to traverse the whole buffer.

Of course you would pass in a non-const reference to your trim function.

string has several find methods including find_last_not_of which will enable you to find where the end of the string is. You can then simply resize your string to the returned value + 1. So something like:

std::string& trim_right( std::string& str )
{
   std::string::size_type pos = str.find_last_not_of(" \r\t\n");
   if( pos != std::string::npos )
          str.resize( pos + 1 );
}

One could also use std::find_if with a not adapter to isspace using rbegin and rend on your iterator. That would return you an iterator to the last non-space character (or rend if there are none). You can then convert this to a regular iterator and use erase to erase these from your string.

Finally, one could use boost::regex, (or std::regex if available). Unfortunately far too many would choose that option. It's bettr than the recursive whitespace removal but far inferior to a basic loop or algorithm use.

2012-12-20 Reply Admin

Scott:
Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.
char* trim_right( char* str )
{
	int i = 0, j = 0;
	while ( str[i] != '\0' )
	{
		if ( str[i] != ' ' )
		{
			j = i;
		}
		++i;
	}
	str[j + 1] = '\0';
	return str;
}

This is similar to I DUNNO LOL's solution and like it, it does bad things to string without spaces.

2012-12-20 Reply Admin

Recursive Reclusive:
Scott:
Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.
char* trim_right( char* str )
{
	int i = 0, j = 0;
	while ( str[i] != '\0' )
	{
		if ( str[i] != ' ' )
		{
			j = i;
		}
		++i;
	}
	str[j + 1] = '\0';
	return str;
}
This is similar to I DUNNO LOL's solution and like it, it does bad things to string without spaces.

Wow, indeed.

DUNNO LOL's truncates the string to length 0, Scott's to length 1. I think EVAN should add that test in his conformity table at http://jsbin.com/ewelah/1

2012-12-20 Reply Admin

It's because his index variable J is set to 0 initially.

For anyone wondering when recursion might come in handy, think about parsing trees with branches of arbitrary depths.

TGV · 2012-12-20 Reply Admin

The Recursion King:
It's because his index variable J is set to 0 initially.
For anyone wondering when recursion might come in handy, think about parsing trees with branches of arbitrary depths.

I do my parsing with an LALR automaton and a stack. In FORTRAN.

2012-12-20 Reply Admin

Thanks Evan for the analysis. Rather informative (I am being sincere, not sarcastic). Now that I have time to follow up...

I was in such a hurry I forgot this forum strips tabs out of submissions--sorry my post is so ugly.
Oops! Boy did I mess up the strings with only spaces. Initialize j to -1 and it works a lot better.
Regardless, I like the implementations with pointers (from I-DUNNO-LOL) better than my index-based solution, even if they very likely optimize down to the same execution speed. So forget the index-based solution--pointers are probably better.
Calling strlen() is still not the best solution because of the scan-out and scan-back situation. The gallier2 solution only beats mine (marginally) in speed for most cases because (as somebody pointed out), strlen() is very optimized. If I were offering an ANSI-C level of solution in terms of hand optimization, I would have simply done the single scan forward the same way strlen() does--by doing it 32 or 64 bits at a time, or detecting architecture and using inline assembly. Without calling strlen() first. But that would have resulted in a function that is a lot harder to understand and not as fit for forum discussion. Some things are more important than raw speed.
It's better to NOT check for NULL input on functions like this, in keeping with the speed of ANSI-C standard string functions. If somebody calls trim_right() a zillion times with strings that the caller knows could not possibly be NULL, it's just a waste of cycles. If the caller suspects the input strings might be NULL, it's the caller's responsibility to check for NULL before calling the function. (Other standard functions like free() are called so frequently with NULL inputs that it makes sense to do the NULL check inside free() to avoid polluting the calling code).
Recursive-Reclusive: Huh? I tested for strings without spaces, and it worked fine (even my broken j=0 version worked for strings without spaces).
Agreed--the WTF is not the recursion (although that's the worse way to do it), but the inefficiency.

And again, thanks Evan for the analysis.

PiisAWheeL · 2012-12-20 Reply Admin

Garrison Fiord:
Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.

Spoken like a man who has never coded in javascript and didn't want to lock up a browser.

2012-12-20 Reply Admin

Brendan:
Wow - I'm impressed!
However, if we're testing speeds then I'd like to make them functionally "more equivalent". Specifically, I'd like to replace "isspace()" with "*str == ' '" (or equivalent as appropriate) in my entry, as I suspect this is causing extra overhead.

Note: I tried g++ version 4.5.4 and it doesn't like range-based for loops at all, so the code wouldn't compile. Newer versions of g++ (that do support range-based for loops) haven't hit repositories yet.

Sorry, 4.6 is what added support.

If I make an update I'll do it.

Cbuttius:
The recursive trim is worst-case O(N²) as strlen is O(N).

Let's be clear; it's only this dumb recursive implementation which is quadratic, and it's not really because of the recursion: a straightforward transformation into a loop won't help (as about 17,000 people have pointed out, the compiler will do that for you), and it would be possible to make a recursive version that wasn't stupid.

(Also I'd add your std::string-based one to the tests, but it has a different signature and so won't work like that. :-))

gallier2:
I think EVAN should add that test in his conformity table at http://jsbin.com/ewelah/1

Actually I meant to have a test for just "abc" and then forgot about it. Oops.

Scott:
* I was in such a hurry I forgot this forum strips tabs out of submissions--sorry my post is so ugly.

It's C, not Python. No biggie. :-)

(Not spam not spam not spam?)

2012-12-20 Reply Admin

Scott:
* I was in such a hurry I forgot this forum strips tabs out of submissions--sorry my post is so ugly.

It's C, not Python. No biggie. :-)

Well then it would have been

'hello   '.rstrip(' ')

:)

And I've thought about it more and I'm starting to turn to the side of the gallier2 solution calling strlen() being better overall. I think I agree that when the compiler sees strlen() it's free to do a lot more optimizations, and it's not like the little trim_right() snippet is unclear for discussions.

Maybe I'm just tainted because my favorite interview question is to ask a candidate to write "strcpy()" and I'm surprised how many set up a for-loop with strlen() in the terminating condition, without understanding how strlen() works.

Good discussion, and it's refreshing to finally see a C-related WTF for a change.

Incidentally, I ran my version and gallier2's on my 8-bit target and for some reason mine beat the socks off gallier2's in every case.

2012-12-20 Reply Admin

Another Lisp programmer enters the workforce.

2012-12-20 Reply Admin

Sean:
This is C; it's going to run pretty darn fast. And no matter how you implement it (or let someone else implement it) string processing by its very nature involves looping over characters until you do or don't find something. So those who say "OMFG what if there's a lot of spaces" are in the same category as those who say "moar XML!" -- they just don't understand what's actually happening inside the machine.
So this boils down to which costs more: the CPU time to execute an allegedly unnecessary stack push and pop, or the developer time to take a solved problem and write a "more elegant" implementation?

If the string passed in consists of n space characters, this function will do about n recursive calls, examine 1.5 n^2 characters, and return a string that ends in a space character. Except when n = 0 where it can crash. So we are not looking for a "more elegant" implementation of a solved problem. We are looking for a solution that is not stupidly complicated, that isn't stupidly slow (if I can pass a million spaces to that function I have a nice DoS attack), and that is actually correct.

2012-12-20 Reply Admin

Xarthaneon the Unclear:
But, while I agree my rewrite is not as graceful, other than the strlen issue, fixed as such:
char* trim_right(char* str)
{
  for(int i = ((int)strlen(str)) - 1; i >= 0; i--)
    if(str[i] == ' ')
        str[i] = '\0';

  return str;
}
how is it wrong?

This will produce an empty string whenever the first character is a space character.

2012-12-20 Reply Admin

Scott:
* Recursive-Reclusive: Huh? I tested for strings without spaces, and it worked fine (even my broken j=0 version worked for strings without spaces).

If there are no spaces in the string, j remains zero. So

str[j + 1] = '\0'

truncates the string to lenght one. Easily to fix, but it's a big bug if you don't.

2012-12-20 Reply Admin

Who would want to eliminate tail recursion? That sounds as dumb as removing white space, which is more pleasant to keep occupied again and again. Recursion, iteration, any traversal of tail is fine with me.

2012-12-20 Reply Admin

Recursive Reclusive:
Scott:
* Recursive-Reclusive: Huh? I tested for strings without spaces, and it worked fine (even my broken j=0 version worked for strings without spaces).
If there are no spaces in the string, j remains zero. So
str[j + 1] = '\0'
truncates the string to lenght one. Easily to fix, but it's a big bug if you don't.

No, it works. Here's the code again (only diff: I fixed j=0 to be j=-1. That fixes the string consisting of only spaces, but makes no difference in the string that contains no spaces).

char* trim_right( char* str )
{
  int i = 0, j = -1;
  while ( str[i] != '\0' )
  {
    if ( str[i] != ' ' )
    {
      j = i;
    }
    ++i;
  }
  str[j + 1] = '\0';
  return str;
}

Feed it "hello" or something. On i == 0, str[i]!=' ' is true and sets j. Compile it and try it. Try it with j=0 instead of -1; it fails the only-spaces case, but not the no-spaces case.

Here's my test stub:

#include <stdio.h>
#include <string.h>

int main()
{
	int i;
	char tests[9][20];
	strcpy( tests[0], "hello there  " );
	strcpy( tests[1], "hello there " );
	strcpy( tests[2], "hello there" );
	strcpy( tests[3], "hello " );
	strcpy( tests[4], "hello" );
	strcpy( tests[5], "h " );
	strcpy( tests[5], "h" );
	strcpy( tests[6], "   " );
	strcpy( tests[7], " " );
	strcpy( tests[8], "" );

	for ( i = 0; i < 9; ++i )
	{
		printf( "|%s| --> ", tests[i] );
		printf( "|%s|\n", trim_right( tests[i] ) );
	}
}

2012-12-20 Reply Admin

Follow up, Recursive Reclusive: I hope you get back to me on this because if there is a case I'm missing (or am just misunderstanding something), I do want to learn. Thanks!

2012-12-21 Reply Admin

Scott:
Incidentally, I ran my version and gallier2's on my 8-bit target and for some reason mine beat the socks off gallier2's in every case.

I didn't put much reflexion when writing my submission on page 1. But it was written in the style that, from my experience, suits best big servers. I work on Oracle M5000 machines (Solaris SPARC 64 bit). Had the routine been written with microcontrollers (those without cache and that are not super-scalar) in mind, I would certainly have written it differently. Probably much more like your version. But as already said, I didn't put much optimisation effort when I wrote that snippet, my biggest emphasizes was to write a correct version.

2012-12-21 Reply Admin

Well you certainly beat me on correctness! (At least the first time--I got it fixed. It's the one test case I forgot to test for).

I was kidding about the 8-bit thing; it was more a suggestion that strlen() probably isn't much more efficient then the super-dumbed down char-by-char for an 8-bit architecture. But for the nominal case (32/64-bit), a good optimized strlen() ended up making your version faster overall.

Anyway, good show!

2012-12-21 Reply Admin

Recursive Reclusive:
Scott:
Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.
char* trim_right( char* str )
{
	int i = 0, j = 0;
	while ( str[i] != '\0' )
	{
		if ( str[i] != ' ' )
		{
			j = i;
		}
		++i;
	}
	str[j + 1] = '\0';
	return str;
}
This is similar to I DUNNO LOL's solution and like it, it does bad things to string without spaces.

Ok, three days of these solutions, and I promise I will get off my hobby horse and leave it be after this.

None of these "fixes" to the original code still do not handle null strings passed into the code. If str is NULL and you are lucky, this line:

while ( str[i] != '\0' )

causes a segfault which will terminate the process. If you're not lucky enough to be on a platform that traps such things, you've just stomped on whatever data lives at memory offset 0, and worse, returned that data to the caller to be mangled further.

It doesn't matter if the algorithm is recursive, iterative, unrolled looping, optimized for word size, etc. This is Defensive Coding 101.

Kuba · 2012-12-21 Reply Admin

gallier2:
one should not rely on compiler optimizations to excuse poor algorithmic skills.

Tail recursion is a valid programming technique, and IMHO it's valid to depend on the compiler doing the right thing.

Kuba · 2012-12-21 Reply Admin

Steve The Cynic:
And C does not have to have a contiguous stack for parameter passing and local variables. Mandating such a thing would make writing C compilers for System/370 difficult, since these machines don't even have a direct concept of a stack. And even on an architecture with a machine stack, it isn't mandatory to put the activation records on the machine stack. Most systems do it that way because it's faster than allocating from the free store / heap for it, but it isn't required behaviour.

Not only the compiler for System/370 would be hard. A lot of code for small microcontrollers is compiled without use of stack for anything but storing return addresses. I have plenty of embedded code with no reentrant functions at all -- in such a case, all parameter passing and local data storage is done via fixed RAM locations. Those locations are overlaid by the linker based on the call tree. On some architectures this is faster and easier than indexing relative to a stack frame.

Never mind that you free the register that would be otherwise used to store the stack frame, and you don't have to keep that register updated on each function entry and exit. Heck, for functions called from one call site only, the compiler automatically inlines as that saves a couple cycles and a couple code space bytes. It does matter when you have 512 bytes for your code, and sometimes it does matter to have only that many bytes when you can put you application into a 6 pin part that costs less than $1 even in small quantities.

2012-12-21 Reply Admin

Mr.Bob:
None of these "fixes" to the original code still do not handle null strings passed into the code.

(Excerpted from above...)

It's better to NOT check for NULL input on functions like this, in keeping with the speed of ANSI-C standard string functions. If somebody calls trim_right() a zillion times with strings that the caller knows could not possibly be NULL, it's just a waste of cycles. If the caller suspects the input strings might be NULL, it's the caller's responsibility to check for NULL before calling the function. (Other standard functions like free() are called so frequently with NULL inputs that it makes sense to do the NULL check inside free() to avoid polluting the calling code).

This is Defensive Coding 101.

And when we graduate to Intelligent Coding 201... :)

Seriously, I know we all have different opinions on the matter of checking inputs, and that's fine, I guess. My opinion is that there are basically three levels of understanding NULL. 1. You don't know about NULL inputs; your code reflects this ignorance because it's completely ignored. 2. You know about NULL and do everything you can to do cover it up* when you get one. 3. You know about all that but choose to ignore the NULL case for speed, code clarity, debugging*, etc. Unfortunately your code looks like #1 so the #2 people just think you're an amateur.

*What does one do when one encounters a NULL? Silently ignore it? printf()? How about this: if I run a program and it segfaults, I fire it up in gdb, and guess what? The real problem is usually two functions back in the backtrace. I say if the program is wrong, it should fail big, fast, and spectacularly--it makes it easier to find the problem. If you keep intercepting bad inputs at the low level, you cover problems and make them hard to find.

Caveat: This doesn't apply to all cases, and if I were writing for a pacemaker or a space shuttle, I'd approach it differently.

2012-12-21 Reply Admin

And please stop quoting the Recursive Reclusive post involving my code that says "it does bad things to string without spaces". He's wrong.

I corrected the j=0 problem in a later post, which fixes the "strings with only spaces" problem (j=-1 works better), but the "strings without spaces" has always worked.

2012-12-21 Reply Admin

Mr.Bob:
causes a segfault which will terminate the process. If you're not lucky enough to be on a platform that traps such things, you've just stomped on whatever data lives at memory offset 0, and worse, returned that data to the caller to be mangled further.
It doesn't matter if the algorithm is recursive, iterative, unrolled looping, optimized for word size, etc. This is Defensive Coding 101.

As someone already pointed out. The design of the function demands not to check for NULL pointer. All string and mem functions (except free) in the standard library do not check for NULL pointers. It is a design choice in line with the whole C idea. If the caller wants to avoid a segfault (or worse) it can check if it wants without burdening the 99.9% of the use cases with an unnecessary check. Segfaulting is often a better choice than propagating an error condition, that no function above, in the call stack, knows how to deal with. If one wants an always checking everything language then they should do java, C# or whatever fancy interpreted language.

2012-12-21 Reply Admin

Kuba:
gallier2:
one should not rely on compiler optimizations to excuse poor algorithmic skills.
Tail recursion is a valid programming technique, and IMHO it's valid to depend on the compiler doing the right thing.

Is this a mayan calendar thing? After 2 pages of comments the same questions and comments come back and back again and again. That point was already answered more than once before, by me and by others.

2012-12-21 Reply Admin

Mr.Bob:
Jeez, it's like griping about how they used Comic Sans to spell "Titanic" on the side of your ship...

That is a perfectly valid gripe.

2012-12-23 Reply Admin

Sheesh! The old cast and too many parens mindset. So juevnile. This is right:

char* trim_right(char* str) { for(size_t i = strlen(str) - 1; i >= 0; i--) if(str[i] == ' ') str[i] = '\0';

return str; }

2012-12-23 Reply Admin

It still has the bug of truncating the string at the first blank from the string. So functionnally completely wrong.

2012-12-27 Reply Admin

Sean:
This is C; it's going to run pretty darn fast. And no matter how you implement it

Yeah, no.

Years ago someone wrote and article about how a well-written program running on a TRS-80 would run faster than a crap program on a Cray.

Better hardware (not C) means you can now write crap that runs fast, but it wuld be nice if you didn't.

2012-12-27 Reply Admin

What's really fun is when you give it a string that is not null terminated.

2013-01-02 Reply Admin

It works fine, as long as you allocate all strings with the proper function:

char *alloc_string(int len) {
  char *ret = calloc(len+2, 1);
  if (ret) *ret++ = ' ';
  return ret;
}

2013-01-03 Reply Admin

gallier2:
The compiler also knows that the function strlen is pure. That means that 2 consecutive calls to the function will return the same value if fed the same parameters.
 l = strlen(x);
 l2 = strlen(x);
will call strlen only once, an optimisation it cannot perform in the general case (it would be fatal for rand() for example).

Now, if it could only optimise the case where you write

 l = strlen(x);
 x[l - 1] = '\0';
 l2 = strlen(x);

...

2013-01-03 Reply Admin

This sort of code hurts my soul:

str[strlen(str)-1] = '\0';

Apart from the under run problem in the above comment, this is vulnerable to buffer overflows.

2013-01-03 Reply Admin

Well, except for the atrocious number of calls to strlen, I don’t see the problem.

2013-01-09 Reply Admin

TGV:
Evan:
Lex Luthor:
Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.
Why does everyone seem to think that "the author used recursion" is the wtf instead of "it's buggy and horribly inefficient [not due directly to the recursion]"?
Probably because (nearly) everyone is a total moron? Since (nearly) everyone only spots problems by looking for unusual constructions?

How about the fact that recursion is mentioned in the article title?

2013-02-28 Reply Admin

Black Bart:
Someone just learned about recursion, and it became the tool of choice for the next several weeks.
When you have a hammer in your hand, everything becomes a nail.

This is tail recursion. In certain class of languages (and their compilers), this executes in constant space (that is, the recursive call is implemented as a jump back to the start) and is, sometimes, in fact the preferred or only way to express loops.

Actually, modern gcc has learned to detect this condition even in C, or so I've heard.

It would be different if you needed to compute something with the result of the recursive call.

2013-02-28 Reply Admin

Black Bart:
Someone just learned about recursion, and it became the tool of choice for the next several weeks.
When you have a hammer in your hand, everything becomes a nail.

This is tail recursion. In certain class of languages (and their compilers), this executes in constant space (that is, the recursive call is implemented as a jump back to the start) and is, sometimes, in fact the preferred or only way to express loops.

Actually, modern gcc has learned to detect this condition even in C, or so I've heard.

It would be different if you needed to compute something with the result of the recursive call.

Of course, you'd compute the strlen once or not at all for this to also compute in linear time.

2024-06-13 Reply Admin

Recursive Whitespace Removal

Leave a comment on “Recursive Whitespace Removal”