The Daily WTF: Curious Perversions in Information Technology

2012-12-19 Reply Admin

Someone just learned about recursion, and it became the tool of choice for the next several weeks.

When you have a hammer in your hand, everything becomes a nail.

2012-12-19 Reply Admin

Not cool to write a 0 byte before the buffer if the string has length 0 and there is a ' ' before it. This seems quite low odd case but it can happen if you use a pointer to scan a longer string with trailing blanks, and passing that pointer to that function.

The recursion is of course another wtf.

2012-12-19 Reply Admin

Presumably...

unsigned int strlen(char* str) { if (str[0] == '\0') { return 0; } return 1 + strlen(str + 1); }

2012-12-19 Reply Admin

So what?

Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

IMHO this is far from TDWTF material.

2012-12-19 Reply Admin

C: All the power of Assembly with most of the flexibility.

2012-12-19 Reply Admin

It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills. C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).

2012-12-19 Reply Admin

Maybe the coder had some background in Prolog programming.

2012-12-19 Reply Admin

This is C; it's going to run pretty darn fast. And no matter how you implement it (or let someone else implement it) string processing by its very nature involves looping over characters until you do or don't find something. So those who say "OMFG what if there's a lot of spaces" are in the same category as those who say "moar XML!" -- they just don't understand what's actually happening inside the machine.

So this boils down to which costs more: the CPU time to execute an allegedly unnecessary stack push and pop, or the developer time to take a solved problem and write a "more elegant" implementation?

2012-12-19 Reply Admin

Doesn't pretty much all of lisp work this way?

2012-12-19 Reply Admin


[tkyte@localhost ~]$ !cat
cat test.c
#include "string.h"
#include "stdio.h"

char *trim_right(char *str)
{
   int len = strlen(str) - 1;

   printf( "len = %d\n", len );
   if(len == 0 || str[len] != ' ') return str;
   printf( "setting str[%d] to zero\n", strlen(str)-1 );
   str[strlen(str)-1] = '\0';
   return trim_right(str);
}

int main()
{
char str[] = "";
char str1[1];

    str1[0] = ' ';
    trim_right(str);
    printf( "%d\n", strlen(str) );

    return 0;
}
[tkyte@localhost ~]$ ./test
len = -1
setting str[-1] to zero
len = -1
0
[tkyte@localhost ~]$

buffer underwrite.... it doesn't work as intended.

2012-12-19 Reply Admin

Looks like it was coded by someone that had a lot of experience with Prolog or Lisp

2012-12-19 Reply Admin

The real WTF is not searching for a library to handle this for you. Look at what other people have written and do a nice code review and drop in your borrowed solution.

2012-12-19 Reply Admin

Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.

2012-12-19 Reply Admin

the tail recursion is arguably correct. It's the double strlen call and the failure on empty string that make this code bad.

2012-12-19 Reply Admin

gallier2:
[…] one should not rely on compiler optimizations to excuse poor algorithmic skills.

Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.

2012-12-19 Reply Admin

Your Name *:
So what?
Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

IMHO this is far from TDWTF material.

But it isn't working. Did you not read the earlier comment by gallier2?

If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.

2012-12-19 Reply Admin

lunaryorn:
gallier2:
[…] one should not rely on compiler optimizations to excuse poor algorithmic skills.
Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.

Try checking the algorithm for bugs before claiming that "poor algorithmic skill" implies a criticism of tail recursion. Be prepared to be laughed at.

2012-12-19 Reply Admin

char *trim_right(char *str)
{
  size_t l = strlen(str);
  while(l && str[--l] == ' ')
      str[l] = 0;
  return str;
}

Wasn't that difficult, was it?

2012-12-19 Reply Admin

Well, how much whitespace characters do you expect? One? Two? A billion? The programmer has probably taken into account the kind of data fed into the function and has decided that this approach would work. The function is a bit unusual and sloppy but I would not call it a WTF.

2012-12-19 Reply Admin

The poor skill refers to even using recursion for a simple string manipulation. Recursion is an excellent tool for some problems, strings is normally not one of them.

2012-12-19 Reply Admin

Your Name *:
So what?
Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

IMHO this is far from TDWTF material.

Are you the same guy who thinks that it is pointless to spend time on an algorithm when just can buy a faster CPU?

2012-12-19 Reply Admin

Sean:
the CPU time to execute an allegedly unnecessary stack push and pop

You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.

Not to mention a potential stack overflow. And wthat tohers have said...

2012-12-19 Reply Admin

"However, this power is limited insomuch that you don't have many of the friendly helper-functions that exist in higher level languages, such as string manipulation for example, unless you go ahead and make them yourself."

Or use one of the gigabytes of libraries that have been written for C, the single most successful programming language of all time.

2012-12-19 Reply Admin

How come no one ever so far was horrified by the quadratic time complexity? Sure, there's a memory cache. Still, it's not easy to get 1G instructions under 1G clock cycles.

captcha - El grasso es verto. El noche es negro.

2012-12-19 Reply Admin

It reminds me of the old contest to implement the slowest possible sort algorithm. That produced some rather mind-bogglingly slow sort routines. Perhaps the author(s) of this knew about that contest, and decided to try the same idea for string trimming. Calling strlen(), which does a scan of the string, twice during each recursive iteration is a clever idea for such a contest. This routine could be made even slower, however, by not saving the length, but calling strlen(str) for each use of len. That would add a third call, and adding a test for strlen(str) != -1 (rather than changing the ==0 test to <=0) would add another loop.

biziclop · 2012-12-19 Reply Admin

Apart from the unnecessary recursion and the serious buffer underrun issue, it also fails completely if whitespace takes any other form than 0x20 (e.g. \t, \n, \r).

2012-12-19 Reply Admin

Could've been written a bit better in APL, like trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵} or trim_right←{⌽{(∨' '≠⍵)/⍵}⌽⍵}

2012-12-19 Reply Admin

Lisa:
Doesn't pretty much all of lisp work this way?

LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.

C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.

Steve The Cynic · 2012-12-19 Reply Admin

F:
Your Name *:
So what?
Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

IMHO this is far from TDWTF material.

But it isn't working. Did you not read the earlier comment by gallier2?

If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.

The bug is actually worse than that. Any call where the string does not contain non-blanks (e.g. the empty string, or any number of blanks), and the byte immediately before the string is a blank, will destroy that blank. If the byte immediately before the string does not exist (not mapped, etc.) it will crash when checking to see if the character is blank. If that byte is marked read-only, the write will crash. If the system is a DeathStation 9000, several million people will die in a nuclear fireball.

And, of course, it is entirely feasible (as suggested by another poster above) that int is not large enough to hold all possible values of the return value of strlen. The other case is a 16-bit real-mode x86 system using the commonly available "huge" memory model, where data pointers were normalised to always have an offset in the 0-15 range. size_t wasn't normally (at the time) bigger than 16 bits, but you could imagine some nutjob at Borland or Microsoft deciding it should be unsigned long, which causes problems here if you pass exactly 64K of string plus one '\000'.

Addendum (2012-12-19 09:15): Actually, I suck at reading code. If the string contains exactly one character (plus the NUL), that character will be left alone and the function will return more or less harmlessly. So a single blank will not be trimmed, and all strings containing only two or more blanks will be trimmed to one blank.

The nuclear fireball remains the probable response when the string is empty.

RobFreundlich · 2012-12-19 Reply Admin

Snowrat:
Could've been written a bit better in APL, like trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵} or trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}

Gesundheit

2012-12-19 Reply Admin

gallier2:

char *trim_right(char *str)
{
  size_t l = strlen(str);
  while(l && str[--l] == ' ')
      str[l] = 0;
  return str;
}

Wasn't that difficult, was it?

Well I would love to be able to guarantee that

while(l && str[--l] == ' ')

Would always work as order of evaluation is not guaranteed, do decrement inside loop.

2012-12-19 Reply Admin

Very simple and elegant. I like it.

CAPTCHA: jumentum - don't jumentum it.

2012-12-19 Reply Admin

Does it make me bad coder if I prefer iterative loops or just loops for this sort of stuff?

Also, if there is dynamic memory allocation I surely hope the lengths are fixed or originals are stored somewhere...

Mcoder · 2012-12-19 Reply Admin

Tail optimization won't remove the strlen at the first line that runs through all the (remaining) string for each character on it.

Anyway, one normaly doesn't trim large strings. It is quit possible that his code makes no difference at all.

2012-12-19 Reply Admin

This is the sort of academic BS I had to do in university programming courses. For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.

2012-12-19 Reply Admin

TopTension:

Sean:
the CPU time to execute an allegedly unnecessary stack push and pop

You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.

Not to mention a potential stack overflow. And wthat tohers have said...

char *trim_right_r(char *str,int len)
{

   if(len == 0 || str[len] != ' ') return str;
   str[len] = '\0';
   return trim_right_r(str,len-1);
}

char *trim_right(char *str)
{
   int len = strlen(str);
   return trim_right_r(str,len-1);
}

FTFY

2012-12-19 Reply Admin

C does guarantee order of evaluation.

Steve The Cynic · 2012-12-19 Reply Admin

Meep:
Lisa:
Doesn't pretty much all of lisp work this way?

LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.

C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.

Only the relevant state, which might not be much at all. An x86 system running this code might have to save ESI, EDI, and EBP, but can freely discard or corrupt EBX, ECX, and EDX, while EAX would be the return value, so not saved at all. Since there is no use of the FPU, the FPU state can easily be ignored.

And C does not have to have a contiguous stack for parameter passing and local variables. Mandating such a thing would make writing C compilers for System/370 difficult, since these machines don't even have a direct concept of a stack. And even on an architecture with a machine stack, it isn't mandatory to put the activation records on the machine stack. Most systems do it that way because it's faster than allocating from the free store / heap for it, but it isn't required behaviour. (However, you can make a case that we shouldn't store activation records on the machine stack, in order to prevent buffer overflows from corrupting return addresses and allowing malware delivery.)

I talk too much sometimes.

2012-12-19 Reply Admin

Garrison Fiord:
Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.

I've used it once or twice when doing graph-theoretical work (network analysis, etc.) It can be a convenient technique for finding a top-level ancestor not, IIRC ... writing it in the form of a loop makes for an unwieldy codebase, whereas recursion makes for simple and easily-comprehended algorithm. The caveat is, of course, that you ought only to implement it on structures which are not more than four or five levels deep - and in the problem domain for which this solution was designed, that condition held.

Steve The Cynic · 2012-12-19 Reply Admin

Techpaul:
Well I would love to be able to guarantee that
while(l && str[--l] == ' ')

Would always work as order of evaluation is not guaranteed, do decrement inside loop.

There's a sequence point at the &&, so the left-hand side will be completely evaluated before anything happens on the right, giving this expression defined behaviour.

2012-12-19 Reply Admin

[quote]

char *trim_right(char *str)
{
int len = strlen(str) - 1;
if(len == 0 || str[len] != ' ') return str;
str[strlen(str)-1] = '\0';
return trim_right(str);
}

Ouch. Evil function. I'm not sure what part is more evil - the part where it's actually recursive (seems a bit unnecessary to me), the bit where it's intended to be replacing ' ' with an explicit NUL '\0'...or the part where it's actually just making the final character of the given string a NUL '\0'. Hope that's not called too often...

Personally, I'd just do something like this:

char* trim_right(char* str)
{
  for(int i = strlen(str) - 1; i >= 0; i--)
    if(str[i] == ' ')
        str[i] = '\0';

  return str;
}

I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

CAPTCHA: causa - the original might causa problem!

sibtrag · 2012-12-19 Reply Admin

If we're micro-optimizing, shouldn't we refrain from writing an end-of-string null character until we've found the first non-blank? The CPU could give you lots of flushes if it lets the loads (to check for blanks) run ahead of the stores (which depend on the values of the loaded bytes) and it doesn't track load/store collisions on a byte basis (perhaps word or cache line). Not to mention the MP effects of all that store traffic hitting the write gathering.

Of course, if we were expecing significant white space, we'd look at the first few bytes to get to word-alignment and then start looking for non-zero words. For even more speed, the SIMD engines on modern processors can even tell us which byte was non-zero while looking at 16 bytes at a time.

2012-12-19 Reply Admin

struct { int important; char stuff[10]; } foo; foo.important = 0x12345678; ... trim_right(foo.stuff);

Steve The Cynic · 2012-12-19 Reply Admin

Xarthaneon the Unclear:
Personally, I'd just do something like this:
char* trim_right(char* str)
{
  for(int i = strlen(str) - 1; i >= 0; i--)
    if(str[i] == ' ')
        str[i] = '\0';

  return str;
}
I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

CAPTCHA: causa - the original might causa problem!

It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.

2012-12-19 Reply Admin

¯\(°_o)/¯ I DUNNO LOL:
struct { int important; char stuff[10]; } foo; foo.important = 0x20202020; ... trim_right(foo.stuff);

FTFM

2012-12-19 Reply Admin

while(l && str[--l] == ' ')
Would always work as order of evaluation is not guaranteed, do decrement inside loop.

Order of evaluation is guaranteed, && || are so called sequence points, they force the order of evaluation, or else the short-circuit evaluation wouldn't work.

If I had written

while((l != 0) & (str[--l] == ' '))

Then you would be right? There would be 'undefined behaviour' because of the side-effect and there would be still be the buffer underflow.

2012-12-19 Reply Admin

Black Bart:
Someone just learned about recursion, and it became the tool of choice for the next several weeks.
When you have a hammer in your hand, everything becomes a nail.

Once there was a programmer who had a problem. He decided to use recursion. Then he had a programmer who had a problem. He decided to use recursion...

2012-12-19 Reply Admin

Steve The Cynic:
Xarthaneon the Unclear:
Personally, I'd just do something like this:
char* trim_right(char* str)
{
  for(int i = strlen(str) - 1; i >= 0; i--)
    if(str == ' ')
        str[i] = '\0';

  return str;
}
I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

CAPTCHA: causa - the original might causa problem!
It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.

I went and looked up strlen() after reading that. [i]Darn it. I noticed in their example that they perform a cast on the returned size_t to get around the issue of it not necessarily returning an int/uint32/uint64/whatever.

But, while I agree my rewrite is not as graceful, other than the strlen issue, fixed as such:

char* trim_right(char* str)
{
  for(int i = ((int)strlen(str)) - 1; i >= 0; i--)
    if(str[i] == ' ')
        str[i] = '\0';

  return str;
}

how is it wrong?

TGV · 2012-12-19 Reply Admin

For the defenders and nay sayers, the following is wrong in this piece of code:

the repeated call to strlen makes it O(n^2)
it can write to the byte before the start of the string, which might happen e.g. if that's where malloc keeps its block length and the block happens to be 32 bytes The recursion is a very mild WTF in comparison to the above issues, and one that any decent compiler can optimize away.

This is not "the programmer took the requirements bladibla" or "elegant". This. is. crap.

Delve · 2012-12-19 Reply Admin

A developer:
This is the sort of academic BS I had to do in university programming courses. For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.

Recursion can do many things better than other methods. This is not one of them. There are a number of possible reasons this code exists, but none of them excuse it.

Recursion is great. Never use it. Recursion is the Cthulhu of code. Do not wake Cthulhu lest you go mad.

Recursive Whitespace Removal

Leave a comment on “Recursive Whitespace Removal”