• Black Bart (unregistered)

    Someone just learned about recursion, and it became the tool of choice for the next several weeks.

    When you have a hammer in your hand, everything becomes a nail.

  • gallier2 (unregistered)

    Not cool to write a 0 byte before the buffer if the string has length 0 and there is a ' ' before it. This seems quite low odd case but it can happen if you use a pointer to scan a longer string with trailing blanks, and passing that pointer to that function.

    The recursion is of course another wtf.

  • eVil (unregistered)

    Presumably...

    unsigned int strlen(char* str) { if (str[0] == '\0') { return 0; } return 1 + strlen(str + 1); }

  • Your Name * (unregistered)

    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.

  • Not That Paul (unregistered)

    C: All the power of Assembly with most of the flexibility.

  • gallier2 (unregistered) in reply to Your Name *

    It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills. C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).

  • Tero T. (unregistered)

    Maybe the coder had some background in Prolog programming.

  • Sean (unregistered)

    This is C; it's going to run pretty darn fast. And no matter how you implement it (or let someone else implement it) string processing by its very nature involves looping over characters until you do or don't find something. So those who say "OMFG what if there's a lot of spaces" are in the same category as those who say "moar XML!" -- they just don't understand what's actually happening inside the machine.

    So this boils down to which costs more: the CPU time to execute an allegedly unnecessary stack push and pop, or the developer time to take a solved problem and write a "more elegant" implementation?

  • Lisa (unregistered)

    Doesn't pretty much all of lisp work this way?

  • Thomas Kyte (unregistered) in reply to Your Name *
    
    [tkyte@localhost ~]$ !cat
    cat test.c
    #include "string.h"
    #include "stdio.h"
    
    char *trim_right(char *str)
    {
       int len = strlen(str) - 1;
    
       printf( "len = %d\n", len );
       if(len == 0 || str[len] != ' ') return str;
       printf( "setting str[%d] to zero\n", strlen(str)-1 );
       str[strlen(str)-1] = '\0';
       return trim_right(str);
    }
    
    int main()
    {
    char str[] = "";
    char str1[1];
    
        str1[0] = ' ';
        trim_right(str);
        printf( "%d\n", strlen(str) );
    
        return 0;
    }
    [tkyte@localhost ~]$ ./test
    len = -1
    setting str[-1] to zero
    len = -1
    0
    [tkyte@localhost ~]$ 
    

    buffer underwrite.... it doesn't work as intended.

  • AdamK (unregistered)

    Looks like it was coded by someone that had a lot of experience with Prolog or Lisp

  • Smug Unix User (unregistered)

    The real WTF is not searching for a library to handle this for you. Look at what other people have written and do a nice code review and drop in your borrowed solution.

  • Garrison Fiord (unregistered)

    Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.

  • eak (unregistered)

    the tail recursion is arguably correct. It's the double strlen call and the failure on empty string that make this code bad.

  • lunaryorn (unregistered) in reply to gallier2
    gallier2:
    […] one should not rely on compiler optimizations to excuse poor algorithmic skills.
    Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.
  • F (unregistered) in reply to Your Name *
    Your Name *:
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.

    But it isn't working. Did you not read the earlier comment by gallier2?

    If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.

  • F (unregistered) in reply to lunaryorn
    lunaryorn:
    gallier2:
    […] one should not rely on compiler optimizations to excuse poor algorithmic skills.
    Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.

    Try checking the algorithm for bugs before claiming that "poor algorithmic skill" implies a criticism of tail recursion. Be prepared to be laughed at.

  • gallier2 (unregistered)
    char *trim_right(char *str)
    {
      size_t l = strlen(str);
      while(l && str[--l] == ' ')
          str[l] = 0;
      return str;
    }
    

    Wasn't that difficult, was it?

  • chernobyl (unregistered)

    Well, how much whitespace characters do you expect? One? Two? A billion? The programmer has probably taken into account the kind of data fed into the function and has decided that this approach would work. The function is a bit unusual and sloppy but I would not call it a WTF.

  • gallier2 (unregistered) in reply to lunaryorn

    The poor skill refers to even using recursion for a simple string manipulation. Recursion is an excellent tool for some problems, strings is normally not one of them.

  • DrStat (unregistered) in reply to Your Name *
    Your Name *:
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.

    Are you the same guy who thinks that it is pointless to spend time on an algorithm when just can buy a faster CPU?

  • TopTension (unregistered) in reply to Sean
    Sean:
    the CPU time to execute an allegedly unnecessary stack push and pop

    You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.

    Not to mention a potential stack overflow. And wthat tohers have said...

  • Meep (unregistered)

    "However, this power is limited insomuch that you don't have many of the friendly helper-functions that exist in higher level languages, such as string manipulation for example, unless you go ahead and make them yourself."

    Or use one of the gigabytes of libraries that have been written for C, the single most successful programming language of all time.

  • Honnza (unregistered)

    How come no one ever so far was horrified by the quadratic time complexity? Sure, there's a memory cache. Still, it's not easy to get 1G instructions under 1G clock cycles.

    captcha - El grasso es verto. El noche es negro.

  • John (unregistered)

    It reminds me of the old contest to implement the slowest possible sort algorithm. That produced some rather mind-bogglingly slow sort routines. Perhaps the author(s) of this knew about that contest, and decided to try the same idea for string trimming. Calling strlen(), which does a scan of the string, twice during each recursive iteration is a clever idea for such a contest. This routine could be made even slower, however, by not saving the length, but calling strlen(str) for each use of len. That would add a third call, and adding a test for strlen(str) != -1 (rather than changing the ==0 test to <=0) would add another loop.

  • (cs)

    Apart from the unnecessary recursion and the serious buffer underrun issue, it also fails completely if whitespace takes any other form than 0x20 (e.g. \t, \n, \r).

  • Snowrat (unregistered)

    Could've been written a bit better in APL, like trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵} or trim_right←{⌽{(∨' '≠⍵)/⍵}⌽⍵}

  • Meep (unregistered) in reply to Lisa
    Lisa:
    Doesn't pretty much all of lisp work this way?

    LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.

    C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.

  • (cs) in reply to F
    F:
    Your Name *:
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.

    But it isn't working. Did you not read the earlier comment by gallier2?

    If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.

    The bug is actually worse than that. Any call where the string does not contain non-blanks (e.g. the empty string, or any number of blanks), and the byte immediately before the string is a blank, will destroy that blank. If the byte immediately before the string does not exist (not mapped, etc.) it will crash when checking to see if the character is blank. If that byte is marked read-only, the write will crash. If the system is a DeathStation 9000, several million people will die in a nuclear fireball.

    And, of course, it is entirely feasible (as suggested by another poster above) that int is not large enough to hold all possible values of the return value of strlen. The other case is a 16-bit real-mode x86 system using the commonly available "huge" memory model, where data pointers were normalised to always have an offset in the 0-15 range. size_t wasn't normally (at the time) bigger than 16 bits, but you could imagine some nutjob at Borland or Microsoft deciding it should be unsigned long, which causes problems here if you pass exactly 64K of string plus one '\000'.

    Addendum (2012-12-19 09:15): Actually, I suck at reading code. If the string contains exactly one character (plus the NUL), that character will be left alone and the function will return more or less harmlessly. So a single blank will not be trimmed, and all strings containing only two or more blanks will be trimmed to one blank.

    The nuclear fireball remains the probable response when the string is empty.

  • (cs) in reply to Snowrat
    Snowrat:
    Could've been written a bit better in APL, like trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵} or trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}
    Gesundheit
  • Techpaul (unregistered) in reply to gallier2
    gallier2:
    char *trim_right(char *str)
    {
      size_t l = strlen(str);
      while(l && str[--l] == ' ')
          str[l] = 0;
      return str;
    }
    
    Wasn't that difficult, was it?
    Well I would love to be able to guarantee that

    while(l && str[--l] == ' ')

    Would always work as order of evaluation is not guaranteed, do decrement inside loop.

  • RakerF1 (unregistered) in reply to gallier2

    Very simple and elegant. I like it.

    CAPTCHA: jumentum - don't jumentum it.

  • Rnd( (unregistered)

    Does it make me bad coder if I prefer iterative loops or just loops for this sort of stuff?

    Also, if there is dynamic memory allocation I surely hope the lengths are fixed or originals are stored somewhere...

  • (cs) in reply to Your Name *

    Tail optimization won't remove the strlen at the first line that runs through all the (remaining) string for each character on it.

    Anyway, one normaly doesn't trim large strings. It is quit possible that his code makes no difference at all.

  • A developer (unregistered)

    This is the sort of academic BS I had to do in university programming courses. For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.

  • Aufgehaben (unregistered) in reply to TopTension
    TopTension:
    Sean:
    the CPU time to execute an allegedly unnecessary stack push and pop

    You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.

    Not to mention a potential stack overflow. And wthat tohers have said...

    char *trim_right_r(char *str,int len)
    {
    
       if(len == 0 || str[len] != ' ') return str;
       str[len] = '\0';
       return trim_right_r(str,len-1);
    }
    
    char *trim_right(char *str)
    {
       int len = strlen(str);
       return trim_right_r(str,len-1);
    }
    

    FTFY

  • Mordred (unregistered) in reply to Techpaul

    C does guarantee order of evaluation.

  • (cs) in reply to Meep
    Meep:
    Lisa:
    Doesn't pretty much all of lisp work this way?

    LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.

    C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.

    Only the relevant state, which might not be much at all. An x86 system running this code might have to save ESI, EDI, and EBP, but can freely discard or corrupt EBX, ECX, and EDX, while EAX would be the return value, so not saved at all. Since there is no use of the FPU, the FPU state can easily be ignored.

    And C does not have to have a contiguous stack for parameter passing and local variables. Mandating such a thing would make writing C compilers for System/370 difficult, since these machines don't even have a direct concept of a stack. And even on an architecture with a machine stack, it isn't mandatory to put the activation records on the machine stack. Most systems do it that way because it's faster than allocating from the free store / heap for it, but it isn't required behaviour. (However, you can make a case that we shouldn't store activation records on the machine stack, in order to prevent buffer overflows from corrupting return addresses and allowing malware delivery.)

    I talk too much sometimes.

  • QJo (unregistered) in reply to Garrison Fiord
    Garrison Fiord:
    Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.

    I've used it once or twice when doing graph-theoretical work (network analysis, etc.) It can be a convenient technique for finding a top-level ancestor not, IIRC ... writing it in the form of a loop makes for an unwieldy codebase, whereas recursion makes for simple and easily-comprehended algorithm. The caveat is, of course, that you ought only to implement it on structures which are not more than four or five levels deep - and in the problem domain for which this solution was designed, that condition held.

  • (cs) in reply to Techpaul
    Techpaul:
    Well I would love to be able to guarantee that

    while(l && str[--l] == ' ')

    Would always work as order of evaluation is not guaranteed, do decrement inside loop.

    There's a sequence point at the &&, so the left-hand side will be completely evaluated before anything happens on the right, giving this expression defined behaviour.
  • Xarthaneon the Unclear (unregistered)

    [quote]

    char *trim_right(char *str)
    {
    int len = strlen(str) - 1;
    if(len == 0 || str[len] != ' ') return str;
    str[strlen(str)-1] = '\0';
    return trim_right(str);
    }

    Ouch. Evil function. I'm not sure what part is more evil - the part where it's actually recursive (seems a bit unnecessary to me), the bit where it's intended to be replacing ' ' with an explicit NUL '\0'...or the part where it's actually just making the final character of the given string a NUL '\0'. Hope that's not called too often...

    Personally, I'd just do something like this:

    char* trim_right(char* str)
    {
      for(int i = strlen(str) - 1; i >= 0; i--)
        if(str[i] == ' ')
            str[i] = '\0';
    
      return str;
    }

    I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

    CAPTCHA: causa - the original might causa problem!

  • (cs) in reply to gallier2

    If we're micro-optimizing, shouldn't we refrain from writing an end-of-string null character until we've found the first non-blank? The CPU could give you lots of flushes if it lets the loads (to check for blanks) run ahead of the stores (which depend on the values of the loaded bytes) and it doesn't track load/store collisions on a byte basis (perhaps word or cache line). Not to mention the MP effects of all that store traffic hitting the write gathering.

    Of course, if we were expecing significant white space, we'd look at the first few bytes to get to word-alignment and then start looking for non-zero words. For even more speed, the SIMD engines on modern processors can even tell us which byte was non-zero while looking at 16 bytes at a time.

  • ¯\(°_o)/¯ I DUNNO LOL (unregistered)

    struct {   int important;   char stuff[10]; } foo; foo.important = 0x12345678; ... trim_right(foo.stuff);

  • (cs) in reply to Xarthaneon the Unclear
    Xarthaneon the Unclear:
    Personally, I'd just do something like this:
    char* trim_right(char* str)
    {
      for(int i = strlen(str) - 1; i >= 0; i--)
        if(str[i] == ' ')
            str[i] = '\0';
    
      return str;
    }

    I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

    CAPTCHA: causa - the original might causa problem!

    It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

    C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.

  • ¯\(°_o)/¯ I DUNNO LOL (unregistered) in reply to ¯\(°_o)/¯ I DUNNO LOL
    ¯\(°_o)/¯ I DUNNO LOL:
    struct {   int important;   char stuff[10]; } foo; foo.important = 0x20202020; ... trim_right(foo.stuff);
    FTFM
  • gallier2 (unregistered) in reply to Techpaul
    while(l && str[--l] == ' ')
    

    Would always work as order of evaluation is not guaranteed, do decrement inside loop.

    Order of evaluation is guaranteed, && || are so called sequence points, they force the order of evaluation, or else the short-circuit evaluation wouldn't work.

    If I had written

    while((l != 0) & (str[--l] == ' '))
    

    Then you would be right? There would be 'undefined behaviour' because of the side-effect and there would be still be the buffer underflow.

  • ¯\(°_o)/¯ I DUNNO LOL (unregistered) in reply to Black Bart
    Black Bart:
    Someone just learned about recursion, and it became the tool of choice for the next several weeks.

    When you have a hammer in your hand, everything becomes a nail.

    Once there was a programmer who had a problem. He decided to use recursion. Then he had a programmer who had a problem. He decided to use recursion...

  • Xarthaneon the Unclear (unregistered) in reply to Steve The Cynic
    Steve The Cynic:
    Xarthaneon the Unclear:
    Personally, I'd just do something like this:
    char* trim_right(char* str)
    {
      for(int i = strlen(str) - 1; i >= 0; i--)
        if(str == ' ')
            str[i] = '\0';
    
      return str;
    }

    I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

    CAPTCHA: causa - the original might causa problem!

    It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

    C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.

    I went and looked up strlen() after reading that. [i]Darn it. I noticed in their example that they perform a cast on the returned size_t to get around the issue of it not necessarily returning an int/uint32/uint64/whatever.

    But, while I agree my rewrite is not as graceful, other than the strlen issue, fixed as such:

    char* trim_right(char* str)
    {
      for(int i = ((int)strlen(str)) - 1; i >= 0; i--)
        if(str[i] == ' ')
            str[i] = '\0';
    
      return str;
    }

    how is it wrong?

  • (cs)

    For the defenders and nay sayers, the following is wrong in this piece of code:

    • the repeated call to strlen makes it O(n^2)
    • it can write to the byte before the start of the string, which might happen e.g. if that's where malloc keeps its block length and the block happens to be 32 bytes The recursion is a very mild WTF in comparison to the above issues, and one that any decent compiler can optimize away.

    This is not "the programmer took the requirements bladibla" or "elegant". This. is. crap.

  • (cs) in reply to A developer
    A developer:
    This is the sort of academic BS I had to do in university programming courses. For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.
    Recursion can do many things better than other methods. This is not one of them. There are a number of possible reasons this code exists, but none of them excuse it.

    Recursion is great. Never use it. Recursion is the Cthulhu of code. Do not wake Cthulhu lest you go mad.

Leave a comment on “Recursive Whitespace Removal”

Log In or post as a guest

Replying to comment #:

« Return to Article