Recursive Whitespace Removal

« Return to Article
  • Black Bart 2012-12-19 08:02
    Someone just learned about recursion, and it became the tool of choice for the next several weeks.

    When you have a hammer in your hand, everything becomes a nail.
  • gallier2 2012-12-19 08:08
    Not cool to write a 0 byte before the buffer if the string has length 0 and there is a ' ' before it. This seems quite low odd case but it can happen if you use a pointer to scan a longer string with trailing blanks, and passing that pointer to that function.

    The recursion is of course another wtf.
  • eVil 2012-12-19 08:09
    Presumably...

    unsigned int strlen(char* str)
    {
    if (str[0] == '\0')
    {
    return 0;
    }
    return 1 + strlen(str + 1);
    }
  • Your Name * 2012-12-19 08:09
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.
  • Not That Paul 2012-12-19 08:13
    C: All the power of Assembly with most of the flexibility.
  • gallier2 2012-12-19 08:17
    It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills.
    C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
  • Tero T. 2012-12-19 08:19
    Maybe the coder had some background in Prolog programming.
  • Sean 2012-12-19 08:21
    This is C; it's going to run pretty darn fast. And no matter how you implement it (or let someone else implement it) string processing by its very nature involves looping over characters until you do or don't find something. So those who say "OMFG what if there's a lot of spaces" are in the same category as those who say "moar XML!" -- they just don't understand what's actually happening inside the machine.

    So this boils down to which costs more: the CPU time to execute an allegedly unnecessary stack push and pop, or the developer time to take a solved problem and write a "more elegant" implementation?
  • Lisa 2012-12-19 08:22
    Doesn't pretty much all of lisp work this way?
  • Thomas Kyte 2012-12-19 08:22


    [tkyte@localhost ~]$ !cat
    cat test.c
    #include "string.h"
    #include "stdio.h"

    char *trim_right(char *str)
    {
    int len = strlen(str) - 1;

    printf( "len = %d\n", len );
    if(len == 0 || str[len] != ' ') return str;
    printf( "setting str[%d] to zero\n", strlen(str)-1 );
    str[strlen(str)-1] = '\0';
    return trim_right(str);
    }

    int main()
    {
    char str[] = "";
    char str1[1];

    str1[0] = ' ';
    trim_right(str);
    printf( "%d\n", strlen(str) );

    return 0;
    }
    [tkyte@localhost ~]$ ./test
    len = -1
    setting str[-1] to zero
    len = -1
    0
    [tkyte@localhost ~]$



    buffer underwrite.... it doesn't work as intended.
  • AdamK 2012-12-19 08:23
    Looks like it was coded by someone that had a lot of experience with Prolog or Lisp
  • Smug Unix User 2012-12-19 08:25
    The real WTF is not searching for a library to handle this for you. Look at what other people have written and do a nice code review and drop in your borrowed solution.
  • Garrison Fiord 2012-12-19 08:28
    Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.
  • eak 2012-12-19 08:31
    the tail recursion is arguably correct. It's the double strlen call and the failure on empty string that make this code bad.
  • lunaryorn 2012-12-19 08:34
    gallier2:
    […] one should not rely on compiler optimizations to excuse poor algorithmic skills.

    Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.
  • F 2012-12-19 08:36
    Your Name *:
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.


    But it isn't working. Did you not read the earlier comment by gallier2?

    If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.
  • F 2012-12-19 08:38
    lunaryorn:
    gallier2:
    […] one should not rely on compiler optimizations to excuse poor algorithmic skills.

    Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.


    Try checking the algorithm for bugs before claiming that "poor algorithmic skill" implies a criticism of tail recursion. Be prepared to be laughed at.
  • gallier2 2012-12-19 08:38
    char *trim_right(char *str)
    
    {
    size_t l = strlen(str);
    while(l && str[--l] == ' ')
    str[l] = 0;
    return str;
    }

    Wasn't that difficult, was it?
  • chernobyl 2012-12-19 08:40
    Well, how much whitespace characters do you expect? One? Two? A billion? The programmer has probably taken into account the kind of data fed into the function and has decided that this approach would work. The function is a bit unusual and sloppy but I would not call it a WTF.
  • gallier2 2012-12-19 08:41
    The poor skill refers to even using recursion for a simple string manipulation. Recursion is an excellent tool for some problems, strings is normally not one of them.
  • DrStat 2012-12-19 08:41
    Your Name *:
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.


    Are you the same guy who thinks that it is pointless to spend time on an algorithm when just can buy a faster CPU?
  • TopTension 2012-12-19 08:42
    Sean:
    the CPU time to execute an allegedly unnecessary stack push and pop


    You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.

    Not to mention a potential stack overflow. And wthat tohers have said...
  • Meep 2012-12-19 08:48
    "However, this power is limited insomuch that you don't have many of the friendly helper-functions that exist in higher level languages, such as string manipulation for example, unless you go ahead and make them yourself."

    Or use one of the gigabytes of libraries that have been written for C, the single most successful programming language of all time.
  • Honnza 2012-12-19 08:49
    How come no one ever so far was horrified by the quadratic time complexity? Sure, there's a memory cache. Still, it's not easy to get 1G instructions under 1G clock cycles.

    captcha - El grasso es verto. El noche es negro.
  • John 2012-12-19 08:49
    It reminds me of the old contest to implement the slowest possible sort algorithm. That produced some rather mind-bogglingly slow sort routines. Perhaps the author(s) of this knew about that contest, and decided to try the same idea for string trimming. Calling strlen(), which does a scan of the string, twice during each recursive iteration is a clever idea for such a contest. This routine could be made even slower, however, by not saving the length, but calling strlen(str) for each use of len. That would add a third call, and adding a test for strlen(str) != -1 (rather than changing the ==0 test to <=0) would add another loop.
  • biziclop 2012-12-19 08:49
    Apart from the unnecessary recursion and the serious buffer underrun issue, it also fails completely if whitespace takes any other form than 0x20 (e.g. \t, \n, \r).
  • Snowrat 2012-12-19 08:51
    Could've been written a bit better in APL, like
    trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵}
    or
    trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}
  • Meep 2012-12-19 08:54
    Lisa:
    Doesn't pretty much all of lisp work this way?


    LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.

    C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.
  • Steve The Cynic 2012-12-19 08:58
    F:
    Your Name *:
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.


    But it isn't working. Did you not read the earlier comment by gallier2?

    If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.

    The bug is actually worse than that. Any call where the string does not contain non-blanks (e.g. the empty string, or any number of blanks), and the byte immediately before the string is a blank, will destroy that blank. If the byte immediately before the string does not exist (not mapped, etc.) it will crash when checking to see if the character is blank. If that byte is marked read-only, the write will crash. If the system is a DeathStation 9000, several million people will die in a nuclear fireball.

    And, of course, it is entirely feasible (as suggested by another poster above) that int is not large enough to hold all possible values of the return value of strlen. The other case is a 16-bit real-mode x86 system using the commonly available "huge" memory model, where data pointers were normalised to always have an offset in the 0-15 range. size_t wasn't normally (at the time) bigger than 16 bits, but you could imagine some nutjob at Borland or Microsoft deciding it should be unsigned long, which causes problems here if you pass exactly 64K of string plus one '\000'.

    Addendum (2012-12-19 09:15):
    Actually, I suck at reading code. If the string contains exactly one character (plus the NUL), that character will be left alone and the function will return more or less harmlessly. So a single blank will not be trimmed, and all strings containing only two or more blanks will be trimmed to one blank.

    The nuclear fireball remains the probable response when the string is empty.
  • RobFreundlich 2012-12-19 08:59
    Snowrat:
    Could've been written a bit better in APL, like
    trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵}
    or
    trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}

    Gesundheit
  • Techpaul 2012-12-19 09:00
    gallier2:
    char *trim_right(char *str)
    
    {
    size_t l = strlen(str);
    while(l && str[--l] == ' ')
    str[l] = 0;
    return str;
    }

    Wasn't that difficult, was it?

    Well I would love to be able to guarantee that

    while(l && str[--l] == ' ')

    Would always work as order of evaluation is not guaranteed, do decrement inside loop.
  • RakerF1 2012-12-19 09:01
    Very simple and elegant. I like it.

    CAPTCHA: jumentum - don't jumentum it.
  • Rnd( 2012-12-19 09:02
    Does it make me bad coder if I prefer iterative loops or just loops for this sort of stuff?

    Also, if there is dynamic memory allocation I surely hope the lengths are fixed or originals are stored somewhere...
  • Mcoder 2012-12-19 09:04
    Tail optimization won't remove the strlen at the first line that runs through all the (remaining) string for each character on it.

    Anyway, one normaly doesn't trim large strings. It is quit possible that his code makes no difference at all.
  • A developer 2012-12-19 09:05
    This is the sort of academic BS I had to do in university programming courses.
    For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.
  • Aufgehaben 2012-12-19 09:06
    TopTension:


    Sean:

    the CPU time to execute an allegedly unnecessary stack push and pop


    You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.

    Not to mention a potential stack overflow. And wthat tohers have said...



    char *trim_right_r(char *str,int len)
    {

    if(len == 0 || str[len] != ' ') return str;
    str[len] = '\0';
    return trim_right_r(str,len-1);
    }

    char *trim_right(char *str)
    {
    int len = strlen(str);
    return trim_right_r(str,len-1);
    }

    FTFY
  • Mordred 2012-12-19 09:08
    C does guarantee order of evaluation.
  • Steve The Cynic 2012-12-19 09:10
    Meep:
    Lisa:
    Doesn't pretty much all of lisp work this way?


    LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.

    C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.

    Only the *relevant* state, which might not be much at all. An x86 system running this code might have to save ESI, EDI, and EBP, but can freely discard or corrupt EBX, ECX, and EDX, while EAX would be the return value, so not saved at all. Since there is no use of the FPU, the FPU state can easily be ignored.

    And C does not have to have a contiguous stack for parameter passing and local variables. Mandating such a thing would make writing C compilers for System/370 difficult, since these machines don't even have a direct concept of a stack. And even on an architecture with a machine stack, it isn't mandatory to put the activation records on the machine stack. Most systems do it that way because it's faster than allocating from the free store / heap for it, but it isn't required behaviour. (However, you can make a case that we shouldn't store activation records on the machine stack, in order to prevent buffer overflows from corrupting return addresses and allowing malware delivery.)

    I talk too much sometimes.
  • QJo 2012-12-19 09:17
    Garrison Fiord:
    Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.


    I've used it once or twice when doing graph-theoretical work (network analysis, etc.) It can be a convenient technique for finding a top-level ancestor not, IIRC ... writing it in the form of a loop makes for an unwieldy codebase, whereas recursion makes for simple and easily-comprehended algorithm. The caveat is, of course, that you ought only to implement it on structures which are not more than four or five levels deep - and in the problem domain for which this solution was designed, that condition held.
  • Steve The Cynic 2012-12-19 09:18
    Techpaul:
    Well I would love to be able to guarantee that

    while(l && str[--l] == ' ')

    Would always work as order of evaluation is not guaranteed, do decrement inside loop.

    There's a sequence point at the &&, so the left-hand side will be completely evaluated before anything happens on the right, giving this expression defined behaviour.
  • Xarthaneon the Unclear 2012-12-19 09:19
    [quote]
    char *trim_right(char *str)
    
    {
    int len = strlen(str) - 1;
    if(len == 0 || str[len] != ' ') return str;
    str[strlen(str)-1] = '\0';
    return trim_right(str);
    }


    Ouch. Evil function. I'm not sure what part is more evil - the part where it's actually recursive (seems a bit unnecessary to me), the bit where it's intended to be replacing ' ' with an explicit NUL '\0'...or the part where it's actually just making the final character of the given string a NUL '\0'. Hope that's not called too often...

    Personally, I'd just do something like this:

    char* trim_right(char* str)
    
    {
    for(int i = strlen(str) - 1; i >= 0; i--)
    if(str[i] == ' ')
    str[i] = '\0';

    return str;
    }


    I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

    CAPTCHA: causa - the original might causa problem!
  • sibtrag 2012-12-19 09:20
    If we're micro-optimizing, shouldn't we refrain from writing an end-of-string null character until we've found the first non-blank? The CPU could give you lots of flushes if it lets the loads (to check for blanks) run ahead of the stores (which depend on the values of the loaded bytes) and it doesn't track load/store collisions on a byte basis (perhaps word or cache line). Not to mention the MP effects of all that store traffic hitting the write gathering.

    Of course, if we were expecing significant white space, we'd look at the first few bytes to get to word-alignment and then start looking for non-zero words. For even more speed, the SIMD engines on modern processors can even tell us which byte was non-zero while looking at 16 bytes at a time.
  • ¯\(°_o)/¯ I DUNNO LOL 2012-12-19 09:22
    struct {
      int important;
      char stuff[10];
    } foo;
    foo.important = 0x12345678;
    ...
    trim_right(foo.stuff);
  • Steve The Cynic 2012-12-19 09:26
    Xarthaneon the Unclear:
    Personally, I'd just do something like this:

    char* trim_right(char* str)
    
    {
    for(int i = strlen(str) - 1; i >= 0; i--)
    if(str[i] == ' ')
    str[i] = '\0';

    return str;
    }


    I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

    CAPTCHA: causa - the original might causa problem!

    It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

    C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.
  • ¯\(°_o)/¯ I DUNNO LOL 2012-12-19 09:27
    ¯\(°_o)/¯ I DUNNO LOL:
    struct {
      int important;
      char stuff[10];
    } foo;
    foo.important = 0x20202020;
    ...
    trim_right(foo.stuff);
    FTFM
  • gallier2 2012-12-19 09:31


    while(l && str[--l] == ' ')


    Would always work as order of evaluation is not guaranteed, do decrement inside loop.


    Order of evaluation is guaranteed, && || are so called sequence points, they force the order of evaluation, or else the short-circuit evaluation wouldn't work.

    If I had written

    while((l != 0) & (str[--l] == ' '))


    Then you would be right? There would be 'undefined behaviour' because of the side-effect and there would be still be the buffer underflow.
  • ¯\(°_o)/¯ I DUNNO LOL 2012-12-19 09:37
    Black Bart:
    Someone just learned about recursion, and it became the tool of choice for the next several weeks.

    When you have a hammer in your hand, everything becomes a nail.
    Once there was a programmer who had a problem. He decided to use recursion. Then he had a programmer who had a problem. He decided to use recursion...
  • Xarthaneon the Unclear 2012-12-19 09:40
    Steve The Cynic:
    Xarthaneon the Unclear:
    Personally, I'd just do something like this:

    char* trim_right(char* str)
    
    {
    for(int i = strlen(str) - 1; i >= 0; i--)
    if(str == ' ')
    str[i] = '\0';

    return str;
    }


    I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

    CAPTCHA: causa - the original might causa problem!

    It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

    C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.


    I went and looked up strlen() after reading that. [i]Darn it.
    I noticed in their example that they perform a cast on the returned size_t to get around the issue of it not necessarily returning an int/uint32/uint64/whatever.

    But, while I agree my rewrite is not as graceful, other than the strlen issue, fixed as such:

    char* trim_right(char* str)
    
    {
    for(int i = ((int)strlen(str)) - 1; i >= 0; i--)
    if(str[i] == ' ')
    str[i] = '\0';

    return str;
    }


    how is it wrong?
  • TGV 2012-12-19 09:43
    For the defenders and nay sayers, the following is wrong in this piece of code:
    - the repeated call to strlen makes it O(n^2)
    - it can write to the byte before the start of the string, which might happen e.g. if that's where malloc keeps its block length and the block happens to be 32 bytes
    The recursion is a very mild WTF in comparison to the above issues, and one that any decent compiler can optimize away.

    This is not "the programmer took the requirements bladibla" or "elegant". This. is. crap.
  • Delve 2012-12-19 09:44
    A developer:
    This is the sort of academic BS I had to do in university programming courses.
    For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.

    Recursion can do many things better than other methods. This is not one of them. There are a number of possible reasons this code exists, but none of them excuse it.

    Recursion is great. Never use it. Recursion is the Cthulhu of code. Do not wake Cthulhu lest you go mad.
  • @Deprecated 2012-12-19 09:45
    Steve The Cynic:
    Xarthaneon the Unclear:
    Personally, I'd just do something like this:

    char* trim_right(char* str)
    
    {
    for(int i = strlen(str) - 1; i >= 0; i--)
    if(str[i] == ' ')
    str[i] = '\0';

    return str;
    }


    I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

    CAPTCHA: causa - the original might causa problem!

    It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

    C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.


    Also, for the input "What The F \0", the original code will return "What The F\0\0" whereas this code will return "What\0The\0F\0\0", or just "What" if you print it out or something.
  • jarvis 2012-12-19 09:47
    chernobyl:
    Well, how much whitespace characters do you expect? One? Two? A billion? The programmer has probably taken into account the kind of data fed into the function and has decided that this approach would work.


    And in a nutshell that's why we have functioning exploits in the wild!
  • gallier2 2012-12-19 09:53
    You just made it worse. Now the compiler hasn't even the chance to give you the warning of the downcast.
    The correction to your code is to declare your i as a size_t. That's what it was invented for. It's in the standard, even built-in language operators like sizeof use that type (that is imho one of the few wtf of the C language, an operator returning a type, which has its definition in an external header file "stddef.h").

    As for systems where sizeof (size_t) > sizeof (int) you can count all 64 bit systems (Solaris, Linux, *BSD and Windows).
    Microcontrollers with Harvard architecture also often have int in the size of the RAM address size (16 bits) and size_t in the size of the ROM address size (24 bits)
  • Dee 2012-12-19 09:55
    Apart from the buffer underwrite described above, I think tail recursion optimization would make this an acceptable solution.
  • PleegWat 2012-12-19 09:57
    [code]
    void trim_right( char * str )
    {
    char * p = NULL;
    while( *str )
    {
    if( p && !isspace(*str) )
    p = NULL;
    if( !p && isspace(*str) )
    p = str;
    }

    if( p ) *p = '\0';
    }
  • RichP 2012-12-19 09:59
    Not That Paul:
    C: All the power of Assembly with most of the flexibility.


    and twice the readability!
  • CoderHero 2012-12-19 10:04
    chernobyl:
    Well, how much whitespace characters do you expect? One? Two? A billion? The programmer has probably taken into account the kind of data fed into the function and has decided that this approach would work. The function is a bit unusual and sloppy but I would not call it a WTF.


    It would be problematic if this were coming from a columnar data feed where the data has 300 columns, and the content is the word "hi".
    But then again, as long as the stack is big enough, no problem right?
  • Tim Ward 2012-12-19 10:04
    Maybe they also learned about compilers that do tail recursion optimisation.

    But I would agree that unless you are knowingly coding for a specific compiler for a good reason you shouldn't rely on this, and that this function isn't a good example of why you might want to do so.
  • eVil 2012-12-19 10:06
    I'd just like to point out that tail recursion is fine for Haskell programmers for a number of reasons:

    1) The entire point of Haskell, as a functional language, is to allow recursion and other similarly functional constructs.

    2) Haskell will handle long chains of recursion gracefully, without spitting stack overflows at you.

    3) They chose Haskell, because they're not hugely worried about performance.


    None of these are appropriate reasons to use tail recursion in C++ however.
  • Rick 2012-12-19 10:06
    Mordred:
    C does guarantee order of evaluation.

    a = ++b + func(b); // evaluation order is specified
    a = func(++b, b); // evaluation order is NOT specified
  • gallier2 2012-12-19 10:14
    PleegWat:

    void trim_right( char * str )
    {
    char * p = NULL;
    while( *str )
    {
    if( p && !isspace(*str) )
    p = NULL;
    if( !p && isspace(*str) )
    p = str;
    }

    if( p ) *p = '\0';
    }



    You forgot a -- somewhere and I don't think that overwriting p with NULL after you found your first non blank is a good idea either. Your code to write 0 would then not trigger.

    But in a code review, I would reject it because it is too complicated for what it does. The multiple writes when blanking a lot of trailing spaces will hit the "write coalescing buffers and caches" that every modern CPU has nowaday. Only in the extreme case were writes were order of magnitudes more costly than reads would I consider such an approach.
  • Steve The Cynic 2012-12-19 10:21
    Xarthaneon the Unclear:
    But, while I agree my rewrite is not as graceful, other than the strlen issue, fixed as such:

    char* trim_right(char* str)
    
    {
    for(int i = ((int)strlen(str)) - 1; i >= 0; i--)
    if(str[i] == ' ')
    str[i] = '\0';

    return str;
    }


    how is it wrong?

    It's wrong for three main reasons:

    1. if strlen() returns a wider type than int (e.g. 64-bit size_t, 32-bit int), and the string's length exceeds the width of int (e.g. a 65KB string on 16-bit int), then you only look at the remainder when dividing by 2<<bitsof(int) (e.g. 1K for a 65K string on 16-bit).

    2. Regardless of the width of strlen() and int, if the string is INT_MAX+2 or more (but less than UINT_MAX+1) characters, you will not modify the string at all because i will be immediately negative.

    3. As pointed out above by someone else, your function stomps on all spaces, even the ones with non-spaces to the right, thus turning your function into an inefficient version of truncateToFirstWord().
  • so what 2012-12-19 10:28
    Not sure I see the wtf. This works, and (knowing the financial world fairly well) will be called at most 2-3 times in recursion. Much better than iterating over the entire string to find the first space in the last group of spaces.

    At least he didn't do a mem copy every time or something retarded like that.
  • Steve The Cynic 2012-12-19 10:34
    Rick:
    Mordred:
    C does guarantee order of evaluation.

    a = ++b + func(b); // evaluation order is specified
    a = func(++b, b); // evaluation order is NOT specified

    Actually, in the first case, you are wrong. The compiler is allowed to execute func(b) before or after ++b, depending on what sort of capricious mood it is in.

    The worst "multiple interpretations of undefined code" I ever saw was this:

    unsigned char *bp = /* stuff */;
    
    unsigned char table[16] = { /* values */ };

    loop N times
    {
    *bp++ = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
    }


    All the compilers we had in the company, for various architectures and operating systems, produced the superficially expected result (look up the values, shift and combine them, store the result, increment the pointer).

    All the compilers, except one, that is.

    The x86-hosted to-MIPS cross-compiler I was using to bring this code to the controller CPU of an ATM switch (Asynchronous Transfer Mode, not Automated Teller Machine) evaluated bp++ completely (and stashed the before-increment value) before evaluating anything else, then evaluated the right-hand side using the incremented value of bp, then stored the result using the stashed before-increment value.

    The contents of the table were designed so that the calculation reversed the bit-order of the bytes, so if it was working correctly and you did it twice to the same memory, there was no net result.

    This stuff all added up to something that looked like a symptom of something else, and I was expecting to have that other problem, so it took me three days to track down what was happening.

    Fixed:
    unsigned char *bp = /* stuff */;
    
    unsigned char table[16] = { /* values */ };

    loop N times
    {
    *bp = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
    bp++;
    }


    Undefined behaviour: Just say, "No."
  • Mike 2012-12-19 10:51
    This comments section looks like it has mostly been written by a bunch of non programming former programmer trolls all pointing and laughing while desperately trying to be offended by the slightest little edge case scenario.

    But the fact remains that the world isn't ending and the code is executing with good performance in real time. A minor bug that has probably never manifested has been noticed and should be fixed.

    As for recursion; anyone who thinks pushing to a stack is a slow and inefficient operation... now there is the real wtf.
  • Coyne 2012-12-19 11:02
    Garrison Fiord:
    Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.


    For most RW solutions, yes, because most of them are pretty simple and really don't justify recursion. It's just that recursion is such an amusing hammer.

    Factorial, for instance, is often shown in recursive form but is faster, uses less memory, and isn't really more difficult to understand in simple loop form.

    But there are RW algorithms where recursion is very useful; one that comes to mind right off is quicksort.

    And, of course, there is the inimitable "Towers of Hanoi". A toy problem, yes, but if you want a "head hurts" experience try implementing that without recursion sometime.
  • Anonymous Bob 2012-12-19 11:18
    Your Name *:
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.


    Besides the mentioned bugs, they've made a O(n) algorithm into an O(n^2) algorithm.

    Also, if you ported this to a small embedded computer with limited resources, you could easily run into a stack overflow. This type of thing happened to me years ago when I ported some 32-bit Unix code to 16-bit Windows.
  • Rnd( 2012-12-19 11:24
    Anonymous Bob:
    Your Name *:
    So what?

    Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

    IMHO this is far from TDWTF material.


    Besides the mentioned bugs, they've made a O(n) algorithm into an O(n^2) algorithm.

    Also, if you ported this to a small embedded computer with limited resources, you could easily run into a stack overflow. This type of thing happened to me years ago when I ported some 32-bit Unix code to 16-bit Windows.


    One without file system?
  • OldCoder 2012-12-19 11:28
    Steve The Cynic:

    The bug is actually worse than that. Any call where the string does not contain non-blanks (e.g. the empty string, or any number of blanks), and the byte immediately before the string is a blank, will destroy that blank. If the byte immediately before the string does not exist (not mapped, etc.) it will crash when checking to see if the character is blank. If that byte is marked read-only, the write will crash. If the system is a DeathStation 9000, several million people will die in a nuclear fireball.

    Ah! Now I know what Jeff Goldblum did to the alien mothership in Independence Day!

    He changed their trim_right routine!
  • gallier2 2012-12-19 11:29
    Wrong on the bunch of non programming former programmer trolls, I'm a still C programming programmer troll.

    As for the code, first a little context. I work on a project for a big organisation doing translation work, we have a huge system with a lot of components written in several languages (Java, perl, VBA, python, C++) but the core code of the backend was written in plain C. Mostly C90, now modernized in C99 were possible. The codebase was written around 1997 and put in production around 2000. The system runs since then rather successfully.

    Now to my example. Last september we got one document that didn't crash in production but returned an erroneous output file. I tracked the error and it was in the part of the code a colleague wrote several years ago (around 2004). It was exactly the same bug as in the WTF of today. The use of an int instead of a size_t in return of strlen, a loop condition that made the program overwrite str[len-1] with len==0. The result was in some extreme conditions, the output file was a little bit garbled. Had my colleague used the size_t, the writing of str[len-1] would have resulted in a "segmentation fault" directly (because len-1 would have yielded ULONG_MAX) and we would have found the bug in 2004 and not delivered crappy files for 8 years.
    (to counter directly the naysayers who will say that our quality control failed bla bla bla, an erroneous file wouldn't be noticed by the end user, it would be comparable to a missing blank in a book).


  • Garrison Fiord 2012-12-19 11:34
    Coyne:
    Garrison Fiord:
    Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.


    For most RW solutions, yes, because most of them are pretty simple and really don't justify recursion. It's just that recursion is such an amusing hammer.

    Factorial, for instance, is often shown in recursive form but is faster, uses less memory, and isn't really more difficult to understand in simple loop form.

    But there are RW algorithms where recursion is very useful; one that comes to mind right off is quicksort.

    And, of course, there is the inimitable "Towers of Hanoi". A toy problem, yes, but if you want a "head hurts" experience try implementing that without recursion sometime.

    It appears you overlooked the word "almost".
  • Captain Oblivious 2012-12-19 11:35
    Lisa:
    Doesn't pretty much all of lisp work this way?


    Yes, but Lisp and other functional languages do things like "tail call elimination", to turn a recursive scheme into a loop for the machine to run in constant space.

    The Real WTF is that C compilers aren't guaranteed to do tail call elimination. Recursion is too good (expressive) a scheme to ignore.
  • Edmund 2012-12-19 11:48
    Recursion can be more elegant if the domain you're working in has objects defined in a "recursive" way, or at least a way which lends itself to recursive operations.
  • Moe 2012-12-19 11:48
    I wouldn't trust code written by the programmer in ANY language.

    Back, in the .com boom, I was tasked with giving technical interviews to prospective Java programmers. After a half dozen Java questions to judge their skill level, I put in some C questions.

    The guys who were "thanked for their time" usually asked a question like this:

    "That's C! Why are you asking that in a Java interview!?"

    "If you can't write good C, how the hell can you write good Java?"

    The ones who nailed the interview didn't need to ask the question -- they knew why. Three weeks later, I got a memo from the exec in charge of software development observing the technical quality of the hew hires.

    The bad news is that the company was sucked down the toilet in the .com crash -- but my experience is far from unique in that respect....
  • vahokif 2012-12-19 11:51
    It's tail recursive. gcc -O2 compiles it into a loop.
  • Steve The Cynic 2012-12-19 11:54
    Coyne:
    And, of course, there is the inimitable "Towers of Hanoi". A toy problem, yes, but if you want a "head hurts" experience try implementing that without recursion sometime.

    Ah, yes, Arkey-malarkey... No, it's not even hard to do without recursion.

    At any point, there are at most two legal moves. It is always legal to move the smallest disk, because no disk is smaller (and therefore blocking). The other two exposed disks are different sizes, by definition, and you can move the smaller of them onto the larger.

    The non-recursive solution is, therefore:

    Every other move moves the smallest disk. All the other moves make the other legal move.

    The only question, then, is which way to move the smallest disk.

    If the goal is to use peg B to move the stack from peg A to peg C, then:
    1. If the number of disks is odd, the smallest disk moves A->C->B->A
    2. If the number of disks is even, the smallest disk moves A->B->C->A

    Example: Four disks (15 moves)

    A: 4321 B: C: <== move 1 to B only legal move at this point
    A: 432 B: 1 C: <== move 2 to C just moved 1, so make a non-1 move
    A: 43 B: 1 C: 2 <== move 1 to C just moved non-1, so make a 1 move
    A: 43 B: C: 21 <== move 3 to B just moved 1, so make a non-1 move
    A: 4 B: 3 C: 21 <== move 1 to A just moved non-1, so make a 1 move
    A: 41 B: 3 C: 2 <== move 2 to B just moved 1, so make a non-1 move
    A: 41 B: 32 C: <== move 1 to B just moved non-1, so make a 1 move
    A: 4 B: 321 C: <== move 4 to B just moved 1, so make a non-1 move
    A: B: 321 C: 4 <== move 1 to C just moved non-1, so make a 1 move
    A: B: 32 C: 41 <== move 2 to A just moved 1, so make a non-1 move
    A: 2 B: 3 C: 41 <== move 1 to A just moved non-1, so make a 1 move
    A: 21 B: 3 C: 4 <== move 3 to C just moved 1, so make a non-1 move
    A: 21 B: C: 43 <== move 1 to B just moved non-1, so make a 1 move
    A: 2 B: 1 C: 43 <== move 2 to C just moved 1, so make a non-1 move
    A: B: 1 C: 432 <== move 1 to C the only legal move now
    A: B: C: 4321 <== done
  • silent bob 2012-12-19 11:59
    Frankly, my cat loves the dogs breakfast when the it comes up. He really meows in pleasure.
  • Martijn 2012-12-19 12:14
    I see no real problem with that. According to http://llvm.org/demo/index.cgi it produces

    .file "/tmp/webcompile/_17418_0.bc"
    .text
    .globl strlen1
    .align 16, 0x90
    .type strlen1,@function
    strlen1: # @strlen1
    .Ltmp0:
    .cfi_startproc
    # BB#0:
    xorl %eax, %eax
    cmpb $0, (%rdi)
    je .LBB0_3
    # BB#1: # %tailrecurse.preheader
    xorl %eax, %eax
    .align 16, 0x90
    .LBB0_2: # %tailrecurse
    # =>This Inner Loop Header: Depth=1
    cmpb $0, 1(%rdi,%rax)
    leaq 1(%rax), %rax
    jne .LBB0_2
    .LBB0_3: # %tailrecurse._crit_edge
    # kill: EAX<def> EAX<kill> RAX<kill>
    ret
    .Ltmp1:
    .size strlen1, .Ltmp1-strlen1
    .Ltmp2:
    .cfi_endproc
    .Leh_func_end0:


    .section ".note.GNU-stack","",@progbits

    vs

    .file "/tmp/webcompile/_19359_0.bc"
    .text
    .globl strlen1
    .align 16, 0x90
    .type strlen1,@function
    strlen1: # @strlen1
    .Ltmp0:
    .cfi_startproc
    # BB#0:
    xorl %eax, %eax
    movl 4(%esp), %ecx
    cmpb $0, (%ecx)
    je .LBB0_2
    .align 16, 0x90
    .LBB0_1: # %.lr.ph
    # =>This Inner Loop Header: Depth=1
    cmpb $0, 1(%ecx,%eax)
    leal 1(%eax), %eax
    jne .LBB0_1
    .LBB0_2: # %._crit_edge
    ret
    .Ltmp1:
    .size strlen1, .Ltmp1-strlen1
    .Ltmp2:
    .cfi_endproc
    .Leh_func_end0:


    .section ".note.GNU-stack","",@progbits

    generated by

    unsigned int strlen1(char* str)
    {
    unsigned int res = 0;
    while (str[res] != '\0') res++;
    return res;
    }
  • Captain Oblivious 2012-12-19 12:22
    eVil:
    I'd just like to point out that tail recursion is fine for Haskell programmers for a number of reasons:

    1) The entire point of Haskell, as a functional language, is to allow recursion and other similarly functional constructs.

    2) Haskell will handle long chains of recursion gracefully, without spitting stack overflows at you.

    3) They chose Haskell, because they're not hugely worried about performance.


    None of these are appropriate reasons to use tail recursion in C++ however.


    I don't disagree with you exactly, but as a statically typed, compiled language, Haskell is several times faster than dynamic scripting languages, and reaches performance comparable to tuned C.

    I know that specific domains require high performance. But Haskell is finding a lot of use in "high performance" industries like finance specifically because correctness is easy to program, performance is good, and the tight inner loops can be written in C when it isn't good enough.

    GHC is getting very good at performance, even without going to the FFI.

    That said, stay away from Haskell. You guys don't want your minds blown. You want to sit in your cubicle and treat your work like an assembly line. Your bosses want you to type out buggy code which will be fixed "later" (i.e., if it ever becomes a problem customers care about).

    http://lukeplant.me.uk/blog/posts/why-learning-haskell-python-makes-you-a-worse-programmer/
    http://stackoverflow.com/questions/2704652/monad-in-plain-english-for-the-oop-programmer-with-no-fp-background/13656209#13656209
  • Rick 2012-12-19 12:27
    Steve The Cynic:
    Rick:
    Mordred:
    C does guarantee order of evaluation.

    a = ++b + func(b); // evaluation order is specified
    a = func(++b, b); // evaluation order is NOT specified

    Actually, in the first case, you are wrong. The compiler is allowed to execute func(b) before or after ++b, depending on what sort of capricious mood it is in.

    The worst "multiple interpretations of undefined code" I ever saw was this:

    unsigned char *bp = /* stuff */;
    
    unsigned char table[16] = { /* values */ };

    loop N times
    {
    *bp++ = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
    }


    All the compilers we had in the company, for various architectures and operating systems, produced the superficially expected result (look up the values, shift and combine them, store the result, increment the pointer).

    All the compilers, except one, that is.

    The x86-hosted to-MIPS cross-compiler I was using to bring this code to the controller CPU of an ATM switch (Asynchronous Transfer Mode, not Automated Teller Machine) evaluated bp++ completely (and stashed the before-increment value) before evaluating anything else, then evaluated the right-hand side using the incremented value of bp, then stored the result using the stashed before-increment value.

    The contents of the table were designed so that the calculation reversed the bit-order of the bytes, so if it was working correctly and you did it twice to the same memory, there was no net result.

    This stuff all added up to something that looked like a symptom of something else, and I was expecting to have that other problem, so it took me three days to track down what was happening.

    Fixed:
    unsigned char *bp = /* stuff */;
    
    unsigned char table[16] = { /* values */ };

    loop N times
    {
    *bp = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
    bp++;
    }


    Undefined behaviour: Just say, "No."

    The order is definitely defined. MIPS simply got it wrong. (Wow, haven't heard that name in a while.)
  • Mr.Bob 2012-12-19 12:29
    All these comments about recursion is evil, he's calling strlen() twice, blah blah blah, and no one is pointing out the more serious problems like the lack of checking for a null pointer on entry or how passing an empty string causes the routine to write to invalid memory?

    Jeez, it's like griping about how they used Comic Sans to spell "Titanic" on the side of your ship...
  • dnabre 2012-12-19 12:30
    Hopefully they're using a standard version, which is more like


    size_t strlen(const char *s) {
    size_t len = 0;
    while(s[len]!='\0') ++len;
    return len;
    }

    or at least a tail-recursive version.

  • towel 2012-12-19 12:31
    Black Bart:
    When you have a hammer in your hand, everything becomes a nail.


    When you have a nail in your head, everything becomes a unicorn.
  • dnabre 2012-12-19 12:34
    Most C compilers nowadays will do tail/sibling recursion ok, but either need a high enough -O or a specific flag.

    It's reasonable to rely on this, but you have to the right the flags to make sure it happens. Last I looked up it up (back around 2008 I think) the current gcc and microsoft C compiler supported it.

    I would assume that LLVM handles it fine.
  • gallier2 2012-12-19 12:37
    Mr.Bob:
    All these comments about recursion is evil, he's calling strlen() twice, blah blah blah, and no one is pointing out the more serious problems like the lack of checking for a null pointer on entry or how passing an empty string causes the routine to write to invalid memory?

    Jeez, it's like griping about how they used Comic Sans to spell "Titanic" on the side of your ship...


    You should read the comments before posting. I mentioned the invalid memory overwrite in the second comment.
  • gallier2 2012-12-19 12:40
    Rick:
    Steve The Cynic:
    Rick:
    Mordred:
    C does guarantee order of evaluation.

    a = ++b + func(b); // evaluation order is specified
    a = func(++b, b); // evaluation order is NOT specified

    Actually, in the first case, you are wrong. The compiler is allowed to execute func(b) before or after ++b, depending on what sort of capricious mood it is in.

    The worst "multiple interpretations of undefined code" I ever saw was this:

    unsigned char *bp = /* stuff */;
    
    unsigned char table[16] = { /* values */ };

    loop N times
    {
    *bp++ = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
    }


    All the compilers we had in the company, for various architectures and operating systems, produced the superficially expected result (look up the values, shift and combine them, store the result, increment the pointer).

    All the compilers, except one, that is.

    The x86-hosted to-MIPS cross-compiler I was using to bring this code to the controller CPU of an ATM switch (Asynchronous Transfer Mode, not Automated Teller Machine) evaluated bp++ completely (and stashed the before-increment value) before evaluating anything else, then evaluated the right-hand side using the incremented value of bp, then stored the result using the stashed before-increment value.

    The contents of the table were designed so that the calculation reversed the bit-order of the bytes, so if it was working correctly and you did it twice to the same memory, there was no net result.

    This stuff all added up to something that looked like a symptom of something else, and I was expecting to have that other problem, so it took me three days to track down what was happening.

    Fixed:
    unsigned char *bp = /* stuff */;
    
    unsigned char table[16] = { /* values */ };

    loop N times
    {
    *bp = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
    bp++;
    }


    Undefined behaviour: Just say, "No."

    The order is definitely defined. MIPS simply got it wrong. (Wow, haven't heard that name in a while.)


    No, the order in which the lvalue and the rvalue are evaluated are undefined. The only thing defined is that the assignment is done after these evaluations.

    see http://stackoverflow.com/questions/4362501/any-good-reason-why-assignment-operator-isnt-a-sequence-point
    for a detailed explanation.
  • Meower68 2012-12-19 13:20
    Written by somebody who normally writes Scheme code. It is, after all, tail recursive. Doesn't that result in an optimal loop? Most Scheme interpreters or compilers will replace the call with a goto.

    Oh, wait. This isn't Scheme. This is C.
  • /Wegge 2012-12-19 13:27
    Techpaul:

    Well I would love to be able to guarantee that

    while(l && str[--l] == ' ')

    Would always work as order of evaluation is not guaranteed, do decrement inside loop.


    The order of that evaluation is very well defined, for any conforming C compiler.
  • David T 2012-12-19 13:33
    so what:
    Not sure I see the wtf. This works, and (knowing the financial world fairly well) will be called at most 2-3 times in recursion. Much better than iterating over the entire string to find the first space in the last group of spaces.


    It _does_ iterate over the entire string, twice for each recursive call, to find the length. With a single pointer to keep track of the last non-space found, it could be finished after the _first_ iteration over the string.
  • Canuck 2012-12-19 13:39
    Actually . . .

    Once there was a programmer who had a problem. He decided to use recursion. Then he had two midget programmers each whom had a problem. They decided to use recursion...
  • Captcha:quibus 2012-12-19 13:47
    Ha ha! He's using a programming pattern that his language wasn't designed for!

    That's funny.
  • Mr.Bob 2012-12-19 13:54
    gallier2:
    Mr.Bob:
    All these comments about recursion is evil, he's calling strlen() twice, blah blah blah, and no one is pointing out the more serious problems like the lack of checking for a null pointer on entry or how passing an empty string causes the routine to write to invalid memory?

    Jeez, it's like griping about how they used Comic Sans to spell "Titanic" on the side of your ship...


    You should read the comments before posting. I mentioned the invalid memory overwrite in the second comment.


    I do see it there; my apologies, good sir.
  • Ilkka 2012-12-19 14:01
    Also it doesn't trim strings that contain only a single space. Although that might be by... "design", to put it politely.
  • AN AMAZING CODER 2012-12-19 14:07
    A developer:
    This is the sort of academic BS I had to do in university programming courses.
    For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.


    I have a buddy who's taking CS right now. All he's been talking about for the last 3 months is recursion recursion recursion. He said his professor wants him to do things this way.

    I've been wondering WTF is up with that, so I'm glad to hear I'm not the only one.

    I tried to explain to him that "recursion is dead simple, and just one way of solving something that you'll probably not use very often in practice". He responded with a question about recursion.
  • TGV 2012-12-19 14:16
    so what:
    Not sure I see the wtf. This works, and (knowing the financial world fairly well) will be called at most 2-3 times in recursion. Much better than iterating over the entire string to find the first space in the last group of spaces.

    No. The code has the potential to write in memory outside the string and calls strlen, which iterates the whole string, something you don't seem to understand, every iteration twice, except for the last one. Your programming license has been revoked. Please go do management.

    Mike:
    This comments section looks like it has mostly been written by a bunch of non programming former programmer trolls ... the code is executing with good performance in real time.

    Another one that doesn't get it. Please hand in your programming license.

    vahokif:
    It's tail recursive. gcc -O2 compiles it into a loop.

    Which, as pointed out, is a minor WTF in comparison to the other issues with the code.

  • Devastated of Rochdale 2012-12-19 14:19
    Words cannot express the weird gut-wrench I felt when my eyes alighted on this code in my RSS feed. I think it may have given me norovirus.

    I don't have anything constructive to add (sorry) because when I try to focus on the code my vision starts to blur, presumably in self-defense.

    Suffice it to say the real WTF may be that this is the first code snippet ever to make me feel the need to comment on it, even if that comment is little more than a howl of anguish.
  • Delve 2012-12-19 14:35
    Mr.Bob:
    All these comments about recursion is evil, he's calling strlen() twice, blah blah blah, and no one is pointing out the more serious problems like the lack of checking for a null pointer on entry or how passing an empty string causes the routine to write to invalid memory?

    Jeez, it's like griping about how they used Comic Sans to spell "Titanic" on the side of your ship...


    Except that if he had used a simple, straightforward loop it would have obviated the more serious problems he introduced. A string (or another array) is a loop-friendly data structure. Use the right tool for the job. What makes this TDWTF-worthy is the novel (hopefully) use of recursion as an anti-pattern. Everything else is just plain poor quality.
  • ¯\(°_o)/¯ I DUNNO LOL 2012-12-19 14:47
    As long as we're playing Code Golf...

    void trim_right(char *str)
    {
      char *end = str;
      while (*str) {
        if (*str++) != ' ') {
          end = str;
        }
      }
      *end = 0;
    }

    Why worry about size_t?
  • ronpaii 2012-12-19 14:49
    If you are optimizing, why replace all the white space with NUL instead of only the 1st space character on the right? In C a string ends with the 1st NUL. String function don't check the whole buffer looking for another string.
  • Evan 2012-12-19 14:55
    All you people yelling "tail recursion" don't understand what you're talking about.

    The problem isn't the lack of iteration -- TCO solves that nicely. The problem is that every recursive call is going to do another strlen() call. TCO does nothing to eliminate that.
  • Evan 2012-12-19 14:58
    Evan:
    All you people yelling "tail recursion" don't understand what you're talking about.

    The problem isn't the lack of iteration -- TCO solves that nicely. The problem is that every recursive call is going to do another strlen() call. TCO does nothing to eliminate that.

    (And, btw, as someone said earlier, this turns something that should be an O(n) operation into O(n^2) worst-case; in general, O(nm) where n is the length of the string and m is the number of trailing spaces.)
  • TheOptimizer 2012-12-19 14:59
    Thomas Kyte:
    ....

    buffer underwrite.... it doesn't work as intended.


    if it did any decent compiler optimizer would go "trail recursion? convert into loop".
  • TheOptimizer 2012-12-19 15:06
    A developer:
    This is the sort of academic BS I had to do in university programming courses.
    For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.


    try working with trees.
  • Todd 2012-12-19 15:25
    Devastated of Rochdale:
    Words cannot express the weird gut-wrench I felt when my eyes alighted on this code in my RSS feed. I think it may have given me norovirus.

    I don't have anything constructive to add (sorry) because when I try to focus on the code my vision starts to blur, presumably in self-defense.

    Suffice it to say the real WTF may be that this is the first code snippet ever to make me feel the need to comment on it, even if that comment is little more than a howl of anguish.

    I thought it was pretty neat.
  • Adin Falkhoff 2012-12-19 15:50
    Is APL even A Programmin Language?
  • A developer 2012-12-19 15:55
    And a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion......
    LOL
  • Hans 2012-12-19 15:57
    I think all of you need to google for "tail recursion". This is perfectly valid code, and a modern compiler is smart enough to optimize this.
  • Mr.Bob 2012-12-19 16:42
    ¯\(°_o)/¯ I DUNNO LOL:
    As long as we're playing Code Golf...

    void trim_right(char *str)
    {
      char *end = str;
      while (*str) {
        if (*str++) != ' ') {
          end = str;
        }
      }
      *end = 0;
    }

    Why worry about size_t?


    Only two nits to pick with your answer:
    1. Still doesn't handle the possibility that str is NULL. Twice.
    2. Strings are arrays of characters, please don't mix them with integer types.
  • The Dark Lord 2012-12-19 16:43
    Snowrat:
    Could've been written a bit better in APL, like
    trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵}
    or
    trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}


    Finally some code I can understand on this forum! Thank you Sir. Oh I miss my APL.

    re·cur·sion
    /riˈkərZHən/
    Noun

    1.See "recursion"

  • LK 2012-12-19 17:02
    So why the strlen anyway ? Assembly-wise, strlen (the gallier2 version of course) is more efficient only if whitespace detection is difficult. If we're in 7-bit ASCII, it's not. We can either take 0x20 or take into account tabs and such and say <= 0x20 !

    Following is less elegant than gallier2. Is it more efficient...


    char *trim_right(char *str)
    {
    char *stop = NULL ;
    char *cursor = str ;
    char current = *cursor ;

    while (current != '\0') {
    int printable = (current > ' ') ;

    if (printable) { if (stop) stop=NULL ; }
    else if (!stop) { stop=cursor ; }

    cursor++ ;
    current = *cursor ;
    }

    if (stop) *stop = '\0' ;

    return str;
    }


    Nah, to be really efficient I'll provide trWTF (gag and throw up all of you anti-goto fanatics):


    #define PRINTABLE(x) (x > ' ')

    char *trim_right(char *str)
    {
    char *stop = NULL ;
    char *cursor = str ;
    char current ;

    goto First;

    NextUnstopped:
    cursor++ ;

    First:
    current = *cursor ;
    if (current == '\0') goto End;
    if PRINTABLE(current) goto NextUnstopped;
    stop = cursor ;

    FoundMoreWhitespace:
    cursor++ ;
    current = *cursor ;
    if (current == '\0') goto End;
    if (!PRINTABLE(current)) goto FoundMoreWhitespace ;
    stop = NULL ;
    goto NextUnstopped ;

    End:
    if (stop) *stop = '\0' ;
    return str;
    }


    Don't need no stinkin' optimizer. Unless maybe some modern instruction set... have to investigate that... or not :-)
  • LK 2012-12-19 17:06
    Reading Mr Bob's comment... so if (!str) return NULL; OK...
  • Evan 2012-12-19 17:34
    To everyone saying it's not much of a WTF performancewise: maybe some actual data will convince you. This is how bad it is, based on my own measurements. This is running on Linux, with GCC 4.7.2, with -O3. (-O2 doesn't make much of a difference.)

    The following shows, for strings of various lengths made entirely of spaces (BTW I didn't read the thread, did anyone point out it's buggy in this scenario?), how much longer the submitted version takes to run vs. ¯\(°_o)/¯ I DUNNO LOL's solution (quoted below by Mr. Bob), which I arbitrarily chose.

    length how dumb it is
    -------- --------------
    1,000 ~25x
    10,000 240x
    100,000 2,380x
    500,000 13,500x
    1,000,000 27,200x

    For the million-character string, the submitted version took about 33 seconds, and the reasonable solution took about 1.2 milliseconds.

    Now, maybe you say "well, a string of a million spaces, that's not a realistic input so I'm not going to worry", and that's not a totally unreasonable standpoint to take.

    However, to say "eh it's not dumb because the compiler will just optimize away the tail recursion" is overlooking the dumbness of taking a linear-time algorithm and turning it into a quadratic one. (Well, while not trading out a large constant -- e.g. Fibonacci heaps are required for Dijkstra's algorithm to have it's advertised complexity, but the constant factor of Fibonacci heaps over other kinds of heaps means in practice you'll usually get better performance with a higher-complexity version.)

    Play around with it yourself; you'll have to modify the way you do timing (look at QueryPerformanceCounter() or something) if you want to run it on Windows. http://pastebin.com/M5xZJYdd
  • Evan 2012-12-19 17:50
    Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more -- for the 1,000,000 character string, by another 17x, which gives a whopping 450,000x suckitude factor for the submitted version.

    Also, you don't need to have a string of all spaces for other methods to show performance advantages -- a single space at the end of a 10,000-character string is enough for gallier2's solution to register about a 1.5x improvement over the submitted version (though in that case the submitted version is actually ~4x faster than I DUNNO LOL's).
  • Scott 2012-12-19 17:53
    Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.

    char* trim_right( char* str )
    {
    int i = 0, j = 0;
    while ( str[i] != '\0' )
    {
    if ( str[i] != ' ' )
    {
    j = i;
    }
    ++i;
    }
    str[j + 1] = '\0';
    return str;
    }
  • Evan 2012-12-19 18:03
    Evan:
    Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more -- for the 1,000,000 character string, by another 17x, which gives a whopping 450,000x suckitude factor for the submitted version.

    Actually this is incorrect, somewhat. For the case of a string made entirely of spaces, gallier2's version walks backward over the string character by character and is also "slow". This means that multiplying the 17x improvement of gallier2's version on tests that are NOT all spaces (in that case it was 999,999 'a's followed by a single space) by the 27,000x or whatever improvement from my earlier table was just flat out wrong.

    However, on a string that's not all spaces, gallier2's version is indeed faster.

    (For the same reason, Scott's version he just posted is also slow in comparison to gallier2 -- about the same as I DUNNO LOL, or about 18x slower than gallier2's for the 999,999 'a's then a space test case. "Not calling other functions" (in this case, strlen()), is not a virtue.)
  • metadalek 2012-12-19 18:23
    I would use (assuming str is a valid string and p is a char *):

    if ((p = strrchr(str, ' ')) != NULL) *p = '\0';

  • Evan 2012-12-19 18:40
    metadalek:
    I would use (assuming str is a valid string and p is a char *):

    if ((p = strrchr(str, ' ')) != NULL) *p = '\0';

    That may be the buggiest version yet:
    "abc  " => "abc "
    
    " ab c" => " ab"
    " abc" => " "
  • A developer 2012-12-19 19:14
    Alright it's official.
    YOU'RE ALL A BUNCH OF NERDS!
  • Trollinator 2012-12-19 19:26
    TRWTF is the submitter who doesn't seem to understand tail recursion optimisation.
  • zdfxh 2012-12-19 19:38
    Who was it that said something like:
    "C has all the power and capability of Assembler with the elegance and readability of Assembler"
  • Jim 2012-12-19 19:42
    gallier2:
    Not cool to write a 0 byte before the buffer if the string has length 0 and there is a ' ' before it. This seems quite low odd case but it can happen if you use a pointer to scan a longer string with trailing blanks, and passing that pointer to that function.

    The recursion is of course another wtf.
    Except the line before that takes care of that....

    I wonder why he didn;t reuse len?
  • ase; dl 2012-12-19 19:44
    gallier2:
    It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills.
    C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
    I can't see no stinkin' buffer underflow....
  • ase;dl 2012-12-19 19:46
    ase; dl:
    gallier2:
    It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills.
    C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
    I can't see no stinkin' buffer underflow....
    although "len" is not actually len....
  • Norman Diamond 2012-12-19 19:49
    gallier2:
    Rick:
    Steve The Cynic:
    The worst "multiple interpretations of undefined code" I ever saw was this:
    unsigned char *bp = /* stuff */;
    
    unsigned char table[16] = { /* values */ };

    loop N times
    {
    *bp++ = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
    }
    The x86-hosted to-MIPS cross-compiler I was using to bring this code to the controller CPU of an ATM switch (Asynchronous Transfer Mode, not Automated Teller Machine) evaluated bp++ completely (and stashed the before-increment value) before evaluating anything else, then evaluated the right-hand side using the incremented value of bp, then stored the result using the stashed before-increment value.

    Fixed:
    unsigned char *bp = /* stuff */;
    
    unsigned char table[16] = { /* values */ };

    loop N times
    {
    *bp = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
    bp++;
    }
    Undefined behaviour: Just say, "No."
    The order is definitely defined. MIPS simply got it wrong. (Wow, haven't heard that name in a while.)
    No, the order in which the lvalue and the rvalue are evaluated are undefined. The only thing defined is that the assignment is done after these evaluations.

    see http://stackoverflow.com/questions/4362501/any-good-reason-why-assignment-operator-isnt-a-sequence-point
    for a detailed explanation.
    The analyses by MIPS, Steve, and Gallier are correct but understated. One more rule in the C standard is that the old value of bp shall be used only to determine the new value to be stored in bp, but the program sometimes (depending on the order of evaluation which can go either way) uses the old value in computing a different expression. So the behaviour is completely undefined. The assignment never has to be reached. The program could format your hard drives instead.

    P.S. Considering the amount of embedded systems in use now in TV remote controls, cars, and rice cookers, it might be a good idea to be aware of the continued existence of MIPS.
  • metadalek 2012-12-19 20:37
    Sorry, I assumed we have trailing white space.

    You obviously need an extra guard if that assumption is not correct.
  • Exercise 2012-12-19 20:40
    Rnd(:
    Does it make me bad coder if I prefer iterative loops or just loops for this sort of stuff?

    Also, if there is dynamic memory allocation I surely hope the lengths are fixed or originals are stored somewhere...
    there's an intriguing concept, a non-iterative loop....


    boolean x = false;
    /* We want a non-iterative loop */
    while(x==false) x = true;

    /* I guess this would be too */
    //don't loop
    while(x==false) printf("We didn't iterate here\n");
  • ase; dl 2012-12-19 20:40
    ase;dl:
    ase; dl:
    gallier2:
    It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills.
    C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
    I can't see no stinkin' buffer underflow....
    although "len" is not actually len....
    OOPS - NOW I SEE IT!!!
  • jimmy 2012-12-19 20:46
    Mordred:
    C does guarantee order of evaluation of pre and post increment operators

    FTFY.

    C does guarantee order of evaluation of lots of other things, but.

    Some compilers will assume pre-increment means "before any statement on the line" while some will assume it means "in place, but before it is used for this particular operation".
    (for post :s/before/after/g)

    Pretty sure the Visual Studio one and MinGW are different (which many of you might have so play around with things like:

    A=8;B=8
    printf("%d\n", A++ + ++B);
    printf("%d\n", --A + B--);

    etc and compare the results on both of them....If you yhave multiple ++ (or --) on the same var on a row it becomes particularly interesting....
  • Evan 2012-12-19 21:40
    metadalek:
    Sorry, I assumed we have trailing white space.

    You obviously need an extra guard if that assumption is not correct.

    It doesn't work even in that case: if you have multiple spaces at the end, it will only remove the final one (as illustrated by my first test case).

    jimmy:
    Some compilers will assume pre-increment means "before any statement on the line" while some will assume it means "in place, but before it is used for this particular operation".
    (for post :s/before/after/g)

    Pretty sure the Visual Studio one and MinGW are different (which many of you might have so play around with things like:

    A=8;B=8
    printf("%d\n", A++ + ++B);
    printf("%d\n", --A + B--);

    etc and compare the results on both of them....If you yhave multiple ++ (or --) on the same var on a row it becomes particularly interesting....

    And Norman Diamond's comment is correct. Unlike some kinds of implementation-defined behavior (does foo(bar(), baz()) call bar() or baz() first? the standard doesn't say), if you have multiple increments, decrements, assignments (i = i++ or (i=7) + (i=4)), etc. to the same variable without an intervening sequence point, the entire behavior is undefined. It doesn't even have to pick one.

    For instance, in practice, i = i++ will likely either leave the value of i unchanged or increment it, but it needn't do either. Once a program encounters truly undefined behavior, it can do anything. That could crash your program. It could mail your porn stash to your mother, and it would be totally allowed by the C standard.
  • Evan 2012-12-19 21:50
    jimmy:
    A=8;B=8
    printf("%d\n", A++ + ++B);
    printf("%d\n", --A + B--);

    Actually looking at this code again I think that this has deterministic behavior according to the standard; if I'm correct, that is required to print 17\n17\n and leave A == B == 8 after those three lines.

    You might be looking for something like
    A = 8;
    
    printf("%d %d\n", A++, A);

    which is allowed to print either "8 9" or "8 8" (but is required to do one or the other).
  • ¯\(°_o)/¯ I DUNNO LOL 2012-12-19 22:13
    Evan:
    Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more
    You mean 397535?

    Do you even know how strlen works? Hint: strlen is an O(n) operation.

    What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length.
  • Coyne 2012-12-19 22:18
    Garrison Fiord:

    It appears you overlooked the word "almost".


    Ummm..yes, actually I did. Sorry about that; your qualifier was correctly present.
  • Coyne 2012-12-19 22:25
    Steve The Cynic:
    Coyne:
    And, of course, there is the inimitable "Towers of Hanoi". A toy problem, yes, but if you want a "head hurts" experience try implementing that without recursion sometime.

    Ah, yes, Arkey-malarkey... No, it's not even hard to do without recursion.

    At any point, there are at most two legal moves. It is always legal to move the smallest disk, because no disk is smaller (and therefore blocking). The other two exposed disks are different sizes, by definition, and you can move the smaller of them onto the larger.

    The non-recursive solution is, therefore:

    Every other move moves the smallest disk. All the other moves make the other legal move.

    The only question, then, is which way to move the smallest disk.

    If the goal is to use peg B to move the stack from peg A to peg C, then:
    1. If the number of disks is odd, the smallest disk moves A->C->B->A
    2. If the number of disks is even, the smallest disk moves A->B->C->A


    Okay, when I was working this problem, I didn't know about the iterative solution. But I looked at it and the variations on Wikipedia, and it seems to me that any expression of these algorithms would involve tracking arrays and some fairly complex decision logic. Just for example, you have to know how many disks in advance, and what the top disk on each stack is.

    The recursive solver, which I hacked together again, requires one solitary condition (marked below):


    public class TowersOfHanoi {

    static int step = 0;

    /**
    * Output the steps for a Tower of Hanoi stack move from one of 3 pegs to the another of 3 pegs.
    * @param from peg to move a stack from
    * @param to peg to move a stack to
    * @param work peg to use as an intermediate
    * @param count number of disks to move from "from" to "to"
    */
    public static void moveTower(int from, int to, int work, int count)
    {
    if (count == 1)
    {
    System.out.println("Step " + (++step) + ": Move disk from peg " + from + " to peg " + to + ".");
    }
    else
    {
    moveTower(from, work, to, count-1);
    moveTower(from, to, work, 1);
    moveTower(work, to, from, count-1);
    }
    }

    /**
    * @param args
    */
    public static void main(String[] args) {
    moveTower( 1, // move from peg 1
    3, // move to peg 3
    2, // use peg 2 as the initial intermediate
    7); // move 7 disks total
    }

    }


    Hmmm....
  • Lex Luthor 2012-12-19 22:26
    Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.
  • ¯\(°_o)/¯ I DUNNO LOL 2012-12-19 22:34
    FYI, what you were seeing was the effect of strlen using the x86 string instructions on your specific computer. Try it on an ARM-based system sometime. Optimization is not an all or nothing situation. The same code could run faster on some systems and slower on others.

    And are you seriously implying that someone would (intentionally) use this kind of function on a string of more than about 256 bytes or so? At that point, the advantage of the string instructions is lost.
    1,000 ~25x

    Does that mean it runs in negative time? I don't think that math means what you think it means. You probably meant 1/25x, or 25 times FASTER. Guess what? It's not going to be called to nip off blanks at the end of a 1 megabyte block of memory, unless there's a bug that overwrote the null at the end. There's no point in such an operation. In all sane use cases, my code is... faster!

    Benchmarks are useless unless you know WHY something is faster or slower. Benchmarks are also useless if you're doing the equivalent of feeding mice a year's worth of cyclamates every day.
  • Evan 2012-12-19 22:40
    ¯\(°_o)/¯ I DUNNO LOL:
    Do you even know how strlen works? Hint: strlen is an O(n) operation.

    What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length.

    Real strlen() is more efficient than just reading a string character-by-character in a loop. There's a reason that glibc's strlen(), for example, is closer to 50 loc than 5.

    Lex Luthor:
    Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.

    Why does everyone seem to think that "the author used recursion" is the wtf instead of "it's buggy and horribly inefficient [not due directly to the recursion]"?
  • Evan 2012-12-19 22:49
    In all sane use cases, my code is... faster!

    The post with the chart was comparing your (reasonable) solution with the (dumb) solution from the story submission; your code is faster than that in all cases that I could measure. :-) For 1,000 characters, 25 times faster than the dumb submission. For 1,000,000 characters, 27000 times faster.

    FYI, what you were seeing was the effect of strlen using the x86 string instructions on your specific computer. Try it on an ARM-based system sometime. Optimization is not an all or nothing situation. The same code could run faster on some systems and slower on others.

    This is true, and it's applicable to my later, more informal, comparison to gallier2's strlen-based solution. It's possible that in other architectures, a typical real strlen would be closer to what you did.

    However, the improvements I cited were not due to the x86 string instructions, which are not (to my knowledge) used by glibc. (Aside: perhaps a strlen() in code does not actually call glibc's strlen but invokes an intrinsic which uses them? Perhaps. I'm too lazy to check.)

    And are you seriously implying that someone would (intentionally) use this kind of function on a string of more than about 256 bytes or so?

    I could see someone using it on input, which means that the length of the string is in the hands of the user rather than the developer.

    Does that mean it runs in negative time? I don't think that math means what you think it means. You probably meant 1/25x, or 25 times FASTER.

    Sorry, that ~ is "about"; the execution time of your solution with 1000 bytes is comparable to the frequency of the timer I was using (1 microsecond) and so the difference was usually either around 18x or 36x depending on whether yours took just one microsecond or two, so I just sorta picked something in the middle. :-)
  • Hacky 2012-12-19 23:24
    Could in fact corrupt heap or segv by writing to memory before string I believe.
  • Igel 2012-12-19 23:37
    it's machinations of evil lispers =)
  • ¯\(°_o)/¯ I DUNNO LOL 2012-12-20 00:00
    Ah, that was a tilde at low resolution.

    I thought you were comparing mine vs gallier2's, when you were comparing mine vs the article's. FWIW, gallier2's would strictly be slower than mine when given a buffer full of nothing but blanks, because I never have to backtrack. If you want your benchmark to be meaningful, you have to try both optimal and pessimal cases.

    strlen's speed is not only dependent upon the library quality, it is even possible for a compiler to detect it and generate inline string instructions. But without string instructions, it'll be little better than inlining "*p=str; len=0; while(*p++) len++; return len;"

    Also, using array indexing in a loop may potentially generate much worse code than using a pointer if the compiler can't identify the looping and factor it out into a simple pointer. On some architectures it may be the same speed or faster to use indexing over pointers. (Keep the base address in a register, add the offset in an addressing mode.)

    That would explain the sharp speed difference in the benchmark, which I mistakenly thought was gallier2's and due to the strlen call.

    My code has the advantage of being very easy for a half decent compiler to optimize (aside from not likely being able to use string instructions) into very lean code. Everything can be kept in registers without touching the stack, and even the if statement in the loop would take advantage of ARM's conditional execution.

    But really, if it isn't used more than a few times a second, and for short strings, then validity and correctness is far more important than speed.
  • Evan 2012-12-20 00:10
    ¯\(°_o)/¯ I DUNNO LOL:
    I thought you were comparing mine vs gallier2's, when you were comparing mine vs the article's. FWIW, gallier2's would strictly be slower than mine when given a buffer full of nothing but blanks, because I never have to backtrack. If you want your benchmark to be meaningful, you have to try both optimal and pessimal cases.

    Instead of cleaning my apartment or doing something useful, I'm putting together a benchmark set of all of the proposed solutions to trim_right; it'll involve a few different cases; all spaces, no spaces, and in between.
  • Evan 2012-12-20 00:11
    Also, I apologize about not being as clear as I could have been. I'm much less interested in comparing yours vs strlen-based solutions; mostly I'm trying to point out why everyone going "but compilers have tail call optimization!" is barking up the wrong tree.
  • Curious 2012-12-20 00:21
    What happens when an empty (0 length) string is provided... buffer overflow... errr.. underflow?
  • Name Withheld 2012-12-20 00:31
    I admit this code is horribly inefficient, but I just tested it with 53000 spaces, and it completed in less than a millisecond.

    Don't think they have much to worry about.
  • Brendan 2012-12-20 00:35
    Hi,

    LK:

    #define PRINTABLE(x) (x > ' ')

    char *trim_right(char *str)
    {
    char *stop = NULL ;
    char *cursor = str ;
    char current ;

    goto First;

    NextUnstopped:
    cursor++ ;

    First:
    current = *cursor ;
    if (current == '\0') goto End;
    if PRINTABLE(current) goto NextUnstopped;
    stop = cursor ;

    FoundMoreWhitespace:
    cursor++ ;
    current = *cursor ;
    if (current == '\0') goto End;
    if (!PRINTABLE(current)) goto FoundMoreWhitespace ;
    stop = NULL ;
    goto NextUnstopped ;

    End:
    if (stop) *stop = '\0' ;
    return str;
    }


    Don't need no stinkin' optimizer. Unless maybe some modern instruction set... have to investigate that... or not :-)


    That is inefficient - the "if (stop)" can be avoided:

    char *trim_right(char *str)
    
    {
    char *stop;

    next:
    while(!isspace(*str)) {
    if(*str == '\0') return;
    str++;
    }
    stop = str;
    do {
    str++;
    if(*str == '\0') {
    *stop = '\0';
    return;
    }
    } while(isspace(*str));
    goto next;
    }
  • Name Withheld 2012-12-20 00:43
    Sorry that was 5,000. Not 50,000! ;-)
  • Evan 2012-12-20 02:04
    OK, in case anyone wants some experimental evaluation for the proposed
    solutions, I have timing and test
    data
    for every full solution I've seen that I didn't think was
    an obvious joke. (Aufgehaben's may have been a joke too but I included
    it anyway.)

    There are two tables. The top table shows correctness tests for
    several inputs. Correct outputs are shown as green <OK>s; incorrect
    outputs are shown as the result I got. In both inputs (column
    headings) and outputs, spaces are replaced by hyphens. The empty
    string "" is shown as (empty). For example, the article's
    implementation, when given the string " ", outputs " ", which is
    incorrect.

    The lower table shows timing results, in seconds. In the tests,
    exponents denote repetition. I didn't do it very clearly as I don't
    mark the grouping; the first column is shown as "a-^500"; what that
    means is the string "a-" repeated 500 times; the final column means
    499,995 'a's followed by five spaces. (Spaces are again replaced by
    '-'s, of course.)

    I'm not sure of how stable the numbers are; that's from one run
    only. My numbers are from the latest version of MSVC, running on my
    pretty old box.

    And of course it's TOTALLY possible that I screwed something up, so if you think I'm misrepresenting your solution let me know why. :-) (I did have to make very minor changes to a couple for people who changed the return type to void.)


    ---

    If you want to play around with this, there are three parts. The first is
    C++ code that runs each of the
    tests and reports their times in JSON format. For timing, I use the
    timer class available
    from here
    (direct
    link
    ). The JSON can be converted into a readable HTML file by
    this Python script (it reads
    input from stdin and outputs to stdout).

    I didn't try compiling with anything but MSVC, but I also am unaware
    of any reason it won't compile with GCC 4.5+ with -std=c++0x (I use
    the C++11 for loop). To add a new solution, simply add a new block like
    TRIM_RIGHT("test name", "url")
    
    {
    body;
    }

    and it will automatically run it.
  • Eric 2012-12-20 02:09
    A nice example of schlemiel-the-painter's algorithm (mixed with some buffer boundary ignorance)

    http://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm

    (Damn you askimet, this is getting annoying!)
  • Evan 2012-12-20 02:20
    1) Oh yeah, the big change I had to do was to PleegWat's solution, which just enters an infinite loop because it never increments the loop pointer. I stuck an increment in there but didn't really give it much thought as to where it goes, so that may be why it fails so many correctness tests. Not necessarily PleegWat's fault entirely.

    2) Sorry about the formatting of my previous post; I forgot to remove the hard newlines that were added as I edited it in emacs.

    3) I figured out what prompted the discussion about C evaluation order, and it should have been resolved with Steve the Cynic's and gallier2's posts on the first page.
  • Parasietje 2012-12-20 03:20
    Since this is tail-recursion, a good compiler should optimize this and reuse the stack frame.
    I do not know if C has good compilers though.
  • Konstantin Lopyrev 2012-12-20 04:10
    Omg, somebody is wrong on the Internet.
  • gallier2 2012-12-20 04:21
    ¯\(°_o)/¯ I DUNNO LOL:
    Evan:
    Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more
    You mean 397535?

    Do you even know how strlen works? Hint: strlen is an O(n) operation.

    What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length.


    I'm sure EVAN knows that. The thing is, strlen's implementation in most compilers is optimized to death. I've looked at some implementations where it scans 8 bytes in one loop iteration. This means the loop is around 8 times faster as if you scan byte by byte. Then they also took care to limit the number of failed branch predictions, they added prefetch instruction, to limit the number of cache misses. On long strings it makes a hell of a big difference.
    Furthermore, the compiler has informations about standard functions that it hasn't (generaly) for hand-written one.
    The compiler can use intrinsics (Microsoft compilers did that already in DOS times), which means they replaced the function call by ad-hoc code. The compiler also knows that the function strlen is pure. That means that 2 consecutive calls to the function will return the same value if fed the same parameters.

    l = strlen(x);
    l2 = strlen(x);

    will call strlen only once, an optimisation it cannot perform in the general case (it would be fatal for rand() for example).


  • Brendan 2012-12-20 04:39
    Hi,

    Evan:
    OK, in case anyone wants some experimental evaluation for the proposed
    solutions, I have timing and test
    data
    for every full solution I've seen that I didn't think was
    an obvious joke. (Aufgehaben's may have been a joke too but I included
    it anyway.)


    Wow - I'm impressed!

    However, if we're testing speeds then I'd like to make them functionally "more equivalent". Specifically, I'd like to replace "isspace()" with "*str == ' '" (or equivalent as appropriate) in my entry, as I suspect this is causing extra overhead.

    Note: I tried g++ version 4.5.4 and it doesn't like range-based for loops at all, so the code wouldn't compile. Newer versions of g++ (that do support range-based for loops) haven't hit repositories yet.

    -Brendan
  • gallier2 2012-12-20 04:45
    I read the other comments you made and I see that you are aware of all my pontifications I made above, sorry for sounding a bit arrogant.

    EVAN, however, did the right thing. He tested with various inputs and I'm actually really surprised my solution was that good overall.
    I am fully aware that it has O(nm) complexity (n number chars, m number of trailing blanks) but as ever the complexity is not the only thing to consider. The constant factor, is imo often more important than the O() class.
  • Krzysiek 2012-12-20 05:51
    lunaryorn:
    gallier2:
    […] one should not rely on compiler optimizations (not guaranteed by the language standard) to excuse poor algorithmic skills.

    Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.


    Please differentiate between functional languages that depend on recursion because lack of side effects and guarantee tail recursion optimisation and imperative languages that give no TRO guarantee and can bust the stack with sufficiently deep recursion.

    I would like financial transaction systems not to crash because of strings containing 2k spaces, even if encountering such strings seems improbable.
  • TGV 2012-12-20 06:24
    Evan:
    Lex Luthor:
    Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.

    Why does everyone seem to think that "the author used recursion" is the wtf instead of "it's buggy and horribly inefficient [not due directly to the recursion]"?

    Probably because (nearly) everyone is a total moron? Since (nearly) everyone only spots problems by looking for unusual constructions?
    Recursion looks odd, strlen doesn't, so they can stop spotting problems. Since tail recursion is not a big WTF, and some are even so clever to remember that it can be optimized away, they think they look so much smarter by saying "that's not a WTF".
    Bit sad, innit?
  • katastrofa 2012-12-20 07:37
    lunaryorn:
    gallier2:
    […] one should not rely on compiler optimizations to excuse poor algorithmic skills.

    Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.


    Try mentioning Haskell in a discussion about performance and realtime computing... be prepared to be laughed at.
  • Cbuttius 2012-12-20 09:11
    The recursive trim is worst-case O(N²) as strlen is O(N).

    Which is why the trim function would be more efficient to implement in C++ than in C as you can get to the end of a std::string in constant time as the std::string object stores its length so doesn't need to traverse the whole buffer.

    Of course you would pass in a non-const reference to your trim function.

    string has several find methods including find_last_not_of which will enable you to find where the end of the string is. You can then simply resize your string to the returned value + 1. So something like:

    std::string& trim_right( std::string& str )
    {
    std::string::size_type pos = str.find_last_not_of(" \r\t\n");
    if( pos != std::string::npos )
    str.resize( pos + 1 );
    }


    One could also use std::find_if with a `not` adapter to `isspace` using rbegin and rend on your iterator. That would return you an iterator to the last non-space character (or rend if there are none). You can then convert this to a regular iterator and use erase to erase these from your string.

    Finally, one could use boost::regex, (or std::regex if available). Unfortunately far too many would choose that option. It's bettr than the recursive whitespace removal but far inferior to a basic loop or algorithm use.

  • Recursive Reclusive 2012-12-20 09:36
    Scott:
    Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.

    char* trim_right( char* str )
    {
    int i = 0, j = 0;
    while ( str[i] != '\0' )
    {
    if ( str[i] != ' ' )
    {
    j = i;
    }
    ++i;
    }
    str[j + 1] = '\0';
    return str;
    }


    This is similar to I DUNNO LOL's solution and like it, it does bad things to string without spaces.
  • gallier2 2012-12-20 10:16
    Recursive Reclusive:
    Scott:
    Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.

    char* trim_right( char* str )
    {
    int i = 0, j = 0;
    while ( str[i] != '\0' )
    {
    if ( str[i] != ' ' )
    {
    j = i;
    }
    ++i;
    }
    str[j + 1] = '\0';
    return str;
    }


    This is similar to I DUNNO LOL's solution and like it, it does bad things to string without spaces.


    Wow, indeed.

    DUNNO LOL's truncates the string to length 0, Scott's to length 1.
    I think EVAN should add that test in his conformity table at http://jsbin.com/ewelah/1
  • The Recursion King 2012-12-20 10:52
    It's because his index variable J is set to 0 initially.

    For anyone wondering when recursion might come in handy, think about parsing trees with branches of arbitrary depths.
  • TGV 2012-12-20 11:05
    The Recursion King:
    It's because his index variable J is set to 0 initially.

    For anyone wondering when recursion might come in handy, think about parsing trees with branches of arbitrary depths.

    I do my parsing with an LALR automaton and a stack. In FORTRAN.
  • Scott 2012-12-20 12:28
    Thanks Evan for the analysis. Rather informative (I am being sincere, not sarcastic). Now that I have time to follow up...

    * I was in such a hurry I forgot this forum strips tabs out of submissions--sorry my post is so ugly.

    * Oops! Boy did I mess up the strings with only spaces. Initialize j to -1 and it works a lot better.

    * Regardless, I like the implementations with pointers (from I-DUNNO-LOL) better than my index-based solution, even if they very likely optimize down to the same execution speed. So forget the index-based solution--pointers are probably better.

    * Calling strlen() is still not the best solution because of the scan-out and scan-back situation. The gallier2 solution only beats mine (marginally) in speed for most cases because (as somebody pointed out), strlen() is very optimized. If I were offering an ANSI-C level of solution in terms of hand optimization, I would have simply done the single scan forward the same way strlen() does--by doing it 32 or 64 bits at a time, or detecting architecture and using inline assembly. Without calling strlen() first. But that would have resulted in a function that is a lot harder to understand and not as fit for forum discussion. Some things are more important than raw speed.

    * It's better to NOT check for NULL input on functions like this, in keeping with the speed of ANSI-C standard string functions. If somebody calls trim_right() a zillion times with strings that the caller *knows* could not possibly be NULL, it's just a waste of cycles. If the caller suspects the input strings might be NULL, it's the caller's responsibility to check for NULL before calling the function. (Other standard functions like free() are called so frequently with NULL inputs that it makes sense to do the NULL check inside free() to avoid polluting the calling code).

    * Recursive-Reclusive: Huh? I tested for strings without spaces, and it worked fine (even my broken j=0 version worked for strings without spaces).

    * Agreed--the WTF is not the recursion (although that's the worse way to do it), but the inefficiency.

    And again, thanks Evan for the analysis.
  • PiisAWheeL 2012-12-20 13:51
    Garrison Fiord:
    Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.
    Spoken like a man who has never coded in javascript and didn't want to lock up a browser.
  • Evan 2012-12-20 14:29
    Brendan:
    Wow - I'm impressed!

    However, if we're testing speeds then I'd like to make them functionally "more equivalent". Specifically, I'd like to replace "isspace()" with "*str == ' '" (or equivalent as appropriate) in my entry, as I suspect this is causing extra overhead.

    Note: I tried g++ version 4.5.4 and it doesn't like range-based for loops at all, so the code wouldn't compile. Newer versions of g++ (that do support range-based for loops) haven't hit repositories yet.
    Sorry, 4.6 is what added support.

    If I make an update I'll do it.


    Cbuttius:
    The recursive trim is worst-case O(N²) as strlen is O(N).

    Let's be clear; it's only this dumb recursive implementation which is quadratic, and it's not really because of the recursion: a straightforward transformation into a loop won't help (as about 17,000 people have pointed out, the compiler will do that for you), and it would be possible to make a recursive version that wasn't stupid.

    (Also I'd add your std::string-based one to the tests, but it has a different signature and so won't work like that. :-))

    gallier2:
    I think EVAN should add that test in his conformity table at http://jsbin.com/ewelah/1

    Actually I meant to have a test for just "abc" and then forgot about it. Oops.

    Scott:
    * I was in such a hurry I forgot this forum strips tabs out of submissions--sorry my post is so ugly.

    It's C, not Python. No biggie. :-)

    (Not spam not spam not spam?)
  • Scott 2012-12-20 15:40
    Scott:
    * I was in such a hurry I forgot this forum strips tabs out of submissions--sorry my post is so ugly.

    It's C, not Python. No biggie. :-)

    Well then it would have been
    'hello   '.rstrip(' ')
    :)

    And I've thought about it more and I'm starting to turn to the side of the gallier2 solution calling strlen() being better overall. I think I agree that when the compiler sees strlen() it's free to do a lot more optimizations, and it's not like the little trim_right() snippet is unclear for discussions.

    Maybe I'm just tainted because my favorite interview question is to ask a candidate to write "strcpy()" and I'm surprised how many set up a for-loop with strlen() in the terminating condition, without understanding how strlen() works.

    Good discussion, and it's refreshing to finally see a C-related WTF for a change.

    Incidentally, I ran my version and gallier2's on my 8-bit target and for some reason mine beat the socks off gallier2's in every case.

  • Josh 2012-12-20 16:16
    Another Lisp programmer enters the workforce.
  • gnasher729 2012-12-20 17:37
    Sean:
    This is C; it's going to run pretty darn fast. And no matter how you implement it (or let someone else implement it) string processing by its very nature involves looping over characters until you do or don't find something. So those who say "OMFG what if there's a lot of spaces" are in the same category as those who say "moar XML!" -- they just don't understand what's actually happening inside the machine.

    So this boils down to which costs more: the CPU time to execute an allegedly unnecessary stack push and pop, or the developer time to take a solved problem and write a "more elegant" implementation?


    If the string passed in consists of n space characters, this function will do about n recursive calls, examine 1.5 n^2 characters, and return a string that ends in a space character. Except when n = 0 where it can crash. So we are not looking for a "more elegant" implementation of a solved problem. We are looking for a solution that is not stupidly complicated, that isn't stupidly slow (if I can pass a million spaces to that function I have a nice DoS attack), and that is actually correct.
  • gnasher729 2012-12-20 17:53
    Xarthaneon the Unclear:
    But, while I agree my rewrite is not as graceful, other than the strlen issue, fixed as such:

    char* trim_right(char* str)
    
    {
    for(int i = ((int)strlen(str)) - 1; i >= 0; i--)
    if(str[i] == ' ')
    str[i] = '\0';

    return str;
    }


    how is it wrong?


    This will produce an empty string whenever the first character is a space character.
  • Recursive Reclusive 2012-12-20 18:04
    Scott:

    * Recursive-Reclusive: Huh? I tested for strings without spaces, and it worked fine (even my broken j=0 version worked for strings without spaces).

    If there are no spaces in the string, j remains zero. So
    str[j + 1] = '\0'

    truncates the string to lenght one. Easily to fix, but it's a big bug if you don't.
  • Bill C. 2012-12-20 18:59
    Who would want to eliminate tail recursion? That sounds as dumb as removing white space, which is more pleasant to keep occupied again and again. Recursion, iteration, any traversal of tail is fine with me.
  • Scott 2012-12-20 19:22
    Recursive Reclusive:
    Scott:

    * Recursive-Reclusive: Huh? I tested for strings without spaces, and it worked fine (even my broken j=0 version worked for strings without spaces).

    If there are no spaces in the string, j remains zero. So
    str[j + 1] = '\0'

    truncates the string to lenght one. Easily to fix, but it's a big bug if you don't.


    No, it works. Here's the code again (only diff: I fixed j=0 to be j=-1. That fixes the string consisting of only spaces, but makes no difference in the string that contains *no* spaces).


    char* trim_right( char* str )
    {
    int i = 0, j = -1;
    while ( str[i] != '\0' )
    {
    if ( str[i] != ' ' )
    {
    j = i;
    }
    ++i;
    }
    str[j + 1] = '\0';
    return str;
    }


    Feed it "hello" or something. On i == 0, str[i]!=' ' is true and sets j. Compile it and try it. Try it with j=0 instead of -1; it fails the only-spaces case, but not the no-spaces case.

    Here's my test stub:

    #include <stdio.h>
    #include <string.h>

    int main()
    {
    int i;
    char tests[9][20];
    strcpy( tests[0], "hello there " );
    strcpy( tests[1], "hello there " );
    strcpy( tests[2], "hello there" );
    strcpy( tests[3], "hello " );
    strcpy( tests[4], "hello" );
    strcpy( tests[5], "h " );
    strcpy( tests[5], "h" );
    strcpy( tests[6], " " );
    strcpy( tests[7], " " );
    strcpy( tests[8], "" );

    for ( i = 0; i < 9; ++i )
    {
    printf( "|%s| --> ", tests[i] );
    printf( "|%s|\n", trim_right( tests[i] ) );
    }
    }


  • Scott 2012-12-20 19:27
    Follow up, Recursive Reclusive: I hope you get back to me on this because if there is a case I'm missing (or am just misunderstanding something), I do want to learn. Thanks!
  • gallier2 2012-12-21 03:54
    Scott:

    Incidentally, I ran my version and gallier2's on my 8-bit target and for some reason mine beat the socks off gallier2's in every case.



    I didn't put much reflexion when writing my submission on page 1. But it was written in the style that, from my experience, suits best big servers. I work on Oracle M5000 machines (Solaris SPARC 64 bit). Had the routine been written with microcontrollers (those without cache and that are not super-scalar) in mind, I would certainly have written it differently. Probably much more like your version.
    But as already said, I didn't put much optimisation effort when I wrote that snippet, my biggest emphasizes was to write a correct version.
  • Scott 2012-12-21 09:59
    Well you certainly beat me on correctness! (At least the first time--I got it fixed. It's the one test case I forgot to test for).

    I was kidding about the 8-bit thing; it was more a suggestion that strlen() probably isn't much more efficient then the super-dumbed down char-by-char for an 8-bit architecture. But for the nominal case (32/64-bit), a good optimized strlen() ended up making your version faster overall.

    Anyway, good show!

  • Mr.Bob 2012-12-21 11:37
    Recursive Reclusive:
    Scott:
    Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.

    char* trim_right( char* str )
    {
    int i = 0, j = 0;
    while ( str[i] != '\0' )
    {
    if ( str[i] != ' ' )
    {
    j = i;
    }
    ++i;
    }
    str[j + 1] = '\0';
    return str;
    }


    This is similar to I DUNNO LOL's solution and like it, it does bad things to string without spaces.


    Ok, three days of these solutions, and I promise I will get off my hobby horse and leave it be after this.

    None of these "fixes" to the original code still do not handle null strings passed into the code. If str is NULL and you are lucky, this line:

    while ( str[i] != '\0' )


    causes a segfault which will terminate the process. If you're not lucky enough to be on a platform that traps such things, you've just stomped on whatever data lives at memory offset 0, and worse, returned that data to the caller to be mangled further.

    It doesn't matter if the algorithm is recursive, iterative, unrolled looping, optimized for word size, etc. This is Defensive Coding 101.
  • Kuba 2012-12-21 12:08
    gallier2:
    one should not rely on compiler optimizations to excuse poor algorithmic skills.
    Tail recursion is a valid programming technique, and IMHO it's valid to depend on the compiler doing the right thing.
  • Kuba 2012-12-21 12:27
    Steve The Cynic:
    And C does not have to have a contiguous stack for parameter passing and local variables. Mandating such a thing would make writing C compilers for System/370 difficult, since these machines don't even have a direct concept of a stack. And even on an architecture with a machine stack, it isn't mandatory to put the activation records on the machine stack. Most systems do it that way because it's faster than allocating from the free store / heap for it, but it isn't required behaviour.
    Not only the compiler for System/370 would be hard. A lot of code for small microcontrollers is compiled without use of stack for anything but storing return addresses. I have plenty of embedded code with no reentrant functions at all -- in such a case, all parameter passing and local data storage is done via fixed RAM locations. Those locations are overlaid by the linker based on the call tree. On some architectures this is faster and easier than indexing relative to a stack frame.

    Never mind that you free the register that would be otherwise used to store the stack frame, and you don't have to keep that register updated on each function entry and exit. Heck, for functions called from one call site only, the compiler automatically inlines as that saves a couple cycles and a couple code space bytes. It does matter when you have 512 bytes for your code, and sometimes it does matter to have only that many bytes when you can put you application into a 6 pin part that costs less than $1 even in small quantities.
  • Scott 2012-12-21 12:36
    Mr.Bob:

    None of these "fixes" to the original code still do not handle null strings passed into the code.


    (Excerpted from above...)
    * It's better to NOT check for NULL input on functions like this, in keeping with the speed of ANSI-C standard string functions. If somebody calls trim_right() a zillion times with strings that the caller *knows* could not possibly be NULL, it's just a waste of cycles. If the caller suspects the input strings might be NULL, it's the caller's responsibility to check for NULL before calling the function. (Other standard functions like free() are called so frequently with NULL inputs that it makes sense to do the NULL check inside free() to avoid polluting the calling code).

    This is Defensive Coding 101.


    And when we graduate to Intelligent Coding 201... :)

    Seriously, I know we all have different opinions on the matter of checking inputs, and that's fine, I guess. My opinion is that there are basically three levels of understanding NULL. 1. You don't know about NULL inputs; your code reflects this ignorance because it's completely ignored. 2. You know about NULL and do everything you can to do cover it up* when you get one. 3. You know about all that but choose to ignore the NULL case for speed, code clarity, debugging*, etc. Unfortunately your code looks like #1 so the #2 people just think you're an amateur.

    *What does one do when one encounters a NULL? Silently ignore it? printf()? How about this: if I run a program and it segfaults, I fire it up in gdb, and guess what? The real problem is usually two functions back in the backtrace. I say if the program is wrong, it should fail big, fast, and spectacularly--it makes it easier to find the problem. If you keep intercepting bad inputs at the low level, you cover problems and make them hard to find.

    Caveat: This doesn't apply to all cases, and if I were writing for a pacemaker or a space shuttle, I'd approach it differently.

  • Scott 2012-12-21 12:40
    And please stop quoting the Recursive Reclusive post involving my code that says "it does bad things to string without spaces". He's wrong.

    I corrected the j=0 problem in a later post, which fixes the "strings with only spaces" problem (j=-1 works better), but the "strings without spaces" has always worked.
  • gallier2 2012-12-21 12:42
    Mr.Bob:

    causes a segfault which will terminate the process. If you're not lucky enough to be on a platform that traps such things, you've just stomped on whatever data lives at memory offset 0, and worse, returned that data to the caller to be mangled further.

    It doesn't matter if the algorithm is recursive, iterative, unrolled looping, optimized for word size, etc. This is Defensive Coding 101.


    As someone already pointed out. The design of the function demands not to check for NULL pointer. All string and mem functions (except free) in the standard library do not check for NULL pointers. It is a design choice in line with the whole C idea. If the caller wants to avoid a segfault (or worse) it can check if it wants without burdening the 99.9% of the use cases with an unnecessary check. Segfaulting is often a better choice than propagating an error condition, that no function above, in the call stack, knows how to deal with.
    If one wants an always checking everything language then they should do java, C# or whatever fancy interpreted language.
  • gallier2 2012-12-21 12:45
    Kuba:
    gallier2:
    one should not rely on compiler optimizations to excuse poor algorithmic skills.
    Tail recursion is a valid programming technique, and IMHO it's valid to depend on the compiler doing the right thing.


    Is this a mayan calendar thing? After 2 pages of comments the same questions and comments come back and back again and again. That point was already answered more than once before, by me and by others.
  • JJ 2012-12-21 17:14
    Mr.Bob:
    Jeez, it's like griping about how they used Comic Sans to spell "Titanic" on the side of your ship...

    That is a perfectly valid gripe.
  • Mozzis 2012-12-23 15:29
    Sheesh! The old cast and too many parens mindset. So juevnile. This is right:

    char* trim_right(char* str)
    {
    for(size_t i = strlen(str) - 1; i >= 0; i--)
    if(str[i] == ' ')
    str[i] = '\0';

    return str;
    }
  • gallier2 2012-12-23 15:59
    It still has the bug of truncating the string at the first blank from the string. So functionnally completely wrong.
  • Jay 2012-12-27 03:35
    Sean:
    This is C; it's going to run pretty darn fast. And no matter how you implement it


    Yeah, no.

    Years ago someone wrote and article about how a well-written program running on a TRS-80 would run faster than a crap program on a Cray.

    Better hardware (not C) means you can now write crap that runs fast, but it wuld be nice if you didn't.
  • David 2012-12-27 12:44
    What's really fun is when you give it a string that is not null terminated.
  • ab 2013-01-02 15:45
    It works fine, as long as you allocate all strings with the proper function:

    char *alloc_string(int len) {
    char *ret = calloc(len+2, 1);
    if (ret) *ret++ = ' ';
    return ret;
    }

  • Neil 2013-01-03 05:16
    gallier2:
    The compiler also knows that the function strlen is pure. That means that 2 consecutive calls to the function will return the same value if fed the same parameters.
     l = strlen(x);
    
    l2 = strlen(x);
    will call strlen only once, an optimisation it cannot perform in the general case (it would be fatal for rand() for example).
    Now, if it could only optimise the case where you write
     l = strlen(x);
    
    x[l - 1] = '\0';
    l2 = strlen(x);
    ...
  • bio_end_io_t 2013-01-03 09:20
    This sort of code hurts my soul:
    str[strlen(str)-1] = '\0';

    Apart from the under run problem in the above comment, this is vulnerable to buffer overflows.

  • SciK 2013-01-03 20:02
    Well, except for the atrocious number of calls to strlen, I don’t see the problem.
  • Anone 2013-01-09 01:37
    TGV:
    Evan:
    Lex Luthor:
    Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.

    Why does everyone seem to think that "the author used recursion" is the wtf instead of "it's buggy and horribly inefficient [not due directly to the recursion]"?

    Probably because (nearly) everyone is a total moron? Since (nearly) everyone only spots problems by looking for unusual constructions?


    How about the fact that recursion is mentioned in the article title?
  • -is 2013-02-28 02:59
    Black Bart:
    Someone just learned about recursion, and it became the tool of choice for the next several weeks.

    When you have a hammer in your hand, everything becomes a nail.


    This is tail recursion. In certain class of languages (and their compilers), this executes in constant space (that is, the recursive call is implemented as a jump back to the start) and is, sometimes, in fact the preferred or only way to express loops.

    Actually, modern gcc has learned to detect this condition even in C, or so I've heard.

    It would be different if you needed to compute something with the result of the recursive call.

  • -is 2013-02-28 03:02
    Black Bart:
    Someone just learned about recursion, and it became the tool of choice for the next several weeks.

    When you have a hammer in your hand, everything becomes a nail.


    This is tail recursion. In certain class of languages (and their compilers), this executes in constant space (that is, the recursive call is implemented as a jump back to the start) and is, sometimes, in fact the preferred or only way to express loops.

    Actually, modern gcc has learned to detect this condition even in C, or so I've heard.

    It would be different if you needed to compute something with the result of the recursive call.

    Of course, you'd compute the strlen once or not at all for this to also compute in linear time.