• Evan (unregistered) in reply to Evan
    Evan:
    All you people yelling "tail recursion" don't understand what you're talking about.

    The problem isn't the lack of iteration -- TCO solves that nicely. The problem is that every recursive call is going to do another strlen() call. TCO does nothing to eliminate that.

    (And, btw, as someone said earlier, this turns something that should be an O(n) operation into O(n^2) worst-case; in general, O(nm) where n is the length of the string and m is the number of trailing spaces.)

  • TheOptimizer (unregistered) in reply to Thomas Kyte
    Thomas Kyte:
    ....

    buffer underwrite.... it doesn't work as intended.

    if it did any decent compiler optimizer would go "trail recursion? convert into loop".

  • TheOptimizer (unregistered) in reply to A developer
    A developer:
    This is the sort of academic BS I had to do in university programming courses. For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.

    try working with trees.

  • Todd (unregistered) in reply to Devastated of Rochdale
    Devastated of Rochdale:
    Words cannot express the weird gut-wrench I felt when my eyes alighted on this code in my RSS feed. I think it may have given me norovirus.

    I don't have anything constructive to add (sorry) because when I try to focus on the code my vision starts to blur, presumably in self-defense.

    Suffice it to say the real WTF may be that this is the first code snippet ever to make me feel the need to comment on it, even if that comment is little more than a howl of anguish.

    I thought it was pretty neat.
  • Adin Falkhoff (unregistered) in reply to RobFreundlich

    Is APL even A Programmin Language?

  • A developer (unregistered) in reply to AN AMAZING CODER

    And a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion...... LOL

  • Hans (unregistered) in reply to Black Bart

    I think all of you need to google for "tail recursion". This is perfectly valid code, and a modern compiler is smart enough to optimize this.

  • Mr.Bob (unregistered) in reply to ¯\(°_o)/¯ I DUNNO LOL
    ¯\(°_o)/¯ I DUNNO LOL:
    As long as we're playing Code Golf...

    void trim_right(char *str) {   char *end = str;   while (*str) {     if (*str++) != ' ') {       end = str;     }   }   *end = 0; }

    Why worry about size_t?

    Only two nits to pick with your answer:

    1. Still doesn't handle the possibility that str is NULL. Twice.
    2. Strings are arrays of characters, please don't mix them with integer types.
  • (cs) in reply to Snowrat
    Snowrat:
    Could've been written a bit better in APL, like trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵} or trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}

    Finally some code I can understand on this forum! Thank you Sir. Oh I miss my APL.

    re·cur·sion
    /riˈkərZHən/ Noun

    1.See "recursion"

  • LK (unregistered)

    So why the strlen anyway ? Assembly-wise, strlen (the gallier2 version of course) is more efficient only if whitespace detection is difficult. If we're in 7-bit ASCII, it's not. We can either take 0x20 or take into account tabs and such and say <= 0x20 !

    Following is less elegant than gallier2. Is it more efficient...

    char *trim_right(char *str)
    {
       char *stop = NULL ;
       char *cursor = str ;
       char current = *cursor ;
    
       while (current != '\0') {
          int printable = (current > ' ') ;
    
          if (printable) { if (stop) stop=NULL ; }
          else if (!stop) { stop=cursor ; }
    
          cursor++ ;
          current = *cursor ;
       }
    
       if (stop) *stop = '\0' ;
    
       return str;
    }
    

    Nah, to be really efficient I'll provide trWTF (gag and throw up all of you anti-goto fanatics):

    #define PRINTABLE(x) (x > ' ')
    
    char *trim_right(char *str)
    {
       char *stop = NULL ;
       char *cursor = str ;
       char current ;
    
       goto First;
    
       NextUnstopped:
          cursor++ ;
    
       First:
          current = *cursor ;
          if (current == '\0') goto End;
          if PRINTABLE(current) goto NextUnstopped;
          stop = cursor ;
    
       FoundMoreWhitespace:
          cursor++ ;
          current = *cursor ;
          if (current == '\0') goto End;
          if (!PRINTABLE(current)) goto FoundMoreWhitespace ;
          stop = NULL ;
          goto NextUnstopped ;
    
       End:
          if (stop) *stop = '\0' ;
          return str;
    }
    

    Don't need no stinkin' optimizer. Unless maybe some modern instruction set... have to investigate that... or not :-)

  • LK (unregistered) in reply to LK

    Reading Mr Bob's comment... so if (!str) return NULL; OK...

  • Evan (unregistered)

    To everyone saying it's not much of a WTF performancewise: maybe some actual data will convince you. This is how bad it is, based on my own measurements. This is running on Linux, with GCC 4.7.2, with -O3. (-O2 doesn't make much of a difference.)

    The following shows, for strings of various lengths made entirely of spaces (BTW I didn't read the thread, did anyone point out it's buggy in this scenario?), how much longer the submitted version takes to run vs. ¯(°_o)/¯ I DUNNO LOL's solution (quoted below by Mr. Bob), which I arbitrarily chose.

    length       how dumb it is
    --------     --------------
        1,000          ~25x 
       10,000          240x
      100,000        2,380x
      500,000       13,500x
    1,000,000       27,200x
    

    For the million-character string, the submitted version took about 33 seconds, and the reasonable solution took about 1.2 milliseconds.

    Now, maybe you say "well, a string of a million spaces, that's not a realistic input so I'm not going to worry", and that's not a totally unreasonable standpoint to take.

    However, to say "eh it's not dumb because the compiler will just optimize away the tail recursion" is overlooking the dumbness of taking a linear-time algorithm and turning it into a quadratic one. (Well, while not trading out a large constant -- e.g. Fibonacci heaps are required for Dijkstra's algorithm to have it's advertised complexity, but the constant factor of Fibonacci heaps over other kinds of heaps means in practice you'll usually get better performance with a higher-complexity version.)

    Play around with it yourself; you'll have to modify the way you do timing (look at QueryPerformanceCounter() or something) if you want to run it on Windows. http://pastebin.com/M5xZJYdd

  • Evan (unregistered)

    Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more -- for the 1,000,000 character string, by another 17x, which gives a whopping 450,000x suckitude factor for the submitted version.

    Also, you don't need to have a string of all spaces for other methods to show performance advantages -- a single space at the end of a 10,000-character string is enough for gallier2's solution to register about a 1.5x improvement over the submitted version (though in that case the submitted version is actually ~4x faster than I DUNNO LOL's).

  • Scott (unregistered)

    Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.

    char* trim_right( char* str ) { int i = 0, j = 0; while ( str[i] != '\0' ) { if ( str[i] != ' ' ) { j = i; } ++i; } str[j + 1] = '\0'; return str; }

  • Evan (unregistered) in reply to Evan
    Evan:
    Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more -- for the 1,000,000 character string, by another 17x, which gives a whopping 450,000x suckitude factor for the submitted version.
    Actually this is incorrect, somewhat. For the case of a string made entirely of spaces, gallier2's version walks backward over the string character by character and is also "slow". This means that multiplying the 17x improvement of gallier2's version on tests that are NOT all spaces (in that case it was 999,999 'a's followed by a single space) by the 27,000x or whatever improvement from my earlier table was just flat out wrong.

    However, on a string that's not all spaces, gallier2's version is indeed faster.

    (For the same reason, Scott's version he just posted is also slow in comparison to gallier2 -- about the same as I DUNNO LOL, or about 18x slower than gallier2's for the 999,999 'a's then a space test case. "Not calling other functions" (in this case, strlen()), is not a virtue.)

  • metadalek (unregistered)

    I would use (assuming str is a valid string and p is a char *):

    if ((p = strrchr(str, ' ')) != NULL) *p = '\0';

  • Evan (unregistered) in reply to metadalek
    metadalek:
    I would use (assuming str is a valid string and p is a char *):

    if ((p = strrchr(str, ' ')) != NULL) *p = '\0';

    That may be the buggiest version yet:
    "abc  " => "abc "
    " ab c" => " ab"
    "  abc" => " "
  • A developer (unregistered)

    Alright it's official. YOU'RE ALL A BUNCH OF NERDS!

  • Trollinator (unregistered)

    TRWTF is the submitter who doesn't seem to understand tail recursion optimisation.

  • zdfxh (unregistered)

    Who was it that said something like: "C has all the power and capability of Assembler with the elegance and readability of Assembler"

  • Jim (unregistered) in reply to gallier2
    gallier2:
    Not cool to write a 0 byte before the buffer if the string has length 0 and there is a ' ' before it. This seems quite low odd case but it can happen if you use a pointer to scan a longer string with trailing blanks, and passing that pointer to that function.

    The recursion is of course another wtf.

    Except the line before that takes care of that....

    I wonder why he didn;t reuse len?

  • ase; dl (unregistered) in reply to gallier2
    gallier2:
    It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills. C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
    I can't see no stinkin' buffer underflow....
  • ase;dl (unregistered) in reply to ase; dl
    ase; dl:
    gallier2:
    It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills. C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
    I can't see no stinkin' buffer underflow....
    although "len" is not actually len....
  • Norman Diamond (unregistered) in reply to gallier2
    gallier2:
    Rick:
    Steve The Cynic:
    The worst "multiple interpretations of undefined code" I ever saw was this:
    unsigned char *bp = /* stuff */;
    unsigned char table[16] = { /* values */ };
    

    loop N times { *bp++ = (table[*bp & 0x0f] << 4) | table[*bp >> 4]; }

    The x86-hosted to-MIPS cross-compiler I was using to bring this code to the controller CPU of an ATM switch (Asynchronous Transfer Mode, not Automated Teller Machine) evaluated bp++ completely (and stashed the before-increment value) before evaluating anything else, then evaluated the right-hand side using the incremented value of bp, then stored the result using the stashed before-increment value.

    Fixed:

    unsigned char *bp = /* stuff */;
    unsigned char table[16] = { /* values */ };
    
    loop N times
    {
      *bp = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
      bp++;
    }
    Undefined behaviour: Just say, "No."
    The order is definitely defined. MIPS simply got it wrong. (Wow, haven't heard that name in a while.)
    No, the order in which the lvalue and the rvalue are evaluated are undefined. The only thing defined is that the assignment is done after these evaluations.

    see http://stackoverflow.com/questions/4362501/any-good-reason-why-assignment-operator-isnt-a-sequence-point for a detailed explanation.

    The analyses by MIPS, Steve, and Gallier are correct but understated. One more rule in the C standard is that the old value of bp shall be used only to determine the new value to be stored in bp, but the program sometimes (depending on the order of evaluation which can go either way) uses the old value in computing a different expression. So the behaviour is completely undefined. The assignment never has to be reached. The program could format your hard drives instead.

    P.S. Considering the amount of embedded systems in use now in TV remote controls, cars, and rice cookers, it might be a good idea to be aware of the continued existence of MIPS.

  • metadalek (unregistered) in reply to Evan

    Sorry, I assumed we have trailing white space.

    You obviously need an extra guard if that assumption is not correct.

  • Exercise (unregistered) in reply to Rnd(
    Rnd(:
    Does it make me bad coder if I prefer iterative loops or just loops for this sort of stuff?

    Also, if there is dynamic memory allocation I surely hope the lengths are fixed or originals are stored somewhere...

    there's an intriguing concept, a non-iterative loop....

    boolean x = false;
    /* We want a non-iterative loop */
    while(x==false) x = true;
    
    /* I guess this would be too */
    //don't loop
    while(x==false) printf("We didn't iterate here\n");
    
  • ase; dl (unregistered) in reply to ase;dl
    ase;dl:
    ase; dl:
    gallier2:
    It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills. C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
    I can't see no stinkin' buffer underflow....
    although "len" is not actually len....
    OOPS - NOW I SEE IT!!!
  • jimmy (unregistered) in reply to Mordred
    Mordred:
    C does guarantee order of evaluation of pre and post increment operators
    FTFY.

    C does guarantee order of evaluation of lots of other things, but.

    Some compilers will assume pre-increment means "before any statement on the line" while some will assume it means "in place, but before it is used for this particular operation". (for post :s/before/after/g)

    Pretty sure the Visual Studio one and MinGW are different (which many of you might have so play around with things like:

    A=8;B=8 printf("%d\n", A++ + ++B); printf("%d\n", --A + B--);

    etc and compare the results on both of them....If you yhave multiple ++ (or --) on the same var on a row it becomes particularly interesting....

  • Evan (unregistered) in reply to metadalek
    metadalek:
    Sorry, I assumed we have trailing white space.

    You obviously need an extra guard if that assumption is not correct.

    It doesn't work even in that case: if you have multiple spaces at the end, it will only remove the final one (as illustrated by my first test case).

    jimmy:
    Some compilers will assume pre-increment means "before any statement on the line" while some will assume it means "in place, but before it is used for this particular operation". (for post :s/before/after/g)

    Pretty sure the Visual Studio one and MinGW are different (which many of you might have so play around with things like:

    A=8;B=8 printf("%d\n", A++ + ++B); printf("%d\n", --A + B--);

    etc and compare the results on both of them....If you yhave multiple ++ (or --) on the same var on a row it becomes particularly interesting....

    And Norman Diamond's comment is correct. Unlike some kinds of implementation-defined behavior (does foo(bar(), baz()) call bar() or baz() first? the standard doesn't say), if you have multiple increments, decrements, assignments (i = i++ or (i=7) + (i=4)), etc. to the same variable without an intervening sequence point, the entire behavior is undefined. It doesn't even have to pick one.

    For instance, in practice, i = i++ will likely either leave the value of i unchanged or increment it, but it needn't do either. Once a program encounters truly undefined behavior, it can do anything. That could crash your program. It could mail your porn stash to your mother, and it would be totally allowed by the C standard.

  • Evan (unregistered) in reply to jimmy
    jimmy:
    A=8;B=8 printf("%d\n", A++ + ++B); printf("%d\n", --A + B--);
    Actually looking at this code again I think that this has deterministic behavior according to the standard; if I'm correct, that is required to print 17\n17\n and leave A == B == 8 after those three lines.

    You might be looking for something like

    A = 8;
    printf("%d %d\n", A++, A);

    which is allowed to print either "8 9" or "8 8" (but is required to do one or the other).

  • ¯\(°_o)/¯ I DUNNO LOL (unregistered) in reply to Evan
    Evan:
    Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more
    You mean 397535?

    Do you even know how strlen works? Hint: strlen is an O(n) operation.

    What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length.

  • (cs) in reply to Garrison Fiord
    Garrison Fiord:
    It appears you overlooked the word "almost".

    Ummm..yes, actually I did. Sorry about that; your qualifier was correctly present.

  • (cs) in reply to Steve The Cynic
    Steve The Cynic:
    Coyne:
    And, of course, there is the inimitable "Towers of Hanoi". A toy problem, yes, but if you want a "head hurts" experience try implementing that without recursion sometime.
    Ah, yes, Arkey-malarkey... No, it's not even hard to do without recursion.

    At any point, there are at most two legal moves. It is always legal to move the smallest disk, because no disk is smaller (and therefore blocking). The other two exposed disks are different sizes, by definition, and you can move the smaller of them onto the larger.

    The non-recursive solution is, therefore:

    Every other move moves the smallest disk. All the other moves make the other legal move.

    The only question, then, is which way to move the smallest disk.

    If the goal is to use peg B to move the stack from peg A to peg C, then:

    1. If the number of disks is odd, the smallest disk moves A->C->B->A
    2. If the number of disks is even, the smallest disk moves A->B->C->A

    Okay, when I was working this problem, I didn't know about the iterative solution. But I looked at it and the variations on Wikipedia, and it seems to me that any expression of these algorithms would involve tracking arrays and some fairly complex decision logic. Just for example, you have to know how many disks in advance, and what the top disk on each stack is.

    The recursive solver, which I hacked together again, requires one solitary condition (marked below):

    public class TowersOfHanoi {
       
       static int step = 0;
    
       /**
        * Output the steps for a Tower of Hanoi stack move from one of 3 pegs to the another of 3 pegs.
        * @param from  peg to move a stack from
        * @param to    peg to move a stack to
        * @param work  peg to use as an intermediate
        * @param count number of disks to move from "from" to "to"
        */
       public static void moveTower(int from, int to, int work, int count)
       {
          if (count == 1)
          {
             System.out.println("Step " + (++step) + ": Move disk from peg " + from + " to peg " + to + ".");
          }
          else
          {
             moveTower(from, work, to, count-1);
             moveTower(from, to, work, 1);
             moveTower(work, to, from, count-1);
          }
       }
    
       /**
        * @param args
        */
       public static void main(String[] args) {
          moveTower( 1,  // move from peg 1
                     3,  // move to peg 3
                     2,  // use peg 2 as the initial intermediate
                     7); // move 7 disks total
       }
    
    }
    

    Hmmm....

  • Lex Luthor (unregistered) in reply to Your Name *

    Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.

  • ¯\(°_o)/¯ I DUNNO LOL (unregistered)

    FYI, what you were seeing was the effect of strlen using the x86 string instructions on your specific computer. Try it on an ARM-based system sometime. Optimization is not an all or nothing situation. The same code could run faster on some systems and slower on others.

    And are you seriously implying that someone would (intentionally) use this kind of function on a string of more than about 256 bytes or so? At that point, the advantage of the string instructions is lost.

    1,000 ~25x
    Does that mean it runs in negative time? I don't think that math means what you think it means. You probably meant 1/25x, or 25 times FASTER. Guess what? It's not going to be called to nip off blanks at the end of a 1 megabyte block of memory, unless there's a bug that overwrote the null at the end. There's no point in such an operation. In all sane use cases, my code is... faster!

    Benchmarks are useless unless you know WHY something is faster or slower. Benchmarks are also useless if you're doing the equivalent of feeding mice a year's worth of cyclamates every day.

  • Evan (unregistered) in reply to ¯\(°_o)/¯ I DUNNO LOL
    ¯\(°_o)/¯ I DUNNO LOL:
    Do you even know how strlen works? Hint: strlen is an O(n) operation.

    What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length.

    Real strlen() is more efficient than just reading a string character-by-character in a loop. There's a reason that glibc's strlen(), for example, is closer to 50 loc than 5.

    Lex Luthor:
    Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.
    Why does everyone seem to think that "the author used recursion" is the wtf instead of "it's buggy and horribly inefficient [not due directly to the recursion]"?
  • Evan (unregistered) in reply to ¯\(°_o)/¯ I DUNNO LOL
    In all sane use cases, my code is... faster!
    The post with the chart was comparing your (reasonable) solution with the (dumb) solution from the story submission; your code is faster than that in all cases that I could measure. :-) For 1,000 characters, 25 times faster than the dumb submission. For 1,000,000 characters, 27000 times faster.
    FYI, what you were seeing was the effect of strlen using the x86 string instructions on your specific computer. Try it on an ARM-based system sometime. Optimization is not an all or nothing situation. The same code could run faster on some systems and slower on others.
    This is true, and it's applicable to my later, more informal, comparison to gallier2's strlen-based solution. It's possible that in other architectures, a typical real strlen would be closer to what you did.

    However, the improvements I cited were not due to the x86 string instructions, which are not (to my knowledge) used by glibc. (Aside: perhaps a strlen() in code does not actually call glibc's strlen but invokes an intrinsic which uses them? Perhaps. I'm too lazy to check.)

    And are you seriously implying that someone would (intentionally) use this kind of function on a string of more than about 256 bytes or so?
    I could see someone using it on input, which means that the length of the string is in the hands of the user rather than the developer.
    Does that mean it runs in negative time? I don't think that math means what you think it means. You probably meant 1/25x, or 25 times FASTER.
    Sorry, that ~ is "about"; the execution time of your solution with 1000 bytes is comparable to the frequency of the timer I was using (1 microsecond) and so the difference was usually either around 18x or 36x depending on whether yours took just one microsecond or two, so I just sorta picked something in the middle. :-)
  • Hacky (unregistered) in reply to Thomas Kyte

    Could in fact corrupt heap or segv by writing to memory before string I believe.

  • Igel (unregistered)

    it's machinations of evil lispers =)

  • ¯\(°_o)/¯ I DUNNO LOL (unregistered)

    Ah, that was a tilde at low resolution.

    I thought you were comparing mine vs gallier2's, when you were comparing mine vs the article's. FWIW, gallier2's would strictly be slower than mine when given a buffer full of nothing but blanks, because I never have to backtrack. If you want your benchmark to be meaningful, you have to try both optimal and pessimal cases.

    strlen's speed is not only dependent upon the library quality, it is even possible for a compiler to detect it and generate inline string instructions. But without string instructions, it'll be little better than inlining "*p=str; len=0; while(*p++) len++; return len;"

    Also, using array indexing in a loop may potentially generate much worse code than using a pointer if the compiler can't identify the looping and factor it out into a simple pointer. On some architectures it may be the same speed or faster to use indexing over pointers. (Keep the base address in a register, add the offset in an addressing mode.)

    That would explain the sharp speed difference in the benchmark, which I mistakenly thought was gallier2's and due to the strlen call.

    My code has the advantage of being very easy for a half decent compiler to optimize (aside from not likely being able to use string instructions) into very lean code. Everything can be kept in registers without touching the stack, and even the if statement in the loop would take advantage of ARM's conditional execution.

    But really, if it isn't used more than a few times a second, and for short strings, then validity and correctness is far more important than speed.

  • Evan (unregistered) in reply to ¯\(°_o)/¯ I DUNNO LOL
    ¯\(°_o)/¯ I DUNNO LOL:
    I thought you were comparing mine vs gallier2's, when you were comparing mine vs the article's. FWIW, gallier2's would strictly be slower than mine when given a buffer full of nothing but blanks, because I never have to backtrack. If you want your benchmark to be meaningful, you have to try both optimal and pessimal cases.
    Instead of cleaning my apartment or doing something useful, I'm putting together a benchmark set of all of the proposed solutions to trim_right; it'll involve a few different cases; all spaces, no spaces, and in between.
  • Evan (unregistered)

    Also, I apologize about not being as clear as I could have been. I'm much less interested in comparing yours vs strlen-based solutions; mostly I'm trying to point out why everyone going "but compilers have tail call optimization!" is barking up the wrong tree.

  • Curious (unregistered)

    What happens when an empty (0 length) string is provided... buffer overflow... errr.. underflow?

  • Name Withheld (unregistered)

    I admit this code is horribly inefficient, but I just tested it with 53000 spaces, and it completed in less than a millisecond.

    Don't think they have much to worry about.

  • Brendan (unregistered) in reply to LK

    Hi,

    LK:
    #define PRINTABLE(x) (x > ' ')
    

    char *trim_right(char *str) { char *stop = NULL ; char *cursor = str ; char current ;

    goto First;

    NextUnstopped: cursor++ ;

    First: current = *cursor ; if (current == '\0') goto End; if PRINTABLE(current) goto NextUnstopped; stop = cursor ;

    FoundMoreWhitespace: cursor++ ; current = *cursor ; if (current == '\0') goto End; if (!PRINTABLE(current)) goto FoundMoreWhitespace ; stop = NULL ; goto NextUnstopped ;

    End: if (stop) *stop = '\0' ; return str; }

    Don't need no stinkin' optimizer. Unless maybe some modern instruction set... have to investigate that... or not :-)

    That is inefficient - the "if (stop)" can be avoided:

    char *trim_right(char *str)
    {
       char *stop;
    
    next:
       while(!isspace(*str)) {
          if(*str == '\0') return;
          str++;
       }
       stop = str;
       do {
          str++;
          if(*str == '\0') {
             *stop = '\0';
             return;
          }
       } while(isspace(*str));
       goto next;
    }
  • Name Withheld (unregistered) in reply to Name Withheld

    Sorry that was 5,000. Not 50,000! ;-)

  • Evan (unregistered)

    OK, in case anyone wants some experimental evaluation for the proposed solutions, I have timing and test data for every full solution I've seen that I didn't think was an obvious joke. (Aufgehaben's may have been a joke too but I included it anyway.)

    There are two tables. The top table shows correctness tests for several inputs. Correct outputs are shown as green <OK>s; incorrect outputs are shown as the result I got. In both inputs (column headings) and outputs, spaces are replaced by hyphens. The empty string "" is shown as (empty). For example, the article's implementation, when given the string " ", outputs " ", which is incorrect.

    The lower table shows timing results, in seconds. In the tests, exponents denote repetition. I didn't do it very clearly as I don't mark the grouping; the first column is shown as "a-^500"; what that means is the string "a-" repeated 500 times; the final column means 499,995 'a's followed by five spaces. (Spaces are again replaced by '-'s, of course.)

    I'm not sure of how stable the numbers are; that's from one run only. My numbers are from the latest version of MSVC, running on my pretty old box.

    And of course it's TOTALLY possible that I screwed something up, so if you think I'm misrepresenting your solution let me know why. :-) (I did have to make very minor changes to a couple for people who changed the return type to void.)


    If you want to play around with this, there are three parts. The first is C++ code that runs each of the tests and reports their times in JSON format. For timing, I use the timer class available from here (direct link). The JSON can be converted into a readable HTML file by this Python script (it reads input from stdin and outputs to stdout).

    I didn't try compiling with anything but MSVC, but I also am unaware of any reason it won't compile with GCC 4.5+ with -std=c++0x (I use the C++11 for loop). To add a new solution, simply add a new block like

    TRIM_RIGHT("test name", "url")
    {
       body;
    }

    and it will automatically run it.

  • Eric (unregistered)

    A nice example of schlemiel-the-painter's algorithm (mixed with some buffer boundary ignorance)

    http://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm

    (Damn you askimet, this is getting annoying!)

  • Evan (unregistered)
    1. Oh yeah, the big change I had to do was to PleegWat's solution, which just enters an infinite loop because it never increments the loop pointer. I stuck an increment in there but didn't really give it much thought as to where it goes, so that may be why it fails so many correctness tests. Not necessarily PleegWat's fault entirely.

    2. Sorry about the formatting of my previous post; I forgot to remove the hard newlines that were added as I edited it in emacs.

    3. I figured out what prompted the discussion about C evaluation order, and it should have been resolved with Steve the Cynic's and gallier2's posts on the first page.

  • Parasietje (unregistered)

    Since this is tail-recursion, a good compiler should optimize this and reuse the stack frame. I do not know if C has good compilers though.

Leave a comment on “Recursive Whitespace Removal”

Log In or post as a guest

Replying to comment #:

« Return to Article