Comment On Recursive Whitespace Removal

C is a double edged sword. On one hand, it's simple and powerful enough that, given enough effort, you can accomplish just about anything you want. However, this power is limited insomuch that you don't have many of the friendly helper-functions that exist in higher level languages, such as string manipulation for example, unless you go ahead and create them yourself. [expand full text]
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Re: Recursive Whitespace Removal

2012-12-19 14:58 • by Evan (unregistered)
397626 in reply to 397625
Evan:
All you people yelling "tail recursion" don't understand what you're talking about.

The problem isn't the lack of iteration -- TCO solves that nicely. The problem is that every recursive call is going to do another strlen() call. TCO does nothing to eliminate that.

(And, btw, as someone said earlier, this turns something that should be an O(n) operation into O(n^2) worst-case; in general, O(nm) where n is the length of the string and m is the number of trailing spaces.)

Re: Recursive Whitespace Removal

2012-12-19 14:59 • by TheOptimizer (unregistered)
397627 in reply to 397526
Thomas Kyte:
....

buffer underwrite.... it doesn't work as intended.


if it did any decent compiler optimizer would go "trail recursion? convert into loop".

Re: Recursive Whitespace Removal

2012-12-19 15:06 • by TheOptimizer (unregistered)
397629 in reply to 397552
A developer:
This is the sort of academic BS I had to do in university programming courses.
For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.


try working with trees.

Re: Recursive Whitespace Removal

2012-12-19 15:25 • by Todd (unregistered)
397630 in reply to 397621
Devastated of Rochdale:
Words cannot express the weird gut-wrench I felt when my eyes alighted on this code in my RSS feed. I think it may have given me norovirus.

I don't have anything constructive to add (sorry) because when I try to focus on the code my vision starts to blur, presumably in self-defense.

Suffice it to say the real WTF may be that this is the first code snippet ever to make me feel the need to comment on it, even if that comment is little more than a howl of anguish.

I thought it was pretty neat.

Re: Recursive Whitespace Removal

2012-12-19 15:50 • by Adin Falkhoff (unregistered)
397631 in reply to 397547
Is APL even A Programmin Language?

Re: Recursive Whitespace Removal

2012-12-19 15:55 • by A developer (unregistered)
397632 in reply to 397619
And a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion......
LOL

Re: Recursive Whitespace Removal

2012-12-19 15:57 • by Hans (unregistered)
397633 in reply to 397516
I think all of you need to google for "tail recursion". This is perfectly valid code, and a modern compiler is smart enough to optimize this.

Re: Recursive Whitespace Removal

2012-12-19 16:42 • by Mr.Bob (unregistered)
397634 in reply to 397623
¯\(°_o)/¯ I DUNNO LOL:
As long as we're playing Code Golf...

void trim_right(char *str)
{
  char *end = str;
  while (*str) {
    if (*str++) != ' ') {
      end = str;
    }
  }
  *end = 0;
}

Why worry about size_t?


Only two nits to pick with your answer:
1. Still doesn't handle the possibility that str is NULL. Twice.
2. Strings are arrays of characters, please don't mix them with integer types.

Re: Recursive Whitespace Removal

2012-12-19 16:43 • by The Dark Lord
397635 in reply to 397544
Snowrat:
Could've been written a bit better in APL, like
trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵}
or
trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}


Finally some code I can understand on this forum! Thank you Sir. Oh I miss my APL.

re·cur·sion
/riˈkərZHən/
Noun

1.See "recursion"

Re: Recursive Whitespace Removal

2012-12-19 17:02 • by LK (unregistered)
So why the strlen anyway ? Assembly-wise, strlen (the gallier2 version of course) is more efficient only if whitespace detection is difficult. If we're in 7-bit ASCII, it's not. We can either take 0x20 or take into account tabs and such and say <= 0x20 !

Following is less elegant than gallier2. Is it more efficient...


char *trim_right(char *str)
{
char *stop = NULL ;
char *cursor = str ;
char current = *cursor ;

while (current != '\0') {
int printable = (current > ' ') ;

if (printable) { if (stop) stop=NULL ; }
else if (!stop) { stop=cursor ; }

cursor++ ;
current = *cursor ;
}

if (stop) *stop = '\0' ;

return str;
}


Nah, to be really efficient I'll provide trWTF (gag and throw up all of you anti-goto fanatics):


#define PRINTABLE(x) (x > ' ')

char *trim_right(char *str)
{
char *stop = NULL ;
char *cursor = str ;
char current ;

goto First;

NextUnstopped:
cursor++ ;

First:
current = *cursor ;
if (current == '\0') goto End;
if PRINTABLE(current) goto NextUnstopped;
stop = cursor ;

FoundMoreWhitespace:
cursor++ ;
current = *cursor ;
if (current == '\0') goto End;
if (!PRINTABLE(current)) goto FoundMoreWhitespace ;
stop = NULL ;
goto NextUnstopped ;

End:
if (stop) *stop = '\0' ;
return str;
}


Don't need no stinkin' optimizer. Unless maybe some modern instruction set... have to investigate that... or not :-)

Re: Recursive Whitespace Removal

2012-12-19 17:06 • by LK (unregistered)
397637 in reply to 397636
Reading Mr Bob's comment... so if (!str) return NULL; OK...

Re: Recursive Whitespace Removal

2012-12-19 17:34 • by Evan (unregistered)
To everyone saying it's not much of a WTF performancewise: maybe some actual data will convince you. This is how bad it is, based on my own measurements. This is running on Linux, with GCC 4.7.2, with -O3. (-O2 doesn't make much of a difference.)

The following shows, for strings of various lengths made entirely of spaces (BTW I didn't read the thread, did anyone point out it's buggy in this scenario?), how much longer the submitted version takes to run vs. ¯\(°_o)/¯ I DUNNO LOL's solution (quoted below by Mr. Bob), which I arbitrarily chose.

length how dumb it is
-------- --------------
1,000 ~25x
10,000 240x
100,000 2,380x
500,000 13,500x
1,000,000 27,200x

For the million-character string, the submitted version took about 33 seconds, and the reasonable solution took about 1.2 milliseconds.

Now, maybe you say "well, a string of a million spaces, that's not a realistic input so I'm not going to worry", and that's not a totally unreasonable standpoint to take.

However, to say "eh it's not dumb because the compiler will just optimize away the tail recursion" is overlooking the dumbness of taking a linear-time algorithm and turning it into a quadratic one. (Well, while not trading out a large constant -- e.g. Fibonacci heaps are required for Dijkstra's algorithm to have it's advertised complexity, but the constant factor of Fibonacci heaps over other kinds of heaps means in practice you'll usually get better performance with a higher-complexity version.)

Play around with it yourself; you'll have to modify the way you do timing (look at QueryPerformanceCounter() or something) if you want to run it on Windows. http://pastebin.com/M5xZJYdd

Re: Recursive Whitespace Removal

2012-12-19 17:50 • by Evan (unregistered)
Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more -- for the 1,000,000 character string, by another 17x, which gives a whopping 450,000x suckitude factor for the submitted version.

Also, you don't need to have a string of all spaces for other methods to show performance advantages -- a single space at the end of a 10,000-character string is enough for gallier2's solution to register about a 1.5x improvement over the submitted version (though in that case the submitted version is actually ~4x faster than I DUNNO LOL's).

Re: Recursive Whitespace Removal

2012-12-19 17:53 • by Scott (unregistered)
Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.

char* trim_right( char* str )
{
int i = 0, j = 0;
while ( str[i] != '\0' )
{
if ( str[i] != ' ' )
{
j = i;
}
++i;
}
str[j + 1] = '\0';
return str;
}

Re: Recursive Whitespace Removal

2012-12-19 18:03 • by Evan (unregistered)
397642 in reply to 397640
Evan:
Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more -- for the 1,000,000 character string, by another 17x, which gives a whopping 450,000x suckitude factor for the submitted version.

Actually this is incorrect, somewhat. For the case of a string made entirely of spaces, gallier2's version walks backward over the string character by character and is also "slow". This means that multiplying the 17x improvement of gallier2's version on tests that are NOT all spaces (in that case it was 999,999 'a's followed by a single space) by the 27,000x or whatever improvement from my earlier table was just flat out wrong.

However, on a string that's not all spaces, gallier2's version is indeed faster.

(For the same reason, Scott's version he just posted is also slow in comparison to gallier2 -- about the same as I DUNNO LOL, or about 18x slower than gallier2's for the 999,999 'a's then a space test case. "Not calling other functions" (in this case, strlen()), is not a virtue.)

Re: Recursive Whitespace Removal

2012-12-19 18:23 • by metadalek (unregistered)
I would use (assuming str is a valid string and p is a char *):

if ((p = strrchr(str, ' ')) != NULL) *p = '\0';

Re: Recursive Whitespace Removal

2012-12-19 18:40 • by Evan (unregistered)
397644 in reply to 397643
metadalek:
I would use (assuming str is a valid string and p is a char *):

if ((p = strrchr(str, ' ')) != NULL) *p = '\0';

That may be the buggiest version yet:
"abc  " => "abc "

" ab c" => " ab"
" abc" => " "

Re: Recursive Whitespace Removal

2012-12-19 19:14 • by A developer (unregistered)
Alright it's official.
YOU'RE ALL A BUNCH OF NERDS!

Re: Recursive Whitespace Removal

2012-12-19 19:26 • by Trollinator (unregistered)
TRWTF is the submitter who doesn't seem to understand tail recursion optimisation.

Re: Recursive Whitespace Removal

2012-12-19 19:38 • by zdfxh (unregistered)
Who was it that said something like:
"C has all the power and capability of Assembler with the elegance and readability of Assembler"

Re: Recursive Whitespace Removal

2012-12-19 19:42 • by Jim (unregistered)
397648 in reply to 397517
gallier2:
Not cool to write a 0 byte before the buffer if the string has length 0 and there is a ' ' before it. This seems quite low odd case but it can happen if you use a pointer to scan a longer string with trailing blanks, and passing that pointer to that function.

The recursion is of course another wtf.
Except the line before that takes care of that....

I wonder why he didn;t reuse len?

Re: Recursive Whitespace Removal

2012-12-19 19:44 • by ase; dl (unregistered)
397649 in reply to 397522
gallier2:
It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills.
C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
I can't see no stinkin' buffer underflow....

Re: Recursive Whitespace Removal

2012-12-19 19:46 • by ase;dl (unregistered)
397650 in reply to 397649
ase; dl:
gallier2:
It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills.
C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
I can't see no stinkin' buffer underflow....
although "len" is not actually len....

Re: Recursive Whitespace Removal

2012-12-19 19:49 • by Norman Diamond (unregistered)
397651 in reply to 397609
gallier2:
Rick:
Steve The Cynic:
The worst "multiple interpretations of undefined code" I ever saw was this:
unsigned char *bp = /* stuff */;

unsigned char table[16] = { /* values */ };

loop N times
{
*bp++ = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
}
The x86-hosted to-MIPS cross-compiler I was using to bring this code to the controller CPU of an ATM switch (Asynchronous Transfer Mode, not Automated Teller Machine) evaluated bp++ completely (and stashed the before-increment value) before evaluating anything else, then evaluated the right-hand side using the incremented value of bp, then stored the result using the stashed before-increment value.

Fixed:
unsigned char *bp = /* stuff */;

unsigned char table[16] = { /* values */ };

loop N times
{
*bp = (table[*bp & 0x0f] << 4) | table[*bp >> 4];
bp++;
}
Undefined behaviour: Just say, "No."
The order is definitely defined. MIPS simply got it wrong. (Wow, haven't heard that name in a while.)
No, the order in which the lvalue and the rvalue are evaluated are undefined. The only thing defined is that the assignment is done after these evaluations.

see http://stackoverflow.com/questions/4362501/any-good-reason-why-assignment-operator-isnt-a-sequence-point
for a detailed explanation.
The analyses by MIPS, Steve, and Gallier are correct but understated. One more rule in the C standard is that the old value of bp shall be used only to determine the new value to be stored in bp, but the program sometimes (depending on the order of evaluation which can go either way) uses the old value in computing a different expression. So the behaviour is completely undefined. The assignment never has to be reached. The program could format your hard drives instead.

P.S. Considering the amount of embedded systems in use now in TV remote controls, cars, and rice cookers, it might be a good idea to be aware of the continued existence of MIPS.

Re: Recursive Whitespace Removal

2012-12-19 20:37 • by metadalek (unregistered)
397652 in reply to 397644
Sorry, I assumed we have trailing white space.

You obviously need an extra guard if that assumption is not correct.

Re: Recursive Whitespace Removal

2012-12-19 20:40 • by Exercise (unregistered)
397653 in reply to 397550
Rnd(:
Does it make me bad coder if I prefer iterative loops or just loops for this sort of stuff?

Also, if there is dynamic memory allocation I surely hope the lengths are fixed or originals are stored somewhere...
there's an intriguing concept, a non-iterative loop....


boolean x = false;
/* We want a non-iterative loop */
while(x==false) x = true;

/* I guess this would be too */
//don't loop
while(x==false) printf("We didn't iterate here\n");

Re: Recursive Whitespace Removal

2012-12-19 20:40 • by ase; dl (unregistered)
397654 in reply to 397650
ase;dl:
ase; dl:
gallier2:
It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills.
C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
I can't see no stinkin' buffer underflow....
although "len" is not actually len....
OOPS - NOW I SEE IT!!!

Re: Recursive Whitespace Removal

2012-12-19 20:46 • by jimmy (unregistered)
397655 in reply to 397554
Mordred:
C does guarantee order of evaluation of pre and post increment operators

FTFY.

C does guarantee order of evaluation of lots of other things, but.

Some compilers will assume pre-increment means "before any statement on the line" while some will assume it means "in place, but before it is used for this particular operation".
(for post :s/before/after/g)

Pretty sure the Visual Studio one and MinGW are different (which many of you might have so play around with things like:

A=8;B=8
printf("%d\n", A++ + ++B);
printf("%d\n", --A + B--);

etc and compare the results on both of them....If you yhave multiple ++ (or --) on the same var on a row it becomes particularly interesting....

Re: Recursive Whitespace Removal

2012-12-19 21:40 • by Evan (unregistered)
397656 in reply to 397652
metadalek:
Sorry, I assumed we have trailing white space.

You obviously need an extra guard if that assumption is not correct.

It doesn't work even in that case: if you have multiple spaces at the end, it will only remove the final one (as illustrated by my first test case).

jimmy:
Some compilers will assume pre-increment means "before any statement on the line" while some will assume it means "in place, but before it is used for this particular operation".
(for post :s/before/after/g)

Pretty sure the Visual Studio one and MinGW are different (which many of you might have so play around with things like:

A=8;B=8
printf("%d\n", A++ + ++B);
printf("%d\n", --A + B--);

etc and compare the results on both of them....If you yhave multiple ++ (or --) on the same var on a row it becomes particularly interesting....

And Norman Diamond's comment is correct. Unlike some kinds of implementation-defined behavior (does foo(bar(), baz()) call bar() or baz() first? the standard doesn't say), if you have multiple increments, decrements, assignments (i = i++ or (i=7) + (i=4)), etc. to the same variable without an intervening sequence point, the entire behavior is undefined. It doesn't even have to pick one.

For instance, in practice, i = i++ will likely either leave the value of i unchanged or increment it, but it needn't do either. Once a program encounters truly undefined behavior, it can do anything. That could crash your program. It could mail your porn stash to your mother, and it would be totally allowed by the C standard.

Re: Recursive Whitespace Removal

2012-12-19 21:50 • by Evan (unregistered)
397657 in reply to 397655
jimmy:
A=8;B=8
printf("%d\n", A++ + ++B);
printf("%d\n", --A + B--);

Actually looking at this code again I think that this has deterministic behavior according to the standard; if I'm correct, that is required to print 17\n17\n and leave A == B == 8 after those three lines.

You might be looking for something like
A = 8;

printf("%d %d\n", A++, A);

which is allowed to print either "8 9" or "8 8" (but is required to do one or the other).

Re: Recursive Whitespace Removal

2012-12-19 22:13 • by ¯\(°_o)/¯ I DUNNO LOL (unregistered)
397658 in reply to 397640
Evan:
Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more
You mean 397535?

Do you even know how strlen works? Hint: strlen is an O(n) operation.

What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length.

Re: Recursive Whitespace Removal

2012-12-19 22:18 • by Coyne
397659 in reply to 397590
Garrison Fiord:

It appears you overlooked the word "almost".


Ummm..yes, actually I did. Sorry about that; your qualifier was correctly present.

Re: Recursive Whitespace Removal

2012-12-19 22:25 • by Coyne
397661 in reply to 397597
Steve The Cynic:
Coyne:
And, of course, there is the inimitable "Towers of Hanoi". A toy problem, yes, but if you want a "head hurts" experience try implementing that without recursion sometime.

Ah, yes, Arkey-malarkey... No, it's not even hard to do without recursion.

At any point, there are at most two legal moves. It is always legal to move the smallest disk, because no disk is smaller (and therefore blocking). The other two exposed disks are different sizes, by definition, and you can move the smaller of them onto the larger.

The non-recursive solution is, therefore:

Every other move moves the smallest disk. All the other moves make the other legal move.

The only question, then, is which way to move the smallest disk.

If the goal is to use peg B to move the stack from peg A to peg C, then:
1. If the number of disks is odd, the smallest disk moves A->C->B->A
2. If the number of disks is even, the smallest disk moves A->B->C->A


Okay, when I was working this problem, I didn't know about the iterative solution. But I looked at it and the variations on Wikipedia, and it seems to me that any expression of these algorithms would involve tracking arrays and some fairly complex decision logic. Just for example, you have to know how many disks in advance, and what the top disk on each stack is.

The recursive solver, which I hacked together again, requires one solitary condition (marked below):


public class TowersOfHanoi {

static int step = 0;

/**
* Output the steps for a Tower of Hanoi stack move from one of 3 pegs to the another of 3 pegs.
* @param from peg to move a stack from
* @param to peg to move a stack to
* @param work peg to use as an intermediate
* @param count number of disks to move from "from" to "to"
*/
public static void moveTower(int from, int to, int work, int count)
{
if (count == 1)
{
System.out.println("Step " + (++step) + ": Move disk from peg " + from + " to peg " + to + ".");
}
else
{
moveTower(from, work, to, count-1);
moveTower(from, to, work, 1);
moveTower(work, to, from, count-1);
}
}

/**
* @param args
*/
public static void main(String[] args) {
moveTower( 1, // move from peg 1
3, // move to peg 3
2, // use peg 2 as the initial intermediate
7); // move 7 disks total
}

}


Hmmm....

Re: Recursive Whitespace Removal

2012-12-19 22:26 • by Lex Luthor (unregistered)
397662 in reply to 397519
Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.

Re: Recursive Whitespace Removal

2012-12-19 22:34 • by ¯\(°_o)/¯ I DUNNO LOL (unregistered)
FYI, what you were seeing was the effect of strlen using the x86 string instructions on your specific computer. Try it on an ARM-based system sometime. Optimization is not an all or nothing situation. The same code could run faster on some systems and slower on others.

And are you seriously implying that someone would (intentionally) use this kind of function on a string of more than about 256 bytes or so? At that point, the advantage of the string instructions is lost.
1,000 ~25x

Does that mean it runs in negative time? I don't think that math means what you think it means. You probably meant 1/25x, or 25 times FASTER. Guess what? It's not going to be called to nip off blanks at the end of a 1 megabyte block of memory, unless there's a bug that overwrote the null at the end. There's no point in such an operation. In all sane use cases, my code is... faster!

Benchmarks are useless unless you know WHY something is faster or slower. Benchmarks are also useless if you're doing the equivalent of feeding mice a year's worth of cyclamates every day.

Re: Recursive Whitespace Removal

2012-12-19 22:40 • by Evan (unregistered)
397664 in reply to 397658
¯\(°_o)/¯ I DUNNO LOL:
Do you even know how strlen works? Hint: strlen is an O(n) operation.

What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length.

Real strlen() is more efficient than just reading a string character-by-character in a loop. There's a reason that glibc's strlen(), for example, is closer to 50 loc than 5.

Lex Luthor:
Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.

Why does everyone seem to think that "the author used recursion" is the wtf instead of "it's buggy and horribly inefficient [not due directly to the recursion]"?

Re: Recursive Whitespace Removal

2012-12-19 22:49 • by Evan (unregistered)
397665 in reply to 397663
In all sane use cases, my code is... faster!

The post with the chart was comparing your (reasonable) solution with the (dumb) solution from the story submission; your code is faster than that in all cases that I could measure. :-) For 1,000 characters, 25 times faster than the dumb submission. For 1,000,000 characters, 27000 times faster.

FYI, what you were seeing was the effect of strlen using the x86 string instructions on your specific computer. Try it on an ARM-based system sometime. Optimization is not an all or nothing situation. The same code could run faster on some systems and slower on others.

This is true, and it's applicable to my later, more informal, comparison to gallier2's strlen-based solution. It's possible that in other architectures, a typical real strlen would be closer to what you did.

However, the improvements I cited were not due to the x86 string instructions, which are not (to my knowledge) used by glibc. (Aside: perhaps a strlen() in code does not actually call glibc's strlen but invokes an intrinsic which uses them? Perhaps. I'm too lazy to check.)

And are you seriously implying that someone would (intentionally) use this kind of function on a string of more than about 256 bytes or so?

I could see someone using it on input, which means that the length of the string is in the hands of the user rather than the developer.

Does that mean it runs in negative time? I don't think that math means what you think it means. You probably meant 1/25x, or 25 times FASTER.

Sorry, that ~ is "about"; the execution time of your solution with 1000 bytes is comparable to the frequency of the timer I was using (1 microsecond) and so the difference was usually either around 18x or 36x depending on whether yours took just one microsecond or two, so I just sorta picked something in the middle. :-)

Re: Recursive Whitespace Removal

2012-12-19 23:24 • by Hacky (unregistered)
397666 in reply to 397526
Could in fact corrupt heap or segv by writing to memory before string I believe.

Re: Recursive Whitespace Removal

2012-12-19 23:37 • by Igel (unregistered)
it's machinations of evil lispers =)

Re: Recursive Whitespace Removal

2012-12-20 00:00 • by ¯\(°_o)/¯ I DUNNO LOL (unregistered)
Ah, that was a tilde at low resolution.

I thought you were comparing mine vs gallier2's, when you were comparing mine vs the article's. FWIW, gallier2's would strictly be slower than mine when given a buffer full of nothing but blanks, because I never have to backtrack. If you want your benchmark to be meaningful, you have to try both optimal and pessimal cases.

strlen's speed is not only dependent upon the library quality, it is even possible for a compiler to detect it and generate inline string instructions. But without string instructions, it'll be little better than inlining "*p=str; len=0; while(*p++) len++; return len;"

Also, using array indexing in a loop may potentially generate much worse code than using a pointer if the compiler can't identify the looping and factor it out into a simple pointer. On some architectures it may be the same speed or faster to use indexing over pointers. (Keep the base address in a register, add the offset in an addressing mode.)

That would explain the sharp speed difference in the benchmark, which I mistakenly thought was gallier2's and due to the strlen call.

My code has the advantage of being very easy for a half decent compiler to optimize (aside from not likely being able to use string instructions) into very lean code. Everything can be kept in registers without touching the stack, and even the if statement in the loop would take advantage of ARM's conditional execution.

But really, if it isn't used more than a few times a second, and for short strings, then validity and correctness is far more important than speed.

Re: Recursive Whitespace Removal

2012-12-20 00:10 • by Evan (unregistered)
397669 in reply to 397668
¯\(°_o)/¯ I DUNNO LOL:
I thought you were comparing mine vs gallier2's, when you were comparing mine vs the article's. FWIW, gallier2's would strictly be slower than mine when given a buffer full of nothing but blanks, because I never have to backtrack. If you want your benchmark to be meaningful, you have to try both optimal and pessimal cases.

Instead of cleaning my apartment or doing something useful, I'm putting together a benchmark set of all of the proposed solutions to trim_right; it'll involve a few different cases; all spaces, no spaces, and in between.

Re: Recursive Whitespace Removal

2012-12-20 00:11 • by Evan (unregistered)
Also, I apologize about not being as clear as I could have been. I'm much less interested in comparing yours vs strlen-based solutions; mostly I'm trying to point out why everyone going "but compilers have tail call optimization!" is barking up the wrong tree.

Re: Recursive Whitespace Removal

2012-12-20 00:21 • by Curious (unregistered)
What happens when an empty (0 length) string is provided... buffer overflow... errr.. underflow?

Re: Recursive Whitespace Removal

2012-12-20 00:31 • by Name Withheld (unregistered)
I admit this code is horribly inefficient, but I just tested it with 53000 spaces, and it completed in less than a millisecond.

Don't think they have much to worry about.

Re: Recursive Whitespace Removal

2012-12-20 00:35 • by Brendan (unregistered)
397673 in reply to 397636
Hi,

LK:

#define PRINTABLE(x) (x > ' ')

char *trim_right(char *str)
{
char *stop = NULL ;
char *cursor = str ;
char current ;

goto First;

NextUnstopped:
cursor++ ;

First:
current = *cursor ;
if (current == '\0') goto End;
if PRINTABLE(current) goto NextUnstopped;
stop = cursor ;

FoundMoreWhitespace:
cursor++ ;
current = *cursor ;
if (current == '\0') goto End;
if (!PRINTABLE(current)) goto FoundMoreWhitespace ;
stop = NULL ;
goto NextUnstopped ;

End:
if (stop) *stop = '\0' ;
return str;
}


Don't need no stinkin' optimizer. Unless maybe some modern instruction set... have to investigate that... or not :-)


That is inefficient - the "if (stop)" can be avoided:

char *trim_right(char *str)

{
char *stop;

next:
while(!isspace(*str)) {
if(*str == '\0') return;
str++;
}
stop = str;
do {
str++;
if(*str == '\0') {
*stop = '\0';
return;
}
} while(isspace(*str));
goto next;
}

Re: Recursive Whitespace Removal

2012-12-20 00:43 • by Name Withheld (unregistered)
397674 in reply to 397672
Sorry that was 5,000. Not 50,000! ;-)

Re: Recursive Whitespace Removal

2012-12-20 02:04 • by Evan (unregistered)
OK, in case anyone wants some experimental evaluation for the proposed
solutions, I have timing and test
data
for every full solution I've seen that I didn't think was
an obvious joke. (Aufgehaben's may have been a joke too but I included
it anyway.)

There are two tables. The top table shows correctness tests for
several inputs. Correct outputs are shown as green <OK>s; incorrect
outputs are shown as the result I got. In both inputs (column
headings) and outputs, spaces are replaced by hyphens. The empty
string "" is shown as (empty). For example, the article's
implementation, when given the string " ", outputs " ", which is
incorrect.

The lower table shows timing results, in seconds. In the tests,
exponents denote repetition. I didn't do it very clearly as I don't
mark the grouping; the first column is shown as "a-^500"; what that
means is the string "a-" repeated 500 times; the final column means
499,995 'a's followed by five spaces. (Spaces are again replaced by
'-'s, of course.)

I'm not sure of how stable the numbers are; that's from one run
only. My numbers are from the latest version of MSVC, running on my
pretty old box.

And of course it's TOTALLY possible that I screwed something up, so if you think I'm misrepresenting your solution let me know why. :-) (I did have to make very minor changes to a couple for people who changed the return type to void.)


---

If you want to play around with this, there are three parts. The first is
C++ code that runs each of the
tests and reports their times in JSON format. For timing, I use the
timer class available
from here
(direct
link
). The JSON can be converted into a readable HTML file by
this Python script (it reads
input from stdin and outputs to stdout).

I didn't try compiling with anything but MSVC, but I also am unaware
of any reason it won't compile with GCC 4.5+ with -std=c++0x (I use
the C++11 for loop). To add a new solution, simply add a new block like
TRIM_RIGHT("test name", "url")

{
body;
}

and it will automatically run it.

Re: Recursive Whitespace Removal

2012-12-20 02:09 • by Eric (unregistered)
A nice example of schlemiel-the-painter's algorithm (mixed with some buffer boundary ignorance)

http://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm

(Damn you askimet, this is getting annoying!)

Re: Recursive Whitespace Removal

2012-12-20 02:20 • by Evan (unregistered)
1) Oh yeah, the big change I had to do was to PleegWat's solution, which just enters an infinite loop because it never increments the loop pointer. I stuck an increment in there but didn't really give it much thought as to where it goes, so that may be why it fails so many correctness tests. Not necessarily PleegWat's fault entirely.

2) Sorry about the formatting of my previous post; I forgot to remove the hard newlines that were added as I edited it in emacs.

3) I figured out what prompted the discussion about C evaluation order, and it should have been resolved with Steve the Cynic's and gallier2's posts on the first page.

Re: Recursive Whitespace Removal

2012-12-20 03:20 • by Parasietje (unregistered)
Since this is tail-recursion, a good compiler should optimize this and reuse the stack frame.
I do not know if C has good compilers though.
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Add Comment