| « Prev | Page 1 | Page 2 | Page 3 | Page 4 | Next » |
Re: Recursive Whitespace Removal
2012-12-19 14:58
•
by
Evan
(unregistered)
|
(And, btw, as someone said earlier, this turns something that should be an O(n) operation into O(n^2) worst-case; in general, O(nm) where n is the length of the string and m is the number of trailing spaces.) |
Re: Recursive Whitespace Removal
2012-12-19 14:59
•
by
TheOptimizer
(unregistered)
|
if it did any decent compiler optimizer would go "trail recursion? convert into loop". |
Re: Recursive Whitespace Removal
2012-12-19 15:06
•
by
TheOptimizer
(unregistered)
|
try working with trees. |
Re: Recursive Whitespace Removal
2012-12-19 15:25
•
by
Todd
(unregistered)
|
I thought it was pretty neat. |
Re: Recursive Whitespace Removal
2012-12-19 15:50
•
by
Adin Falkhoff
(unregistered)
|
|
Is APL even A Programmin Language?
|
Re: Recursive Whitespace Removal
2012-12-19 15:55
•
by
A developer
(unregistered)
|
|
And a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion and a question about recursion......
LOL |
Re: Recursive Whitespace Removal
2012-12-19 15:57
•
by
Hans
(unregistered)
|
|
I think all of you need to google for "tail recursion". This is perfectly valid code, and a modern compiler is smart enough to optimize this.
|
Re: Recursive Whitespace Removal
2012-12-19 16:42
•
by
Mr.Bob
(unregistered)
|
Only two nits to pick with your answer: 1. Still doesn't handle the possibility that str is NULL. Twice. 2. Strings are arrays of characters, please don't mix them with integer types. |
Re: Recursive Whitespace Removal
2012-12-19 16:43
•
by
The Dark Lord
|
Finally some code I can understand on this forum! Thank you Sir. Oh I miss my APL. re·cur·sion /riˈkərZHən/ Noun 1.See "recursion" |
|
So why the strlen anyway ? Assembly-wise, strlen (the gallier2 version of course) is more efficient only if whitespace detection is difficult. If we're in 7-bit ASCII, it's not. We can either take 0x20 or take into account tabs and such and say <= 0x20 !
Following is less elegant than gallier2. Is it more efficient...
Nah, to be really efficient I'll provide trWTF (gag and throw up all of you anti-goto fanatics):
Don't need no stinkin' optimizer. Unless maybe some modern instruction set... have to investigate that... or not :-) |
Re: Recursive Whitespace Removal
2012-12-19 17:06
•
by
LK
(unregistered)
|
|
Reading Mr Bob's comment... so if (!str) return NULL; OK...
|
|
To everyone saying it's not much of a WTF performancewise: maybe some actual data will convince you. This is how bad it is, based on my own measurements. This is running on Linux, with GCC 4.7.2, with -O3. (-O2 doesn't make much of a difference.)
The following shows, for strings of various lengths made entirely of spaces (BTW I didn't read the thread, did anyone point out it's buggy in this scenario?), how much longer the submitted version takes to run vs. ¯\(°_o)/¯ I DUNNO LOL's solution (quoted below by Mr. Bob), which I arbitrarily chose.
For the million-character string, the submitted version took about 33 seconds, and the reasonable solution took about 1.2 milliseconds. Now, maybe you say "well, a string of a million spaces, that's not a realistic input so I'm not going to worry", and that's not a totally unreasonable standpoint to take. However, to say "eh it's not dumb because the compiler will just optimize away the tail recursion" is overlooking the dumbness of taking a linear-time algorithm and turning it into a quadratic one. (Well, while not trading out a large constant -- e.g. Fibonacci heaps are required for Dijkstra's algorithm to have it's advertised complexity, but the constant factor of Fibonacci heaps over other kinds of heaps means in practice you'll usually get better performance with a higher-complexity version.) Play around with it yourself; you'll have to modify the way you do timing (look at QueryPerformanceCounter() or something) if you want to run it on Windows. http://pastebin.com/M5xZJYdd |
|
Two other performance notes. First, turns out that I DUNNO LOL's solution is actually a lot slower than it needs to be due to it scanning the string character-by-character. If you use gallier2's version with strlen(), that will improve the performance even more -- for the 1,000,000 character string, by another 17x, which gives a whopping 450,000x suckitude factor for the submitted version.
Also, you don't need to have a string of all spaces for other methods to show performance advantages -- a single space at the end of a 10,000-character string is enough for gallier2's solution to register about a 1.5x improvement over the submitted version (though in that case the submitted version is actually ~4x faster than I DUNNO LOL's). |
|
Not recursive, doesn't call any other functions, O(n), simple, works (as far as I can tell in a hot hurry). Apologies if somebody posted a similar solution already.
char* trim_right( char* str ) { int i = 0, j = 0; while ( str[i] != '\0' ) { if ( str[i] != ' ' ) { j = i; } ++i; } str[j + 1] = '\0'; return str; } |
Re: Recursive Whitespace Removal
2012-12-19 18:03
•
by
Evan
(unregistered)
|
Actually this is incorrect, somewhat. For the case of a string made entirely of spaces, gallier2's version walks backward over the string character by character and is also "slow". This means that multiplying the 17x improvement of gallier2's version on tests that are NOT all spaces (in that case it was 999,999 'a's followed by a single space) by the 27,000x or whatever improvement from my earlier table was just flat out wrong. However, on a string that's not all spaces, gallier2's version is indeed faster. (For the same reason, Scott's version he just posted is also slow in comparison to gallier2 -- about the same as I DUNNO LOL, or about 18x slower than gallier2's for the 999,999 'a's then a space test case. "Not calling other functions" (in this case, strlen()), is not a virtue.) |
|
I would use (assuming str is a valid string and p is a char *):
if ((p = strrchr(str, ' ')) != NULL) *p = '\0'; |
Re: Recursive Whitespace Removal
2012-12-19 18:40
•
by
Evan
(unregistered)
|
That may be the buggiest version yet: "abc " => "abc " |
Re: Recursive Whitespace Removal
2012-12-19 19:14
•
by
A developer
(unregistered)
|
|
Alright it's official.
YOU'RE ALL A BUNCH OF NERDS! |
Re: Recursive Whitespace Removal
2012-12-19 19:26
•
by
Trollinator
(unregistered)
|
|
TRWTF is the submitter who doesn't seem to understand tail recursion optimisation.
|
|
Who was it that said something like:
"C has all the power and capability of Assembler with the elegance and readability of Assembler" |
Re: Recursive Whitespace Removal
2012-12-19 19:42
•
by
Jim
(unregistered)
|
Except the line before that takes care of that.... I wonder why he didn;t reuse len? |
Re: Recursive Whitespace Removal
2012-12-19 19:44
•
by
ase; dl
(unregistered)
|
I can't see no stinkin' buffer underflow.... |
Re: Recursive Whitespace Removal
2012-12-19 19:46
•
by
ase;dl
(unregistered)
|
although "len" is not actually len.... |
Re: Recursive Whitespace Removal
2012-12-19 19:49
•
by
Norman Diamond
(unregistered)
|
The analyses by MIPS, Steve, and Gallier are correct but understated. One more rule in the C standard is that the old value of bp shall be used only to determine the new value to be stored in bp, but the program sometimes (depending on the order of evaluation which can go either way) uses the old value in computing a different expression. So the behaviour is completely undefined. The assignment never has to be reached. The program could format your hard drives instead. P.S. Considering the amount of embedded systems in use now in TV remote controls, cars, and rice cookers, it might be a good idea to be aware of the continued existence of MIPS. |
Re: Recursive Whitespace Removal
2012-12-19 20:37
•
by
metadalek
(unregistered)
|
|
Sorry, I assumed we have trailing white space.
You obviously need an extra guard if that assumption is not correct. |
Re: Recursive Whitespace Removal
2012-12-19 20:40
•
by
Exercise
(unregistered)
|
there's an intriguing concept, a non-iterative loop....
|
Re: Recursive Whitespace Removal
2012-12-19 20:40
•
by
ase; dl
(unregistered)
|
OOPS - NOW I SEE IT!!! |
Re: Recursive Whitespace Removal
2012-12-19 20:46
•
by
jimmy
(unregistered)
|
FTFY. C does guarantee order of evaluation of lots of other things, but. Some compilers will assume pre-increment means "before any statement on the line" while some will assume it means "in place, but before it is used for this particular operation". (for post :s/before/after/g) Pretty sure the Visual Studio one and MinGW are different (which many of you might have so play around with things like: A=8;B=8 printf("%d\n", A++ + ++B); printf("%d\n", --A + B--); etc and compare the results on both of them....If you yhave multiple ++ (or --) on the same var on a row it becomes particularly interesting.... |
Re: Recursive Whitespace Removal
2012-12-19 21:40
•
by
Evan
(unregistered)
|
It doesn't work even in that case: if you have multiple spaces at the end, it will only remove the final one (as illustrated by my first test case).
And Norman Diamond's comment is correct. Unlike some kinds of implementation-defined behavior (does foo(bar(), baz()) call bar() or baz() first? the standard doesn't say), if you have multiple increments, decrements, assignments (i = i++ or (i=7) + (i=4)), etc. to the same variable without an intervening sequence point, the entire behavior is undefined. It doesn't even have to pick one. For instance, in practice, i = i++ will likely either leave the value of i unchanged or increment it, but it needn't do either. Once a program encounters truly undefined behavior, it can do anything. That could crash your program. It could mail your porn stash to your mother, and it would be totally allowed by the C standard. |
Re: Recursive Whitespace Removal
2012-12-19 21:50
•
by
Evan
(unregistered)
|
Actually looking at this code again I think that this has deterministic behavior according to the standard; if I'm correct, that is required to print 17\n17\n and leave A == B == 8 after those three lines. You might be looking for something like A = 8; which is allowed to print either "8 9" or "8 8" (but is required to do one or the other). |
Re: Recursive Whitespace Removal
2012-12-19 22:13
•
by
¯\(°_o)/¯ I DUNNO LOL
(unregistered)
|
You mean 397535? Do you even know how strlen works? Hint: strlen is an O(n) operation. What I wrote is nothing more than a version of strlen that remembers the last start of a group of blanks instead of counting the length. |
Ummm..yes, actually I did. Sorry about that; your qualifier was correctly present. |
Okay, when I was working this problem, I didn't know about the iterative solution. But I looked at it and the variations on Wikipedia, and it seems to me that any expression of these algorithms would involve tracking arrays and some fairly complex decision logic. Just for example, you have to know how many disks in advance, and what the top disk on each stack is. The recursive solver, which I hacked together again, requires one solitary condition (marked below):
Hmmm.... |
Re: Recursive Whitespace Removal
2012-12-19 22:26
•
by
Lex Luthor
(unregistered)
|
|
Agreed. This is not a WTF. Nothing wrong with recursion in C, but possibly person who reported it.
|
Re: Recursive Whitespace Removal
2012-12-19 22:34
•
by
¯\(°_o)/¯ I DUNNO LOL
(unregistered)
|
|
FYI, what you were seeing was the effect of strlen using the x86 string instructions on your specific computer. Try it on an ARM-based system sometime. Optimization is not an all or nothing situation. The same code could run faster on some systems and slower on others.
And are you seriously implying that someone would (intentionally) use this kind of function on a string of more than about 256 bytes or so? At that point, the advantage of the string instructions is lost.
Does that mean it runs in negative time? I don't think that math means what you think it means. You probably meant 1/25x, or 25 times FASTER. Guess what? It's not going to be called to nip off blanks at the end of a 1 megabyte block of memory, unless there's a bug that overwrote the null at the end. There's no point in such an operation. In all sane use cases, my code is... faster! Benchmarks are useless unless you know WHY something is faster or slower. Benchmarks are also useless if you're doing the equivalent of feeding mice a year's worth of cyclamates every day. |
Re: Recursive Whitespace Removal
2012-12-19 22:40
•
by
Evan
(unregistered)
|
Real strlen() is more efficient than just reading a string character-by-character in a loop. There's a reason that glibc's strlen(), for example, is closer to 50 loc than 5.
Why does everyone seem to think that "the author used recursion" is the wtf instead of "it's buggy and horribly inefficient [not due directly to the recursion]"? |
Re: Recursive Whitespace Removal
2012-12-19 22:49
•
by
Evan
(unregistered)
|
The post with the chart was comparing your (reasonable) solution with the (dumb) solution from the story submission; your code is faster than that in all cases that I could measure. :-) For 1,000 characters, 25 times faster than the dumb submission. For 1,000,000 characters, 27000 times faster.
This is true, and it's applicable to my later, more informal, comparison to gallier2's strlen-based solution. It's possible that in other architectures, a typical real strlen would be closer to what you did. However, the improvements I cited were not due to the x86 string instructions, which are not (to my knowledge) used by glibc. (Aside: perhaps a strlen() in code does not actually call glibc's strlen but invokes an intrinsic which uses them? Perhaps. I'm too lazy to check.)
I could see someone using it on input, which means that the length of the string is in the hands of the user rather than the developer.
Sorry, that ~ is "about"; the execution time of your solution with 1000 bytes is comparable to the frequency of the timer I was using (1 microsecond) and so the difference was usually either around 18x or 36x depending on whether yours took just one microsecond or two, so I just sorta picked something in the middle. :-) |
Re: Recursive Whitespace Removal
2012-12-19 23:24
•
by
Hacky
(unregistered)
|
|
Could in fact corrupt heap or segv by writing to memory before string I believe.
|
|
it's machinations of evil lispers =)
|
Re: Recursive Whitespace Removal
2012-12-20 00:00
•
by
¯\(°_o)/¯ I DUNNO LOL
(unregistered)
|
|
Ah, that was a tilde at low resolution.
I thought you were comparing mine vs gallier2's, when you were comparing mine vs the article's. FWIW, gallier2's would strictly be slower than mine when given a buffer full of nothing but blanks, because I never have to backtrack. If you want your benchmark to be meaningful, you have to try both optimal and pessimal cases. strlen's speed is not only dependent upon the library quality, it is even possible for a compiler to detect it and generate inline string instructions. But without string instructions, it'll be little better than inlining "*p=str; len=0; while(*p++) len++; return len;" Also, using array indexing in a loop may potentially generate much worse code than using a pointer if the compiler can't identify the looping and factor it out into a simple pointer. On some architectures it may be the same speed or faster to use indexing over pointers. (Keep the base address in a register, add the offset in an addressing mode.) That would explain the sharp speed difference in the benchmark, which I mistakenly thought was gallier2's and due to the strlen call. My code has the advantage of being very easy for a half decent compiler to optimize (aside from not likely being able to use string instructions) into very lean code. Everything can be kept in registers without touching the stack, and even the if statement in the loop would take advantage of ARM's conditional execution. But really, if it isn't used more than a few times a second, and for short strings, then validity and correctness is far more important than speed. |
Re: Recursive Whitespace Removal
2012-12-20 00:10
•
by
Evan
(unregistered)
|
Instead of cleaning my apartment or doing something useful, I'm putting together a benchmark set of all of the proposed solutions to trim_right; it'll involve a few different cases; all spaces, no spaces, and in between. |
|
Also, I apologize about not being as clear as I could have been. I'm much less interested in comparing yours vs strlen-based solutions; mostly I'm trying to point out why everyone going "but compilers have tail call optimization!" is barking up the wrong tree.
|
|
What happens when an empty (0 length) string is provided... buffer overflow... errr.. underflow?
|
Re: Recursive Whitespace Removal
2012-12-20 00:31
•
by
Name Withheld
(unregistered)
|
|
I admit this code is horribly inefficient, but I just tested it with 53000 spaces, and it completed in less than a millisecond.
Don't think they have much to worry about. |
Re: Recursive Whitespace Removal
2012-12-20 00:35
•
by
Brendan
(unregistered)
|
|
Hi,
That is inefficient - the "if (stop)" can be avoided: char *trim_right(char *str) |
Re: Recursive Whitespace Removal
2012-12-20 00:43
•
by
Name Withheld
(unregistered)
|
|
Sorry that was 5,000. Not 50,000! ;-)
|
|
OK, in case anyone wants some experimental evaluation for the proposed
solutions, I have timing and test data for every full solution I've seen that I didn't think was an obvious joke. (Aufgehaben's may have been a joke too but I included it anyway.) There are two tables. The top table shows correctness tests for several inputs. Correct outputs are shown as green <OK>s; incorrect outputs are shown as the result I got. In both inputs (column headings) and outputs, spaces are replaced by hyphens. The empty string "" is shown as (empty). For example, the article's implementation, when given the string " ", outputs " ", which is incorrect. The lower table shows timing results, in seconds. In the tests, exponents denote repetition. I didn't do it very clearly as I don't mark the grouping; the first column is shown as "a-^500"; what that means is the string "a-" repeated 500 times; the final column means 499,995 'a's followed by five spaces. (Spaces are again replaced by '-'s, of course.) I'm not sure of how stable the numbers are; that's from one run only. My numbers are from the latest version of MSVC, running on my pretty old box. And of course it's TOTALLY possible that I screwed something up, so if you think I'm misrepresenting your solution let me know why. :-) (I did have to make very minor changes to a couple for people who changed the return type to void.) --- If you want to play around with this, there are three parts. The first is C++ code that runs each of the tests and reports their times in JSON format. For timing, I use the timer class available from here (direct link). The JSON can be converted into a readable HTML file by this Python script (it reads input from stdin and outputs to stdout). I didn't try compiling with anything but MSVC, but I also am unaware of any reason it won't compile with GCC 4.5+ with -std=c++0x (I use the C++11 for loop). To add a new solution, simply add a new block like TRIM_RIGHT("test name", "url")
and it will automatically run it. |
|
A nice example of schlemiel-the-painter's algorithm (mixed with some buffer boundary ignorance)
http://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm (Damn you askimet, this is getting annoying!) |
|
1) Oh yeah, the big change I had to do was to PleegWat's solution, which just enters an infinite loop because it never increments the loop pointer. I stuck an increment in there but didn't really give it much thought as to where it goes, so that may be why it fails so many correctness tests. Not necessarily PleegWat's fault entirely.
2) Sorry about the formatting of my previous post; I forgot to remove the hard newlines that were added as I edited it in emacs. 3) I figured out what prompted the discussion about C evaluation order, and it should have been resolved with Steve the Cynic's and gallier2's posts on the first page. |
|
Since this is tail-recursion, a good compiler should optimize this and reuse the stack frame.
I do not know if C has good compilers though. |
| « Prev | Page 1 | Page 2 | Page 3 | Page 4 | Next » |