| « Prev | Page 1 | Page 2 | Page 3 | Page 4 | Next » |
Re: Recursive Whitespace Removal
2012-12-20 04:10
•
by
Konstantin Lopyrev
(unregistered)
|
|
Omg, somebody is wrong on the Internet.
|
Re: Recursive Whitespace Removal
2012-12-20 04:21
•
by
gallier2
(unregistered)
|
I'm sure EVAN knows that. The thing is, strlen's implementation in most compilers is optimized to death. I've looked at some implementations where it scans 8 bytes in one loop iteration. This means the loop is around 8 times faster as if you scan byte by byte. Then they also took care to limit the number of failed branch predictions, they added prefetch instruction, to limit the number of cache misses. On long strings it makes a hell of a big difference. Furthermore, the compiler has informations about standard functions that it hasn't (generaly) for hand-written one. The compiler can use intrinsics (Microsoft compilers did that already in DOS times), which means they replaced the function call by ad-hoc code. The compiler also knows that the function strlen is pure. That means that 2 consecutive calls to the function will return the same value if fed the same parameters.
will call strlen only once, an optimisation it cannot perform in the general case (it would be fatal for rand() for example). |
Re: Recursive Whitespace Removal
2012-12-20 04:39
•
by
Brendan
(unregistered)
|
|
Hi,
Wow - I'm impressed! However, if we're testing speeds then I'd like to make them functionally "more equivalent". Specifically, I'd like to replace "isspace()" with "*str == ' '" (or equivalent as appropriate) in my entry, as I suspect this is causing extra overhead. Note: I tried g++ version 4.5.4 and it doesn't like range-based for loops at all, so the code wouldn't compile. Newer versions of g++ (that do support range-based for loops) haven't hit repositories yet. -Brendan |
Re: Recursive Whitespace Removal
2012-12-20 04:45
•
by
gallier2
(unregistered)
|
|
I read the other comments you made and I see that you are aware of all my pontifications I made above, sorry for sounding a bit arrogant.
EVAN, however, did the right thing. He tested with various inputs and I'm actually really surprised my solution was that good overall. I am fully aware that it has O(nm) complexity (n number chars, m number of trailing blanks) but as ever the complexity is not the only thing to consider. The constant factor, is imo often more important than the O() class. |
Re: Recursive Whitespace Removal
2012-12-20 05:51
•
by
Krzysiek
(unregistered)
|
Please differentiate between functional languages that depend on recursion because lack of side effects and guarantee tail recursion optimisation and imperative languages that give no TRO guarantee and can bust the stack with sufficiently deep recursion. I would like financial transaction systems not to crash because of strings containing 2k spaces, even if encountering such strings seems improbable. |
Probably because (nearly) everyone is a total moron? Since (nearly) everyone only spots problems by looking for unusual constructions? Recursion looks odd, strlen doesn't, so they can stop spotting problems. Since tail recursion is not a big WTF, and some are even so clever to remember that it can be optimized away, they think they look so much smarter by saying "that's not a WTF". Bit sad, innit? |
Re: Recursive Whitespace Removal
2012-12-20 07:37
•
by
katastrofa
(unregistered)
|
Try mentioning Haskell in a discussion about performance and realtime computing... be prepared to be laughed at. |
|
The recursive trim is worst-case O(N²) as strlen is O(N).
Which is why the trim function would be more efficient to implement in C++ than in C as you can get to the end of a std::string in constant time as the std::string object stores its length so doesn't need to traverse the whole buffer. Of course you would pass in a non-const reference to your trim function. string has several find methods including find_last_not_of which will enable you to find where the end of the string is. You can then simply resize your string to the returned value + 1. So something like:
One could also use std::find_if with a `not` adapter to `isspace` using rbegin and rend on your iterator. That would return you an iterator to the last non-space character (or rend if there are none). You can then convert this to a regular iterator and use erase to erase these from your string. Finally, one could use boost::regex, (or std::regex if available). Unfortunately far too many would choose that option. It's bettr than the recursive whitespace removal but far inferior to a basic loop or algorithm use. |
Re: Recursive Whitespace Removal
2012-12-20 09:36
•
by
Recursive Reclusive
(unregistered)
|
This is similar to I DUNNO LOL's solution and like it, it does bad things to string without spaces. |
Re: Recursive Whitespace Removal
2012-12-20 10:16
•
by
gallier2
(unregistered)
|
Wow, indeed. DUNNO LOL's truncates the string to length 0, Scott's to length 1. I think EVAN should add that test in his conformity table at http://jsbin.com/ewelah/1 |
Re: Recursive Whitespace Removal
2012-12-20 10:52
•
by
The Recursion King
(unregistered)
|
|
It's because his index variable J is set to 0 initially.
For anyone wondering when recursion might come in handy, think about parsing trees with branches of arbitrary depths. |
I do my parsing with an LALR automaton and a stack. In FORTRAN. |
|
Thanks Evan for the analysis. Rather informative (I am being sincere, not sarcastic). Now that I have time to follow up...
* I was in such a hurry I forgot this forum strips tabs out of submissions--sorry my post is so ugly. * Oops! Boy did I mess up the strings with only spaces. Initialize j to -1 and it works a lot better. * Regardless, I like the implementations with pointers (from I-DUNNO-LOL) better than my index-based solution, even if they very likely optimize down to the same execution speed. So forget the index-based solution--pointers are probably better. * Calling strlen() is still not the best solution because of the scan-out and scan-back situation. The gallier2 solution only beats mine (marginally) in speed for most cases because (as somebody pointed out), strlen() is very optimized. If I were offering an ANSI-C level of solution in terms of hand optimization, I would have simply done the single scan forward the same way strlen() does--by doing it 32 or 64 bits at a time, or detecting architecture and using inline assembly. Without calling strlen() first. But that would have resulted in a function that is a lot harder to understand and not as fit for forum discussion. Some things are more important than raw speed. * It's better to NOT check for NULL input on functions like this, in keeping with the speed of ANSI-C standard string functions. If somebody calls trim_right() a zillion times with strings that the caller *knows* could not possibly be NULL, it's just a waste of cycles. If the caller suspects the input strings might be NULL, it's the caller's responsibility to check for NULL before calling the function. (Other standard functions like free() are called so frequently with NULL inputs that it makes sense to do the NULL check inside free() to avoid polluting the calling code). * Recursive-Reclusive: Huh? I tested for strings without spaces, and it worked fine (even my broken j=0 version worked for strings without spaces). * Agreed--the WTF is not the recursion (although that's the worse way to do it), but the inefficiency. And again, thanks Evan for the analysis. |
Re: Recursive Whitespace Removal
2012-12-20 13:51
•
by
PiisAWheeL
|
Spoken like a man who has never coded in javascript and didn't want to lock up a browser. |
Re: Recursive Whitespace Removal
2012-12-20 14:29
•
by
Evan
(unregistered)
|
Sorry, 4.6 is what added support. If I make an update I'll do it.
Let's be clear; it's only this dumb recursive implementation which is quadratic, and it's not really because of the recursion: a straightforward transformation into a loop won't help (as about 17,000 people have pointed out, the compiler will do that for you), and it would be possible to make a recursive version that wasn't stupid. (Also I'd add your std::string-based one to the tests, but it has a different signature and so won't work like that. :-))
Actually I meant to have a test for just "abc" and then forgot about it. Oops.
It's C, not Python. No biggie. :-) (Not spam not spam not spam?) |
Re: Recursive Whitespace Removal
2012-12-20 15:40
•
by
Scott
(unregistered)
|
It's C, not Python. No biggie. :-) Well then it would have been 'hello '.rstrip(' ') :)
And I've thought about it more and I'm starting to turn to the side of the gallier2 solution calling strlen() being better overall. I think I agree that when the compiler sees strlen() it's free to do a lot more optimizations, and it's not like the little trim_right() snippet is unclear for discussions. Maybe I'm just tainted because my favorite interview question is to ask a candidate to write "strcpy()" and I'm surprised how many set up a for-loop with strlen() in the terminating condition, without understanding how strlen() works. Good discussion, and it's refreshing to finally see a C-related WTF for a change. Incidentally, I ran my version and gallier2's on my 8-bit target and for some reason mine beat the socks off gallier2's in every case. |
|
Another Lisp programmer enters the workforce.
|
Re: Recursive Whitespace Removal
2012-12-20 17:37
•
by
gnasher729
(unregistered)
|
If the string passed in consists of n space characters, this function will do about n recursive calls, examine 1.5 n^2 characters, and return a string that ends in a space character. Except when n = 0 where it can crash. So we are not looking for a "more elegant" implementation of a solved problem. We are looking for a solution that is not stupidly complicated, that isn't stupidly slow (if I can pass a million spaces to that function I have a nice DoS attack), and that is actually correct. |
Re: Recursive Whitespace Removal
2012-12-20 17:53
•
by
gnasher729
(unregistered)
|
This will produce an empty string whenever the first character is a space character. |
Re: Recursive Whitespace Removal
2012-12-20 18:04
•
by
Recursive Reclusive
(unregistered)
|
If there are no spaces in the string, j remains zero. So str[j + 1] = '\0' truncates the string to lenght one. Easily to fix, but it's a big bug if you don't. |
|
Who would want to eliminate tail recursion? That sounds as dumb as removing white space, which is more pleasant to keep occupied again and again. Recursion, iteration, any traversal of tail is fine with me.
|
Re: Recursive Whitespace Removal
2012-12-20 19:22
•
by
Scott
(unregistered)
|
No, it works. Here's the code again (only diff: I fixed j=0 to be j=-1. That fixes the string consisting of only spaces, but makes no difference in the string that contains *no* spaces).
Feed it "hello" or something. On i == 0, str[i]!=' ' is true and sets j. Compile it and try it. Try it with j=0 instead of -1; it fails the only-spaces case, but not the no-spaces case. Here's my test stub:
|
|
Follow up, Recursive Reclusive: I hope you get back to me on this because if there is a case I'm missing (or am just misunderstanding something), I do want to learn. Thanks!
|
Re: Recursive Whitespace Removal
2012-12-21 03:54
•
by
gallier2
(unregistered)
|
I didn't put much reflexion when writing my submission on page 1. But it was written in the style that, from my experience, suits best big servers. I work on Oracle M5000 machines (Solaris SPARC 64 bit). Had the routine been written with microcontrollers (those without cache and that are not super-scalar) in mind, I would certainly have written it differently. Probably much more like your version. But as already said, I didn't put much optimisation effort when I wrote that snippet, my biggest emphasizes was to write a correct version. |
Re: Recursive Whitespace Removal
2012-12-21 09:59
•
by
Scott
(unregistered)
|
|
Well you certainly beat me on correctness! (At least the first time--I got it fixed. It's the one test case I forgot to test for).
I was kidding about the 8-bit thing; it was more a suggestion that strlen() probably isn't much more efficient then the super-dumbed down char-by-char for an 8-bit architecture. But for the nominal case (32/64-bit), a good optimized strlen() ended up making your version faster overall. Anyway, good show! |
Re: Recursive Whitespace Removal
2012-12-21 11:37
•
by
Mr.Bob
(unregistered)
|
Ok, three days of these solutions, and I promise I will get off my hobby horse and leave it be after this. None of these "fixes" to the original code still do not handle null strings passed into the code. If str is NULL and you are lucky, this line: while ( str[i] != '\0' ) causes a segfault which will terminate the process. If you're not lucky enough to be on a platform that traps such things, you've just stomped on whatever data lives at memory offset 0, and worse, returned that data to the caller to be mangled further. It doesn't matter if the algorithm is recursive, iterative, unrolled looping, optimized for word size, etc. This is Defensive Coding 101. |
Tail recursion is a valid programming technique, and IMHO it's valid to depend on the compiler doing the right thing. |
Not only the compiler for System/370 would be hard. A lot of code for small microcontrollers is compiled without use of stack for anything but storing return addresses. I have plenty of embedded code with no reentrant functions at all -- in such a case, all parameter passing and local data storage is done via fixed RAM locations. Those locations are overlaid by the linker based on the call tree. On some architectures this is faster and easier than indexing relative to a stack frame. Never mind that you free the register that would be otherwise used to store the stack frame, and you don't have to keep that register updated on each function entry and exit. Heck, for functions called from one call site only, the compiler automatically inlines as that saves a couple cycles and a couple code space bytes. It does matter when you have 512 bytes for your code, and sometimes it does matter to have only that many bytes when you can put you application into a 6 pin part that costs less than $1 even in small quantities. |
Re: Recursive Whitespace Removal
2012-12-21 12:36
•
by
Scott
(unregistered)
|
(Excerpted from above...) * It's better to NOT check for NULL input on functions like this, in keeping with the speed of ANSI-C standard string functions. If somebody calls trim_right() a zillion times with strings that the caller *knows* could not possibly be NULL, it's just a waste of cycles. If the caller suspects the input strings might be NULL, it's the caller's responsibility to check for NULL before calling the function. (Other standard functions like free() are called so frequently with NULL inputs that it makes sense to do the NULL check inside free() to avoid polluting the calling code).
And when we graduate to Intelligent Coding 201... :) Seriously, I know we all have different opinions on the matter of checking inputs, and that's fine, I guess. My opinion is that there are basically three levels of understanding NULL. 1. You don't know about NULL inputs; your code reflects this ignorance because it's completely ignored. 2. You know about NULL and do everything you can to do cover it up* when you get one. 3. You know about all that but choose to ignore the NULL case for speed, code clarity, debugging*, etc. Unfortunately your code looks like #1 so the #2 people just think you're an amateur. *What does one do when one encounters a NULL? Silently ignore it? printf()? How about this: if I run a program and it segfaults, I fire it up in gdb, and guess what? The real problem is usually two functions back in the backtrace. I say if the program is wrong, it should fail big, fast, and spectacularly--it makes it easier to find the problem. If you keep intercepting bad inputs at the low level, you cover problems and make them hard to find. Caveat: This doesn't apply to all cases, and if I were writing for a pacemaker or a space shuttle, I'd approach it differently. |
|
And please stop quoting the Recursive Reclusive post involving my code that says "it does bad things to string without spaces". He's wrong.
I corrected the j=0 problem in a later post, which fixes the "strings with only spaces" problem (j=-1 works better), but the "strings without spaces" has always worked. |
Re: Recursive Whitespace Removal
2012-12-21 12:42
•
by
gallier2
(unregistered)
|
As someone already pointed out. The design of the function demands not to check for NULL pointer. All string and mem functions (except free) in the standard library do not check for NULL pointers. It is a design choice in line with the whole C idea. If the caller wants to avoid a segfault (or worse) it can check if it wants without burdening the 99.9% of the use cases with an unnecessary check. Segfaulting is often a better choice than propagating an error condition, that no function above, in the call stack, knows how to deal with. If one wants an always checking everything language then they should do java, C# or whatever fancy interpreted language. |
Re: Recursive Whitespace Removal
2012-12-21 12:45
•
by
gallier2
(unregistered)
|
Is this a mayan calendar thing? After 2 pages of comments the same questions and comments come back and back again and again. That point was already answered more than once before, by me and by others. |
Re: Recursive Whitespace Removal
2012-12-21 17:14
•
by
JJ
(unregistered)
|
That is a perfectly valid gripe. |
Re: Recursive Whitespace Removal
2012-12-23 15:29
•
by
Mozzis
(unregistered)
|
|
Sheesh! The old cast and too many parens mindset. So juevnile. This is right:
char* trim_right(char* str) { for(size_t i = strlen(str) - 1; i >= 0; i--) if(str[i] == ' ') str[i] = '\0'; return str; } |
Re: Recursive Whitespace Removal
2012-12-23 15:59
•
by
gallier2
(unregistered)
|
|
It still has the bug of truncating the string at the first blank from the string. So functionnally completely wrong.
|
Re: Recursive Whitespace Removal
2012-12-27 03:35
•
by
Jay
(unregistered)
|
Yeah, no. Years ago someone wrote and article about how a well-written program running on a TRS-80 would run faster than a crap program on a Cray. Better hardware (not C) means you can now write crap that runs fast, but it wuld be nice if you didn't. |
Re: Recursive Whitespace Removal
2012-12-27 12:44
•
by
David
(unregistered)
|
|
What's really fun is when you give it a string that is not null terminated.
|
Re: Recursive Whitespace Removal
2013-01-02 15:45
•
by
ab
(unregistered)
|
|
It works fine, as long as you allocate all strings with the proper function:
|
Re: Recursive Whitespace Removal
2013-01-03 05:16
•
by
Neil
(unregistered)
|
Now, if it could only optimise the case where you write l = strlen(x);... |
Re: Recursive Whitespace Removal
2013-01-03 09:20
•
by
bio_end_io_t
(unregistered)
|
|
This sort of code hurts my soul:
str[strlen(str)-1] = '\0'; Apart from the under run problem in the above comment, this is vulnerable to buffer overflows. |
|
Well, except for the atrocious number of calls to strlen, I don’t see the problem.
|
Re: Recursive Whitespace Removal
2013-01-09 01:37
•
by
Anone
(unregistered)
|
How about the fact that recursion is mentioned in the article title? |
Re: Recursive Whitespace Removal
2013-02-28 02:59
•
by
-is
(unregistered)
|
This is tail recursion. In certain class of languages (and their compilers), this executes in constant space (that is, the recursive call is implemented as a jump back to the start) and is, sometimes, in fact the preferred or only way to express loops. Actually, modern gcc has learned to detect this condition even in C, or so I've heard. It would be different if you needed to compute something with the result of the recursive call. |
Re: Recursive Whitespace Removal
2013-02-28 03:02
•
by
-is
(unregistered)
|
This is tail recursion. In certain class of languages (and their compilers), this executes in constant space (that is, the recursive call is implemented as a jump back to the start) and is, sometimes, in fact the preferred or only way to express loops. Actually, modern gcc has learned to detect this condition even in C, or so I've heard. It would be different if you needed to compute something with the result of the recursive call. Of course, you'd compute the strlen once or not at all for this to also compute in linear time. |
| « Prev | Page 1 | Page 2 | Page 3 | Page 4 | Next » |