- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Omg, somebody is wrong on the Internet.
Admin
I'm sure EVAN knows that. The thing is, strlen's implementation in most compilers is optimized to death. I've looked at some implementations where it scans 8 bytes in one loop iteration. This means the loop is around 8 times faster as if you scan byte by byte. Then they also took care to limit the number of failed branch predictions, they added prefetch instruction, to limit the number of cache misses. On long strings it makes a hell of a big difference. Furthermore, the compiler has informations about standard functions that it hasn't (generaly) for hand-written one. The compiler can use intrinsics (Microsoft compilers did that already in DOS times), which means they replaced the function call by ad-hoc code. The compiler also knows that the function strlen is pure. That means that 2 consecutive calls to the function will return the same value if fed the same parameters.
will call strlen only once, an optimisation it cannot perform in the general case (it would be fatal for rand() for example).
Admin
Hi,
Wow - I'm impressed!
However, if we're testing speeds then I'd like to make them functionally "more equivalent". Specifically, I'd like to replace "isspace()" with "*str == ' '" (or equivalent as appropriate) in my entry, as I suspect this is causing extra overhead.
Note: I tried g++ version 4.5.4 and it doesn't like range-based for loops at all, so the code wouldn't compile. Newer versions of g++ (that do support range-based for loops) haven't hit repositories yet.
-Brendan
Admin
I read the other comments you made and I see that you are aware of all my pontifications I made above, sorry for sounding a bit arrogant.
EVAN, however, did the right thing. He tested with various inputs and I'm actually really surprised my solution was that good overall. I am fully aware that it has O(nm) complexity (n number chars, m number of trailing blanks) but as ever the complexity is not the only thing to consider. The constant factor, is imo often more important than the O() class.
Admin
Please differentiate between functional languages that depend on recursion because lack of side effects and guarantee tail recursion optimisation and imperative languages that give no TRO guarantee and can bust the stack with sufficiently deep recursion.
I would like financial transaction systems not to crash because of strings containing 2k spaces, even if encountering such strings seems improbable.
Admin
Admin
Try mentioning Haskell in a discussion about performance and realtime computing... be prepared to be laughed at.
Admin
The recursive trim is worst-case O(N²) as strlen is O(N).
Which is why the trim function would be more efficient to implement in C++ than in C as you can get to the end of a std::string in constant time as the std::string object stores its length so doesn't need to traverse the whole buffer.
Of course you would pass in a non-const reference to your trim function.
string has several find methods including find_last_not_of which will enable you to find where the end of the string is. You can then simply resize your string to the returned value + 1. So something like:
One could also use std::find_if with a
not
adapter toisspace
using rbegin and rend on your iterator. That would return you an iterator to the last non-space character (or rend if there are none). You can then convert this to a regular iterator and use erase to erase these from your string.Finally, one could use boost::regex, (or std::regex if available). Unfortunately far too many would choose that option. It's bettr than the recursive whitespace removal but far inferior to a basic loop or algorithm use.
Admin
Admin
Wow, indeed.
DUNNO LOL's truncates the string to length 0, Scott's to length 1. I think EVAN should add that test in his conformity table at http://jsbin.com/ewelah/1
Admin
It's because his index variable J is set to 0 initially.
For anyone wondering when recursion might come in handy, think about parsing trees with branches of arbitrary depths.
Admin
Admin
Thanks Evan for the analysis. Rather informative (I am being sincere, not sarcastic). Now that I have time to follow up...
I was in such a hurry I forgot this forum strips tabs out of submissions--sorry my post is so ugly.
Oops! Boy did I mess up the strings with only spaces. Initialize j to -1 and it works a lot better.
Regardless, I like the implementations with pointers (from I-DUNNO-LOL) better than my index-based solution, even if they very likely optimize down to the same execution speed. So forget the index-based solution--pointers are probably better.
Calling strlen() is still not the best solution because of the scan-out and scan-back situation. The gallier2 solution only beats mine (marginally) in speed for most cases because (as somebody pointed out), strlen() is very optimized. If I were offering an ANSI-C level of solution in terms of hand optimization, I would have simply done the single scan forward the same way strlen() does--by doing it 32 or 64 bits at a time, or detecting architecture and using inline assembly. Without calling strlen() first. But that would have resulted in a function that is a lot harder to understand and not as fit for forum discussion. Some things are more important than raw speed.
It's better to NOT check for NULL input on functions like this, in keeping with the speed of ANSI-C standard string functions. If somebody calls trim_right() a zillion times with strings that the caller knows could not possibly be NULL, it's just a waste of cycles. If the caller suspects the input strings might be NULL, it's the caller's responsibility to check for NULL before calling the function. (Other standard functions like free() are called so frequently with NULL inputs that it makes sense to do the NULL check inside free() to avoid polluting the calling code).
Recursive-Reclusive: Huh? I tested for strings without spaces, and it worked fine (even my broken j=0 version worked for strings without spaces).
Agreed--the WTF is not the recursion (although that's the worse way to do it), but the inefficiency.
And again, thanks Evan for the analysis.
Admin
Admin
If I make an update I'll do it.
Let's be clear; it's only this dumb recursive implementation which is quadratic, and it's not really because of the recursion: a straightforward transformation into a loop won't help (as about 17,000 people have pointed out, the compiler will do that for you), and it would be possible to make a recursive version that wasn't stupid.(Also I'd add your std::string-based one to the tests, but it has a different signature and so won't work like that. :-))
Actually I meant to have a test for just "abc" and then forgot about it. Oops. It's C, not Python. No biggie. :-)(Not spam not spam not spam?)
Admin
Well then it would have been
:)And I've thought about it more and I'm starting to turn to the side of the gallier2 solution calling strlen() being better overall. I think I agree that when the compiler sees strlen() it's free to do a lot more optimizations, and it's not like the little trim_right() snippet is unclear for discussions.
Maybe I'm just tainted because my favorite interview question is to ask a candidate to write "strcpy()" and I'm surprised how many set up a for-loop with strlen() in the terminating condition, without understanding how strlen() works.
Good discussion, and it's refreshing to finally see a C-related WTF for a change.
Incidentally, I ran my version and gallier2's on my 8-bit target and for some reason mine beat the socks off gallier2's in every case.
Admin
Another Lisp programmer enters the workforce.
Admin
If the string passed in consists of n space characters, this function will do about n recursive calls, examine 1.5 n^2 characters, and return a string that ends in a space character. Except when n = 0 where it can crash. So we are not looking for a "more elegant" implementation of a solved problem. We are looking for a solution that is not stupidly complicated, that isn't stupidly slow (if I can pass a million spaces to that function I have a nice DoS attack), and that is actually correct.
Admin
This will produce an empty string whenever the first character is a space character.
Admin
Admin
Who would want to eliminate tail recursion? That sounds as dumb as removing white space, which is more pleasant to keep occupied again and again. Recursion, iteration, any traversal of tail is fine with me.
Admin
No, it works. Here's the code again (only diff: I fixed j=0 to be j=-1. That fixes the string consisting of only spaces, but makes no difference in the string that contains no spaces).
Feed it "hello" or something. On i == 0, str[i]!=' ' is true and sets j. Compile it and try it. Try it with j=0 instead of -1; it fails the only-spaces case, but not the no-spaces case.
Here's my test stub:
Admin
Follow up, Recursive Reclusive: I hope you get back to me on this because if there is a case I'm missing (or am just misunderstanding something), I do want to learn. Thanks!
Admin
I didn't put much reflexion when writing my submission on page 1. But it was written in the style that, from my experience, suits best big servers. I work on Oracle M5000 machines (Solaris SPARC 64 bit). Had the routine been written with microcontrollers (those without cache and that are not super-scalar) in mind, I would certainly have written it differently. Probably much more like your version. But as already said, I didn't put much optimisation effort when I wrote that snippet, my biggest emphasizes was to write a correct version.
Admin
Well you certainly beat me on correctness! (At least the first time--I got it fixed. It's the one test case I forgot to test for).
I was kidding about the 8-bit thing; it was more a suggestion that strlen() probably isn't much more efficient then the super-dumbed down char-by-char for an 8-bit architecture. But for the nominal case (32/64-bit), a good optimized strlen() ended up making your version faster overall.
Anyway, good show!
Admin
Ok, three days of these solutions, and I promise I will get off my hobby horse and leave it be after this.
None of these "fixes" to the original code still do not handle null strings passed into the code. If str is NULL and you are lucky, this line:
causes a segfault which will terminate the process. If you're not lucky enough to be on a platform that traps such things, you've just stomped on whatever data lives at memory offset 0, and worse, returned that data to the caller to be mangled further.
It doesn't matter if the algorithm is recursive, iterative, unrolled looping, optimized for word size, etc. This is Defensive Coding 101.
Admin
Admin
Never mind that you free the register that would be otherwise used to store the stack frame, and you don't have to keep that register updated on each function entry and exit. Heck, for functions called from one call site only, the compiler automatically inlines as that saves a couple cycles and a couple code space bytes. It does matter when you have 512 bytes for your code, and sometimes it does matter to have only that many bytes when you can put you application into a 6 pin part that costs less than $1 even in small quantities.
Admin
(Excerpted from above...)
And when we graduate to Intelligent Coding 201... :)
Seriously, I know we all have different opinions on the matter of checking inputs, and that's fine, I guess. My opinion is that there are basically three levels of understanding NULL. 1. You don't know about NULL inputs; your code reflects this ignorance because it's completely ignored. 2. You know about NULL and do everything you can to do cover it up* when you get one. 3. You know about all that but choose to ignore the NULL case for speed, code clarity, debugging*, etc. Unfortunately your code looks like #1 so the #2 people just think you're an amateur.
*What does one do when one encounters a NULL? Silently ignore it? printf()? How about this: if I run a program and it segfaults, I fire it up in gdb, and guess what? The real problem is usually two functions back in the backtrace. I say if the program is wrong, it should fail big, fast, and spectacularly--it makes it easier to find the problem. If you keep intercepting bad inputs at the low level, you cover problems and make them hard to find.
Caveat: This doesn't apply to all cases, and if I were writing for a pacemaker or a space shuttle, I'd approach it differently.
Admin
And please stop quoting the Recursive Reclusive post involving my code that says "it does bad things to string without spaces". He's wrong.
I corrected the j=0 problem in a later post, which fixes the "strings with only spaces" problem (j=-1 works better), but the "strings without spaces" has always worked.
Admin
As someone already pointed out. The design of the function demands not to check for NULL pointer. All string and mem functions (except free) in the standard library do not check for NULL pointers. It is a design choice in line with the whole C idea. If the caller wants to avoid a segfault (or worse) it can check if it wants without burdening the 99.9% of the use cases with an unnecessary check. Segfaulting is often a better choice than propagating an error condition, that no function above, in the call stack, knows how to deal with. If one wants an always checking everything language then they should do java, C# or whatever fancy interpreted language.
Admin
Is this a mayan calendar thing? After 2 pages of comments the same questions and comments come back and back again and again. That point was already answered more than once before, by me and by others.
Admin
Admin
Sheesh! The old cast and too many parens mindset. So juevnile. This is right:
char* trim_right(char* str) { for(size_t i = strlen(str) - 1; i >= 0; i--) if(str[i] == ' ') str[i] = '\0';
return str; }
Admin
It still has the bug of truncating the string at the first blank from the string. So functionnally completely wrong.
Admin
Yeah, no.
Years ago someone wrote and article about how a well-written program running on a TRS-80 would run faster than a crap program on a Cray.
Better hardware (not C) means you can now write crap that runs fast, but it wuld be nice if you didn't.
Admin
What's really fun is when you give it a string that is not null terminated.
Admin
It works fine, as long as you allocate all strings with the proper function:
Admin
Admin
This sort of code hurts my soul:
Apart from the under run problem in the above comment, this is vulnerable to buffer overflows.
Admin
Well, except for the atrocious number of calls to strlen, I don’t see the problem.
Admin
How about the fact that recursion is mentioned in the article title?
Admin
This is tail recursion. In certain class of languages (and their compilers), this executes in constant space (that is, the recursive call is implemented as a jump back to the start) and is, sometimes, in fact the preferred or only way to express loops.
Actually, modern gcc has learned to detect this condition even in C, or so I've heard.
It would be different if you needed to compute something with the result of the recursive call.
Admin
This is tail recursion. In certain class of languages (and their compilers), this executes in constant space (that is, the recursive call is implemented as a jump back to the start) and is, sometimes, in fact the preferred or only way to express loops.
Actually, modern gcc has learned to detect this condition even in C, or so I've heard.
It would be different if you needed to compute something with the result of the recursive call.
Of course, you'd compute the strlen once or not at all for this to also compute in linear time.
Admin
Achat mГ©dicament en ligne fiable https://kamagraenligne.com/# pharmacie en ligne avec ordonnance