- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Someone just learned about recursion, and it became the tool of choice for the next several weeks.
When you have a hammer in your hand, everything becomes a nail.
Admin
Not cool to write a 0 byte before the buffer if the string has length 0 and there is a ' ' before it. This seems quite low odd case but it can happen if you use a pointer to scan a longer string with trailing blanks, and passing that pointer to that function.
The recursion is of course another wtf.
Admin
Presumably...
unsigned int strlen(char* str) { if (str[0] == '\0') { return 0; } return 1 + strlen(str + 1); }
Admin
So what?
Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.
IMHO this is far from TDWTF material.
Admin
C: All the power of Assembly with most of the flexibility.
Admin
It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills. C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).
Admin
Maybe the coder had some background in Prolog programming.
Admin
This is C; it's going to run pretty darn fast. And no matter how you implement it (or let someone else implement it) string processing by its very nature involves looping over characters until you do or don't find something. So those who say "OMFG what if there's a lot of spaces" are in the same category as those who say "moar XML!" -- they just don't understand what's actually happening inside the machine.
So this boils down to which costs more: the CPU time to execute an allegedly unnecessary stack push and pop, or the developer time to take a solved problem and write a "more elegant" implementation?
Admin
Doesn't pretty much all of lisp work this way?
Admin
buffer underwrite.... it doesn't work as intended.
Admin
Looks like it was coded by someone that had a lot of experience with Prolog or Lisp
Admin
The real WTF is not searching for a library to handle this for you. Look at what other people have written and do a nice code review and drop in your borrowed solution.
Admin
Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.
Admin
the tail recursion is arguably correct. It's the double strlen call and the failure on empty string that make this code bad.
Admin
Admin
But it isn't working. Did you not read the earlier comment by gallier2?
If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.
Admin
Try checking the algorithm for bugs before claiming that "poor algorithmic skill" implies a criticism of tail recursion. Be prepared to be laughed at.
Admin
Wasn't that difficult, was it?
Admin
Well, how much whitespace characters do you expect? One? Two? A billion? The programmer has probably taken into account the kind of data fed into the function and has decided that this approach would work. The function is a bit unusual and sloppy but I would not call it a WTF.
Admin
The poor skill refers to even using recursion for a simple string manipulation. Recursion is an excellent tool for some problems, strings is normally not one of them.
Admin
Are you the same guy who thinks that it is pointless to spend time on an algorithm when just can buy a faster CPU?
Admin
You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.
Not to mention a potential stack overflow. And wthat tohers have said...
Admin
"However, this power is limited insomuch that you don't have many of the friendly helper-functions that exist in higher level languages, such as string manipulation for example, unless you go ahead and make them yourself."
Or use one of the gigabytes of libraries that have been written for C, the single most successful programming language of all time.
Admin
How come no one ever so far was horrified by the quadratic time complexity? Sure, there's a memory cache. Still, it's not easy to get 1G instructions under 1G clock cycles.
captcha - El grasso es verto. El noche es negro.
Admin
It reminds me of the old contest to implement the slowest possible sort algorithm. That produced some rather mind-bogglingly slow sort routines. Perhaps the author(s) of this knew about that contest, and decided to try the same idea for string trimming. Calling strlen(), which does a scan of the string, twice during each recursive iteration is a clever idea for such a contest. This routine could be made even slower, however, by not saving the length, but calling strlen(str) for each use of len. That would add a third call, and adding a test for strlen(str) != -1 (rather than changing the ==0 test to <=0) would add another loop.
Admin
Apart from the unnecessary recursion and the serious buffer underrun issue, it also fails completely if whitespace takes any other form than 0x20 (e.g. \t, \n, \r).
Admin
Could've been written a bit better in APL, like trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵} or trim_right←{⌽{(∨' '≠⍵)/⍵}⌽⍵}
Admin
LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.
C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.
Admin
And, of course, it is entirely feasible (as suggested by another poster above) that int is not large enough to hold all possible values of the return value of strlen. The other case is a 16-bit real-mode x86 system using the commonly available "huge" memory model, where data pointers were normalised to always have an offset in the 0-15 range. size_t wasn't normally (at the time) bigger than 16 bits, but you could imagine some nutjob at Borland or Microsoft deciding it should be unsigned long, which causes problems here if you pass exactly 64K of string plus one '\000'.
Addendum (2012-12-19 09:15): Actually, I suck at reading code. If the string contains exactly one character (plus the NUL), that character will be left alone and the function will return more or less harmlessly. So a single blank will not be trimmed, and all strings containing only two or more blanks will be trimmed to one blank.
The nuclear fireball remains the probable response when the string is empty.
Admin
Admin
while(l && str[--l] == ' ')
Would always work as order of evaluation is not guaranteed, do decrement inside loop.
Admin
Very simple and elegant. I like it.
CAPTCHA: jumentum - don't jumentum it.
Admin
Does it make me bad coder if I prefer iterative loops or just loops for this sort of stuff?
Also, if there is dynamic memory allocation I surely hope the lengths are fixed or originals are stored somewhere...
Admin
Tail optimization won't remove the strlen at the first line that runs through all the (remaining) string for each character on it.
Anyway, one normaly doesn't trim large strings. It is quit possible that his code makes no difference at all.
Admin
This is the sort of academic BS I had to do in university programming courses. For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.
Admin
FTFY
Admin
C does guarantee order of evaluation.
Admin
And C does not have to have a contiguous stack for parameter passing and local variables. Mandating such a thing would make writing C compilers for System/370 difficult, since these machines don't even have a direct concept of a stack. And even on an architecture with a machine stack, it isn't mandatory to put the activation records on the machine stack. Most systems do it that way because it's faster than allocating from the free store / heap for it, but it isn't required behaviour. (However, you can make a case that we shouldn't store activation records on the machine stack, in order to prevent buffer overflows from corrupting return addresses and allowing malware delivery.)
I talk too much sometimes.
Admin
I've used it once or twice when doing graph-theoretical work (network analysis, etc.) It can be a convenient technique for finding a top-level ancestor not, IIRC ... writing it in the form of a loop makes for an unwieldy codebase, whereas recursion makes for simple and easily-comprehended algorithm. The caveat is, of course, that you ought only to implement it on structures which are not more than four or five levels deep - and in the problem domain for which this solution was designed, that condition held.
Admin
Admin
[quote]
Ouch. Evil function. I'm not sure what part is more evil - the part where it's actually recursive (seems a bit unnecessary to me), the bit where it's intended to be replacing ' ' with an explicit NUL '\0'...or the part where it's actually just making the final character of the given string a NUL '\0'. Hope that's not called too often...
Personally, I'd just do something like this:
I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.
CAPTCHA: causa - the original might causa problem!
Admin
If we're micro-optimizing, shouldn't we refrain from writing an end-of-string null character until we've found the first non-blank? The CPU could give you lots of flushes if it lets the loads (to check for blanks) run ahead of the stores (which depend on the values of the loaded bytes) and it doesn't track load/store collisions on a byte basis (perhaps word or cache line). Not to mention the MP effects of all that store traffic hitting the write gathering.
Of course, if we were expecing significant white space, we'd look at the first few bytes to get to word-alignment and then start looking for non-zero words. For even more speed, the SIMD engines on modern processors can even tell us which byte was non-zero while looking at 16 bytes at a time.
Admin
struct { int important; char stuff[10]; } foo; foo.important = 0x12345678; ... trim_right(foo.stuff);
Admin
C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.
Admin
Admin
Order of evaluation is guaranteed, && || are so called sequence points, they force the order of evaluation, or else the short-circuit evaluation wouldn't work.
If I had written
Then you would be right? There would be 'undefined behaviour' because of the side-effect and there would be still be the buffer underflow.
Admin
Admin
I went and looked up strlen() after reading that. [i]Darn it. I noticed in their example that they perform a cast on the returned size_t to get around the issue of it not necessarily returning an int/uint32/uint64/whatever.
But, while I agree my rewrite is not as graceful, other than the strlen issue, fixed as such:
how is it wrong?
Admin
For the defenders and nay sayers, the following is wrong in this piece of code:
This is not "the programmer took the requirements bladibla" or "elegant". This. is. crap.
Admin
Recursion is great. Never use it. Recursion is the Cthulhu of code. Do not wake Cthulhu lest you go mad.