Comment On Recursive Whitespace Removal

C is a double edged sword. On one hand, it's simple and powerful enough that, given enough effort, you can accomplish just about anything you want. However, this power is limited insomuch that you don't have many of the friendly helper-functions that exist in higher level languages, such as string manipulation for example, unless you go ahead and create them yourself. [expand full text]
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Re: Recursive Whitespace Removal

2012-12-19 08:02 • by Black Bart (unregistered)
Someone just learned about recursion, and it became the tool of choice for the next several weeks.

When you have a hammer in your hand, everything becomes a nail.

Re: Recursive Whitespace Removal

2012-12-19 08:08 • by gallier2 (unregistered)
Not cool to write a 0 byte before the buffer if the string has length 0 and there is a ' ' before it. This seems quite low odd case but it can happen if you use a pointer to scan a longer string with trailing blanks, and passing that pointer to that function.

The recursion is of course another wtf.

Re: Recursive Whitespace Removal

2012-12-19 08:09 • by eVil (unregistered)
Presumably...

unsigned int strlen(char* str)
{
if (str[0] == '\0')
{
return 0;
}
return 1 + strlen(str + 1);
}

Re: Recursive Whitespace Removal

2012-12-19 08:09 • by Your Name * (unregistered)
So what?

Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

IMHO this is far from TDWTF material.

Re: Recursive Whitespace Removal

2012-12-19 08:13 • by Not That Paul (unregistered)
C: All the power of Assembly with most of the flexibility.

Re: Recursive Whitespace Removal

2012-12-19 08:17 • by gallier2 (unregistered)
397522 in reply to 397519
It is because of the (potential) buffer underflow, arithmetic overflow (using int instead of size_t can be fatal on 64bit machines), and one should not rely on compiler optimizations to excuse poor algorithmic skills.
C compilers have a very hard time to optimize some stuff that is trivial in other languages, especially when using pointers (the aliasing issue is a bitch).

Re: Recursive Whitespace Removal

2012-12-19 08:19 • by Tero T. (unregistered)
Maybe the coder had some background in Prolog programming.

Re: Recursive Whitespace Removal

2012-12-19 08:21 • by Sean (unregistered)
This is C; it's going to run pretty darn fast. And no matter how you implement it (or let someone else implement it) string processing by its very nature involves looping over characters until you do or don't find something. So those who say "OMFG what if there's a lot of spaces" are in the same category as those who say "moar XML!" -- they just don't understand what's actually happening inside the machine.

So this boils down to which costs more: the CPU time to execute an allegedly unnecessary stack push and pop, or the developer time to take a solved problem and write a "more elegant" implementation?

Re: Recursive Whitespace Removal

2012-12-19 08:22 • by Lisa (unregistered)
Doesn't pretty much all of lisp work this way?

Re: Recursive Whitespace Removal

2012-12-19 08:22 • by Thomas Kyte (unregistered)
397526 in reply to 397519


[tkyte@localhost ~]$ !cat
cat test.c
#include "string.h"
#include "stdio.h"

char *trim_right(char *str)
{
int len = strlen(str) - 1;

printf( "len = %d\n", len );
if(len == 0 || str[len] != ' ') return str;
printf( "setting str[%d] to zero\n", strlen(str)-1 );
str[strlen(str)-1] = '\0';
return trim_right(str);
}

int main()
{
char str[] = "";
char str1[1];

str1[0] = ' ';
trim_right(str);
printf( "%d\n", strlen(str) );

return 0;
}
[tkyte@localhost ~]$ ./test
len = -1
setting str[-1] to zero
len = -1
0
[tkyte@localhost ~]$



buffer underwrite.... it doesn't work as intended.

Re: Recursive Whitespace Removal

2012-12-19 08:23 • by AdamK (unregistered)
Looks like it was coded by someone that had a lot of experience with Prolog or Lisp

Re: Recursive Whitespace Removal

2012-12-19 08:25 • by Smug Unix User (unregistered)
The real WTF is not searching for a library to handle this for you. Look at what other people have written and do a nice code review and drop in your borrowed solution.

Re: Recursive Whitespace Removal

2012-12-19 08:28 • by Garrison Fiord (unregistered)
Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.

Re: Recursive Whitespace Removal

2012-12-19 08:31 • by eak (unregistered)
the tail recursion is arguably correct. It's the double strlen call and the failure on empty string that make this code bad.

Re: Recursive Whitespace Removal

2012-12-19 08:34 • by lunaryorn (unregistered)
397531 in reply to 397522
gallier2:
[…] one should not rely on compiler optimizations to excuse poor algorithmic skills.

Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.

Re: Recursive Whitespace Removal

2012-12-19 08:36 • by F (unregistered)
397533 in reply to 397519
Your Name *:
So what?

Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

IMHO this is far from TDWTF material.


But it isn't working. Did you not read the earlier comment by gallier2?

If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.

Re: Recursive Whitespace Removal

2012-12-19 08:38 • by F (unregistered)
397534 in reply to 397531
lunaryorn:
gallier2:
[…] one should not rely on compiler optimizations to excuse poor algorithmic skills.

Try and explain to a Haskell programmer that tail recursion is a “poor algorithmic skill”… be prepared to be laughed at.


Try checking the algorithm for bugs before claiming that "poor algorithmic skill" implies a criticism of tail recursion. Be prepared to be laughed at.

Re: Recursive Whitespace Removal

2012-12-19 08:38 • by gallier2 (unregistered)
char *trim_right(char *str)

{
size_t l = strlen(str);
while(l && str[--l] == ' ')
str[l] = 0;
return str;
}

Wasn't that difficult, was it?

Re: Recursive Whitespace Removal

2012-12-19 08:40 • by chernobyl (unregistered)
Well, how much whitespace characters do you expect? One? Two? A billion? The programmer has probably taken into account the kind of data fed into the function and has decided that this approach would work. The function is a bit unusual and sloppy but I would not call it a WTF.

Re: Recursive Whitespace Removal

2012-12-19 08:41 • by gallier2 (unregistered)
397537 in reply to 397531
The poor skill refers to even using recursion for a simple string manipulation. Recursion is an excellent tool for some problems, strings is normally not one of them.

Re: Recursive Whitespace Removal

2012-12-19 08:41 • by DrStat (unregistered)
397538 in reply to 397519
Your Name *:
So what?

Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

IMHO this is far from TDWTF material.


Are you the same guy who thinks that it is pointless to spend time on an algorithm when just can buy a faster CPU?

Re: Recursive Whitespace Removal

2012-12-19 08:42 • by TopTension (unregistered)
397539 in reply to 397524
Sean:
the CPU time to execute an allegedly unnecessary stack push and pop


You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.

Not to mention a potential stack overflow. And wthat tohers have said...

Re: Recursive Whitespace Removal

2012-12-19 08:48 • by Meep (unregistered)
"However, this power is limited insomuch that you don't have many of the friendly helper-functions that exist in higher level languages, such as string manipulation for example, unless you go ahead and make them yourself."

Or use one of the gigabytes of libraries that have been written for C, the single most successful programming language of all time.

Re: Recursive Whitespace Removal

2012-12-19 08:49 • by Honnza (unregistered)
How come no one ever so far was horrified by the quadratic time complexity? Sure, there's a memory cache. Still, it's not easy to get 1G instructions under 1G clock cycles.

captcha - El grasso es verto. El noche es negro.

Re: Recursive Whitespace Removal

2012-12-19 08:49 • by John (unregistered)
It reminds me of the old contest to implement the slowest possible sort algorithm. That produced some rather mind-bogglingly slow sort routines. Perhaps the author(s) of this knew about that contest, and decided to try the same idea for string trimming. Calling strlen(), which does a scan of the string, twice during each recursive iteration is a clever idea for such a contest. This routine could be made even slower, however, by not saving the length, but calling strlen(str) for each use of len. That would add a third call, and adding a test for strlen(str) != -1 (rather than changing the ==0 test to <=0) would add another loop.

Re: Recursive Whitespace Removal

2012-12-19 08:49 • by biziclop
Apart from the unnecessary recursion and the serious buffer underrun issue, it also fails completely if whitespace takes any other form than 0x20 (e.g. \t, \n, \r).

Re: Recursive Whitespace Removal

2012-12-19 08:51 • by Snowrat (unregistered)
Could've been written a bit better in APL, like
trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵}
or
trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}

Re: Recursive Whitespace Removal

2012-12-19 08:54 • by Meep (unregistered)
397545 in reply to 397525
Lisa:
Doesn't pretty much all of lisp work this way?


LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.

C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.

Re: Recursive Whitespace Removal

2012-12-19 08:58 • by Steve The Cynic
397546 in reply to 397533
F:
Your Name *:
So what?

Sure, could be more robust, but it is working as intended. Any decent modern C compiler will eliminate the tail recursion anyway.

IMHO this is far from TDWTF material.


But it isn't working. Did you not read the earlier comment by gallier2?

If passed a zero-length string that happens to be preceded in memory by a byte equal to blank, it will overwrite that byte with null. This is a potentially serious and hard-to-trace bug.

The bug is actually worse than that. Any call where the string does not contain non-blanks (e.g. the empty string, or any number of blanks), and the byte immediately before the string is a blank, will destroy that blank. If the byte immediately before the string does not exist (not mapped, etc.) it will crash when checking to see if the character is blank. If that byte is marked read-only, the write will crash. If the system is a DeathStation 9000, several million people will die in a nuclear fireball.

And, of course, it is entirely feasible (as suggested by another poster above) that int is not large enough to hold all possible values of the return value of strlen. The other case is a 16-bit real-mode x86 system using the commonly available "huge" memory model, where data pointers were normalised to always have an offset in the 0-15 range. size_t wasn't normally (at the time) bigger than 16 bits, but you could imagine some nutjob at Borland or Microsoft deciding it should be unsigned long, which causes problems here if you pass exactly 64K of string plus one '\000'.

Addendum (2012-12-19 09:15):
Actually, I suck at reading code. If the string contains exactly one character (plus the NUL), that character will be left alone and the function will return more or less harmlessly. So a single blank will not be trimmed, and all strings containing only two or more blanks will be trimmed to one blank.

The nuclear fireball remains the probable response when the string is empty.

Re: Recursive Whitespace Removal

2012-12-19 08:59 • by RobFreundlich
397547 in reply to 397544
Snowrat:
Could've been written a bit better in APL, like
trim_right←{' '=¯1↑⍵: ∇¯1↓⍵⋄⍵}
or
trim_right←{⌽{(∨\' '≠⍵)/⍵}⌽⍵}

Gesundheit

Re: Recursive Whitespace Removal

2012-12-19 09:00 • by Techpaul (unregistered)
397548 in reply to 397535
gallier2:
char *trim_right(char *str)

{
size_t l = strlen(str);
while(l && str[--l] == ' ')
str[l] = 0;
return str;
}

Wasn't that difficult, was it?

Well I would love to be able to guarantee that

while(l && str[--l] == ' ')

Would always work as order of evaluation is not guaranteed, do decrement inside loop.

Re: Recursive Whitespace Removal

2012-12-19 09:01 • by RakerF1 (unregistered)
397549 in reply to 397535
Very simple and elegant. I like it.

CAPTCHA: jumentum - don't jumentum it.

Re: Recursive Whitespace Removal

2012-12-19 09:02 • by Rnd( (unregistered)
Does it make me bad coder if I prefer iterative loops or just loops for this sort of stuff?

Also, if there is dynamic memory allocation I surely hope the lengths are fixed or originals are stored somewhere...

Re: Recursive Whitespace Removal

2012-12-19 09:04 • by Mcoder
397551 in reply to 397519
Tail optimization won't remove the strlen at the first line that runs through all the (remaining) string for each character on it.

Anyway, one normaly doesn't trim large strings. It is quit possible that his code makes no difference at all.

Re: Recursive Whitespace Removal

2012-12-19 09:05 • by A developer (unregistered)
This is the sort of academic BS I had to do in university programming courses.
For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.

Re: Recursive Whitespace Removal

2012-12-19 09:06 • by Aufgehaben (unregistered)
397553 in reply to 397539
TopTension:


Sean:

the CPU time to execute an allegedly unnecessary stack push and pop


You forgot the CPU time to iterate through the string on every recursion to count the characters in strlen(), while the calling stack instance of the function knows exactly, that it is one less than what strlen() hes returned previously.

Not to mention a potential stack overflow. And wthat tohers have said...



char *trim_right_r(char *str,int len)
{

if(len == 0 || str[len] != ' ') return str;
str[len] = '\0';
return trim_right_r(str,len-1);
}

char *trim_right(char *str)
{
int len = strlen(str);
return trim_right_r(str,len-1);
}

FTFY

Re: Recursive Whitespace Removal

2012-12-19 09:08 • by Mordred (unregistered)
397554 in reply to 397548
C does guarantee order of evaluation.

Re: Recursive Whitespace Removal

2012-12-19 09:10 • by Steve The Cynic
397555 in reply to 397545
Meep:
Lisa:
Doesn't pretty much all of lisp work this way?


LISP let's you do it however you like, but ML derived languages do work this way. They usually have a dynamic stack precisely to handle functions that do a lot of recursion.

C, OTOH, often has a fixed stack, and function calls are comparably more expensive as it has to save the entire state of the CPU.

Only the *relevant* state, which might not be much at all. An x86 system running this code might have to save ESI, EDI, and EBP, but can freely discard or corrupt EBX, ECX, and EDX, while EAX would be the return value, so not saved at all. Since there is no use of the FPU, the FPU state can easily be ignored.

And C does not have to have a contiguous stack for parameter passing and local variables. Mandating such a thing would make writing C compilers for System/370 difficult, since these machines don't even have a direct concept of a stack. And even on an architecture with a machine stack, it isn't mandatory to put the activation records on the machine stack. Most systems do it that way because it's faster than allocating from the free store / heap for it, but it isn't required behaviour. (However, you can make a case that we shouldn't store activation records on the machine stack, in order to prevent buffer overflows from corrupting return addresses and allowing malware delivery.)

I talk too much sometimes.

Re: Recursive Whitespace Removal

2012-12-19 09:17 • by QJo (unregistered)
397556 in reply to 397529
Garrison Fiord:
Even though I was taught recursion in almost every programming class I took, I've found that it's almost always a bad idea. In fact, like the do-while loop, I think I've only found 1 time in my career that recursion was more elegant than a simple loop.


I've used it once or twice when doing graph-theoretical work (network analysis, etc.) It can be a convenient technique for finding a top-level ancestor not, IIRC ... writing it in the form of a loop makes for an unwieldy codebase, whereas recursion makes for simple and easily-comprehended algorithm. The caveat is, of course, that you ought only to implement it on structures which are not more than four or five levels deep - and in the problem domain for which this solution was designed, that condition held.

Re: Recursive Whitespace Removal

2012-12-19 09:18 • by Steve The Cynic
397557 in reply to 397548
Techpaul:
Well I would love to be able to guarantee that

while(l && str[--l] == ' ')

Would always work as order of evaluation is not guaranteed, do decrement inside loop.

There's a sequence point at the &&, so the left-hand side will be completely evaluated before anything happens on the right, giving this expression defined behaviour.

Re: Recursive Whitespace Removal

2012-12-19 09:19 • by Xarthaneon the Unclear (unregistered)
[quote]
char *trim_right(char *str)

{
int len = strlen(str) - 1;
if(len == 0 || str[len] != ' ') return str;
str[strlen(str)-1] = '\0';
return trim_right(str);
}


Ouch. Evil function. I'm not sure what part is more evil - the part where it's actually recursive (seems a bit unnecessary to me), the bit where it's intended to be replacing ' ' with an explicit NUL '\0'...or the part where it's actually just making the final character of the given string a NUL '\0'. Hope that's not called too often...

Personally, I'd just do something like this:

char* trim_right(char* str)

{
for(int i = strlen(str) - 1; i >= 0; i--)
if(str[i] == ' ')
str[i] = '\0';

return str;
}


I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

CAPTCHA: causa - the original might causa problem!

Re: Recursive Whitespace Removal

2012-12-19 09:20 • by sibtrag
397559 in reply to 397535
If we're micro-optimizing, shouldn't we refrain from writing an end-of-string null character until we've found the first non-blank? The CPU could give you lots of flushes if it lets the loads (to check for blanks) run ahead of the stores (which depend on the values of the loaded bytes) and it doesn't track load/store collisions on a byte basis (perhaps word or cache line). Not to mention the MP effects of all that store traffic hitting the write gathering.

Of course, if we were expecing significant white space, we'd look at the first few bytes to get to word-alignment and then start looking for non-zero words. For even more speed, the SIMD engines on modern processors can even tell us which byte was non-zero while looking at 16 bytes at a time.

Re: Recursive Whitespace Removal

2012-12-19 09:22 • by ¯\(°_o)/¯ I DUNNO LOL (unregistered)
struct {
  int important;
  char stuff[10];
} foo;
foo.important = 0x12345678;
...
trim_right(foo.stuff);

Re: Recursive Whitespace Removal

2012-12-19 09:26 • by Steve The Cynic
397561 in reply to 397558
Xarthaneon the Unclear:
Personally, I'd just do something like this:

char* trim_right(char* str)

{
for(int i = strlen(str) - 1; i >= 0; i--)
if(str[i] == ' ')
str[i] = '\0';

return str;
}


I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

CAPTCHA: causa - the original might causa problem!

It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.

Re: Recursive Whitespace Removal

2012-12-19 09:27 • by ¯\(°_o)/¯ I DUNNO LOL (unregistered)
397562 in reply to 397560
¯\(°_o)/¯ I DUNNO LOL:
struct {
  int important;
  char stuff[10];
} foo;
foo.important = 0x20202020;
...
trim_right(foo.stuff);
FTFM

Re: Recursive Whitespace Removal

2012-12-19 09:31 • by gallier2 (unregistered)
397563 in reply to 397548


while(l && str[--l] == ' ')


Would always work as order of evaluation is not guaranteed, do decrement inside loop.


Order of evaluation is guaranteed, && || are so called sequence points, they force the order of evaluation, or else the short-circuit evaluation wouldn't work.

If I had written

while((l != 0) & (str[--l] == ' '))


Then you would be right? There would be 'undefined behaviour' because of the side-effect and there would be still be the buffer underflow.

Re: Recursive Whitespace Removal

2012-12-19 09:37 • by ¯\(°_o)/¯ I DUNNO LOL (unregistered)
397564 in reply to 397516
Black Bart:
Someone just learned about recursion, and it became the tool of choice for the next several weeks.

When you have a hammer in your hand, everything becomes a nail.
Once there was a programmer who had a problem. He decided to use recursion. Then he had a programmer who had a problem. He decided to use recursion...

Re: Recursive Whitespace Removal

2012-12-19 09:40 • by Xarthaneon the Unclear (unregistered)
397565 in reply to 397561
Steve The Cynic:
Xarthaneon the Unclear:
Personally, I'd just do something like this:

char* trim_right(char* str)

{
for(int i = strlen(str) - 1; i >= 0; i--)
if(str == ' ')
str[i] = '\0';

return str;
}


I realize it's not as graceful - or concise in terms of lines of code - as the given anti-example, but I can guarantee it will be faster, and those spaces will be replaced with a NUL like the original code intended.

CAPTCHA: causa - the original might causa problem!

It's not as graceful, and it's also wrong. strlen() does not return int, and is not obliged to return something the same size as int (someone cited 64-bit systems with 32-bit ints as an example). And even if strlen returns an int-sized unsigned value, you have a problem if that value is > INT_MAX+1. This happens at just 32769 characters on a 16-bit system...

C++ is often cited as taking your whole leg when it does succeed in shooting you in the foot. C is quite capable of that, too, and lacks many of the elaborate safeties in C++.


I went and looked up strlen() after reading that. [i]Darn it.
I noticed in their example that they perform a cast on the returned size_t to get around the issue of it not necessarily returning an int/uint32/uint64/whatever.

But, while I agree my rewrite is not as graceful, other than the strlen issue, fixed as such:

char* trim_right(char* str)

{
for(int i = ((int)strlen(str)) - 1; i >= 0; i--)
if(str[i] == ' ')
str[i] = '\0';

return str;
}


how is it wrong?

Re: Recursive Whitespace Removal

2012-12-19 09:43 • by TGV
For the defenders and nay sayers, the following is wrong in this piece of code:
- the repeated call to strlen makes it O(n^2)
- it can write to the byte before the start of the string, which might happen e.g. if that's where malloc keeps its block length and the block happens to be 32 bytes
The recursion is a very mild WTF in comparison to the above issues, and one that any decent compiler can optimize away.

This is not "the programmer took the requirements bladibla" or "elegant". This. is. crap.

Re: Recursive Whitespace Removal

2012-12-19 09:44 • by Delve
397567 in reply to 397552
A developer:
This is the sort of academic BS I had to do in university programming courses.
For some reason professors think recursion is the cats meow when in reality it's the dog's breakfast.

Recursion can do many things better than other methods. This is not one of them. There are a number of possible reasons this code exists, but none of them excuse it.

Recursion is great. Never use it. Recursion is the Cthulhu of code. Do not wake Cthulhu lest you go mad.
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Add Comment