- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
If you want a really robust, O-O style implementation of a string, use a linked-list of Characters. That way strings could be traversed equally well in either the forward (L2R) or reverser (R2L) directions.
Actually, this would be more of a C++ implementation than C, but humor me....
The String class would include a length attribute and pointer to the first and last Character (head and tail) of the String. It would also have a boolean to indicate if the string is L2R or R2L.
Each character in the string would be represented by a subclass of Character. Character would have the next and prev pointers. It would also have to a code-page attribute, and a boolean to indicate if it is holding a valid value in that code-page. Of course, Character would be an abstract base class with all virtual methods.
Character would have several subclasses to represent ASCII characters, 16-bit unicode, and 32-bit unicode characters.
Additional attributes on String and/or Character may be necessary to facilitate various string manipulation functions (substring, indexof, charAt, isUpper, isLower, etc...).
;-)
Admin
I feel like an idiot--I actually had to stare at this for a while. I was thinking "What's the problem? key_p[key_len] is the null terminator."
It was only after thinking about it that I realized that this is useless unless somehow faced with a broken strlen.
Reminds me of some code my partner in a CS class once wrote:
Admin
Yeah... If you don't mind your string taking up several times the amount of space a simple character array would.
Now you're talking about two pointers plus a character for each list element (assuming a doubly-linked list). So in c you'd be talking about 40 bits per character, which is 5X larger than just using a character array. Plus now accessing each successive character requires dereferencing a new pointer which may be to a completely different memory location (I sure hope it hasn't been paged to disk). Plus your "string" is now on the heap and a simple pointer increment to get to the next character is now impossible (and dangerous, I might add).
I'll pass.
Very enterprisey, though.
Admin
That makes doggie's brain hurt.
Admin
BTW, "if (key_p[key_len] != 0x00)" could be written as "if (key_p[key_len])" isn't it ? :)
Admin
That's exactly the point. strlen GIVES you the location of the next null in memory regardless as to wether it exceeds the bounardy of that particular array. Thus:
if (key_p[key_len] != 0x00)
Can NEVER be true.
That's the WTF; that the coder didn't know wtf strlen() does.
Admin
x < 1
Huh?
Admin
That was the first thing that came to my mind. Multi-threaded programming and safe concurrency can warp the minds of even excellent software developers, and run-of-the-mill WTF-writing coders could easily end up resorting to voodoo like this to fix a heisenbug.
captcha: bathe. What I want to do to my eyes after seeing code on Daily WTF.
Admin
No, it wouldn't. It would return the length of the buffer, not the length of the string.
Admin
FYI... My post was completely tounge-in-cheek.
I was hoping I wouldn't have to explain that....
Admin
FYI... My post was completely tounge-in-cheek.
I was hoping I wouldn't have to explain that....
Admin
Tongue-in-cheek?!?! I just spent nearly 2 hours starting my implementation of the character linked list! Awww man... another afternoon wasted...
Admin
If C programmers would quit trying to write 70's style K&R code and actually write code with variable names and typing that made sense there would be far less idiotic errors like this.
char name[] = "bonehead"; char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE the little p in front of the var name indicates to the maintenance programmer in bangalore that this is a pointer */
char nameAgain[16] = "bonehead";
size_t len = strlen(name); int size = sizeof(name); printf("len: %d size:%d\n", len, size);
len = strlen(pName); size = sizeof(pName); printf("len: %d size:%d\n", len, size);
len = strlen(nameAgain); size = sizeof(nameAgain); printf("len: %d size:%d\n", len, size);
spews forth:
len: 8 size:9 len: 8 size:4 len: 8 size:16
Just likes sharp knives, matches and loaded handguns, C is a powerful and valuable tool that is dangerous in the hands of children.
capcha: cognac, XO is pretty nice...
Admin
If C programmers would quit trying to write 70's style K&R code and actually write code with variable names and typing that made sense there would be far less idiotic errors like this.
char name[] = "bonehead"; char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE the little p in front of the var name indicates to the maintenance programmer in bangalore that this is a pointer */
char nameAgain[16] = "bonehead";
size_t len = strlen(name); int size = sizeof(name); printf("len: %d size:%d\n", len, size);
len = strlen(pName); size = sizeof(pName); printf("len: %d size:%d\n", len, size);
len = strlen(nameAgain); size = sizeof(nameAgain); printf("len: %d size:%d\n", len, size);
spews forth:
len: 8 size:9 len: 8 size:4 len: 8 size:16
Just likes sharp knives, matches and loaded handguns, C is a powerful and valuable tool that is dangerous in the hands of children.
capcha: cognac, XO is pretty nice...
Admin
i've actually written debug code like that. It was to trace down a memory corruption in a heavily multithreaded application where it was not possible to use a debugger with watch points. Arguably that was a very special situation.
Admin
I always liked the VAX version of string descriptors. The string descriptor would contain, among other information, the length of the string (16 bits) and the memory address where the string is stored.
When I started programming in C, it took me a while to get use to the idea of null terminated strings.
Admin
Actually this is a prime example of the name "worse than failure." And you are falling into the same trap.
The code is "WORSE than failure" because the person writing this, is obviously not learning anything. I don't want to see what other mess this person can write. If this person doesn't know the strlen will return the len up to the next 0, what else doesn't he know?
Yeah, this might not be a What The Fuck... but its definitely Worse Than Failure.
On the other hand, what is the point of printing out the len, and then in debug only printing it out again?
Admin
Eh? Back in my day we used only 7 bits!
Admin
Admin
I like mine better:
or, for extra shortness: replace all calls to printStr with the following:
or, for extra primeness: replace all calls to printStr with the following:
Admin
I like to use C-style unts. C-unts for short... duck
Admin
Admin
sizeof() is dangerous to use even if you think you know the size of the array. For instance:
wchar_t myarray[16]; int size_of_array = sizeof(myarray);
What is the value of size_of_array in this example?
(this causes us problems all the time when people pick up bad habits from char arrays and try to apply them to wide character arrays)
Admin
At the same time calculating the length of a null-terminated string takes O(n) time while "calculating" the length of a counted string takes O(1) time. This means that other things, like concatenating strings also is faster with counted strings.
I suspect that a length calculation is more common than passing a substring around. And if it isn't, you can still get the best of both worlds with a string something like the following:
STRING::beginning (using C++ syntax) is the equivalent of your current character pointer. Length is the size of the string. Buffer is a pointer to the beginning of the buffer the string is stored in, and max_length is the size of said buffer.
If you are willing to forgo the invariant that the buffer pointer points to the beginning of the buffer (and thus can be passed to free() if it was dynamically allocated), then you can drop that field and you're left with something that gives you almost the property you name above. p+1 becomes syntactically more complex because you have to construct a new string for which buffer is one more and the length fields are one less, but it gives you about the same runtime complexity while still keeping the calculation of strlen at a constant time.
(For the record, this three-field version of strings is what Windows uses internally; I'm now convinced it's the "correct" model when working in C.)
Actually, I just wrote a post trashing C string handling yesterday on another message board. In addition to the above strlen and consequent efficiency statements, I also mentioned that even the "safe" versions of the str* functions are not really; strncat and snprintf for instance aren't guaranteed to null-terminate your strings, so you always have to remember to do it yourself. Finally, the type is insufficient; char* doesn't tell you if it's a single character or a string. This is really a problem with the fact that C doesn't actually have arrays rather than strings themselves though.
Admin
Unfortunately, the code didn't build balanced trees and the I/O library managed to produce the most degenerate case of unbalanced tree possible; effectively a list built backwards, but with extra pointers. So iterating through the string that you'd just read required going through the whole (massive) deep tree for each character, shaving one element off the end each time.
To cap it all, this was on a 64-bit system; each character required a total of 32 bytes to store (plus malloc() overhead I suspect) so that there could be a pointer to the left, right and up the tree (the character itself also had to be padded to 8 bytes, but that's not actually surprising). When the files you're dealing with are in the couple-of-megabytes range, that level of blow-up is unwelcome, especially around 10 years ago (when machines were smaller and memories less capacious).
The result was that a simple file parser took hours to process this one poor file, and was sufficiently Worse Than Failure as to merit mentioning here. We trashed that code in favour of a simple C program, and got parsing back to the "much less than a second" range (short enough that we didn't bother to measure). Ever since then, I've never encountered any language that handled strings so disastrously poorly; if you must build trees for strings, at least make 'em balanced!
Admin
Maybe I'm being dense (wouldn't be the first time), but this looks like exactly when you would use sizeof. The code monkey cannot be certain whether a wchar_t is 1 char (unlikely) or 2 (usual) or something else (damn unlikely). So the only way you're going to get the proper memory size of the array is using sizeof.
What'll really cause confusion is when sizeof (struct my_random_t) doesn't return a value padded for alignment. Then you get things like sizeof an array of four not being exactly four times the sizeof a single struct. I don't know whether thats a compliance issue or not.
Admin
But then, what do I know? I'm just a developer and maintainer of it...
Admin
No no no... char is the data type. * is a modifier on the variable. Doing it that way leads to errors like people writing "char* ptr1, ptr2;" and expecting two pointers.
The little p in front idea works, but, you're heading into Hungarian Notation area, which doesn't hold up well over time. Just use names that make sense.
Admin
Languages allow us to do stupid things. For example, English allows us to mention bombs, guns and knives in the security line at the airport, but it's never a good idea.
Similarly, C allows us to declare multiple variable on a single line, but that also is not a good idea.
captcha: pointer. How strangely appropriate.
Admin
But the point is that rarely do you WANT the proper memory size. Typically you want to do something like this:
Admin
That's the first time I've heard that one. Almost makes sense if you really insist on "char* x". Almost. It's very rarely a good idea to go against what the language intended.
Admin
it is time to share the biggest WTF i ever encountered. i am a computer vision specialist. in this field, we are used to represent images as a long string of pixels together with some attributes describing the structure of the image (pixel size, image dimensions, borders and padding, ...)
i once worked with a guy, which was new to this field. he told me about an idea he had to simplify the image processing: using a linked list. at first, i thought he was going to use a linked list for storing processing "steps" allowing to modify the processing on the fly.
some weeks later, i had the opportunity to peek at its code. and yes, he did use linked list... to store image data ! here is grossly how it looked like:
wondering was i was looking at the code ? his coworkers were complaining about its processing being "a bit slow" (understand 30 seconds to process a 2000x2000 image) and "not having enough memory" despite a full 1GB of RAM (normal when your 4MB of image data transforms into 80MB of fragmented mess).
now for the extra WTF:
it is shortly after that i started having nausea each time i saw bad code, something which was not unusual at this time. it is also shortly after that i switched for another ocmpany.
(so, RTL, you thought you were funny ? you just awoke my worst nightmare, you c... !)
Admin
Personally, I got suspicious about this WTF when it was code found by somebody named "Sergey". Can I conclude he's a pinko?
capthcha: cognac - pinkos love to drink it.
Admin
How about fixed-length arrays of bits, say eight in a bunch?
Admin
Only a C programmer would be silly enough to put a previous and next pointer into a Character class - which obviously has nothing to do with a Character abstraction. Then try to make a point about it ;-):-p
Admin
Admin
I have been bitten many more times by sizeof() than any string.h function. It uses the symbol-table at compile-time.
The sizeof() operator accepts either a type, and typedef, or a variable-name. There are lots of pitfalls. Look at these gems, based on the variable declarations!
char buff[80]; char *str;
struct num { int a; double b; }; struct num values; typedef struct num *my_num_t;
The sizeof() operator is a necessarly evil part of the malloc() system C uses. It does nothing to make strings, or anything else, easier to work.
Admin
No, the '*' is NOT part of the type. Neither is the array subscript, []. It possible, common, & good style to put pointers and scalars on the same declaration line.
Suppse three char-related variables go together. Why not put them on the same line?
char name[9] = "bonehead", *pName = name, ch;
ch = name[5]; ch = *(pName + 5);
Admin
"char mystring[16] = "hello world"; Will allocate on the stack, so will never segfault, until the function returns anyway."
Not always true- mystring could be a global.
Admin
I think you mis-spelled 'BRILLANT'
;)
Admin
The extra check wouldn't help because the other thread could modify the string immediately AFTER the extra check! Or even during it.
Admin
It could cause a stack overflow. Note that char const *mystring = "hello world";
uses less stack (string literals are stored in static memory).
Admin
There is no such paradigm. This is a misconception amongst people who did not bother to learn C from a good book.
Admin
snprintf is guaranteed to terminate the string unless you pass 0 as the buffer length, or if you pass erroneous parameters and fail to check the return value.
You might be thinking of strncpy, which does not terminate the string in some circumstances (the function is poorly named as well as poorly designed).
ISO/IEC 9899:1899 7.21.3.2/2 The strncat function appends not more than n characters (a null character and characters that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1. A terminating null character is always appended to the result.
7.19.6.5/2 The snprintf function is equivalent to fprintf, except that the output is written into an array specified by argument s rather than to a stream. If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array.
Admin
#define dimof(Array) ( sizeof(Array) / sizeof *(Array) )
The compiler is broken if that is the case.Admin
Admin
sad really, when self-glorified programmers attack their tools when all that's wrong is that their brain can not handle complex ideas.
Admin
Further, it does not use any "symbol-table" at any time, let alone compile-time. sizeof can be a "run-time" operation, eg:
void func(int n) { char array[n]; printf("%u\n", (unsigned int)sizeof array); }
(Note. variable-length arrays were not added to C until 1999 so if your compiler has multiple conformance modes, you may need to select that one, eg. gcc -std=c99).
1. sizeof tells you the size in bytes you have allocated. It isn't a "pitfall" that it doesn't tell you some different fact! 2. Huh? Are you claiming that sizeof(void *) is 80? 3. my_num_t is a pointer type. You asked for its size, and you got the size of a pointer. Why would you expect anything else? If there is a pitfall here, it is that using pointer typedefs leads to obfuscated code. There's never any good reason to use pointer typedefs and it's puzzling that so many people do. 4. The problem is that the '->' operator needs a pointer on the left. Nothing to do with sizeof. 5. sizeof values.a (no brackets required) is correct code and tells you the size taken up by the member 'a' of the object 'values', in this case it will be the same as sizeof(int). Ridiculous comment. sizeof has nothing to do with the malloc system. It allows you to know how big statically-allocated objects are, in bytes. Most of us find that useful. For example:char buf[20]; snprintf(buf, sizeof buf, "Hello, world");
memset( &obj, 0, sizeof obj );
memcmp( &obj1, &obj2, sizeof obj1 );
ptr1 = malloc( sizeof *ptr1 ); ptr2 = malloc( 10 * sizeof *ptr2 );
The latter forms are especially useful for avoiding allocation size errors.
Admin
I never understood this argument. Big whoop. So the compiler yells at you, you go "oops" and put in the extra *.
I personally usually like to declare variables on separate lines anyway.
Admin
Interesting, I thought that they weren't. I guess that demonstrates why I should verify things before asserting them.
I wonder why I got that misconception.
(BTW, I don't agree strncpy is poorly named. strcpy seems reasonable, at least by C's str+TLA scheme, and strncpy doesn't seem to do anything odd besides not be guaranteed to null-terminate (and padding the result to n characters), so I think it's an okay name.)
Addendum (2007-04-12 22:41): Actually, I think the misconception about snprintf at least may come from VC++ apparently not providing snprintf, only MS's own _snprintf. (Don't ask me why they did this, I have no clue.) That does not null-terminate if the size is smaller than the string.