The Daily WTF: Curious Perversions in Information Technology

2007-04-12 Reply Admin

If you want a really robust, O-O style implementation of a string, use a linked-list of Characters. That way strings could be traversed equally well in either the forward (L2R) or reverser (R2L) directions.

Actually, this would be more of a C++ implementation than C, but humor me....

The String class would include a length attribute and pointer to the first and last Character (head and tail) of the String. It would also have a boolean to indicate if the string is L2R or R2L.

Each character in the string would be represented by a subclass of Character. Character would have the next and prev pointers. It would also have to a code-page attribute, and a boolean to indicate if it is holding a valid value in that code-page. Of course, Character would be an abstract base class with all virtual methods.

Character would have several subclasses to represent ASCII characters, 16-bit unicode, and 32-bit unicode characters.

Additional attributes on String and/or Character may be necessary to facilitate various string manipulation functions (substring, indexof, charAt, isUpper, isLower, etc...).

;-)

2007-04-12 Reply Admin

I feel like an idiot--I actually had to stare at this for a while. I was thinking "What's the problem? key_p[key_len] is the null terminator."

It was only after thinking about it that I realized that this is useless unless somehow faced with a broken strlen.

Reminds me of some code my partner in a CS class once wrote:

	for ( int x =0; x < 1 ; x++)
	{
		if ( x < BUFFER_SIZE) //Overflow protection
		if ( buffer[x] == stop_char)
			{
			keep_receive=false;
			}
		
	// Should we store the termination character?
	if (buffer[x] != '\0')
	if ( iter < BIG_BUFFER_SIZE)//Overflow protection
		{
		storage[iter] = buffer[x];
		iter++;
		}
	else
	{
	#ifdef DEBUG
	cout<<"Error, overflowing buffer."<<endl;
	#endif
	break;
	}
</pre>
That's verbatim, including the formatting.  Don't ask me to try to explain it, because I can't.
captcha: burned

Opie · 2007-04-12 Reply Admin

RTL:
If you want a really robust, O-O style implementation of a string, use a linked-list of Characters.

Yeah... If you don't mind your string taking up several times the amount of space a simple character array would.

Now you're talking about two pointers plus a character for each list element (assuming a doubly-linked list). So in c you'd be talking about 40 bits per character, which is 5X larger than just using a character array. Plus now accessing each successive character requires dereferencing a new pointer which may be to a completely different memory location (I sure hope it hasn't been paged to disk). Plus your "string" is now on the heap and a simple pointer increment to get to the next character is now impossible (and dangerous, I might add).

I'll pass.

Very enterprisey, though.

poochner · 2007-04-12 Reply Admin

That makes doggie's brain hurt.

2007-04-12 Reply Admin

BTW, "if (key_p[key_len] != 0x00)" could be written as "if (key_p[key_len])" isn't it ? :)

2007-04-12 Reply Admin

mav:
This could could be perfectly valid really.
// if for some reason this thing isn't null terminated // then badness. Of course, we just did a strlen on it, // so i don't know that we'll ever go DOWN this code... // but its not really invalid of itself... if (key_p[key_len] != 0x00) { fprintf(stderr, "key termination error %02x\n", key_p[key_len]); close(key_fd); return; } // print the key for more debug fun. if (debug) { printf("len<%d> key<%s>\n", key_len, key_p); }

I'm sorry, but this just isn't all that WTFable. And null termination is a good thing, especially when you consider that it is very portable amongst disparate languages, operating systems, etc...

That's exactly the point. strlen GIVES you the location of the next null in memory regardless as to wether it exceeds the bounardy of that particular array. Thus:

if (key_p[key_len] != 0x00)

Can NEVER be true.

That's the WTF; that the coder didn't know wtf strlen() does.

2007-04-12 Reply Admin

Hyuga:
	for ( int x =0; x < 1 ; x++)
That's verbatim, including the formatting. Don't ask me to try to explain it, because I can't.
captcha: burned

x < 1

Huh?

2007-04-12 Reply Admin

anthony:
Ok, what about if he's worried about concurrent access? Maybe he's just trying to make his code super-robust?
To wit:

...

// RIGHT HERE another thread comes in and modifies the // contents of key_p!

That was the first thing that came to my mind. Multi-threaded programming and safe concurrency can warp the minds of even excellent software developers, and run-of-the-mill WTF-writing coders could easily end up resorting to voodoo like this to fix a heisenbug.

captcha: bathe. What I want to do to my eyes after seeing code on Daily WTF.

2007-04-12 Reply Admin

annoy:
annony:
mav:
I'm ok with C strings.
Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...

sizeof() would not return the character length of a C String. Assuming a "C string" is a char* memory buffer, the variable itself is a pointer.

Psuedo code:

char* mystring; mystring = somethingThatReturnsAstring(); int nLen = sizeof(mystring); // nLen would be 4 on a 32-bit system

Replying to myself.

sizeof() will return a string length if you did:

char mystring[16] = "hello world"; int nLen = sizeof(mystring);

I just don't think it's practical using fixed length buffers like that.

captcha: doom (yes we all are)

No, it wouldn't. It would return the length of the buffer, not the length of the string.

2007-04-12 Reply Admin

FYI... My post was completely tounge-in-cheek.

I was hoping I wouldn't have to explain that....

2007-04-12 Reply Admin

Opie:
RTL:
If you want a really robust, O-O style implementation of a string, use a linked-list of Characters.

Yeah... If you don't mind your string taking up several times the amount of space a simple character array would.

Now you're talking about two pointers plus a character for each list element (assuming a doubly-linked list). So in c you'd be talking about 40 bits per character, which is 5X larger than just using a character array. Plus now accessing each successive character requires dereferencing a new pointer which may be to a completely different memory location (I sure hope it hasn't been paged to disk). Plus your "string" is now on the heap and a simple pointer increment to get to the next character is now impossible (and dangerous, I might add).

I'll pass.

Very enterprisey, though.

FYI... My post was completely tounge-in-cheek.

I was hoping I wouldn't have to explain that....

kanna · 2007-04-12 Reply Admin

RTL:
FYI... My post was completely tounge-in-cheek.
I was hoping I wouldn't have to explain that....

Tongue-in-cheek?!?! I just spent nearly 2 hours starting my implementation of the character linked list! Awww man... another afternoon wasted...

2007-04-12 Reply Admin

If C programmers would quit trying to write 70's style K&R code and actually write code with variable names and typing that made sense there would be far less idiotic errors like this.

char name[] = "bonehead"; char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE the little p in front of the var name indicates to the maintenance programmer in bangalore that this is a pointer */

char nameAgain[16] = "bonehead";

size_t len = strlen(name); int size = sizeof(name); printf("len: %d size:%d\n", len, size);

len = strlen(pName); size = sizeof(pName); printf("len: %d size:%d\n", len, size);

len = strlen(nameAgain); size = sizeof(nameAgain); printf("len: %d size:%d\n", len, size);

spews forth:

len: 8 size:9 len: 8 size:4 len: 8 size:16

Just likes sharp knives, matches and loaded handguns, C is a powerful and valuable tool that is dangerous in the hands of children.

capcha: cognac, XO is pretty nice...

2007-04-12 Reply Admin

If C programmers would quit trying to write 70's style K&R code and actually write code with variable names and typing that made sense there would be far less idiotic errors like this.

char name[] = "bonehead"; char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE the little p in front of the var name indicates to the maintenance programmer in bangalore that this is a pointer */

char nameAgain[16] = "bonehead";

size_t len = strlen(name); int size = sizeof(name); printf("len: %d size:%d\n", len, size);

len = strlen(pName); size = sizeof(pName); printf("len: %d size:%d\n", len, size);

len = strlen(nameAgain); size = sizeof(nameAgain); printf("len: %d size:%d\n", len, size);

spews forth:

len: 8 size:9 len: 8 size:4 len: 8 size:16

Just likes sharp knives, matches and loaded handguns, C is a powerful and valuable tool that is dangerous in the hands of children.

capcha: cognac, XO is pretty nice...

2007-04-12 Reply Admin

i've actually written debug code like that. It was to trace down a memory corruption in a heavily multithreaded application where it was not possible to use a debugger with watch points. Arguably that was a very special situation.

2007-04-12 Reply Admin

I always liked the VAX version of string descriptors. The string descriptor would contain, among other information, the length of the string (16 bits) and the memory address where the string is stored.

When I started programming in C, it took me a while to get use to the idea of null terminated strings.

chrismcb · 2007-04-12 Reply Admin

mav:
Kai:
Mav, what point is a condition that is never true?
A slightly simplified variant would be the following:

char x = 0; if (x != 0) { ... }

I agree with you, its a dumb thing to do, but I've seen worse things... I'm sure I've made mistakes similar to this without knowing it, I imagine that every programmer has. So I hate to call down hellfire on this guy for doing it, especially when you consider the fact that it really doesn't adversely affect anything. It certainly isn't worse than failure. Its merely worse than no failure. So really its a WTNF.

Actually this is a prime example of the name "worse than failure." And you are falling into the same trap.

The code is "WORSE than failure" because the person writing this, is obviously not learning anything. I don't want to see what other mess this person can write. If this person doesn't know the strlen will return the len up to the next 0, what else doesn't he know?

Yeah, this might not be a What The Fuck... but its definitely Worse Than Failure.

On the other hand, what is the point of printing out the len, and then in debug only printing it out again?

chrismcb · 2007-04-12 Reply Admin

dennis:
> Strings suck. We should all use integers.
I agree, and in fact I always use that approach. So, instead of having the string "FOO" in my code, I have a little array of integers: 70, 79, 79.

To save on memory, I make each integer only 8 bits wide.

I also find it's convenient to mark the end of the list of integers by having an array element containing the integer 0.

Eh? Back in my day we used only 7 bits!

2007-04-12 Reply Admin

int
print_by_char(char *s) {
        char *p;
        if(s) for(p=s; *p; p++) print("%c", p);
}

Edowyth · 2007-04-12 Reply Admin

maht:
int print_by_char(char *s) { char *p; if(s) for(p=s; *p; p++) print("%c", p); }

I like mine better:

void printStr(char * s)
{
  print("%s",s)
}

or, for extra shortness: replace all calls to printStr with the following:

print("%s",argument)

or, for extra primeness: replace all calls to printStr with the following:

print("%d",2)

2007-04-12 Reply Admin

I like to use C-style unts. C-unts for short... duck

BrownHornet · 2007-04-12 Reply Admin

Kai:
Mav, what point is a condition that is never true?
A slightly simplified variant would be the following:

char x = 0; if (x != 0) { ... }

I've actually seen code like this at a previous job:

if (someField == someValue && someField != someValue)
{
...
}

2007-04-12 Reply Admin

sizeof() is dangerous to use even if you think you know the size of the array. For instance:

wchar_t myarray[16]; int size_of_array = sizeof(myarray);

What is the value of size_of_array in this example?

(this causes us problems all the time when people pick up bad habits from char arrays and try to apply them to wide character arrays)

EvanED · 2007-04-12 Reply Admin

Stephen Harris:
Null terminated strings make sense when you realise then a string is really just a pointer to memory, and having it null terminated means you don't need additional temporary variables while scanning it (just update the pointer; *(p+1) is a perfectly valid substring of a non-null string *p and is quicker than sub$() or right$() or whatever other function might be required to copy memory blocks around (which a string format with length indicator would require).

At the same time calculating the length of a null-terminated string takes O(n) time while "calculating" the length of a counted string takes O(1) time. This means that other things, like concatenating strings also is faster with counted strings.

I suspect that a length calculation is more common than passing a substring around. And if it isn't, you can still get the best of both worlds with a string something like the following:

struct STRING {
    char* beginning;
    int length;
    char* buffer;
    int max_length;
};

STRING::beginning (using C++ syntax) is the equivalent of your current character pointer. Length is the size of the string. Buffer is a pointer to the beginning of the buffer the string is stored in, and max_length is the size of said buffer.

If you are willing to forgo the invariant that the buffer pointer points to the beginning of the buffer (and thus can be passed to free() if it was dynamically allocated), then you can drop that field and you're left with something that gives you almost the property you name above. p+1 becomes syntactically more complex because you have to construct a new string for which buffer is one more and the length fields are one less, but it gives you about the same runtime complexity while still keeping the calculation of strlen at a constant time.

(For the record, this three-field version of strings is what Windows uses internally; I'm now convinced it's the "correct" model when working in C.)

Actually, I just wrote a post trashing C string handling yesterday on another message board. In addition to the above strlen and consequent efficiency statements, I also mentioned that even the "safe" versions of the str* functions are not really; strncat and snprintf for instance aren't guaranteed to null-terminate your strings, so you always have to remember to do it yourself. Finally, the type is insufficient; char* doesn't tell you if it's a single character or a string. This is really a problem with the fact that C doesn't actually have arrays rather than strings themselves though.

2007-04-12 Reply Admin

RTL:
If you want a really robust, O-O style implementation of a string, use a linked-list of Characters.

I have used (seriously!) a language that implemented strings as a tree of characters. The language designer (who shall remain nameless) apparently thought that like this it would be possible to insert and delete things in the middle of the string really quickly. This was indeed the case.

Unfortunately, the code didn't build balanced trees and the I/O library managed to produce the most degenerate case of unbalanced tree possible; effectively a list built backwards, but with extra pointers. So iterating through the string that you'd just read required going through the whole (massive) deep tree for each character, shaving one element off the end each time.

To cap it all, this was on a 64-bit system; each character required a total of 32 bytes to store (plus malloc() overhead I suspect) so that there could be a pointer to the left, right and up the tree (the character itself also had to be padded to 8 bytes, but that's not actually surprising). When the files you're dealing with are in the couple-of-megabytes range, that level of blow-up is unwelcome, especially around 10 years ago (when machines were smaller and memories less capacious).

The result was that a simple file parser took hours to process this one poor file, and was sufficiently Worse Than Failure as to merit mentioning here. We trashed that code in favour of a simple C program, and got parsing back to the "much less than a second" range (short enough that we didn't bother to measure). Ever since then, I've never encountered any language that handled strings so disastrously poorly; if you must build trees for strings, at least make 'em balanced!

poochner · 2007-04-12 Reply Admin

Steve Bush:
sizeof() is dangerous to use even if you think you know the size of the array. For instance:
wchar_t myarray[16]; int size_of_array = sizeof(myarray);

What is the value of size_of_array in this example?

(this causes us problems all the time when people pick up bad habits from char arrays and try to apply them to wide character arrays)

Maybe I'm being dense (wouldn't be the first time), but this looks like exactly when you would use sizeof. The code monkey cannot be certain whether a wchar_t is 1 char (unlikely) or 2 (usual) or something else (damn unlikely). So the only way you're going to get the proper memory size of the array is using sizeof.

What'll really cause confusion is when sizeof (struct my_random_t) doesn't return a value padded for alignment. Then you get things like sizeof an array of four not being exactly four times the sizeof a single struct. I don't know whether thats a compliance issue or not.

2007-04-12 Reply Admin

rgz:
An apprentice:
Mohammed:
Strings suck. We should all use integers.
Wrong. We should use strings for everything. Integers are just a special case of strings after all.
I have the thing just for you http://www.tcl.tk/

I'll give you a hint: behind the scenes, Tcl's not all strings. It just pretends to be for the unwashed masses. It's real type system is really quite complex and built on the concept of morphable views and logical immutability. It also uses counted strings of several different kinds.

But then, what do I know? I'm just a developer and maintainer of it...

2007-04-12 Reply Admin

android:
If C programmers would quit trying to write 70's style K&R code and actually write code with variable names and typing that made sense there would be far less idiotic errors like this.
char name[] = "bonehead"; char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE the little p in front of the var name indicates to the maintenance programmer in bangalore that this is a pointer */

No no no... char is the data type. * is a modifier on the variable. Doing it that way leads to errors like people writing "char* ptr1, ptr2;" and expecting two pointers.

The little p in front idea works, but, you're heading into Hungarian Notation area, which doesn't hold up well over time. Just use names that make sense.

2007-04-12 Reply Admin

edwdig:
No no no... char is the data type. * is a modifier on the variable. Doing it that way leads to errors like people writing "char* ptr1, ptr2;" and expecting two pointers.

Languages allow us to do stupid things. For example, English allows us to mention bombs, guns and knives in the security line at the airport, but it's never a good idea.

Similarly, C allows us to declare multiple variable on a single line, but that also is not a good idea.

captcha: pointer. How strangely appropriate.

EvanED · 2007-04-12 Reply Admin

poochner:
Steve Bush:
sizeof() is dangerous to use even if you think you know the size of the array. For instance:
wchar_t myarray[16]; int size_of_array = sizeof(myarray);

What is the value of size_of_array in this example?

(this causes us problems all the time when people pick up bad habits from char arrays and try to apply them to wide character arrays)

Maybe I'm being dense (wouldn't be the first time), but this looks like exactly when you would use sizeof. The code monkey cannot be certain whether a wchar_t is 1 char (unlikely) or 2 (usual) or something else (damn unlikely). So the only way you're going to get the proper memory size of the array is using sizeof.

But the point is that rarely do you WANT the proper memory size. Typically you want to do something like this:

int len = strlen(str);
for(int i=0 ; i<len ; ++i)
   do_something(str[i]);</pre>
About the only time you want the actual size of the buffer holding str is if you're doing something like calling write() or read() passing it the buffer, or when mallocing a new buffer to hold a copy of the old.

2007-04-12 Reply Admin

android:
Similarly, C allows us to declare multiple variable on a single line, but that also is not a good idea.

That's the first time I've heard that one. Almost makes sense if you really insist on "char* x". Almost. It's very rarely a good idea to go against what the language intended.

2007-04-12 Reply Admin

RTL:
If you want a really robust, O-O style implementation of a string, use a linked-list of Characters. That way strings could be traversed equally well in either the forward (L2R) or reverser (R2L) directions.
Actually, this would be more of a C++ implementation than C, but humor me....

(snipped a good description...)

it is time to share the biggest WTF i ever encountered. i am a computer vision specialist. in this field, we are used to represent images as a long string of pixels together with some attributes describing the structure of the image (pixel size, image dimensions, borders and padding, ...)

i once worked with a guy, which was new to this field. he told me about an idea he had to simplify the image processing: using a linked list. at first, i thought he was going to use a linked list for storing processing "steps" allowing to modify the processing on the fly.

some weeks later, i had the opportunity to peek at its code. and yes, he did use linked list... to store image data ! here is grossly how it looked like:

class Pixel
{
private:
    int value;
    int x;
    int y;
    Pixel *next;
    Pixel *previous;
public:
    Pixel( int value, int x, int y, Pixel *previous, Pixel *next );
    
    int getValue();
    void setValue( int );
    int getX();
    int getY();
    Pixel * getNext();
    void setNext( Pixel * );
    Pixel * getPrevious();
    Pixel * setPrevious();
}

wondering was i was looking at the code ? his coworkers were complaining about its processing being "a bit slow" (understand 30 seconds to process a 2000x2000 image) and "not having enough memory" despite a full 1GB of RAM (normal when your 4MB of image data transforms into 80MB of fragmented mess).

now for the extra WTF:

since the frame grabber was returning a string of pixels data, he had to convert from the frame grabber representation to its own, looping through every pixels and allocating and initializing a new Pixel object each time.
the camera shooting the images had a 256 grey level sensor, those that only necessitate 8 bits to store one pixels, but somewhere he thought that using an int was more practical (we were using a 32 bit computer)
he was a bit new (as a newbie) to new (as an operator) and delete, so while creating an image was (painful but) easy, destroying one was not automatic, resulting in huge memory leaks requiring the software to be restarted often.

it is shortly after that i started having nausea each time i saw bad code, something which was not unusual at this time. it is also shortly after that i switched for another ocmpany.

(so, RTL, you thought you were funny ? you just awoke my worst nightmare, you c... !)

2007-04-12 Reply Admin

Personally, I got suspicious about this WTF when it was code found by somebody named "Sergey". Can I conclude he's a pinko?

capthcha: cognac - pinkos love to drink it.

2007-04-12 Reply Admin

Piet:
Booleans suck, we should all use arrays of bits

How about fixed-length arrays of bits, say eight in a bunch?

2007-04-12 Reply Admin

RTL:
Each character in the string would be represented by a subclass of Character. Character would have the next and prev pointers.

Only a C programmer would be silly enough to put a previous and next pointer into a Character class - which obviously has nothing to do with a Character abstraction. Then try to make a point about it ;-):-p

2007-04-12 Reply Admin

Edowyth:
I like mine better:
void printStr(char * s)
{
  print("%s",s)
}

That looks handy, but I don't like the name. Since it outputs a string, maybe we could call it 'outputs'? No, that's confusing ... how about 'puts'?

2007-04-12 Reply Admin

mav:
I'm ok with C strings.
Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...

I have been bitten many more times by sizeof() than any string.h function. It uses the symbol-table at compile-time.

The sizeof() operator accepts either a type, and typedef, or a variable-name. There are lots of pitfalls. Look at these gems, based on the variable declarations!

char buff[80]; char *str;

struct num { int a; double b; }; struct num values; typedef struct num *my_num_t;

sizeof(buff) is 80, not max string-len of 79.
sizeof(buff) == sizeof(void *), measures no useful memory block.
sizeof(my_num_t) is a pointer type like #3 not the struct's size.
sizeof(my_num_t->a) doesn't even compile in GCC; see #5.
sizeof(values.a) will define #4 for you,....or maybe not!

The sizeof() operator is a necessarly evil part of the malloc() system C uses. It does nothing to make strings, or anything else, easier to work.

2007-04-12 Reply Admin

android:
If C programmers would quit trying to write 70's style K&R code and actually write code with variable names and typing that made sense there would be far less idiotic errors like this.
char name[] = "bonehead"; char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE the little p in front of the var name indicates to the maintenance programmer in bangalore that this is a pointer */

No, the '*' is NOT part of the type. Neither is the array subscript, []. It possible, common, & good style to put pointers and scalars on the same declaration line.

Suppse three char-related variables go together. Why not put them on the same line?

char name[9] = "bonehead", *pName = name, ch;

ch = name[5]; ch = *(pName + 5);

2007-04-12 Reply Admin

"char mystring[16] = "hello world"; Will allocate on the stack, so will never segfault, until the function returns anyway."

Not always true- mystring could be a global.

2007-04-12 Reply Admin

Kinglink:
annoy:
annony:
mav:
I'm ok with C strings.
Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...

sizeof() would not return the character length of a C String. Assuming a "C string" is a char* memory buffer, the variable itself is a pointer.

Psuedo code:

char* mystring; mystring = somethingThatReturnsAstring(); int nLen = sizeof(mystring); // nLen would be 4 on a 32-bit system

Replying to myself.

sizeof() will return a string length if you did:

char mystring[16] = "hello world"; int nLen = sizeof(mystring);

I just don't think it's practical using fixed length buffers like that.

captcha: doom (yes we all are)

Sure it is very practical. Why use somestring = mystring when you can use strcpy(somestring,mystring); every time.

And then when you want to always have the same data in somestring and mystring then you can just call strcpy(mystring,somestring); every time you make a change. BRILLIANT!

I think you mis-spelled 'BRILLANT'

;)

Old Wolf · 2007-04-12 Reply Admin

anthony:
Ok, what about if he's worried about concurrent access? Maybe he's just trying to make his code super-robust?
// RIGHT HERE another thread comes in and modifies the // contents of key_p! if (key_p[key_len] != 0x00)

In that case, the WTF is that the program is sharing memory without any sort of locking or other safety mechanism.

The extra check wouldn't help because the other thread could modify the string immediately AFTER the extra check! Or even during it.

Old Wolf · 2007-04-12 Reply Admin

Steve:
char mystring[16] = "hello world"; Will allocate on the stack, so will never segfault

It could cause a stack overflow. Note that char const *mystring = "hello world";

uses less stack (string literals are stored in static memory).

Old Wolf · 2007-04-12 Reply Admin

sergio:
It's not C strings that suck, it's "arrays == pointers" paradigm job!

There is no such paradigm. This is a misconception amongst people who did not bother to learn C from a good book.

Old Wolf · 2007-04-12 Reply Admin

EvanED:
I also mentioned that even the "safe" versions of the str* functions are not really; strncat and snprintf for instance aren't guaranteed to null-terminate your strings, so you always have to remember to do it yourself.

strncat is guaranteed to terminate the string.

snprintf is guaranteed to terminate the string unless you pass 0 as the buffer length, or if you pass erroneous parameters and fail to check the return value.

You might be thinking of strncpy, which does not terminate the string in some circumstances (the function is poorly named as well as poorly designed).

ISO/IEC 9899:1899 7.21.3.2/2 The strncat function appends not more than n characters (a null character and characters that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1. The initial character of s2 overwrites the null character at the end of s1. A terminating null character is always appended to the result.

7.19.6.5/2 The snprintf function is equivalent to fprintf, except that the output is written into an array specified by argument s rather than to a stream. If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array.

Old Wolf · 2007-04-12 Reply Admin

poochner:
Steve Bush:
sizeof() is dangerous to use even if you think you know the size of the array. For instance:
wchar_t myarray[16]; int size_of_array = sizeof(myarray);

What is the value of size_of_array in this example?

Maybe I'm being dense (wouldn't be the first time), but this looks like exactly when you would use sizeof. The code monkey cannot be certain whether a wchar_t is 1 char (unlikely) or 2 (usual) or something else (damn unlikely).

sizeof(wchar_t) is 4 on most systems.

So the only way you're going to get the proper memory size of the array is using sizeof.

Well, if you want the number of bytes in the array then you would use sizeof. If you want the number of elements in the array then you would do something else. In fact I use a macro like this:

#define dimof(Array) ( sizeof(Array) / sizeof *(Array) )

What'll really cause confusion is when sizeof (struct my_random_t) doesn't return a value padded for alignment. Then you get things like sizeof an array of four not being exactly four times the sizeof a single struct.

The compiler is broken if that is the case.

Old Wolf · 2007-04-12 Reply Admin

edwdig:
android:
char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE */
No no no... char is the data type. * is a modifier on the variable.

The data type of pName is "char *". Pointer to char.

Doing it that way leads to errors like people writing "char* ptr1, ptr2;" and expecting two pointers.

That's just a syntax choice by the inventors of C; they could equally as well have made the above code declare two pointers and the rest of the language would be the same. (Presumably they didn't because it would lead to *more convoluted* syntax if you are declaring something tricky like multiple pointers to arrays).

2007-04-12 Reply Admin

sad really, when self-glorified programmers attack their tools when all that's wrong is that their brain can not handle complex ideas.

Old Wolf · 2007-04-12 Reply Admin

Andrew:
I have been bitten many more times by sizeof() than any string.h function. It uses the symbol-table at compile-time.

Perhaps you would get bitten less if you would learn what sizeof actually does. It is extremely simple:

sizeof obj tells you how many bytes of storage the object named 'obj' uses.
sizeof(TYPE) tells you how many bytes of storage an object of type TYPE uses.

Further, it does not use any "symbol-table" at any time, let alone compile-time. sizeof can be a "run-time" operation, eg:

void func(int n) { char array[n]; printf("%u\n", (unsigned int)sizeof array); }

(Note. variable-length arrays were not added to C until 1999 so if your compiler has multiple conformance modes, you may need to select that one, eg. gcc -std=c99).

There are lots of pitfalls. Look at these gems, based on the variable declarations!
char buff[80], *str; struct num { int a; double b; } values; typedef struct num *my_num_t;

sizeof(buff) is 80, not max string-len of 79.

sizeof(buff) == sizeof(void *), measures no useful memory block.

sizeof(my_num_t) is a pointer type like #3 not the struct's size.

sizeof(my_num_t->a) doesn't even compile in GCC; see #5.

sizeof(values.a) will define #4 for you,....or maybe not!

1. sizeof tells you the size in bytes you have allocated. It isn't a "pitfall" that it doesn't tell you some different fact! 2. Huh? Are you claiming that sizeof(void *) is 80? 3. my_num_t is a pointer type. You asked for its size, and you got the size of a pointer. Why would you expect anything else? If there is a pitfall here, it is that using pointer typedefs leads to obfuscated code. There's never any good reason to use pointer typedefs and it's puzzling that so many people do. 4. The problem is that the '->' operator needs a pointer on the left. Nothing to do with sizeof. 5. sizeof values.a (no brackets required) is correct code and tells you the size taken up by the member 'a' of the object 'values', in this case it will be the same as sizeof(int).

The sizeof() operator is a necessarly evil part of the malloc() system C uses.

Ridiculous comment. sizeof has nothing to do with the malloc system.

It does nothing to make strings, or anything else, easier to work.

It allows you to know how big statically-allocated objects are, in bytes. Most of us find that useful. For example:

char buf[20]; snprintf(buf, sizeof buf, "Hello, world");

memset( &obj, 0, sizeof obj );

memcmp( &obj1, &obj2, sizeof obj1 );

ptr1 = malloc( sizeof *ptr1 ); ptr2 = malloc( 10 * sizeof *ptr2 );

The latter forms are especially useful for avoiding allocation size errors.

EvanED · 2007-04-12 Reply Admin

edwdig:
android:
If C programmers would quit trying to write 70's style K&R code and actually write code with variable names and typing that made sense there would be far less idiotic errors like this.
char name[] = "bonehead"; char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE the little p in front of the var name indicates to the maintenance programmer in bangalore that this is a pointer */

No no no... char is the data type. * is a modifier on the variable. Doing it that way leads to errors like people writing "char* ptr1, ptr2;" and expecting two pointers.

I never understood this argument. Big whoop. So the compiler yells at you, you go "oops" and put in the extra *.

I personally usually like to declare variables on separate lines anyway.

EvanED · 2007-04-12 Reply Admin

Old Wolf:
EvanED:
I also mentioned that even the "safe" versions of the str* functions are not really; strncat and snprintf for instance aren't guaranteed to null-terminate your strings, so you always have to remember to do it yourself.
strncat is guaranteed to terminate the string.
snprintf is guaranteed to terminate the string unless you pass 0 as the buffer length, or if you pass erroneous parameters and fail to check the return value.

You might be thinking of strncpy, which does not terminate the string in some circumstances (the function is poorly named as well as poorly designed).

Interesting, I thought that they weren't. I guess that demonstrates why I should verify things before asserting them.

I wonder why I got that misconception.

(BTW, I don't agree strncpy is poorly named. strcpy seems reasonable, at least by C's str+TLA scheme, and strncpy doesn't seem to do anything odd besides not be guaranteed to null-terminate (and padding the result to n characters), so I think it's an okay name.)

Addendum (2007-04-12 22:41): Actually, I think the misconception about snprintf at least may come from VC++ apparently not providing snprintf, only MS's own _snprintf. (Don't ask me why they did this, I have no clue.) That does not null-terminate if the size is smaller than the string.

The End of the String as We Know It

Leave a comment on “The End of the String as We Know It”