- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Yeah, I never understood why they didn't go MSB in the length decides the size of the length value (similar to how it is actually done with Protobuf). To have a magic value as part of you data payload is never a good idea, even if you ignore the unknown buffer length issue.
However, it's fair to say that we have to remember that C was designed when computers had the processing power of... ehm... actually, I think there's nothing around anymore. My fridge is as powerful as 100x Commodore 64s, so yeah, times have changed. And back in those days, it was the perfect compromise of dealing with strings.
Admin
@MaxiTB maybe an argument can be made that one zero-byte takes less memory than 2 bytes for the length (we're talking back in the days of 8-bit and 16-bit CPUs), but even that argument is extremely weak, and definitely from performance perspective having the length has advantages,
strlen
becomes a O(1) operation, yes random access now needs+ 2
behind the scenes, but iteration (the usual expensive operation) is unaffected. I don't think it was a good compromise - it's just people didn't know any better.Admin
Try to add a zero byte to a string which lacks it is like jumping off the roof of a building to try saving somebody who fell off previously - both people will die.
Admin
The part that irritates me the most is that the string that is being strcatted is, in fact, a string containing a NUL terminator, then followed by the actual NUL terminator. The equivalent of
char anonymous_string_47[2] = { '\000', '\000' };
Admin
"WTF?!\0"
Admin
@Steve - Yes, but fortunately it will have no result on the output.
Admin
I seem to recall cases where a double null terminator was used to terminate a variable length list of null terminated strings. That allows a list of variable length strings without the padding that would be required for an array of fixed length strings, where the array element size needs to be set for the maximum length string.
If the code is an extract, perhaps that might be the explanation.
Admin
There are some Win32 APIs that expect to be passed double null-terminated strings, which represent an array of null-terminated strings, and the double null indicates the end of the array.
One example is the
lpstrFilter
andlpstrCustomFilter
fields of theOPENFILENAME
structure used by theGetOpenFileName()
andGetSaveFileName()
APIs used for presenting an Open or Save As dialog box in pre-Vista applications. Another is the block of environment variables returned byGetEnvironmentStrings()
. The program argumentsargv
passed intomain()
are also stored contiguously in one block, terminated by a double null, but the program doesn't need to know or care about that.Admin
Use strcpy_s for a safe(r) replacement for strcpy. The strNcpy function is not, under any normal circumstances, to be used. It's a weird specialty function that is not a safe replacement for strcpy.
See MS LEARN for the correct replacement.
(So what's wrong with strncpy? Well, the same problems that are always pointed out: it doesn't guarantee a NUL char at the end, and if you have a big destination buffer, will spent a lot of time pointlessly filling the buffer with NUL chars.
Admin
I'm also pretty sure nobody expected C to be around after half a century.
Admin
To clarify:
Imagine you want to store an infinite integer in the most efficient way when most of the numbers are in the lower ranges. Well, you do the same thing UTF-8 did. So basically you take the most significant bit as an indicator that there are more bits following in the next word. You basically waste 1/(bits of word) of a word, however you can go to infinity with this system. Now in reality you most of the times have to only read a few words (for short strings most likely only one), so you end up with an as efficient way as using magic terminators in the string without the need of a terminator.
Admin
If the double null is the explanation, it still shows that they don't understand how it works, since this code won't do that. As far as
strcat
is concerned,"\0"
is the same as""
, since the null byte is treated as the terminator of this string.Admin
Flashbacks to my first dev job out of college where I performed maintenance (and theoretically new enhancements) on C code. 90% of the time it was debugging core dumps to find the place where somebody missed a null terminator at the end of their string.
(The real WTF there was that it was a shop using an Oracle database back end and writing processing code in C. It was the mid-90s but still - it could have at least been C++).
Admin
See, that's why you use a BSTR, which is length prefixed, no +2 required for indexing.
Admin
(Forgive me if everyone already knows this:)
This is known as "in-band signalling." Putting the control info outside the data is "out-of-band signalling." Telephones still use some in-band signalling (touch tones). It's simpler... but it is much more susceptible to problems when the controls look like the data. For example, what if your data needs a null byte? (Well, don't use C strings for that.)
Admin
I'm trying to decide if today's WTF is more like "In today's news: the sun rises in the east. the pope is catholic. details at 11." I've seen more than my share of "string\0" than I care to count. It's lack of fundamental understanding. I recently had to deal with code from a programmer who should have known better when he did some JavaScript that dealt with five pages. He created 6 of them and then deliberately ignored element 0 so he had a 1-based list. Ugh. Programmers not grokking basic feature X of language Y are without number.
Admin
Nobody expects the C inquisition! Torture them... with the NUL terminator! Confess!
Admin
You could also prefix the list with the number of elements, just like you could prefix each string with the number of characters. So the list would start with two bytes indicating the length of the list followed by two bytes for the length of the first string, instead of ending with two NUL bytes.
But as others have pointed out, this code doesn't actually construct a doubly-terminated list. It just concatenates the first string with an empty string, producing a result identical to what it started with.
Admin
The problem with size+string is that you aren't communicating buffer size. So your non-NULL terminated string still has all the problems of NULL terminated strings in that you can overflow the buffer. Having the size of the string thus isn't really much of a benefit over NULL termination other than strlen() being O(1) instead of O(n).
NULL termination does however let you implement functions like strtok() trivially easily since they replace the token with NULL and return the substrings that way. This makes it much easier to parse strings and command lines back in the day. If you had to deal with Pascal strings (size+string) you'd have to fake it by replacing the characters with the string length and storing those away.
Admin
Before and after the use of nul terminated strings in C there were many other languages which implemented strings in other ways. C strings are significantly better than most of them. For example, some implementations of Pascal used the initial byte to hold the string length - so it was impossible to have a string of more than 255 characters.
The problem with C strings is that you need to know what you are doing when using them. But that is essential for all programming. If you don't know what you are doing, you shouldn't be programming at all as your efforts will be of the quality of the works so frequently exhibited on this site.
The str*() functions do not have strn*() variants because they are dangerous. These are much newer inventions by people who think that neither attention to detail nor knowing what functions do are important. The oldest strn*() function - strncpy() - is frequently described as a safer alternative to strcpy() when it is no such thing. Despite its name, it has nothing to do with C strings. And it would only be safer if you think that correct behaviour of programs is not important, as the way people advocate strn*() functions be used results in, or is highly likely to result in, unwanted truncation and, ironically enough, something which is not a valid C string because it lacks the nul termination.
Annex K provided a lot of this sort of nonsense and has ended up a failure.
Admin
Also keep in mind that C was designed to be "portable" (insert guffaw here) so if you have a 2-byte character count, you had to account for endianess. Null termination avoids that problem. My big beef is that is that strlen() doesn't return the length of string to use to allocate memory, you have to manually account for ... wait for it ... the terminating null.
Admin
Aaaah, yes. The old "If you were a perfect programmer like I am there'd be no need for safety nets or assistive technology. Just never goof, not once, over the course of your career. Just like me."
Admin
The problem with " assistive technology" as you call it is that it does not get rid of mistakes. It just moves them into a layer that is opaque to you. But you still need a basic understanding of what is happening in there if you want to use them right. Otherwise you just replace one category of mistakes with another.
However, because of the mistaken belief that it removes mistakes people will jump to using them without learning the fundamentals. Thus it is quite possible for technology meant to help you to actually cause more mistakes than it avoids.
And it often comes with the bonus of tying your hands and not letting you manually do what you know needs doing to fix them.
Admin
In other words, there will never be NULL termination.
Admin
If this is meant as a reply to Charles-2, then I think you miss the point. The
strn*
functions are not a safety net. They only provide protection against one class of error - the buffer overflow (and even thenstrncat
is its own special WTF*). In what circumstances is it acceptable to truncate a string at an arbitrary point? That's always been bad and now that a buffer ofchar
is as likely to contain UTF-8 as plain ASCII, it's even worse.strncat
appends at mostn
chars from the source to the destination. To use it safely, you have to know how big the destination buffer is and how long the string it contains is.Addendum 2024-04-11 05:59: The last paragraph should have an asterisk at the front because it's meant to be a footnote about
strncat
.Admin
File dialog filters actually require triple-null terminated strings. That's because they're actually lists of pairs of strings, and "two empty strings" is the list terminator.
So you'll need to write stuff like:
lpstrCustomFilter="Text files\0*.txt\0\0"
Now there's three null terminators at the end of this string: One implicit, two explicit.
Admin
Yeah,
strncpy
looks like it was made not actually for string manipulation, but for storing strings in records (where filling the rest of the buffer with zeroes makes sense, and do does considering a full-but-unterminated buffer an acceptable outcome).Microsoft's
strxxx_s
functions are little better since by default, they merely mitigate buffer overflow attacks by turning them into denial-of-service (their default behavior if the buffer size is exceeded is to force a crash). Microsoft's<strsafe.h>
functions behave significantly better, notwithstanding the philosophical debate on arbitrarily-truncated strings.Admin
I don't have a Windows machine available to test on, but I'm pretty sure that only two nulls are needed, not three (the third won't hurt though). I'd expect this to work just fine:
Admin
I'm pretty sure that in this case it wouldn't even append anything to the string, because the string "\0" will stop reading on the first byte (a null terminator). The strcat() is effectively doing nothing, or doing the wrong thing, if the original string had embedded nulls or no null terminator.
Admin
C does not have strings. It merely has some functions which operate on null-terminated arrays of characters- structures which are susceptible to all the usual buffer problems. C does not have strings.
As for double nulls to indicate list termination, it might be better to consider the null as an item separator, not a terminator.
Admin
It's been a while since I did this, but IIRC, the greatest danger in using the strn... functions is not that they truncate the result string without warning. Basically, you either stop the program or truncate the string so it fits, and users generally prefer the latter. If your last name is Nightingale-Fotheringdale, would you prefer software to call you something like "Nightingale-Fot" or to throw an error and refuse to enter you into the system at all because your name is too long?
The huge problem is that when these functions truncate the string to fit the buffer, they go right to the end of the buffer without a null byte. That is, they leave an unterminated string as the result.
It wasn't hard to fix this. Just append a null byte after calling the function. E.g.:
#define BUFFERLENGTH 100 char result[BUFFERLENGTH + 1];
strncpy(result,source,BUFFERLENGTH); result[BUFFERLENGTH] = 0;
Admin
I would be fairly certain that Mr. Nightingale-Fotheringdale who is denied boarding for his flight because the name on the booking does not match the name on his passport would very much have preferred that the booking system had completely crashed as that would have allowed him to either make a phone booking or, if the booking company had the same flaw for phone bookings, make the booking with a different company.
Your "huge problem" is really one of the programmer not knowing what they are doing, failing to read the documentation for functions they are calling , and blundering along making reckless assumptions about the data that they have been entrusted with for processing.
Admin
Off: also there is strlcpy and strlcat which are better than strncpy and strncat, but not Posix-standardized.