- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
No it isn't: due to a WTF in the definition of the C language, '\0' is not of type 'char', it's of type 'int'. You can verify this, for example, by applying 'sizeof' to it: sizeof('\0') will give 4 (on platforms where an 'int' is of size four), but sizeof((char)'\0') will always give 1.
But either way, assigning zero is correct, if unidiomatic...
Admin
I bet that's just the anonymization.
Admin
Well if this entire website is full of examples of what not to do, I guess the URLs are no exception.
Admin
I think you guys missed what he was doing. That's okay. I think you got wrapped up in the whole "sprintf" thing.
What he's doing is the following:
Note that len is a disposable value.
This whole thing is a string concatenation exercise. It's really badly done. In most embedded system, you want to avoid the use of printf, sprintf, etc. They're heavy functions, and you may not have the time or space to include them. They're not useful for debugging either, since you'll end up with load bearing print statements.
Your first thought, I'm sure, is "Hey, why is he not using the native string concatenation functions, you know, the ones built into C?"
Yes, of course, that's the way to do it:
But where's the fun in that? If you've got a really bad compiler, it might do something awesome like include all the functions in the library. (It is very rare to have this happen.)
Edit: Of course, I'm assuming (and likely incorrectly) that the lengths of all these strings is kosher. Code like this is what's responsible for all these "buffer overflow exploits" that you hear about. If header + value is bigger than buffer, then you start writing arbitrary values into the rest of the code. A properly formatted string can allow code execution of anything you desire.
Addendum (2007-02-14 11:52): Addition:
I just realized he's basically printing this onto a piece of paper, scanning it in, and then reading the value. An embedded twist on an old classic. ;)
Admin
So far, the quoted must be the WTF comment of the day. Doesn't make least bit of sense.
Yeah, you know in C strings are arrays of characters, and they are zero terminated. So if you want to truncate a string that's later accessed by strxxx and such functions, all you need to do is to write a zero where you want to truncate it. This is a security issue if you're dumping full string buffers somewhere, as potentially sensitive content stays there, so you'd then need to write zeros all the way to the end of the buffer. For other uses, though, truncation-by-a-single-zero is perfectly acceptable.
The 2D bit is beyond me, though.
Cheers!
Admin
I love CodeSOD. The comments range from comedy gold to outright schadenfreude. Here, I'll try one:
The real WTF is that he didn't use char[] buffer = (header + ": " + value + "\n\r").ToCharArray();
Admin
Since he's writing a 0 over the \n, it takes out the \r as well. The string in sprintf should be "%s: %s"
Would you trust the person who wrote that snippet to be sure that they wouldn't overstep the string?
Admin
What's most frightening is the people here who have no clue about what the code actually does.
First, a buffer of size N chars is allocated on the stack. sprintf writes to that buffer; the end of the string is marked by a ASCII NUL character, '\0' or the integer 0. If the string including the terminator is larger than N, then bye bye stack. Use snprintf instead.
strlen returns the number of characters not including the NUL. The array is zero-indexed, so there is a NUL at buffer[strlen(buffer)]. sprintf returns the number of characters written, so strlen is unnecessary.
The next line inserts a terminating NUL (which is the integer zero) where the '\n' is in the string; this has the effect of terminating the string.
WTF #1. Using sprintf instead of snprintf. WTF #2. Confusing carraige-return with line-feed. WTF #3. Throwing away the result of sprintf just to go find it again. WTF #4. Inserting data into a string that is just about to be thrown away anyway.
Admin
Your code is valid C, but it has a bug. So, here's a version with the bug fixed. Also, on platforms with broken compilers (ZDS II for Z8 Encore comes to mind) it will yield slightly faster code:
BTW, gcc -O3 generates virtually same code for this and for the fixed version of the OPs code. Other compilers may or may not. ZDS II -- won't.
Cheers!
Admin
I wouldn't say it's unidiomatic, it's the way the language is designed in the first place. Look up the definitions for strchr and memset, and tell me again that it's unidiomatic. (Let me just clear this up for the VB and SQL people here: in the C standard library, many functions that take an argument representing a single character in a string search, etc., specify that the parameter is an int, not a char. The reason for this has to do with efficiency in putting arguments on the stack.)
Unfortunately, the C++ spec manages to mangle things to the point where, for the purposes of definition, 0 != '\0' != NULL, which is just ridiculous. That's a discussion for another day, though.
Admin
No. He should use strcat, like $DEITY intended. Or even roll his own if it's a speed-critical path.
Admin
Disgrundled postal worker: I don't think I'll be reading the comments on this site again
But that's where the real WTFs are!
CAPTCHA: tesla
Shocking!
Admin
Well, thank you. Like I said, I knew there was something wrong with it. ;)
Ah well, that's what testing is for, eh? Catching crap like this before it gets sent out.
Admin
#2 is actually like getting the capacity of the glass to find out how much water you have and finding out that it's three feet below sea level.
Admin
In C, the type of a character constant is int; your cast to char above is superfluous (and incorrect, if you meant it to imply that the type of '\0' is char).
In C, '\0' has exactly the same type and value as 0. Its only purpose is source-code documentation, as a hint to maintainers that the code in question is dealing with string termination, and not any of the other possible uses of 0 (integer 0, null pointer constant, etc).
(In C++, character constants have type char; this is one of the differences between the two languages.)
The same applies to '\x0', and would apply to the equivalent decimal character constant, if there were one. '\0' itself is an octal character constant, of course.
-- Michael Wojcik
Admin
printf requires the format string be terminated (and constant strings are implicitly terminated in C). That termination is copied to the buffer; the resulting string is terminated regardless of what was in memory before.
On a non-embedded system (or a non-embedded system that runs fast enough--I work on a kiosk app where we can pretend we're on a decent machine) I would fail the first line in a code inspection because the buffer isn't initialized:
or orBut on an embedded system with limited clock cycles and possibly limited code space, I'd let it pass. printf is evil, but it does guarrantee a terminated string.
Not knowing/thinking about that termination is the source of one of the common, hard-to-detect, and most evil overruns with printf. People check the length of the string they're inserting against the buffer length and forget that one extra character will be written. Don't do that. Even better, don't use printf. On a beefy machine, enter the 80's and use ostrstream.
Nice job thinking about what was on the stack when buffer was allocated, btw. Can I get you in my team's code reviews?
PS - why does the code tag insert newlines? I want to put variable names within text block in a monospace font!
Admin
This only applies if you use a beaindead compiler, or use a dynamic format string. Most reasonable embedded compilers inspect format strings. Thus, for a static "%s:%s", you'd expect the generated code to be similar to calling strcat three times. Some really clever compilers (I know of only one) will generate the equivalent of a hand-coded loop posted previously here.
Cheers!
Admin
Amen!!!
Admin
NO
Admin
Captcha is pinball -- and you're one. sprintf prints a null-terminated string to a buffer, so if N which is has had to be #define(d) somewhere, is large enough to accommodate the header and value, it's going to find the length.
Can \n\r happen? A text file opened in binary mode will preserve the /r (cr) internally. In text mode, the cr/lf is converted to newline. (\n). So if a file opened in binary and written as text, will result in cr/cr/lf.
So, I suspect that the snippet was mis-typed. sprinf, for one, and \n\r.
sprinf would compile, but not link.
I apologize for the pedantry, but I really hate to see the kind of dumb comments we've been getting today.
Admin
So, what do you recommend the size be set to in the call to snprintf ? N ? snprintf won't save you if N is defined improperly, and it won't really help you if it is defined properly.
Admin
Admin
As sprintf writes a null terminated string to the buffer, this will not be a problem. This is also why the code that has been written to try and terminate the string is a waste of time.
Admin
What?! I don't know how N can be defined properly or improperly. As far as I can tell, if the declaration of the char array compiles, it's defined properly enough. If you use N you prevent buffer overflows. The code might not accomplish what you intend, but I'm pretty sure overflowing a buffer and opening up a vulnerability are not the intent either and is an arguably worse outcome. That being said, I'm not sure about the portability of snprintf
Admin
Don't forget to add this after that line:
snprintf won't null terminate a string if it runs out of buffer space.
Admin
Sorry, I was making an assumption that wasn't necessarily fact (one of those I clicked on submit sure I considered every angle, and I new one popped in my head while it ran).
If N is an integer (int N;) Then char buffer[N] would be invalid code, so I assumed it was pseudo-code with N being defined to strlen (var) + strlen (val) + strelen (": ") + strlen ("\n\r") + 1.
I had not considered that N might be real code which is defined to a constant.
Admin
'You learn something new every day'. Yet I don't like leaving things like that.
Another thing, '\r' is an escape sequence (carriage return) and '\n' is also an escape sequence (new line = line feed + carriage return, changing order depending on OS).
But then if sprintf puts an NULL character at the end, it's also useless to put an '\n' and '\r' in the sprintf at all ...
I'm still capable of learning a few more things today.
Admin
No, you want the carriage return (\r) after the line feed (\n) so that the print head doesn't smear the ink while returning.
Admin
The null terminator is guaranteed to be zero I believe.
Can anyone confirm that the standard actually mandates this (as opposed to it just being common practice)?
Admin
Um, all three of those things will compare equal.
Addendum (2007-02-14 13:59): The only weird C++ artifact in this area as compared to C is the aforementioned thing that '\0' in C and C++ are different types (but since the char would then be promoted to an int it doesn't matter) and that NULL in C is (void*)0 but just 0 in C++.
But you can test it yourself:
Admin
Admin
I've seen code with angle brackets around the CR and LF: sprintf(buf, "foobar<\r><\n>");
I can only assume someone saw <CR><LF> in a spec somewhere.
Admin
Yes it will!
From c99 7.19.6.5, section 2:
From freeBSD snprintf manpage:
I wish people here would learn WTF they're talking about before posting comments.
Admin
Won't the first
also copy the '\0' terminator from src1, so that for all string manipulation purposes, dest will actually contain only src1?Admin
In whose broken implementation of sprintf/strlen does this happen? \r doesn't affect either sprintf or strlen in any implementation I'm aware of...
Admin
Indeed.
$ cat stupid.c #include <stdio.h> int main(void) { char foo[5]; snprintf(foo, sizeof(foo), "hello world"); printf("foo[0] is %02X\n", foo[0]); printf("foo[1] is %02X\n", foo[1]); printf("foo[2] is %02X\n", foo[2]); printf("foo[3] is %02X\n", foo[3]); printf("foo[4] is %02X\n", foo[4]); printf("foo[5] is %02X\n", foo[5]); printf("foo[6] is %02X\n", foo[6]); return 0; } $ gcc stupid.c $ ./a.out foo[0] is 68 foo[1] is 65 foo[2] is 6C foo[3] is 6C foo[4] is 00 foo[5] is FFFFFFFB foo[6] is FFFFFFFF $
Admin
It's easy to get rid of that restriction, but I've rarely seen it done in practice.
Such code examples tend to be typical in many embedded systems. It's not just web and windows apps that bring out the worst programmers. Sometimes it's the hardware guy who ends up writing software; or the software guy who doesn't understand hardware trying to write firmware. Sometimes it's the high level apps people that think they're still on a 2GB 2GHz Pentium system. Often it's the rush to get things done as soon as possible (ie, converting a C string to a C++ string, putting that in a strstringstream, and then parsing integers out of that; all because that's quicker than looking up how sscanf works).
Admin
I always assign 0, idiomatically. Because that's 3 less characters than '\0', and the language guarantees that assigning 0 works. It's just easier to read.
Admin
Yeah, it probably will, but this is just conjecture. We'd catch that during the test phase, and the first place we'd look is in that index increment between the two copy sections.
The point is that the OP is doing string concatenation by printing out two strings. That's the embedded equivalent of printing a picture, putting it on a wooden table, taking a photo, and then scanning the photo.
The right way to do it is probably what I had at first: (The "no fun" way.) #include <string.h> strcat( buffer, header ); strcat( buffer, ":" ); strcat( buffer, value );
(Assuming you check the lengths of everything and the compiler won't include the whole library.)
The most important part of the code from the OP is that it shows that the OP is just not comfortable using pointers. Basically, if you're not using pointers in your embedded system, you're using too much footprint. That means you're costing time and money.
Admin
Not very enterprisey -- What if the value of NULL should in the future change to something other than 0?
Admin
This strikes me as a dumb way to do it since strcat needs to find the END of buffer which takes O(n) time, and it needs to do it three times, two more times (and maybe three more depending on what information you have available beforehand) than necessary.
If you're embedded and have to worry about efficiency, I don't think the standard string functions meet your need.
BTW, if Kuba's post
is true, you might get BETTER efficient code with sprintf than with three calls to strcat, unless the compiler also recognizes that pattern.
Admin
I think the real WTF here is that these idiots are using something as inane as C in an embedded environment. They should be using MUMPS. Or Rexx.
Admin
But once you've used one sprintf or sscanf in the code, the other uses are all cheap. For diagnostics this is incredibly useful, especially if you need to print or parse integers. If all you use are strings then it's overkill. But for many embedded systems with more than 16MB memory it doesn't take up much room.
Many C runtime libraries have lighter weight implementations (ie, leaving out float support and such).
Sadly, our system uses a ton of C++ I/O, which is immensely huge and bloated in comparison. I'd rip it all out if I could. Don't know what they were smoking. Probably a syndrome of too many people today learning C++ without understanding C.
Admin
Oh, I agree. Just pointing out another WTF ;-). Of course, if the first while (*d++ = *s++); didn't terminate the string, then the second wouldn't either, and so the function would create an unterminated string, which isn't much use, is it?
But I'm pretty sure they'll both copy the terminators. If I remember my C correctly, *d won't be tested until after the assignment and increment has been completed. Um, unless doing *d++ unstead of *(++d) (which I think is the only thing that prevents dest[0] and src1[0] being skipped) short-circuits that.
Maybe it's time to drag K&R out of storage.
Admin
So C++ uses 0 instead and deprecates NULL. 0 can be assigned to or compared with any pointer type without type casting first. It's not ridiculous, it's an improvement on the confusion that C implementations had.
Admin
It's more than that though; it's that C++ doesn't allow implicit pointer conversions between unrelated types. In C, you can say char* c = (void*)0; (and thus = NULL), but in C++ you can't.
Defining NULL as (void*)0 would have required C++ to make it so that compilers recognize the literal null pointer "(void*)0" and allowed conversions between that type. Stroustrup/the standards committee decided that this was undesirable, so defined NULL to be just 0.
This is the real reason, rather than inconsistency between what NULL was in C implementations.
(Finally, NULL isn't officially deprecated either. It may be with C++0x which introduces the nullptr keyword though, who knows.)
Admin
Admin
You're absolutely right, I didn't realize that it would have to check "buffer" several times to find the end. The standard libraries may not meet the requirements. In my defence, I just don't use strings that often. It comes up once (maybe twice) a year. That's usually for a unit test diagnostic. I don't think I've used a string all year. :D
As for what the compiler does, that's why there's the "View ASM" option. Sometimes the compiler does some really stupid things, and you have to go check on it.
Admin
I'll preface this with something that says I may know what I'm talking about: I was an electrical and computer engineering major in college, and I specialized in embedded systems. I've worked (as a job) on embedded systems for several years. On the vast majority of the "really small" embedded systems that we seem to be assuming this is (and I hope it is - otherwise it's inexcusable), code memory and data memory are seperate address spaces, so there's no danger of a code execution exploit with a buffer overflow. Also, most of these devices have a hardware stack that is only a few levels deep, so the buffer isn't allocated on the stack at all! The compiler just figures out what pieces of memory are guaranteed to not be use when in that particular section of code and allows the buffer to occupy one of them. As for using a language other than C, usually your choice is assembler (and it's not nearly as pretty as x86) and C. The only truly small embedded systems I know of that use C++ actually interpret the C++, not compile it... And of course you've got your really terrible BASIC variants from some manufacturers, but those are geared towards hobbyists and no one in their right mind would use them.
Admin
The most common implementations I've run across do not put in the final terminator if the size is exceeded. Portable code should always put in the terminator ("buf[SIZE-1] = 0")
Always, beware of the return value from snprintf! In case of overflow, it returns the number of characters that it would have appended if it could have. Not the number of characters that were actually written. Do not use the return value as an index into the end of the string without first testing it for overflow!