Comment On The End of the String as We Know It

I wish that there was something funny I could say about C-style, null-terminated strings. There isn't. "Putting a sentinel value at the end of a variable length sequence instead of a fixed-size count at the beginning: priceless." The fact is, they just aren't funny. [expand full text]
« PrevPage 1 | Page 2 | Page 3Next »

Re: The End of the String as We Know It

2007-04-12 09:03 • by Mohammed (unregistered)
Strings suck. We should all use integers.

Re: The End of the String as We Know It

2007-04-12 09:07 • by DOA (unregistered)
131599 in reply to 131598
Mohammed:
Strings suck. We should all use integers.

integers suck, we should all use booleans

Re: The End of the String as We Know It

2007-04-12 09:08 • by Piet (unregistered)
131600 in reply to 131599
Booleans suck, we should all use arrays of bits

Re: The End of the String as We Know It

2007-04-12 09:09 • by dennis (unregistered)
131601 in reply to 131598
> Strings suck. We should all use integers.

I agree, and in fact I always use that approach. So, instead of having the string "FOO" in my code, I have a little array of integers: 70, 79, 79.

To save on memory, I make each integer only 8 bits wide.

I also find it's convenient to mark the end of the list of integers by having an array element containing the integer 0.

Re: The End of the String as We Know It

2007-04-12 09:12 • by ok (unregistered)
131602 in reply to 131600
Why use bits? Use unary digits...

ok.

CAPTCHA: ewww! (as in C)

Re: The End of the String as We Know It

2007-04-12 09:18 • by fennec
C-strings are, in fact, a very good representation for string data in those situations where your program is not actually, you know, manipulating string data. They're great for printf() formatting messages, error messages, interface messages, and such, though.

Re: The End of the String as We Know It

2007-04-12 09:20 • by bobday
131606 in reply to 131604
fennec:
C-strings are, in fact, a very good representation for string data in those situations where your program is not actually, you know, manipulating string data. They're great for printf() formatting messages, error messages, interface messages, and such, though.

Wicked

Re: The End of the String as We Know It

2007-04-12 09:25 • by mav (unregistered)
I'm ok with C strings.

Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...

Re: The End of the String as We Know It

2007-04-12 09:26 • by Mike5 (unregistered)
131608 in reply to 131606
bobday:
fennec:
C-strings are, in fact, a very good representation for string data in those situations where your program is not actually, you know, manipulating string data. They're great for printf() formatting messages, error messages, interface messages, and such, though.

Wicked

Jungle is massive

captcha: tacos - mmmmm....

Re: The End of the String as We Know It

2007-04-12 09:27 • by An apprentice (unregistered)
131609 in reply to 131598
Mohammed:
Strings suck. We should all use integers.

Wrong. We should use strings for everything. Integers are just a special case of strings after all.

Re: The End of the String as We Know It

2007-04-12 09:29 • by snoofle (unregistered)
131610 in reply to 131607

int i = 0;
switch (i) {
case 0: ...; break;
case 1: <error>; break;
case FILE_NOT_FOUND: ...
}

Re: The End of the String as We Know It

2007-04-12 09:32 • by lazybutregistereduser (unregistered)
131611 in reply to 131604
because printf doesn't need to manipulate anything...

if memcpy() is involved (even hidden away in a standard library function like printf()), nul-terminated strings are not the best option. ie: not the best option, even if standards make them the only option.

but think about the alternative:
VCHAR8, VCHAR16, VCHAR32, VCHAR64, VCHAR128? "How much space do you really need. Choose now!"

And then you might want an extra byte at the beginning to tell you how many lead bytes you're using.

Re: The End of the String as We Know It

2007-04-12 09:36 • by fist-poster
131612 in reply to 131607
mav:
I'm ok with C strings.

Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...


That's right!

The fun is kind of ruined by the explanation of the WTF :(

Re: The End of the String as We Know It

2007-04-12 09:53 • by joe mama (unregistered)
Strings suck we should all use yarn.

Re: The End of the String as We Know It

2007-04-12 09:58 • by DONK (unregistered)
If only there were some way to represent lengths of strings before interrogating the string (char array) itself.

I'm thinking something that could possibly hold different variables with a structure. I'll call it struct, for short.

You could call on a variable like "string_length" within this struct.

You could then use the second item in your struct which would point to the string itself which resides in memory. I'm thinking of calling this a pointer.

All I need to do is add some looping and conditional constructs.

Man, I think I'm on to something.

Look out world, here I come!

Re: The End of the String as We Know It

2007-04-12 10:00 • by fennec
131617 in reply to 131615
joe mama:
Strings suck we should all use yarn.

Soap on a rope!

Re: The End of the String as We Know It

2007-04-12 10:01 • by TheRider
131618 in reply to 131601
dennis:
> Strings suck. We should all use integers.

I agree, and in fact I always use that approach. So, instead of having the string "FOO" in my code, I have a little array of integers: 70, 79, 79.

To save on memory, I make each integer only 8 bits wide.

I also find it's convenient to mark the end of the list of integers by having an array element containing the integer 0.

I prefer arrays of characters. But wait, now I'm mixing up Java and C. I think in C this would be arrays of unsigned bytes...

Re: The End of the String as We Know It

2007-04-12 10:01 • by annony (unregistered)
131619 in reply to 131607
mav:
I'm ok with C strings.

Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...


sizeof() would not return the character length of a C String. Assuming a "C string" is a char* memory buffer, the variable itself is a pointer.

Psuedo code:

char* mystring;
mystring = somethingThatReturnsAstring();
int nLen = sizeof(mystring); // nLen would be 4 on a 32-bit system

Re: The End of the String as We Know It

2007-04-12 10:03 • by Duston (unregistered)
Yeah, but string theory is supposed to explain everything, isn't it?

Re: The End of the String as We Know It

2007-04-12 10:12 • by Evo (unregistered)
131622 in reply to 131599
DOA:
Mohammed:
Strings suck. We should all use integers.

integers suck, we should all use booleans


True...

(False? File Not Found?)

Re: The End of the String as We Know It

2007-04-12 10:16 • by WeatherGod
131623 in reply to 131607
mav:
I'm ok with C strings.

Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...


It is also a pretty common mistake to think that sizeof() returns the size of an array...

Re: The End of the String as We Know It

2007-04-12 10:17 • by annoy (unregistered)
131624 in reply to 131619
annony:
mav:
I'm ok with C strings.

Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...


sizeof() would not return the character length of a C String. Assuming a "C string" is a char* memory buffer, the variable itself is a pointer.

Psuedo code:

char* mystring;
mystring = somethingThatReturnsAstring();
int nLen = sizeof(mystring); // nLen would be 4 on a 32-bit system


Replying to myself.

sizeof() will return a string length if you did:

char mystring[16] = "hello world";
int nLen = sizeof(mystring);

I just don't think it's practical using fixed length buffers like that.

captcha: doom (yes we all are)

Re: The End of the String as We Know It

2007-04-12 10:22 • by anon (unregistered)
g-string I like

Re: The End of the String as We Know It

2007-04-12 10:24 • by ImaLurker (unregistered)
131627 in reply to 131622
Yes! Three-state logic for booleans!

Re: The End of the String as We Know It

2007-04-12 10:27 • by -j (unregistered)
131629 in reply to 131620
String theoretic booleans!

enum boolean {
false = 0,
true = 1,
not_even_wrong = 2,
file_not_found = 3
};

-j

Re: The End of the String as We Know It

2007-04-12 10:31 • by BinkyTheClown (unregistered)
131630 in reply to 131624
To be pedantic, the sizeof in your second example will return the size of the array used to hold the string and not the length of the string itself. The string, as defined in C, does not have to consume the entire array.

Insidious coders have been known to store more than one string in an array. Not that I condone that in regular practice of course. However, you are entirely correct that sizeof should not be used to determine the length of a string as it is not a fundamental type in C.

Re: The End of the String as We Know It

2007-04-12 10:33 • by mav (unregistered)
131631 in reply to 131619
This could could be perfectly valid really.

I'll add comments to show what could be happening...

// set char* key_p to the key offset in buf string.
key_p = &buf[buf_i];
// get the key_len by reading to the next null
key_len = strlen(key_p);

// print key length, for kicks i suppose
printf("key_len<%d>\n", key_len);

// if for some reason this thing isn't null terminated
// then badness. Of course, we just did a strlen on it,
// so i don't know that we'll ever go DOWN this code...
// but its not really invalid of itself...
if (key_p[key_len] != 0x00)
{
fprintf(stderr, "key termination error %02x\n",
key_p[key_len]);
close(key_fd);
return;
}
// print the key for more debug fun.
if (debug) {
printf("len<%d> key<%s>\n", key_len, key_p);
}



I'm sorry, but this just isn't all that WTFable. And null termination is a good thing, especially when you consider that it is very portable amongst disparate languages, operating systems, etc...

Re: The End of the String as We Know It

2007-04-12 10:34 • by mav (unregistered)
131632 in reply to 131631
Also, I was incorrect. sizeof would be badness. Sorry, hadn't had my coffee yet. :-(

Re: The End of the String as We Know It

2007-04-12 10:34 • by Cable (unregistered)
131633 in reply to 131627
ImaLurker:
Yes! Three-state logic for booleans!


a I see representative of true/false/maybe fraction

Re: The End of the String as We Know It

2007-04-12 10:36 • by Kinglink (unregistered)
131635 in reply to 131624
annoy:
annony:
mav:
I'm ok with C strings.

Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...


sizeof() would not return the character length of a C String. Assuming a "C string" is a char* memory buffer, the variable itself is a pointer.

Psuedo code:

char* mystring;
mystring = somethingThatReturnsAstring();
int nLen = sizeof(mystring); // nLen would be 4 on a 32-bit system


Replying to myself.

sizeof() will return a string length if you did:

char mystring[16] = "hello world";
int nLen = sizeof(mystring);

I just don't think it's practical using fixed length buffers like that.

captcha: doom (yes we all are)


Sure it is very practical. Why use
somestring = mystring
when you can use
strcpy(somestring,mystring);
every time.

And then when you want to always have the same data in somestring and mystring then you can just call
strcpy(mystring,somestring);
every time you make a change. BRILLIANT!

Re: The End of the String as We Know It

2007-04-12 10:41 • by Steve (unregistered)
There are no "strings" in C. Only arrays of char.

Re: The End of the String as We Know It

2007-04-12 10:42 • by Daid
131637 in reply to 131624
annoy:
annony:
mav:
I'm ok with C strings.

Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...


sizeof() would not return the character length of a C String. Assuming a "C string" is a char* memory buffer, the variable itself is a pointer.

Psuedo code:

char* mystring;
mystring = somethingThatReturnsAstring();
int nLen = sizeof(mystring); // nLen would be 4 on a 32-bit system


Replying to myself.

sizeof() will return a string length if you did:

char mystring[16] = "hello world";
int nLen = sizeof(mystring);

I just don't think it's practical using fixed length buffers like that.

captcha: doom (yes we all are)
And the sizeof() will return 16 in your case, and mystring[16] = segfault. Not 0. (In theory, to bad it probly would compile, and run 99% of the time)

Re: The End of the String as We Know It

2007-04-12 10:42 • by Stephen Harris (unregistered)
Null terminated strings make sense when you realise then a string is really just a pointer to memory, and having it null terminated means you don't need additional temporary variables while scanning it (just update the pointer; *(p+1) is a perfectly valid substring of a non-null string *p and is quicker than sub$() or right$() or whatever other function might be required to copy memory blocks around (which a string format with length indicator would require).

The problem is that they're not too intuitive to deal with (remember to allocate an extra byte; if you want to store 100 characters then you need 101 bytes allocated and the offsets range from 0 to 99... eep!) and you can easily start overlapping strings (*p and *(p+1) in this case; modify *p and you've automatically changes *(p+1) which may not be what you want) and this easily leads to bugs. Ah well.

All programming languages suck. All programmers suck.

Rgds
Stephen
(a programmer)

Re: The End of the String as We Know It

2007-04-12 10:50 • by Kai (unregistered)
131641 in reply to 131631
Mav, what point is a condition that is never true?

A slightly simplified variant would be the following:

char x = 0;
if (x != 0) { ... }

Re: The End of the String as We Know It

2007-04-12 10:52 • by KNY (unregistered)
131642 in reply to 131601
dennis:
> Strings suck. We should all use integers.

I agree, and in fact I always use that approach. So, instead of having the string "FOO" in my code, I have a little array of integers: 70, 79, 79.

To save on memory, I make each integer only 8 bits wide.

I also find it's convenient to mark the end of the list of integers by having an array element containing the integer 0.



I find your ideas intriguing and would like to subscribe to your newsletter.

Re: The End of the String as We Know It

2007-04-12 11:01 • by N/k
131644 in reply to 131602
ok:
Why use bits? Use unary digits...
I like to call them "uts".

Re: The End of the String as We Know It

2007-04-12 11:01 • by Saladin
131645 in reply to 131641
Kai:
Mav, what point is a condition that is never true?

The opposite of a sanity check?

while (!true) {
// do a shot and then loosen your tie
// things are only going to get worse from here on out
}

Re: The End of the String as We Know It

2007-04-12 11:06 • by mav (unregistered)
131647 in reply to 131641
Kai:
Mav, what point is a condition that is never true?

A slightly simplified variant would be the following:

char x = 0;
if (x != 0) { ... }




I agree with you, its a dumb thing to do, but I've seen worse things... I'm sure I've made mistakes similar to this without knowing it, I imagine that every programmer has. So I hate to call down hellfire on this guy for doing it, especially when you consider the fact that it really doesn't adversely affect anything. It certainly isn't worse than failure. Its merely worse than no failure. So really its a WTNF.

Re: The End of the String as We Know It

2007-04-12 11:09 • by WIldpeaks
131648 in reply to 131598
Umm, strlen Got Lost

Re: The End of the String as We Know It

2007-04-12 11:15 • by rgz (unregistered)
131649 in reply to 131609
An apprentice:
Mohammed:
Strings suck. We should all use integers.

Wrong. We should use strings for everything. Integers are just a special case of strings after all.


I have the thing just for you http://www.tcl.tk/

Re: The End of the String as We Know It

2007-04-12 11:16 • by e**2n (unregistered)
The real WTF is using strlen.
Everyone nowdays should know, that only
strnlen can save this world from
Annihilation.

Re: The End of the String as We Know It

2007-04-12 11:19 • by XML Hater (unregistered)
131651 in reply to 131601
dennis:
> Strings suck. We should all use integers.

I agree, and in fact I always use that approach. So, instead of having the string "FOO" in my code, I have a little array of integers: 70, 79, 79.

To save on memory, I make each integer only 8 bits wide.

I also find it's convenient to mark the end of the list of integers by having an array element containing the integer 0.



LOL, now THAT was FUNNY!!!

Re: The End of the String as We Know It

2007-04-12 11:21 • by jimlangrunner
131652 in reply to 131620
Duston:
Yeah, but string theory is supposed to explain everything, isn't it?


Ye Gods, I think he's got it. (0x00)

Re: The End of the String as We Know It

2007-04-12 11:22 • by anthony (unregistered)
Ok, what about if he's worried about concurrent access? Maybe he's just trying to make his code super-robust?

To wit:

key_p = &buf[buf_i];
key_len = strlen(key_p);

printf("key_len<%d>\n", key_len);


// RIGHT HERE another thread comes in and modifies the
// contents of key_p!
if (key_p[key_len] != 0x00)
{
fprintf(stderr, "key termination error %02x\n",
key_p[key_len]);
close(key_fd);
return;
}
if (debug) {
printf("len<%d> key<%s>\n", key_len, key_p);
}

A bit of a stretch I know...

Or maybe he's worried about some malicious process coming in and modifying the strlen function in memory, so that it appears to work, but is subtly broken... the nefarious scheme of some cartoon supervillain-esque hacker?

Re: The End of the String as We Know It

2007-04-12 11:37 • by Steve (unregistered)
131659 in reply to 131611
lazybutregistereduser:

if memcpy() is involved (even hidden away in a standard library function like printf()), nul-terminated strings are not the best option. ie: not the best option, even if standards make them the only option.


Er... strcpy?

Re: The End of the String as We Know It

2007-04-12 11:43 • by Charly (unregistered)
131665 in reply to 131601
dennis:
> Strings suck. We should all use integers.

I agree, and in fact I always use that approach. So, instead of having the string "FOO" in my code, I have a little array of integers: 70, 79, 79.

To save on memory, I make each integer only 8 bits wide.

I also find it's convenient to mark the end of the list of integers by having an array element containing the integer 0.



integers suck too, and so do bits, I use arrays of dual-phase silicon semiconductor relays

Re: The End of the String as We Know It

2007-04-12 11:46 • by Steve (unregistered)
131668 in reply to 131637
char mystring[16] = "hello world"; Will allocate on the stack, so will never segfault, until the function returns anyway.

CAPTCHA: doom (it likes doom today)

Re: The End of the String as We Know It

2007-04-12 12:11 • by Faxmachinen
131676 in reply to 131641
Kai:
A slightly simplified variant would be the following:

char x = 0;
if (x != 0) { ... }

Technically speaking, a slightly simplified variant would be more like this:
char x = abs(0);
if (x != 0) { ... }
And in both cases the condition can be true.

But then again, you can just #define 0 std::rand(), so I guess my point is moot.

Re: The End of the String as We Know It

2007-04-12 12:22 • by sergio (unregistered)
131680 in reply to 131637
The only case you can use sizeof() as equivalent of strlen() is when you let the compiler figure out how much space do you need:

char mystring[] = "hello";
printf("%d %d\n", sizeof(mystring), strlen(mystring));

6 5

Notice that they are off by 1, because sizeof() counts the zero terminating byte, and strlen() doesn't.
It's not C strings that suck, it's "arrays == pointers" paradigm and the rest of the language, too, thanks, K&R, great job!

Re: The End of the String as We Know It

2007-04-12 12:45 • by Chris (unregistered)
I don't think there's *nothing* funny that can be said about null-terminated strings. I rather like this [paraphrased] quote:

One of the major reasons for the downfall of the Roman Empire was, lacking zero, they had no way to indicate termination of their C strings.
« PrevPage 1 | Page 2 | Page 3Next »

Add Comment