• Skipper (unregistered) in reply to annoy
    annoy:
    sizeof() will return a string length if you did:

    char mystring[16] = "hello world"; int nLen = sizeof(mystring);

    Try that with a wchar_t "string": OOPS!

  • (cs) in reply to Old Wolf
    Old Wolf:
    1. sizeof tells you the size in bytes you have allocated. It isn't a "pitfall" that it doesn't tell you some different fact! 2. Huh? Are you claiming that sizeof(void *) is 80? 3. my_num_t is a pointer type. You asked for its size, and you got the size of a pointer. Why would you expect anything else? If there is a pitfall here, it is that using pointer typedefs leads to obfuscated code. There's never any good reason to use pointer typedefs and it's puzzling that so many people do. 4. The problem is that the '->' operator needs a pointer on the left. Nothing to do with sizeof. 5. sizeof values.a (no brackets required) is correct code and tells you the size taken up by the member 'a' of the object 'values', in this case it will be the same as sizeof(int).

    I think the point was that using sizeof() to get the size of a string is in some sense somewhat dangerous, because while there are circumstances in which you can certainly use it, you have to be aware of the context in which you are. So for instance, if you have a declaration

    char x[]="Hello World";
    you could use sizeof(), but this is somewhat brittle because if someone changes it to
    char* x="Hello World\n";
    the size calculation silently breaks.

  • pc (unregistered) in reply to Piet

    So we can use true, false, and FILE_NOT_FOUND

  • edwdig (unregistered) in reply to Old Wolf
    Old Wolf:
    edwdig:
    android:
    char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE */
    No no no... char is the data type. * is a modifier on the variable.
    The data type of pName is "char *". Pointer to char.

    The data is of type char. pName happens to be a pointer to the data rather than the actual data.

    That's just a syntax choice by the inventors of C; they could equally as well have made the above code declare two pointers and the rest of the language would be the same. (Presumably they didn't because it would lead to *more convoluted* syntax if you are declaring something tricky like multiple pointers to arrays).

    It wasn't an arbitrary syntax decision. It was done that way because the type system was designed around the view I mentioned before. It's also why it's "char data[10];" and not "char[10] data;"

    C syntax makes a lot more sense when you understand why it is like it is.

  • Jon (unregistered)

    This is a wank. There is nothing wrong with C strings. There can be a LOT wrong with length-specified strings, especially if the length field is too small, eg. 8 bit, ala classic Pascal.

    CAPTCHA - putting in your CAPTCHA is also a wank.

  • (cs) in reply to snoofle
    snoofle:
    int i = 0;
    switch (i) {
      case 0: ...; break;
      case 1: <error>; break;
      case FILE_NOT_FOUND: ...
    }
    
    you mean:
    int i = 0;
    switch (i) {
      case 0: <error>; break;
      case 1: ...; break;
      case FILE_NOT_FOUND: profit("!!!");
    }
    
  • cosmicfroggy (unregistered)

    Nothing funny you can say about strings? Pshaw!

    Two strings walk into a bar. The bartender asks them, "What'll you have?" The first string says, "I'll have a bourbon and Coke." The second string says, "I'll have a beer.zxd43ezz 01020304^M\032". So the first string says, very quietly, "You'll have to excuse my friend. He's not null-terminated."

  • Alex (unregistered)

    I still think Delphi has the most elegant way of expressing strings. Doesn't have the speed issues of java/lua etc, nor the sentinel of c. For those that don't know how it works, it acts exactly like a c-string - is null terminated, to be compatible with windows apis. But the 4 bytes preceding the first character are the length, and the 4 bytes before that are a reference count for automatic garbage collection.

  • dkf (unregistered) in reply to Alex
    Alex:
    I still think Delphi has the most elegant way of expressing strings. Doesn't have the speed issues of java/lua etc, nor the sentinel of c. For those that don't know how it works, it acts exactly like a c-string - is null terminated, to be compatible with windows apis. But the 4 bytes preceding the first character are the length, and the 4 bytes before that are a reference count for automatic garbage collection.
    The main problem with this is that doesn't scale up nicely to 64-bit systems. The problem is that it's becoming more common to encounter strings of 4GB long or more; I know it's a disgustingly long string, but they do happen.

    But in general, counted strings are very good. Two key benefits are that they 1: can contain binary data safely (useful annoyingly often), and 2: allow really efficient processing.

    The second point is important, and stems from the fact that if you're not looking for a zero byte, you can (or rather your compiler can generate the code to) actually move or otherwise process many bytes at once rather than one at a time. It also allows you to avoid the O(n) operation strlen.

  • Anonymous (unregistered) in reply to An apprentice
    An apprentice:
    Mohammed:
    Strings suck. We should all use integers.
    Wrong. We should use strings for everything. Integers are just a special case of strings after all.
    Wrong. We should use XML instead of strings. Strings are just a special case of XML after all.

    :P

  • Bitter Like Quinine (unregistered)
    I wish that there was something funny I could say about C-style, null-terminated strings. There isn't. "Putting a sentinel value at the end of a variable length sequence instead of a fixed-size count at the beginning: priceless."

    Yeah, delimiters suck. Bring back Hollerith constants.

    (Hint: "This is some text" <-- sentinel value fixed-size count --> 17HThis is some text )

  • Nadav Samet (unregistered)

    It starts to scan from the given character, until it find things unrelated to the start of the string. Then it would not the length without a need for sentinel. Now the code makes sense :)

    Nadav

  • (cs)

    OK, now I'm feeling stupid. I thought c-strings were either null-terminated, or terminated by the end of the array. I can, however, not quickly find any support for this belief (and plenty for the opposite). I know this used to be a problem securitywise.

    If you think about it this works fine for strings. Just read untill the 0 or end of allocated memory. No need to keep track of size or allocate an extra byte for the 0. I guess I'm wrong.

    However I would like to point out that strings aren't a primitive type in C. So if you don't like the implementation, you can do something about it.

    I would also like to point out that C is really usefull for writing programs that need to use procesor and memory efficiently. Considering they used lose the first 2 digits of the year to save space, one can understand null-terminated strings. If you don't need to do things efficiently, then you can go play with Java or some other high level language. In fact, you should only use C when it fulfills a requirement.

    Languages are all made for a purpose. Just because I can do everything you do in Java in ps, doesn't mean I should (well, ok I can't, but someone can. In fact, I know a guy who used to write ps programs so he could use the printer for extra processing power). I can also use a shotgun to swat flies, but then I shouldn't complain about the holes in the wall.

  • Chris (unregistered) in reply to Eric

    I always liked the VAX version of string descriptors. The string descriptor would contain, among other information, the length of the string (16 bits) and the memory address where the string is stored.

    This is similar to the way Pascal and direct ancestors to C (such as BCPL) encode strings. The problem with BCPL is that the length is encoded as a byte, meaning your strings couldn't be longer than 255 characters ...

  • Chris (unregistered) in reply to Chris
    Chris:
    I always liked the VAX version of string descriptors. The string descriptor would contain, among other information, the length of the string (16 bits) and the memory address where the string is stored.

    This is similar to the way Pascal and direct ancestors to C (such as BCPL) encode strings. The problem with BCPL is that the length is encoded as a byte, meaning your strings couldn't be longer than 255 characters ...

    And according to another poster, it's how Pascal (at least some variants) does it as well - could've sworn it was 16 bits.

    Personally, I tend to use a simple string structure when I'm coding in C, something like:

    typedef struct
    {
        char *buf;
        int allocated;
        int used;
    } String;
    
    String *
    string_new(void)
    {
        String *s;
    
        s = string_alloc();
        if (s)
            string_init(s);
    
        return s;
    }
    
    String *
    string_alloc(void)
    {
        String *s;
    
        s = malloc(sizeof *s);
    
        return s;
    }
    
    void
    string_init(String *s)
    {
        memset(s, 0, sizeof *s);
    }
    
    /* ... various other string functions ... */
    
    void
    string_clear(String *s)
    {
        if (s->allocated) {
            free(s->buf);
            memset(s, 0, sizeof *s);
        }
    }
    
    void
    string_delete(String *s)
    {
        string_clear(s);
        free(s);
    }
    
  • B (unregistered) in reply to mav
    mav:
    I'm ok with C strings.

    Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...

    Not correct. sizeof in the original code will provide you with the size of "pointer to string", which is 4 (on the platform I use most often) irrespective of the length of the string.

  • B (unregistered) in reply to B
    B:
    mav:
    I'm ok with C strings.

    Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...

    Not correct. sizeof in the original code will provide you with the size of "pointer to string", which is 4 (on the platform I use most often) irrespective of the length of the string.

    Make that "pointer to char", please...

  • (cs) in reply to TheJasper
    TheJasper:
    However I would like to point out that strings aren't a primitive type in C. So if you don't like the implementation, you can do something about it.

    Yes, you can do something about it if you don't mind remaining compatible with the seventeen zillion APIs that use null-terminated strings.

    (Okay, so this isn't actually hard to do, but you would have to keep it in mind.)

    I would also like to point out that C is really usefull for writing programs that need to use procesor and memory efficiently. Considering they used lose the first 2 digits of the year to save space, one can understand null-terminated strings.

    You can't tell me that they weren't ALSO concerned about speed though, and turning an O(1) operation into O(n) in order to save one extra byte (at the time, 2 bytes for the length would have worked fine) in SOME strings (if you assume a random distribution, a word size of two bytes, and that objects are aligned on word boundaries, then half the time) doesn't seem like a good trade-off to me.

  • bbb (unregistered) in reply to Piet
    Piet:
    Booleans suck, we should all use arrays of bits
    Arrays of bits suck, we should all use strings.
  • pdrap (unregistered) in reply to annony

    sizeof is a unary operator, not a function. If you write it sizeof() that makes it look like a function.

    You could just write the line like this: int nLen = sizeof mystring;

    The C grammar has a specific production which recognizes sizeof and then a typename inside of parentheses as a case of a unary expression.

    If you've already got an expression, putting it inside of parentheses turns it into a postifix expression and then a unary expressionwhich matches another production which recognizes sizeof followed by a unary expression.

    So, even when you need the parentheses on a typename [like sizeof (int) does] it's still not a function call.

  • Rhialto (unregistered) in reply to WeatherGod
    WeatherGod:
    mav:
    I'm ok with C strings.

    Looks like our WTF today is using strlen instead of sizeof. A pretty common mistake...

    It is also a pretty common mistake to think that sizeof() returns the size of an array...

    In fact, if you give it an array, it does give you the size of the array. Not if you give it a pointer, though.

  • Rhialto (unregistered) in reply to android
    android:
    If C programmers would quit trying to write 70's style K&R code and actually write code with variable names and typing that made sense there would be far less idiotic errors like this.

    char name[] = "bonehead"; char* pName = "bonehead"; /* the * goes by the char, it IS part of the pointer TYPE the little p in front of the var name indicates to the maintenance programmer in bangalore that this is a pointer */

    So the "*" goes with the TYPE you say? Ever looked up what types

    char* p, q;

    declares?

  • Kuba (unregistered) in reply to fennec
    fennec:
    C-strings are, in fact, a very good representation for string data in those situations where your program is not actually, you know, manipulating string data. They're great for printf() formatting messages, error messages, interface messages, and such, though.

    Not to be rude, but constant printf format strings should be dissected by decent-enough compilers into calls to RTL functions that do the actual work. This is standard practice on embedded platforms, where the combined monstrosity of a printf may not even fit into the code memory.

  • François (unregistered) in reply to Kai
    Kai:
    Mav, what point is a condition that is never true?

    A slightly simplified variant would be the following:

    char x = 0; if (x != 0) { ... }

    What if you're halfway through a longjmp that modifies x? In a multithreaded program? Where x could be anywhere and your head's up your rear?

    I've had people ask me before why I've had code that looks much like that but with a static thrown in before the char declaration. To them, I say, "Braaaaaaaaaaaaaaains."

    I have a OCR that could probably recognise that captcha.

  • François (unregistered) in reply to B
    Not correct. sizeof in the original code will provide you with the size of "pointer to string", which is 4 (on the platform I use most often) irrespective of the length of the string.

    It returns 8 on my system.

  • PAVLVS PHASEOVLVS (unregistered) in reply to Chris
    Chris:
    One of the major reasons for the downfall of the Roman Empire was, lacking zero, they had no way to indicate termination of their C strings.

    I'd say Roman C strings would always be 100 characters?

    -P.P.

  • (cs) in reply to dennis
    dennis:
    > Strings suck. We should all use integers.

    I agree, and in fact I always use that approach. So, instead of having the string "FOO" in my code, I have a little array of integers: 70, 79, 79.

    To save on memory, I make each integer only 8 bits wide.

    I also find it's convenient to mark the end of the list of integers by having an array element containing the integer 0.

    Just get it over with and use Assembler instead.

  • (cs) in reply to EvanED
    EvanED:
    Actually, I think the misconception about snprintf at least may come from VC++ apparently not providing snprintf, only MS's own _snprintf. (Don't ask me why they did this, I have no clue.) That does not null-terminate if the size is smaller than the string.

    snprintf was not added to the C standard until 1999. Before then, there were various implementations of it around with slightly different behaviour. I suppose it is good that MS gave theirs a different name (_snprintf) to show that it might be non-standard. Another point of difference is that some of these older versions of snprintf would return different values than the current standard specifies.

  • M4 (unregistered) in reply to chrismcb
    chrismcb:
    Eh? Back in my day we used only 7 bits!

    Oh horror, that reminds me of the Cyber with 60 bits words and 6 bit characters. Aaaaargh!

    M4

    Captcha: validus. No this kind of madness is never validus!

  • anon (unregistered) in reply to chrismcb

    Back in my day, we only used six (FIELDATA, anyone?).

    Rumors have it that before my days, only five bits were used (punched paper strips).

Leave a comment on “The End of the String as We Know It”

Log In or post as a guest

Replying to comment #:

« Return to Article