• (disco)

    But even on null terminated strings, this code is dangerous. Since arrays in C, like any sane language, are zero indexed, this code may attempt to access memory beyond the end of the array, overwriting whatever’s there with a null terminator.

    From what I understand, this is incorrect.

    Let's say we have a char coal[] = "Hello";, the array will be identical to char coal[] = { 'H', 'e', 'l', 'l', 'o', '\0' }; which has a size of 6 chars.

    Now strlen(coal) will return 5, because the string is obviously 5 characters long. Since arrays in C are zero-indexed, coal[5] refers to the 6th element of the array, which is exactly the null-terminator. In this case, coal[strlen(coal)] = '\0'; is actually a no-op. Hell, the compiler may even optimize it out.

    However, if it is char *fail = "Hello";, then fail[strlen(fail)] = '\0'; can possibly result in an attempt to write to read-only memory location, depending on the compiler and execution environment.

    Anyway, the code is a WTF, but so is this article.

    Screen capture in case the article gets modified.

  • (disco) in reply to Anonymous

    The code is a pure WTF for the reasons you mention, but the article is not. Suppose someone builds a string in local variable but doesn't add a NUL for whatever reason - the code above could indeed write to memory that is unrelated to the allocated string space.

  • (disco)

    I looked at the title and went "who the hell pulls out strings out of their nose, that's disgusting".

    lcrawford:
    Suppose someone builds a string in local variable but doesn't add a NUL for whatever reason

    But even on null terminated strings, this code is dangerous.

    If it's null-terminated, it should work (and be a noop). If it's not, it crashes anyway.

  • (disco) in reply to Maciejasjmj

    I was going to say exactly that. @Remy didn’t take into account the pendantic subset of TDWTF readers :stuck_out_tongue:

  • (disco)

    It's not as WTF as some might suspect.

    It's actually a pretty neat "crash early catcher". Especially when used with some type of memory checking tool.

    However, if it is the case, it definitely should have been well documented.

  • (disco) in reply to lcrawford
    lcrawford:
    Suppose someone builds a string in local variable but doesn't add a NUL for whatever reason

    Yeah, and something named buf is there for doing stuff with. It's not going to be a static-ish thing like char coal[] = "Hello";

  • (disco) in reply to HappyCerberus

    It would be a neat fail-fast implementation if it was a case of cause the application to crash if buf is not NUL terminated. However it's causing undefined behaviour, so you have no guarantee that your application will crash at this point. Or indeed the ability to guarantee anything about what your application will do.

  • (disco) in reply to Maciejasjmj

    If it's null-terminated, it should work (and be a noop). If it's not, it crashes anyway.

    For the CPU cycle obsessed, if they're not using the str* functions, but instead using a strn* function, they may omit adding a nul terminator to save cycle time or maximize buffer use. But then the buffer space is technically no longer a classic C nul-terminated string.

    (That happens more often with embedded software)

  • (disco)

    when Java programmers

    That was not necessary and totally untrue, because in every sane language with any support for strings, you don't have to think about this sort of problems. So TRWTF is writing string handling ops in C.

  • (disco) in reply to Erik_Nilsen_Haga

    That is what stack/heap protectors are for. They crash the program if you go out of bounds.

    But yes, it isn't guaranteed.

  • (disco) in reply to Eldelshell

    C++ added a string class. So there is that. I don't know what problems arise with that but I would agree that strings (and normal arrays) in C are an interesting topic.

    Filed Under: \0

  • (disco) in reply to lcrawford
    lcrawford:
    Suppose someone builds a string in local variable but **doesn't add a NUL** for whatever reason

    But even on null terminated strings, this code is dangerous. Since arrays in C, like any sane language, are zero indexed, this code may attempt to access memory beyond the end of the array, overwriting whatever’s there with a null terminator.

    Not sure if you are really replying to my reply...

  • (disco)

    @Remy.... you were reading that thread weren't you... or is this just a coincidence? :-D

    http://what.thedailywtf.com/t/a-problem-with-big-numbers/5182/15 http://what.thedailywtf.com/t/a-problem-with-big-numbers/5182/17 http://what.thedailywtf.com/t/a-problem-with-big-numbers/5182/18

  • (disco)
    Since arrays in C, like any **sane** language...

    You're saying C is sane?

  • (disco) in reply to Dogsworth
    Dogsworth:
    You're saying C is sane?

    more so than the way VisualBasic has things sometimes 0 indexed and sometimes 1 indexed leading me to invariably rewrite the thing in C# whenever i encounter legacy VB (and even VB.net)

  • (disco) in reply to lcrawford
    lcrawford:
    For the CPU cycle obsessed, if they're not using the str* functions, but instead using a strn* function, they may omit adding a nul terminator to save cycle time or maximize buffer use. But then the buffer space is technically no longer a classic C nul-terminated string.

    It's still strlen, which AFAIK reads until it encounters a NUL. If it doesn't for long enough time and wanders off the program's address space, well...

  • (disco)

    This is almost certainly a NOP, even if the string is not NULL terminated. When strlen() reaches the end of the buffer, it will keep going until it finds a NULL or it reaches unreadable memory. If it finds a NULL in the stack or heap or any writable memory, it will replace the NULL with NULL. It's only if it reaches unreadable memory before finding a NULL or finds the NULL in unwritable memory that it will crash.

  • (disco) in reply to Anonymous
    Anonymous:
    **However**, if it is `char *fail = "Hello";`, then `fail[strlen(fail)] = '\0';` can possibly result in an attempt to write to read-only memory location, depending on the compiler and execution environment.

    If someone calls buf a string literal, you have a bigger WTF on your hands. That said, it often happens by accident in WTF code written by beginners:

    char* buf = malloc(20);
    buf = "fail";
    

    To think that Visual C++'s obsolete C compiler still doesn't have an equivalent to gcc's -Wwrite-strings...

  • (disco) in reply to cyneric
    cyneric:
    If it finds a NULL in the stack or heap or any writable memory, it will replace the NULL with NULL.

    If the program is multithreaded and strlen stops on a null byte in memory used by another thread, you may still end up with memory corruption if the other thread manages to replace the null byte with another value before you write 0 at that address.

  • (disco) in reply to Maciejasjmj
    Maciejasjmj:
    It's still strlen, which AFAIK reads until it encounters a NUL. If it doesn't for long enough time and wanders off the program's address space, well...
    strnlen() has two params - a pointer to string like strlen(), and a maximum number of chars to read. So, using strnlen() with max length of (buffer_size-1) would actually make sense. Except buffer[buffer_size]='\0' would work just as good, and be faster, so it'd be still WTF to write such code.
  • (disco) in reply to VinDuv
    VinDuv:
    I was going to say exactly that. @Remy didn’t take into account the pendantic subset of TDWTF readers :stuck_out_tongue:
    I'll bite by pointing out that you spelled "pedantic" incorrectly.

    I carefully checked the spelling of my first sentence in order not to fall foul of Muphry's Law.

  • (disco) in reply to Steve_The_Cynic
    Steve_The_Cynic:
    Muphry's

    hehehe

  • (disco)

    There's a great deal of pedantry that can be handed out up above concerning the difference between NUL, NULL, null, and '\000'. Only the last of those is suitable for actually terminating a C string.

    • NUL is a synonym for '\000', but requires you to explicitly define it.
    • NULL and null are pointers, and normally can't be used when terminating strings of characters.
    • '\000' can also be written '\0' or 0.
  • (disco) in reply to Steve_The_Cynic
    Steve_The_Cynic:
    I'll bite by pointing out that you spelled "pedantic" incorrectly.

    More like, you've incorrectly spelled "pedantic" correctly.

  • (disco) in reply to aliceif
    aliceif:
    hehehe

    http://en.wikipedia.org/wiki/Muphry%27s_law

  • (disco)

    There isn't nearly enough making fun of Remy here for making the article's image--

    1. Hotlinked from wikimedia.org
    2. A freaking 2560 x 1700, 773K JPG.
  • (disco) in reply to Eldelshell
    Eldelshell:
    > when Java programmers

    That was not necessary and totally untrue, because in every sane language with any support for strings, you don't have to think about this sort of problems. So TRWTF is writing string handling ops in C.

    Even so, a C programmer would be prepared to deal with C's idiosyncrasies. A Java programmer would not be. It's always a WTF to try and code in one language as though it were another, regardless of which one is more reasonable.
  • (disco)

    I gave it a like just for the shoot yourself in the foot link provided.

  • (disco) in reply to accalia
    accalia:
    more so than the way VisualBasic has things sometimes 0 indexed and sometimes 1 indexed leading me to invariably rewrite the thing in C# whenever i encounter legacy VB (and even VB.net)

    Comparing things to VB isn't saying much. :laughing:

    That being said, this looks like band-aid code.

  • (disco) in reply to Steve_The_Cynic
    Steve_The_Cynic:
    I'll bite by pointing out that you spelled "pedantic" incorrectly.

    Pedantry and forum memes don't mix...

  • (disco) in reply to accalia

    VisualBasic has things sometimes 0 indexed and sometimes 1 indexed leading me to invariably rewrite the thing in C# whenever i encounter legacy VB (and even VB.net)

    How does behavior present in VB lead you to rewrite VB.NET into C#, when VB.NET does not have said behavior?

  • (disco) in reply to lcrawford
    lcrawford:
    For the CPU cycle obsessed, if they're not using the str* functions, but instead using a strn* function, they may omit adding a nul terminator to save cycle time or maximize buffer use.

    But if you're going to be CPU-cycle-obsessed, and doing any kind of complex string manipulation, then you'll track the length separately (Pascal-style strings), storing "Hello" as {5, "H", "e", "l", "l", "o", "\0"} instead of {"H", "e", "l", "l", "o", "\0"}, which allows you to skip strlen and the O(n) walk down the string that it has to do to give you a count.

  • (disco) in reply to nmclean
    nmclean:
    How does behavior present in VB lead you to rewrite VB.NET into C#, when VB.NET does not have said behavior?
    1. because AFAIK VB.net still has 1-based indexes for things like arrays, same as VB
    2. because i can't always tell the two apart at a glance because of identical syntaxes
    3. because VS ceamlessly mixes C# code and VB code without even being asked to so conversion is simple
    4. because i have a script that does 99% of the conversion for me. i just have to validate its output and tweak a few things here and there if it gets confused
  • (disco) in reply to accalia
    accalia:
    1) because AFAIK VB.net still has 1-based indexes for things like arrays, same as VB 2) because i can't always tell the two apart at a glance because of identical syntaxes 3) because VS ceamlessly mixes C# code and VB code without even being asked to so conversion is simple 4) because i have a script that does 99% of the conversion for me. i just have to validate it's output and tweak a few things here and there if it gets confused

    It was a rhetorical question. You said the reason you rewrite VB.NET is because of behavior in VB, but obviously that is not the reason since it's not in VB.NET. The facts that the syntax is similar and that you have a script don't change the fact that arrays in VB.NET are consistently 0-based.

  • (disco) in reply to accalia
    accalia:
    4) because i have a script that does 99% of the conversion for me. i just have to validate **it's** output and tweak a few things here and there if it gets confused

    *twitch*

  • (disco) in reply to nmclean
    nmclean:
    don't change the fact that arrays in VB.NET are consistently 0-based.

    huh. TIL that VB.net isn't as cromulent as i thought it was.

    I still don't want it in any code base i maintain though.

  • (disco) in reply to chubertdev

    hmm? i see no stray' apostrop'he he're....

    do yo'u?

  • (disco) in reply to accalia
    accalia:
    hmm? i see no stray' apostrop'he he're....

    do yo'u?

    Indeed. [image]

  • (disco) in reply to accalia
    accalia:
    huh. TIL that VB.net isn't as cromulent as i thought it was.

    I still don't want it in any code base i maintain though.

    sigh

    It's much, much, much, much closer to C# .NET than VB6.

  • (disco) in reply to chubertdev

    curses! foiled again!

    and i would have gotten away with it too if it weren't for that darned edit pencil!

  • (disco) in reply to chubertdev
    chubertdev:
    It's much, much, much, much closer to C# .NET than VB6.

    hmm... having looked up some articles now that @nmclean pointed that fact out to me i agree.

    it's still a context switch to move from one syntax to the other in the middle of trying to track down a bug or make a change to a system. and for that reason, if for no other, it should be removed.

    Pick ONE language and stick with it. (you are allowed a second if it's something like server side/client side, but no mixing and matching! also you'll probably want someone else to do the client side as i have limited patience for the oddities of IE)

  • (disco) in reply to accalia
    accalia:
    hmm... having looked up some articles now that @nmclean pointed that fact out to me i agree.

    it's still a context switch to move from one syntax to the other in the middle of trying to track down a bug or make a change to a system. and for that reason, if for no other, it should be removed.

    Pick ONE language and stick with it. (you are allowed a second if it's something like server side/client side, but no mixing and matching! also you'll probably want someone else to do the client side as i have limited patience for the oddities of IE)

    I don't really struggle going back and forth. We even have a .NET app that has its four main projects in VB, with a dependency on two DLLs that we bought that source code for that are in C#.

    That being said, I normally point people to this article: http://visualstudiomagazine.com/Articles/2011/05/01/pfcov_Csharp-and-VB.aspx?Page=1

  • (disco) in reply to chubertdev
    chubertdev:
    That being said, I normally point people to this article:

    i didn't mentionwhich language you had to pick did I? :-P

    if VB.NET works for you then fine, but if 90% of the code's already in C# guess which one i'm picking?

  • (disco) in reply to HappyCerberus

    No sale: It won't always crash, even if the string is not null terminated. This is because it will always replace a \0 with a \0. That will fail if the target location is read-only, but otherwise it succeeds even if a memory location outside the bounds of the target string is accessed.

    As a failure detector...it is a failure.

  • (disco) in reply to accalia

    Choose what you want, but it's no excuse for ignorance about what VB is.

  • (disco) in reply to blakeyrat
    blakeyrat:
    Choose what you want, but it's no excuse for ignorance about what VB is.

    TI-BASIC => Devil's jockstrap VB => evil icky bad VB.NET => now that I've bothered to read up on it a bit more: meh, i'd rather C# but whatevs

    my main point is you shouldn't mix languages within an application if you have any other choice. Pick a language that will work (and if it's VB.NET whatever) and stick with it.

  • (disco) in reply to Steve_The_Cynic
    Steve_The_Cynic:
    I'll bite by pointing out that you spelled "pedantic" incorrectly.

    Brillant!

    Filed under: not sure if trolling or really didn't read the Memes wiki thread

  • (disco) in reply to Gaska
    Gaska:
    strnlen() has two params - a pointer to string like strlen(), and a maximum number of chars to read. So, using strnlen() with max length of (buffer_size-1) would actually make sense. Except buffer[buffer_size]='\0' would work just as good, and be faster, so it'd be still WTF to write such code.

    It might be kinda useful to actually determine whether the data actually is null-terminated - if you call it with buffer_size and it returns buffer_size, then it's not.

    Still, I was pretty sure this one doesn't exist, because it just barely makes sense.

  • (disco) in reply to antiquarian
    antiquarian:

    @Steve_The_Cynic mainly posting in the Article category is a barrier to knowing about the Memes wiki thread

  • (disco) in reply to Anonymous

    Yep. That's right. If buf already points to a null terminated string, then the statement is a no-op regardless of encoding.

Leave a comment on “Nasal String Length”

Log In or post as a guest

Replying to comment #:

« Return to Article