• byro (disco)

    Can someone explain how this can happen?!

  • Anonymous (disco)

    A 8KB read buffer...

    ... or is it a read bugger?

  • DaveK (disco)

    TRWTF is not recognizing 32768 and needing to use a calculator to divide it by 8192.

  • RaceProUK (disco)
    Comment held for moderation.
  • balazs (disco) in reply to byro
    Comment held for moderation.
  • Gaska (disco) in reply to balazs

    Alternatively, it might be some strncpy() or similar to a buffer of size n*8K. In case string size equals buffer size, null terminator isn't put in place.

  • Cian (disco)

    There's an elderly HP flatbed scanner which refuses to scan if you have a multiple of 4GB free on C: at the time - and its rounded off rather than the exact KB or MB free.

    I've never figured out what they could have been checking for that'd make it think that 4 (or 8, or 128) GB is no free space but 3.99/4.01 isn't.

  • fowkc (disco)

    TRWTF is that having found a possible fix they went:

    1. Fix CSS file
    2. Commit CSS file
    3. Test if fix actually worked.
  • flabdablet (disco) in reply to Cian
    Cian:
    I've never figured out what they could have been checking for that'd make it think that 4 (or 8, or 128) GB is no free space but 3.99/4.01 isn't.

    Somebody put the free space byte count in a uint32 before testing it.

  • Patrick_Schluter (disco) in reply to byro

    I had the same bug in one of my programs. It happens when you open a file using a memory map and then use C string functions on the buffer. C strings are defined as characters ended with the sentinel value 0. A memory map will always have the size in a granularity of a memory page size, which is 4K on x86 and ARM processors but can have other values on other systems (on SPARC it's 8K, the last Apple ARM uses now 16K). On file sizes that do not match a multiple of the page size, the last page will be filled with 0 bytes, so noproblem if you use C string functions. In the rare case of a file size being a multiple of the page size, the last page will fit exactly and there will be no 0 byte at the end. When scanning with a C string function (strlen, strstr, strchr etc.), the program will try to access beyond the last page, but in general there is no valid page behind and the program crashes with a "segmentation fault". To avoid the problem, either do not use C functions or open the memory with 1 byte more than the real size of the file.

    In the story it's probably a multiple of 4K that will crash Dreamweaver as it is either a Windows or a Mac program and even Power-PC uses 4K pages. EDIT: I used Unix terminology but the behavior is the same on Windows or Mac (which is Unix) because it's a fundamental aspect of the processors, not of the OS.

  • Arantor (disco)

    Fun bonus fact, PHP 5.3.10 had the same basic problem, only with multiples of 4KB. Would crash and if plugged into Apache as a module, would also potentially crash Apache.

    But, you know, TRWTF and all that.

  • Zemm (disco)

    This might explain the reason so many files on our web sites have null bytes at the end, and they've stopped multiplying now that the designers have stopped using Dreamweaver!

  • Gaska (disco) in reply to Patrick_Schluter
    Patrick_Schluter:
    To avoid the problem, either do not use C functions or open the memory with 1 byte more than the real size of the file.
    Or don't treat random data blob as null-terminated string.
  • Eldelshell (disco)

    So, did they fix this in later iterations of Dreamweaver.

    I only took a glance at Dreamweaver once, but it reminded me so much of FrontPage that I ran away. Why does Adobe has this power on me is unknown.

  • Patrick_Schluter (disco) in reply to Gaska

    That's what not using C (string) functions means. Thank you for your reformulation.

    Adding one byte to the size of the mmap transforms your "random data block" to a nul terminated string.

  • CarrieVS (disco) in reply to fowkc
    fowkc:
    TRWTF is that having found a possible fix they went:1. Fix CSS file2. Commit CSS file3. Test if fix actually worked.

    Perhaps I'm being really thick, but why is that a WTF and what else should they have done, in this situation?

  • Gaska (disco) in reply to Patrick_Schluter
    Patrick_Schluter:
    That's what not using C (string) functions means. Thank you for your reformulation.
    /hate

    There are cases where you are explicitly guaranteed that a data blob ends with null terminator. This is called a string, and you can use it everywhere where you can use a string without worrying about invalid reads. There are also cases where you are NOT explicitly guaranteed the null terminator, and in this case if you use this data blob as a string, shit happens.

    Patrick_Schluter:
    Adding one byte to the size of the mmap transforms your "random data block" to a nul terminated string.
    Yes. And before adding this one byte, you don't have string but random data block. Sounds like what Cpt. Obvious would say, but there are so many programmers who simply don't get it.
  • Gaska (disco) in reply to CarrieVS
    CarrieVS:
    Perhaps I'm being really thick, but why is that a WTF and what else should they have done, in this situation?
    Test before commit. Although I think that unlike in the article, the actual event happened exactly like this.
  • Medinoc (disco) in reply to Gaska

    That addresses one half of the problem, but leaves the other half: A random data block can't be manipulated as a C string if it contains embedded nulls. It can be manipulated as a C++ std::string though.

  • CarrieVS (disco) in reply to Gaska
    Gaska:
    Test before commit. Although I think that unlike in the article, the actual event happened exactly like this.

    I am familiar with the general principle, but the operative words here were in this situation.

    Perhaps I'm misunderstanding but how do you test before commit when the problem isn't in what you're developing but in what you're developing it with, and it will happen as long as you have a file of that specific size in the repository?

  • Gaska (disco) in reply to Medinoc
    Medinoc:
    That addresses one half of the problem, but leaves the other half: A random data block can't be manipulated as a C string if it contains embedded nulls.It can be manipulated as a C++ std::string though.
    When was the last time you needed a text with nulls inside?

    And if you do (for example, when you DON'T operate on text, but on BINARY DATA), you can always do:

    struct data
    {
        uint32_t length;
        char* data;
    }
    

    You can program in C in a perfectly type-safe manner, both literal types and conceptual types. C++ just makes it easier. I'll repeat: C++ doesn't make it possibe but easier.

    CarrieVS:
    Perhaps I'm misunderstanding but how do you test before commit when the problem isn't in what you're developing but in what you're developing it with, and it will happen as long as you have a file of that specific size in the repository?
    Testing not as in writing unit tests (because it's stupid and counter-productive to unit-test 3rd-party software), but as in checking if the fix actually fixed the problem. Ie., see if DW still crashes or not.
  • CarrieVS (disco) in reply to Gaska
    Gaska:
    checking if the fix actually fixed the problem. Ie., see if DW still crashes or not.

    Yes I understand what 'test' means. It's crashing as long as the repository has the offending file in, is it not? Am I labouring under a fundamental misinterpretation of what's going on because to my understanding, it would keep crashing until you committed it.

  • Gaska (disco) in reply to CarrieVS
    CarrieVS:
    It's crashing as long as the repository has the offending file in, is it not?
    Not exactly. It crashes as long as the *local copy* of repository has the offending file.
  • boomzilla (disco)

    I could have sworn this bug had been on TDWTF before, but now I can't find it.

  • CarrieVS (disco) in reply to Gaska

    OK, thank you.

    All I know about Dreamweaver is the name and I haven't encountered source control since I'd not entirely got to grips with it at the end of the very hurried crash course the contracting company that owns me put me through before hiring me out to where I'm working now the best part of a year ago (and prior to that I had minimal IT background. I fully admit to being an ignorant n00b).

    I was aware of the concept of a local repository but was led astray by the article saying he updated it after committing and implying that Dreamweaver having the 'fixed' CSS file followed that (it just says 'updated' now, but it definitely did say 'his local repository').

  • Gaska (disco) in reply to CarrieVS

    Just for the record: Dreamweaver is a website creation tool, not version control system. Also, if you are a programmer and don't use any kind of VCS, you should learn it ASAP. For your own good.

  • CarrieVS (disco) in reply to Gaska

    I did know Dreamweaver and whatever source/version control they're using are two separate things, but thanks anyway. I won't try and explain my job and the various WTFs of it but I didn't come up with the setup I'm expected to follow and it's not in my power to change it.

  • anonymous234 (disco) in reply to Gaska
    Gaska:
    When was the last time you needed a text with nulls inside?
    Regardless of what you want to support, you ALWAYS need to make sure your program won't break (in an insecure way) if someone maliciously injects a \000 in its input. So that's yet another thing you have to keep in mind when using C functions.

    Have you ever wondered why bugs even exist in the first place? Why we can never seem to make software that conforms to some clear specifications? It's because the human brain is ridiculously bad at keeping track of things. Thus the constant need for abstraction and isolation in programming, because we just can't reliably imagine more than 2 systems interacting.

    And it's why anyone who supports using C for desktop software is an idiot. It's a low level language that forces YOU to remember the 9263487 things you need to remember to make complex data structures work (where does this pointer point to, has it been initialized, could it have been deinitialized, can there be any loops here...). Yes, it can be faster than with another language, so what? Computers are fucking fast, most users never get above 5% CPU usage.

    [/offtopic_rant]

    Gaska:
    And if you do (for example, when you DON'T operate on text, but on BINARY DATA), you can always do:

    struct data { uint32_t length; char* data; }

    Or in other words, a length-prefixed string. Which are so obviously superior to null-terminated strings it's not even funny. But I guess the designers of C couldn't afford the extra 4/8 bytes per string.
  • Jaloopa (disco) in reply to CarrieVS
    CarrieVS:
    the very hurried crash course the contracting company that owns me put me through before hiring me out to where I'm working now

    Sounds like the place that got me my start in programming. Rather than paying the silly, imaginary price for their training course, you sold your soul to them for two years while they contracted you out and pocketed the majority of the profits.

  • hungrier (disco) in reply to CarrieVS
    CarrieVS:
    I won't try and explain [...] various WTFs

    If anything, this is the place to do it.

  • Gaska (disco) in reply to anonymous234
    anonymous234:
    Regardless of what you want to support, you ALWAYS need to make sure your program won't break (in an insecure way) if someone maliciously injects a \000 in its input. So that's yet another thing you have to keep in mind when using C functions.
    Usually when this happens, you end up with truncated string. That's far from crash.
    anonymous234:
    Have you ever wondered why bugs even exist in the first place? Why we can never seem to make software that conforms to some clear specifications? It's because the human brain is ridiculously bad at keeping track of things. Thus the constant need for abstraction and isolation in programming, because we just can't reliably imagine more than 2 systems interacting.
    More often it's because of programmers who CBA to look up the documentation Or to make the documentation. Or wrote wrong documentation. Or have broken API contracts due to e.g. not null-terminating string when it says the return value is string. Or simply doing idiotic things that only by pure chance haven't crashed before.
    anonymous234:
    And it's why anyone who supports using C for desktop software is an idiot.
    Or masochist. Inb4: I'm neither since I write in either C++ or C#, depending on what I'm doing.
    anonymous234:
    It's a low level language that forces YOU to remember the 9263487 things you need to remember to make complex data structures work (where does this pointer point to, has it been initialized, could it have been deinitialized, can there be any loops here...).
    But it's possible. My entire point is that it's possible. Inconvenient, but possible. You just have to RTFM and keep your API contracts.
    anonymous234:
    Yes, it can be faster than with another language, so what? Computers are fucking fast, most users never get above 5% CPU usage.
    It's because of people like you why MS Word needs 2GB RAM to run.
    anonymous234:
    Or in other words, a length-prefixed string.
    IT'S NOT A STRING GODDAMMIT!!!
  • Eldelshell (disco) in reply to CarrieVS
    CarrieVS:
    All I know about Dreamweaver is the name and I haven't encountered source control since I'd not entirely got to grips with it at the end of the very hurried crash course the contracting company that owns me put me through before hiring me out to where I'm working now the best part of a year ago
    Gaska:
    Also, if you are a programmer and don't use any kind of VCS, you should learn it ASAP. For your own good.

    And commas and periods. Good God, that was like reading my mothers WhatsApps.

  • Gaska (disco) in reply to Eldelshell
    Eldelshell:
    And commas and periods. Good God, that was like reading my mothers WhatsApps.
    Well, it was perfectly correct sentence. Lack of commas comes from cleverly connecting dependent and independent clauses in the way that's absolutely unambiguous, mitigating the need for commas.
  • Arantor (disco) in reply to anonymous234

    My stepdad is a fierce advocate of modern software being developed in assembler because 'modern machines are so fast it would literally fly' because modern software is so terribly slow, not written properly and not debugged properly.

    (And writing this in assembler would help, obviously. And of course said dude uses QBASIC thus invalidating every argument he has, but can't see it.)

  • Helix (disco)

    needs more comments on the article author bio.

  • RaceProUK (disco) in reply to Gaska
    Gaska:
    Well, it was perfectly correct sentence. Lack of commas comes from cleverly connecting dependent and independent clauses in the way that's absolutely unambiguous, mitigating the need for commas.
    It may have been grammatically correct, but fuck me, it's hard to read without the commas.
  • CarrieVS (disco) in reply to Jaloopa

    That's

    Jaloopa:
    > CarrieVS: > the very hurried crash course the contracting company that owns me put me through before hiring me out to where I'm working now

    Sounds like the place that got me my start in programming. Rather than paying the silly, imaginary price for their training course, you sold your soul to them for two years while they contracted you out and pocketed the majority of the profits.

    Yup. I am nearly halfway through my indentured servitude. I wonder if it's the same one. Is yours a three letter acronym that hardly anyone knows what it stands for (it's three names but I can't remember what they are) and they insist on describing their half-trained graduates as consultants? UK-based but have operations in the US and Germany too?

    hungrier:
    > CarrieVS: > I won't try and explain [...] various WTFs

    If anything, this is the place to do it.

    I'm saving the best one for when we actually fix it instead of hoping the world will end before the timebomb exlodes. Or for when the timebomb explodes. One of the two.

  • anonymous234 (disco) in reply to Gaska
    Gaska:
    programmers who CBA to look up the documentation [...] Or have broken API contracts due to e.g. not null-terminating string when it says the return value is string

    Right, but that's also part of it. Every additional contract condition is extra chance of a bug appearing because someone forgot it. And people aren't going to fetch AND carefully read the documentation every single time they use something (although IDEs with "pop-up" function information can help a lot).

    In particular, I'm criticizing the people who expect developers to read and memorize every page of the official documentation before writing a single line of code. "It clearly says in a footnote of page 358 of the C library specification, that you have to set DO_UNICODE_EXCEPTION=25 before opening a file with a name that starts with "e" on a Wednesday. It's your fault you don't follow the contract!".

    Gaska:
    IT'S NOT A STRING GODDAMMIT!!!
    It's a string of bytes. Possibly containing invalid Unicode and ASCII codes, but that's just a minor problem.
  • FrostCat (disco) in reply to Medinoc
    Medinoc:
    A random data block can't be manipulated as a C string if it contains embedded nulls.

    Then you treat it as a BSTR!

  • SirTwist (disco) in reply to CarrieVS
    CarrieVS:
    I was aware of the concept of a local repository but was led astray by the article saying he updated it after committing and implying that Dreamweaver having the 'fixed' CSS file followed that (it just says 'updated' now, but it definitely did say 'his local repository').
    Possibly bad anonymization, possibly Ellis and/or "Tom" don't actually know SVN. The local copy is called a "working copy," not a "local repository." "Local repository" only makes sense if you're talking about a distributed VCS.
  • Gaska (disco) in reply to anonymous234
    anonymous234:
    And people aren't going to fetch AND carefully read the documentation every single time they use something
    They should.
    anonymous234:
    In particular, I'm criticizing the people who expect developers to read and memorize every page of the official documentation before writing a single line of code.
    That's what we have tables of content for.
    anonymous234:
    "It clearly says in a footnote of page 358 of the C library specification, that you have to set DO_UNICODE_EXCEPTION=25 before opening a file with a name that starts with "e" on a Wednesday. It's your fault you don't follow the contract!".
    Global state is another evil. And if it's not global state, this switch should be documented in the particular function reference.
    anonymous234:
    It's a string of bytes. Possibly containing invalid Unicode and ASCII codes, but that's just a minor problem.
    String almost always means text string. When it means byte string that doesn't hold a text value, it's usually named either array, data block, or something like that. Either way, my example wasn't text but any data. You should never treat that char* as text string.
  • antiquarian (disco) in reply to Patrick_Schluter
    Patrick_Schluter:
    To avoid the problem, either do not use C functions or open the memory with 1 byte more than the real size of the file. **use a programming language with sane string handling.**

    FTFY :trollface:

    anonymous234:
    And it's why anyone who supports using C for desktop software is an idiot. It's a low level language that forces YOU to remember the 9263487 things you need to remember to make complex data structures work (where does this pointer point to, has it been initialized, could it have been deinitialized, can there be any loops here...). Yes, it can be faster than with another language, so what? Computers are fucking fast, most users never get above 5% CPU usage.

    What's worse, safer compiled languages have been available for decades.

  • Patrick_Schluter (disco) in reply to antiquarian
    antiquarian:
    To avoid the problem, [s]either do not use C functions or open the memory with 1 byte more than the real size of the file.[/s] use a programming language with sane string handling.

    One can not always choose and when one does not choose, then the one has to know how to handle correctly what he had to use, even if he had preferred using something more fancy.

  • xaade (disco) in reply to Gaska

    He did test before commit. He found the change, open the file with the change, then committed the change.

    Oh, you wanted him to test a commit that its only difference is one comment?

    I don't test commits where I change the comments only.

    Of course, I could have mistyped something, and introduced a bug. Hmm... I'm trying to remember when that has ever happened.

  • antiquarian (disco) in reply to Patrick_Schluter
    Patrick_Schluter:
    One can not always choose and when one does not choose, then the one has to know how to handle correctly what he had to use, even if he had preferred using something more fancy.

    Sorry, forgot the :trollface:. I'll edit.

  • chubertdev (disco) in reply to boomzilla
    boomzilla:
    I could have sworn this bug had been on TDWTF before, but now I can't find it.

    It's been mentioned relentlessly in the comments on certain articles.

    Is "Nightmareweaver" too much?

  • boomzilla (disco) in reply to chubertdev
    chubertdev:
    It's been mentioned relentlessly in the comments on certain articles.

    Probably not there (where I remember it from, I rarely looked at those), though I thought it might have been in the forum somewhere, too.

  • chubertdev (disco) in reply to boomzilla

    This one, or the old one? I never really read the old forum.

  • boomzilla (disco) in reply to chubertdev

    Old one. I think I remember this from waaay back. Again, I wouldn't be surprised that it showed up in old article comments, just that I probably wouldn't have seen it there unless someone brought it back to the forum.

  • EvanED (disco) in reply to Patrick_Schluter
    Patrick_Schluter:
    Adding one byte to the size of the mmap transforms your "random data block" to a nul terminated string.

    Until you read a file containing NUL.

    Gaska:
    When was the last time you needed a text with nulls inside?

    Doesn't matter. Your user can provide one, so you better behave correctly (even if it's printing an error) in those cases. Operating as if the stuff after the NUL isn't there is a WTF (and in many cases a potential security vulnerability and in others provides a very easy way to delete a bunch of the user's data)... and that means that calling string functions on it is a big smell, at least to me. (Aside from strlen to compare to the actual buffer size so that you can throw an error if it's not there.)

Leave a comment on “The 8K Bug”

Log In or post as a guest

Replying to comment #:

« Return to Article