• Andreas (unregistered) in reply to Anon
    Anon:
    The WTF is that when the local variables are uninitialized, they can be placed in the BSS, which doesn't require "real" allocation, at least in the ELF format used by most Unix-like platforms.

    This is nonsense as local variables are never placed into the BSS segment but dynamically allocated on the stack.

  • Andreas (unregistered)

    The WTF is that Winston was apparently never accepted as a team member and properly introduced into the current practices. Yes, touching all files for "code cleanup" without coordinating this with the other team members was certainly not helpful. But Winston made no secret out of it:

    I seek out and destroy poorly written code.
    Hence it would have been wise to tell him to which extent such code cleanups are welcomed and to be done. In addition, his initial suggestions were not welcomed and brushed off with an unfriendly comment:
    For now though, we have some bigger fish to fry

    Simply put: If you have no time to introduce a new team member into your team, do not employ one. Or tell at least when you have time for him and what he can do or shall not do until a proper introduction takes place. Poor communication is an invitation of disaster or at least a waste of resources.

  • Haynes (unregistered)

    What's the picture of the bashed-up Saxo got to do with anything?

  • (cs) in reply to Haynes
    Haynes:
    What's the picture of the bashed-up Saxo got to do with anything?

    Whoosh...

    Clearly, within the context of a mangled, pointless story, a pointless photo of a mangled car and a passing obscure reference to C makes perfect sense.

    TRWTF is that "Winston" probably submitted this story, not "Chris" or whatever his name was. "I joined a company with no process and horrible code. I fixed some of the horrible code. They had a hissy fit because I didn't follow their unwritten, unstated, bass-ackwards 'process'. WTF?"

  • Alex (unregistered) in reply to Herby

    I am not as good with C as many ppl here appear to be, but I wrote a simple test application that does char a[] = "Short String". It appears that the string is in fact copied to the stack. Nothing like that happens with char *a. The code is as following:

    #include <stdio.h>
    
    void foo(void)
    {                                                                         
      char a[] = "Short string";
      printf(a);      
    }
    

    I compiled this with gcc -S -O0 test.c -o test.asm (GNU C compiler, to assembler only, no optimizations), which resulted in:

    .LC0:
    	.string	"Short string"
    	.text
    .globl foo
    	.type	foo, @function
    foo:
    	pushl	%ebp
    	movl	%esp, %ebp
    	subl	$40, %esp
    	movl	%gs:20, %eax
    	movl	%eax, -4(%ebp)
    	xorl	%eax, %eax
    	movl	.LC0, %eax
    	movl	%eax, -17(%ebp)
    	movl	.LC0+4, %eax
    	movl	%eax, -13(%ebp)
    	movl	.LC0+8, %eax
    	movl	%eax, -9(%ebp)
    	movzbl	.LC0+12, %eax
    	movb	%al, -5(%ebp)
    	leal	-17(%ebp), %eax
    	movl	%eax, (%esp)
    	call	printf
    	movl	-4(%ebp), %eax
    	xorl	%gs:20, %eax
    	je	.L5
    	call	__stack_chk_fail
    

    Notice that space for the string is allocated on the stack (subl $40, %esp), and the value of the string is promptly copied from the heap, one "long" at a time (movl .LC0, %eax; movl %eax, -17(%ebp), etc).

    Comments anyone?

  • Anonymous coward (unregistered)

    Um, so you guy are honestly saying you don't see the WTF in changing every single file in a large application, even if it's trivial, that's still a lot of changes to go trough. SCM history also becomes quite useless. The rule is, if you're not touching something, and it works, don't touch it.

  • (cs)

    Today's lesson in arguing on the Internet:

    Once you realize that you're obviously wrong and everyone else knows it, it's a good idea to stick to your position anyway and try to change the definition of your own statement so that it sounds more correct. You will always look more intelligent and mature this way than if you had simply admitted you were wrong.

  • AndyL (unregistered) in reply to Murdog
    Murdog:
    The REAL WTF is geeks who think they're funny!
    It sure is!
  • illtiz (unregistered) in reply to Anonymous
    Anonymous:
    IIUC, that only applies to globals and statics. The variables here are plain old local variables. If this function was called recursively, there might not be enough space in the .bss to contain all the copies of the string - but there should be on the stack.

    Erm...what? Are you suggesting that a function that is caused recursively will enlarge my binary with multiple copies of the code constants? I will just assume you mixed that sentence up, stating the reverse of what you meant to. Either way, I don't see the point.

  • M. (unregistered) in reply to Anonymous coward
    Anonymous coward:
    Um, so you guy are honestly saying you don't see the WTF in changing every single file in a large application, even if it's trivial, that's still a lot of changes to go trough. SCM history also becomes quite useless. The rule is, if you're not touching something, and it works, don't touch it.

    I must correct this a little bit...

    WTF: doing repository wide changes just before release on the main repo(ie not doing it in a branch).

    WTF: not doing (even repository wide) maintenance at all and letting your huge code decay and rot as changes stack-up. Typical symptoms: strange workarounds to limitations of working, don't touch it subsystems. But hey - if you need workarounds, it's not working anymore! Over time workarounds tend to pile on each other leaving you with a total mess. Then you leave company then and let some poor freshmen to deal with the sh1t.

    Disclaimer: of course it depends on the nature and scale of the project. In a lot of cases hacking type maintenance is just good enough, BUT in this particular story they deal with project where any major technological/design debt will hurt big time later on.

  • M. (unregistered) in reply to Alex
    Alex:
    I am not as good with C as many ppl here appear to be, but I wrote a simple test application that does char a[] = "Short String". It appears that the string is in fact copied to the stack. Nothing like that happens with char *a. The code is as following:
     ... removed...
    

    Notice that space for the string is allocated on the stack (subl $40, %esp), and the value of the string is promptly copied from the heap, one "long" at a time (movl .LC0, %eax; movl %eax, -17(%ebp), etc).

    Comments anyone?

    Yes, congrats to the discovery(but... copied from heap?), but seriously - read the discussion first, there are like two pages of char[] vs char* already :/

  • Buddy (unregistered) in reply to Dascandy
    Dascandy:
    What's the WTF?

    The cardinal rule among S/W developers: If it ain't broke, don't fix it. Lots wasn't broke, and he fucked with it. This is most definitely a WTF.

  • Alex (unregistered) in reply to M.
    M.:
    Alex:
    I am not as good with C as many ppl here appear to be, but I wrote a simple test application that does char a[] = "Short String". It appears that the string is in fact copied to the stack. Nothing like that happens with char *a. The code is as following:
     ... removed...
    

    Notice that space for the string is allocated on the stack (subl $40, %esp), and the value of the string is promptly copied from the heap, one "long" at a time (movl .LC0, %eax; movl %eax, -17(%ebp), etc).

    Comments anyone?

    Yes, congrats to the discovery(but... copied from heap?), but seriously - read the discussion first, there are like two pages of char[] vs char* already :/

    I've lost track of the discussion above, maybe we can make a summary? But first, what is wrong with "copied from the heap"? I thought that assembler code

    .LC0:
            .string "Short string"
            .text
    

    results in .LC0 being a pointer to char[] "Short string", located on the heap, and instructions

            movl    .LC0, %eax
            movl    %eax, -24(%ebp)
            movl    .LC0+4, %eax
            movl    %eax, -20(%ebp)
            movl    .LC0+8, %eax
            movl    %eax, -16(%ebp)
            movzbl  .LC0+12, %eax
    

    moved the contents of that string into the stack. Or is something wrong with that? Where is .LC0 located?

  • Wizard Stan (unregistered)

    char array is effectively a char pointer "under the hood" in the same way that multiplication is effectively addition in a loop "under the hood"

  • Zombywuf (unregistered) in reply to Ash
    Ash:
    Zombywuf:
    <snip>

    I have never seen anyone strcpy into on stack arrays in that manner. The correct way looks more like char string[] = "data data data"; <snip>

    I see it all the time. Have you never needed to change arrays on the stack other than at initialization??

    That would not be the same manner, i.e. doing it for initialisation is just plain weird.

  • nobis (unregistered) in reply to M.
    M.:
    For a) int main (int argc, char **argv) { int x; char data_string[100000]; ... x = 2; strcpy(data_string,"data data data"); ... } b) int main (int argc, char **argv) { int x = 2; char data_string[100000] = "data data data"; ... } Now well we have different situation! Let's ignore it would be entirely stupid to allocate such a huge array on stack, but let's inspect what really happens... Because compiler is obliged to zero initialize rest of array in case b) it calls memset OR actually uses prepared content for this variable. This can be usually triggered when compiler is instructed to optimize for speed. Ups - 100kilos larger (and compared to "a" slower) code. So in this case a) is clearly superior.

    Actually, neither is. Who the hell puts 100k of data on the stack? Can you say stack underflow?

  • M. (unregistered) in reply to nobis
    nobis:
    M.:

    <...>

    Now well we have different situation! Let's ignore it would be entirely stupid to allocate such a huge array on stack, but ...

    <...>

    Actually, neither is. Who the hell puts 100k of data on the stack? Can you say stack underflow?

    Indeed, it's stated in there :o) It's exaggerated example of interesting side effect of stack-based array initialization. It can hurt in similar way in terms of resulting program-size if those arrays are small, but plentiful.

  • Gnurgh (unregistered) in reply to Stig
    Stig:
    EvanED:
    Stig:
    No idea what you mean. "Semantically" they are different because one 'means' a pointer to a character, t'other 'means' and array of characters....however, an array essentially 'decays' (not my choice of word) into a char*, it just happens to have a contiguous bunch of chars next to it in memory.
    That's not what "semantics" means to me in the world of computer languages, where it means "how the program behaves".
    Semantics is all about meaning, not behavior...an understanding of what it is you are expressing with your code.
    And char arrays behave differently than char pointers.
    Technically, neither 'behave' like anything. They're simply memory structures. If you look at char a[10] you will see that a is a pointer....to a char....followed by 9 other chars. This is the "underlying identical nature' I was referring to.

    So, the following:

    file1.c: int foo[100];

    file2.c: extern int *foo;

    int bar() { *foo = 10; }

    Should be OK then, since they are the same? But, no, it is not OK, because they are not the same.

    Just because an array decays to pointer to the first element when (in most cases) used in expressions does not mean they are the same.

    int foo[10]; int *bar;

    foo is not a modifiable lvalue, bar is. foo has a static size 10*sizeof(int) that can not be changed, bar can be pointed to any piece of memory basically and only the size of the pointer remains constant.

    And so on, and so on...

  • SomeGuy (unregistered) in reply to Chris Becke
    Chris Becke:
    In code that has tricky memory leaks, uninitialized variables are a common cause. It was noted that the problem was hard to track down, so a shotgun approach to tracking the problem down is not that inappropriate.

    He probably thought he was pr-activly dealing with an easy to fix class of code error and, unless there were side effects / order considerations the code is better as a result.

    The WTF here is, why is this a WTF? They had version control? Branch. Revert. Reapply the specific bug fix. Get the immediate build out. Then re-apply. This is not a code style difference along the lines of how you arrange your bracers or whitespace - uninitialized variables are actually bad.

    Nice to see you're around Winston :)

  • (cs) in reply to Alex
    Alex:
    I've lost track of the discussion above, maybe we can make a summary?
    Summary: even though you can use the same syntax (most notably [n] to access elements, but also * to access the first element) and arrays decay to pointers when copied to a pointer or passed as a parameter, they are still different. (One such symptom is what you name.)

    The fact that you can treat them almost the same (or said another way, you can usually treat them the same) leads people to say that arrays "are" pointers, "are equivalent to" pointers, and other similar false statements, that these statements lead to confusion about what the differences are, and this confusion leads to sometimes-hard-to-diagnose bugs.

    But first, what is wrong with "copied from the heap"? I thought that assembler code...results in .LC0 being a pointer to char[] "Short string", located on the heap, ... moved the contents of that string into the stack. Or is something wrong with that? Where is .LC0 located?
    LC0 is located in a read-only section of the program image. The heap is specifically where dynamically-allocated memory comes from -- it's the part that malloc touches.

    It would be much more accurate to say that LC0 is stored in the same memory region as the actual program code, even though that's not quite true either.

    Technically speaking, if you're on a Linux system say, it will be in a section called "rodata" (for read-only data). This is in contrast to "data" (used for globals and statics), and "text" (used for program code). (I'm sure Windows's PE format has equivalent sections, but I'm not sure if they have the same names or not.)

  • Jay (unregistered)

    Forget "under the hood". While pointers and arrays have many similarities, they also have many practical differences.

    Like, most obviously, try compiling this code:

    char *a;
    char b[20];
    
    a="Hello";
    b="Hello";
    

    The last statement will not successfully compile. Therefore, I cannot use an array in exactly the same way that I use a pointer. QED. End of story.

    Or:

    char *a;
    char b[20];
    
    strcpy(b,"Hello");
    strcpy(a,"Hello");
    

    The first strcpy is completely valid and legal. The second strcpy will, if you're lucky, generate a runtime error. If you're not lucky, you'll overwrite a random memory location.

    Or:

    char *a="Hello";
    char b[15]="Hello";
    char *c="Hello";
    char *d="Shalom";
    
    strcpy(b,"Goodbye");
    strcpy(a,"Goodbye");
    

    On my system, this compiles successfully, but the second strcpy throws a segmentation fault, because we're trying to copy into the constant pool, which is not allowed. A compiler that did allow it would be setting you up for some nasty bugs. Most compilers share constants, so changing a would also mysteriously change c. And as "Goodbye" is longer than "Hello", we would be overwriting 2 bytes that might well be allocated to some other constant, perhaps mysteriously changing d to "ye". (Depending on how memory was laid out.)

  • JJ (unregistered) in reply to Anonymous
    Anonymous:
    Would any of the professionals who read this site really act like Winston did in this scenario? Does anyone seriously condone this behaviour? The WTF here is the attitude of the likes of Winston, plain and simple.

    Oh gosh, you're right! People who prefer good coding practices are the problem. I'm sure glad that nobody at my last job gave a rat's about coding standards. Consistent, legible formatting? Proper initialization of variables? Why, if everybody had the attitude that these things are important, it would be a disaster! It's much better to stumble along with illegible, uncommented code, blindly trying to fix undocumented spaghetti. If people had cared about code standards, maybe all those bugs I had to track down wouldn't have existed in the first place, and that would definitely be a real WTF.

  • (cs)

    I did some inconclusive tests on things I wasn't completely sure about...

    I compiled without any extra flags, so no optimization nor debug symbols, etc.

    Sample output:

    ---Output begins after this line--- (long)const_str1=0x27ff30 (long)const_str2=0x27ff20 (long)str1=0x27ff10 (long)str2=0x27ff00 (long)const_ptr_str1=0x403000 (long)const_ptr_str2=0x403000 (long)ptr_str1=0x403000 (long)ptr_str2=0x403000

    const_str1="data data data" const_str2="data data data" str1="data data data" str2="data data data" const_ptr_str1="data data data" const_ptr_str2="data data data" ptr_str1="data data data" ptr_str2="data data data" ---Output ended on previous line---

    As you can see, all of the strings point to "data data data", but not all of them point to the same copy. The pointer types all point to the same one, likely the static data in the executable (I'm unfamiliar with executable formats and assembly, but I /think/ the region would be referred to as the data segment). The array types all point to their own copies (each of them on the stack, which according to Wikipedia is not considered the data segment). None of this should be a surprise.

    Nowhere near conclusive, but when I changed the array types to be uninitialized and then strcpy'd TEH_STR into them,...

    ...the executable size grew (only about 200KB, with those 4 variables), suggesting that it was actually more space efficient to initialize them inline. That said, when I added optimization (MinGW/GCC with -O2), the executables from both versions came out the exact same size (albeit, not exactly the same). I'm not really an "engineer" so I don't claim to know the low-level details of what changes, etc.

    I don't really see anything wrong with the changes Winston made, aside from re-factoring code that may have been fine, instead of properly diagnosing the problem. I've been guilty of re-factoring working code as well (particularly when I can't make sense of it in its current form). My instinctual thought when seeing the original code in the OP was probably similar to that of Winston's (albeit, I'm only an intermediate C programmer; and not even in a professional context with it ATM). That doesn't make pre-release time the appropriate time to refactor. :P

    I definitely think that coding standards and commit policies are important to have. In a perfect world, all solutions would be group efforts and all code would be peer reviewed... And intern's should be supervised and reviewed to both guide them to proper solutions and avoid catastrophic losses...

    Addendum (2009-07-31 13:30): My bad. It only grew about 200 bytes. That makes so much more sense. :D

  • Alex (unregistered) in reply to EvanED
    EvanED:
    Alex:
    I've lost track of the discussion above, maybe we can make a summary?
    Summary: even though you can use the same syntax (most notably [n] to access elements, but also * to access the first element) and arrays decay to pointers when copied to a pointer or passed as a parameter, they are still different. (One such symptom is what you name.)

    The fact that you can treat them almost the same (or said another way, you can usually treat them the same) leads people to say that arrays "are" pointers, "are equivalent to" pointers, and other similar false statements, that these statements lead to confusion about what the differences are, and this confusion leads to sometimes-hard-to-diagnose bugs.

    But first, what is wrong with "copied from the heap"? I thought that assembler code...results in .LC0 being a pointer to char[] "Short string", located on the heap, ... moved the contents of that string into the stack. Or is something wrong with that? Where is .LC0 located?
    LC0 is located in a read-only section of the program image. The heap is specifically where dynamically-allocated memory comes from -- it's the part that malloc touches.

    It would be much more accurate to say that LC0 is stored in the same memory region as the actual program code, even though that's not quite true either.

    Technically speaking, if you're on a Linux system say, it will be in a section called "rodata" (for read-only data). This is in contrast to "data" (used for globals and statics), and "text" (used for program code). (I'm sure Windows's PE format has equivalent sections, but I'm not sure if they have the same names or not.)

    Great, thanks for a good explanation Evan. Could just be that you saved me from being fired or put on a dead-end job like poor Winston. So, if I may attempt to sum up the reasoning behind the WTF:

    - char *buf = "String"; 
    buf is pointing to a region of memory called "rodata" on Linux, which can roughly be thought of as a part of actual program code.
    - char buf[] = "String"; 
    buf is an array allocated on the stack, exactly as long as the length of the string plus zero. Chars are automatically copied to the stack from rodata.
    - char buf[10] = "String" 
    same as above, but with exactly 10 bytes allocated on the stack. Due to language specifications, the unused bytes have to be filled with zeros. Sometimes this leads compiler to interpret the line as 'char buf[10] = "String\0\0\0"', adding, in this case, three bytes to rodata and thus to the size of the binary. Which is the reason why Winston's changes increased size of the binaries, presumably enough to fill two CD's. Which is what this WTF is all about.

    Okey, hopefully this clears it up a little.

    PS: For completeness, we also have

    - char buf[10]; 
    Allocates 10 bytes on the stack. Content is not specified. Compiler is likely to leave it untouched, and whatever junk happened to be in that region of memory simply stays there.
  • Stephen (unregistered) in reply to OutlawProgrammer
    OutlawProgrammer:
    Someone (read: the president/CTO) decided it would be a good idea to put this intern in change of internationalizing the entire application. ... It turns out the intern's i18n tools had a bug in it where they would somehow delete certain lines of code. 99% of the time this would result in a compilation error and he would just go back and fix the line he fucked up. However, in that rare 1% of cases, he would delete an important line but the code would still compile. One of these cases was the root cause of the bug I was tracking down. ... The powers-that-be demanded a project-wide rollback of the code, reverting 3 months worth of everyone's work.
    Have you never heard of branches? Here, every major bug/feature gets its own branch. If someone screws up, you just back out their merge, which doesn't affect anyone else's work (which, naturally, they do on other branches).

    Yes, tools where branching/merging difficult would make this a pain in the ass. The solution is to get better tools, not to dumb down the process to accommodate bad tools.

  • Alex (unregistered) in reply to Alex
    Great, thanks for a good explanation Evan. Could just be that you saved me from being fired or put on a dead-end job like poor Winston. So, if I may attempt to sum up the reasoning behind the WTF:
    - char *buf = "String"; 
    buf is pointing to a region of memory called "rodata" on Linux, which can roughly be thought of as a part of actual program code.
    - char buf[] = "String"; 
    buf is an array allocated on the stack, exactly as long as the length of the string plus zero. Chars are automatically copied to the stack from rodata.
    - char buf[10] = "String" 
    same as above, but with exactly 10 bytes allocated on the stack. Due to language specifications, the unused bytes have to be filled with zeros. Sometimes this leads compiler to interpret the line as 'char buf[10] = "String\0\0\0"', adding, in this case, three bytes to rodata and thus to the size of the binary. Which is the reason why Winston's changes increased size of the binaries, presumably enough to fill two CD's. Which is what this WTF is all about.

    Okey, hopefully this clears it up a little.

    PS: For completeness, we also have

    - char buf[10]; 
    Allocates 10 bytes on the stack. Content is not specified. Compiler is likely to leave it untouched, and whatever junk happened to be in that region of memory simply stays there.

    All of the above presumes that declarations are made inside a function, otherwise they are of course going on heap :)

  • Bim Job (unregistered) in reply to Chris Becke
    Chris Becke:
    Jub-Jub:
    The pattern may not be unusual but it's stupid.... Doesn't bool initialise to false? We will always free If in doubt, initialise your bool first....

    We could complicate the code by not using a bool

    Why would a bool auto initialize to false? Its not a complex type.

    I'm beginning to think that there's something on the internet between a troll and a dwarf (not you, Chris). Trolls do this on purpose. Dwarves are too short to see above their own misguided preconceptions.

    Not sure what you mean by a "complex type" (possibly one with a well-formed constructor, ie C++), but: no, bools do not auto initialize. Not on the stack, not on the heap. A shitload of inadequate programmers assume that they initialize to false (why? why? why? why not file_not_found?). These people do not understand how memory works, and should have theirs erased by the Matrix.

    I've been working with these bozos for twenty-plus years, now. Things have not got better. There are dangerous raving maniacs out there.

    It's not difficult, people:

    (1) Understand the basics of a von Neumann architecture. (2) If using C, understand the semantics. Not just the syntax. The compiler will hit you over the head with the syntax, but it has absolutely no clue whatsoever with the semantics.

    That's your effing job.

  • Bob Sinclair (unregistered)

    In my company it's really simpel. An engineer can do anything to the code he wants. The diffs are looked at by another engineer chosen at random. If that engineer agrees, it gets committed in the repository.

    Our code quality is 'reasonably all right', because of this peer review process.

    A lone engineer touching every file because of a pet peeve would not get integrated. Unless it really is an improvement.

    Software engineering isn't a hard job. It just requires to set ego's aside and focus on the group effort. Such is life.

  • Buffled (unregistered) in reply to Chris Becke
    Chris Becke:
    In code that has tricky memory leaks, uninitialized variables are a common cause. It was noted that the problem was hard to track down, so a shotgun approach to tracking the problem down is not that inappropriate.
    If your resume ever crosses my desk, you don't get the interview, nevermind the job.
  • Ornedan (unregistered) in reply to Buddy
    Buddy:
    Dascandy:
    What's the WTF?

    The cardinal rule among S/W developers: If it ain't broke, don't fix it. Lots wasn't broke, and he fucked with it. This is most definitely a WTF.

    Lots was maybe-broke. He cleaned it up, so there's now less noise to hide whatever real brokenness is left. It should be done as soon as possible and the only WTF-candidate here is that he did it sooner than might be wise.

  • Anonymous (unregistered) in reply to Alex
    Alex:
    - char buf[10] = "String" 
    same as above, but with exactly 10 bytes allocated on the stack. Due to language specifications, the unused bytes have to be filled with zeros. Sometimes this leads compiler to interpret the line as 'char buf[10] = "String\0\0\0"', adding, in this case, three bytes to rodata and thus to the size of the binary. Which is the reason why Winston's changes increased size of the binaries, presumably enough to fill two CD's. Which is what this WTF is all about.

    Okey, hopefully this clears it up a little.

    Didn't the story start with Winston railing about people allocating more space than their strings need? It doesn't make a lot of sense for the WTF to be Winston adding to that, unless someone told him that's the code convention there. Nope. That's not at all something Chris might do.
  • Jordan (unregistered)
    Sure, developers were expected to fix bugs as they found them, but this... this all-encompassing carnival of carnage was beyond imagining.

    I hope I'm not the only one who thinks that describing am over-thorough code tidy-up as an "all-encompassing carnival of carnage" is a fantastic overstatement? It sounds like the code was messy; perhaps the TWTF is that the code was able to get into this state in the first place?

  • (cs)
    [Arrays aren't pointers because the compiler doesn't let me do everything I can with a raw pointer.]
    Arrays in C are just constant pointers to stack allocated memory. They behave exactly like any other constant pointer would, with the exception that the sizeof operator returns the total size of the array in bytes instead of the pointer size. That's the only difference I'm aware of. If they weren't pointers, it wouldn't work to explicitly dereference them.
    #define name_size 51
    char name[name_size] = "Sandy Beach";
    putchar(*(name + 6)); // Writes 'B' to stdout.
    

    #define numbers_size 5 int numbers[numbers_size] = {5, 4, 3, 2, 1}; printf("%d", *(numbers + 2)); // Writes '3' to stdout.

    str1 and str2 are the same except for when used with the sizeof operator.

    #define str1_size 51
    char str1[str1_size] = "John Doe";
    // Except for with the sizeof operator, str2 is the same as str1.
    char * const str2 = str1;
    //str1 = NULL; --> "incompatible types in assignment"
    //str2 = NULL; --> "assignment of read-only variable `str2'"
    Arrays in C are just pointers with a few differing semantics.
  • Shinobu (unregistered)

    I think the second CD was needed because the old version was still there. Even with the possible issues that the change might have caused (and I'm not convinced they were severe or not addressed or fixable) you wouldn't expect a full blow-up. The Real WTF® is not having CVS or something similar and lack of oversight. And perhaps patience.

  • Alex (unregistered) in reply to Anonymous
    Anonymous:
    Alex:
    - char buf[10] = "String" 
    same as above, but with exactly 10 bytes allocated on the stack. Due to language specifications, the unused bytes have to be filled with zeros. Sometimes this leads compiler to interpret the line as 'char buf[10] = "String\0\0\0"', adding, in this case, three bytes to rodata and thus to the size of the binary. Which is the reason why Winston's changes increased size of the binaries, presumably enough to fill two CD's. Which is what this WTF is all about.

    Okey, hopefully this clears it up a little.

    Didn't the story start with Winston railing about people allocating more space than their strings need? It doesn't make a lot of sense for the WTF to be Winston adding to that, unless someone told him that's the code convention there. Nope. That's not at all something Chris might do.

    It appears Winston didn't know that char buf[10] = "String" leads to padding and binary size increase with some compilers. Otherwise, I think it is sensible to fix code like

      char data_string[20];
      ...
      strcpy(data_string,"data data data");
    

    because this code obfuscates bugs. He should have changed it to

      char data_string[20];
      strcpy(data_string,"data data data");
      ...
    
  • Roger (unregistered)

    Ok, so management hired a guy because he said he likes to destroy bad code, and when he does that, is removed of his ability to do so. Is that the WTF?

  • John Doe (unregistered)

    There is one thing that can cause problems with memory leaks in the original code.

    Imagine this: void function() { int loopTill = 10; char szFoo[5]; sprintf(szFoo, "12345");

    for(int i=0; i<loopTill, i++) { //initialize memory here or //or structures that say how much you should allocate }

    since the data is on stack, buffer overflow will change the value of loopTill, which in turn can cause havoc somewhere down the road. Of course, endiannes, optimizations, security cookies and other things can change all of this, but the risk is there.

    Of course, before shipping that's not a good thing to change a lot of code.

  • Alex (unregistered) in reply to John Doe
    John Doe:
    char szFoo[5]; sprintf(szFoo, "12345");
    Mind the zero at the end of C strings...
  • John Doe (unregistered) in reply to Alex

    That is exactly the problem, a common mistake will cause variable pushed on stack to be overwritten, in sprintf operation.

    sprintf will fill the buffer and the value of int before array will get changed.

    The code is wrong, but it illustrates the problem.

  • Alex (unregistered) in reply to John Doe
    John Doe:
    The code is wrong, but it illustrates the problem.

    Ah, I see, sorry. Good example.

  • david (unregistered)

    What am I missing here? What exactly is the WTF?

  • kyle (unregistered)

    I'm not sure where the confusion is coming from, but the WTF is clear. You can't just replace "int x; ... ; x = 2" with "int x = 2;" everywhere because it leads to bugs if you have code like this:

    int x; ... while (something) { x = 2;

    // Do something that depends on x initially being 2 // and which also changes the value of x }

  • smasher (unregistered) in reply to @Deprecated
    (registered):
    Yeah, I don't get it either. Where did the second CD come from?

    I think this must have been a patch being sent to the client - just the files that had been changed, which should have fit on on 8 inch floppy if Winston hadn't changed every file.

  • Duke of New York (unregistered)

    Disc two doesn't do anything. That's the beauty of it!

  • (cs) in reply to Stig
    Interestingly, gcc now deprecates code like:
    
    char* blah = "blah";
    
    ...and would prefer that you use:
    
    char blah[] = "blah";
    
    ...and does so to enforce the semantics of the language, not because of the compiler output.
    
    jes:/usr/home/jes$ cat x.c
    #include <stdio.h>
    #include <stdlib.h>
    
    int main (void) {
      char * blah = "foo\n";
    
      printf("%s", blah);
      blah = "bar\n";
      printf("%s", blah);
      exit(0);
    }
    jes:/usr/home/jes$ gcc -Wall -pedantic x.c
    jes:/usr/home/jes$
    

    I am glad that I don't see it being deprecated (gcc 4.1.2). The two declarations aren't the same and have different uses. Only the pointer declaration allows me to re-assign blah to a new char array. Using the char[] blah = "foo\n"; would break things. Another example where pointers and arrays are not the same.

  • (cs)

    A more pointed example of the difference between pointers and arrays - a mistake I've seen more than a few times:

    jes:/usr/home/jes$ cat x.c
    #include <stdio.h>
    #include <stdlib.h>
    
    extern char *zoe;
    extern char *frog;
    
    int main (void) {
      printf("%s%s", zoe, frog);
      exit(0);
    }
    

    zoe and frog are both declared as external character pointers. So this file should work just fine:

    jes:/usr/home/jes$ cat xx.c
    
    char zoe[] = "the point of the pointer ";
    char *frog = "is to make it's point\n";
    

    Compile them with debug, and run under gdb:

    jes:/usr/home/jes$ gcc -g -Wall -pedantic -o x  x.c xx.c
    jes:/usr/home/jes$ gdb x
    GNU gdb 6.8
    Copyright (C) 2008 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-pc-linux-gnu"...
    (gdb) r
    Starting program: /usr/home/jes/x
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00007fdee7f355d0 in strlen () from /lib/libc.so.6
    (gdb) f 3
    #3  0x0000000000400579 in main () at x.c:8
    8         printf("%s%s", zoe, frog);
    (gdb) p frog
    $1 = 0x400681 "is to make it's point\n"
    (gdb) p zoe
    $2 = 0x6e696f7020656874 

    It looks like you need to pay attention to the difference between char * and char[] - they aren't the same. Change the extern declaration of zoe to extern char zoe[]; and things work. Now for all the C experts out there who think pointers and arrays are the same: What's up with that then?

    They aren't the same and pretending they are is a recipe for disaster. Note that the incorrect program compiles without errors, it simply doesn't work like you'd expect if you are under the misapprehension that type * == type[]

  • Vladimir (unregistered)

    There are two major things about his code:

    1. char * a = "aaa aaa aaa"; is pointer to the constant in c++ (you might get a warning).

    2. some compilers might want to optimize and merge all declarations like char * a = "aaa aaa aaa"; char * b = "aaa aaa aaa"; so variable "a" and "b" point to the very same memory. This might cause interesting behaviour like a[4] = 'c'; and b becomes "aaa caa aaa";

  • fdizzle (unregistered)

    TRWTF is that this story made it onto TDWTF.. He improved the code and you're lambasting him for it. Sure it was a waste of time when they're about to deliver, but if the bugs are fixed, what's the problem? You didn't even mention if he fixed the memory leak.

  • CButtius (unregistered)

    Winston sounds just like me. And the way he was treated. The problem was that he was badly managed.

    The manager should have been responsible for prioritising what he was working on rather than just leaving him to "get on with it".

    The changes he made were good but of low priority compared to what he was asked to do.

    If he introduced a bug by some accidental typo that cause the code to build to something bigger, they should be working on fixing it, but they had source control and can label some of the previous version to build the release on that are known to be bug free. It is also possible to move Winston's changes onto a branch whilst restoring the previous version to the main branch and then merge in Winston's changes once they are properly tested.

    The lack of proper regression tests also seems to be an issue here.

    QA actually means Quality Assurance, and it is ensuring proper software engineering are being carried out, including testing. For many companies though QA = testing which it isn't.

  • CButtius (unregistered) in reply to JarFil
    JarFil:
    Let there be two files...

    t1.c

    #include <stdio.h>
    
    int main () {
            char buf[100000] = [1100 character buffer snipped];
            printf("%s\n",buf);
    }
    

    t2.c

    #include <string.h>
    #include <stdio.h>
    
    int main () {
            char buf[100000];
            strcpy(buf, [same 1100 char buffer]);
            printf("%s\n",buf);
    }
    

    If we do:

    $ gcc t1.c -o t1
    $ strip -s t1
    $ gcc t2.c -o t2
    $ strip -s t2
    

    We obtain:

    -rwxr-xr-x 1 user user 108808 2009-07-30 18:26 t1
    -rw-r--r-- 1 user user   1179 2009-07-30 18:26 t1.c
    -rwxr-xr-x 1 user user   6408 2009-07-30 18:26 t2
    -rw-r--r-- 1 user user   1210 2009-07-30 18:26 t2.c
    

    Over 100Kb more for a single variable.

    So, yeah, that could quite well mean an additional CD, or ten.

    In the first case you have initialised your buffer of 100000 chars and when you do that it initialises all of it, not just the first 1100 that you have in your string.

    Try using [] rather than specifying the size.

    You could also use const char * if there is no plan to write into it.

    A common thing one might do is

    char buf[BUFSIZE] = { 0 };

    which puts a 0 into every byte, not just the first one. char buf[ BUFSIZE ] = ""; does the same.

    I'm sure Winston would not have done exactly the code above, because he was careful on good coding practices and would not have used magic numbers like 15.

Leave a comment on “Mister Fix-it”

Log In or post as a guest

Replying to comment #:

« Return to Article