- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This is nonsense as local variables are never placed into the BSS segment but dynamically allocated on the stack.
Admin
The WTF is that Winston was apparently never accepted as a team member and properly introduced into the current practices. Yes, touching all files for "code cleanup" without coordinating this with the other team members was certainly not helpful. But Winston made no secret out of it:
Hence it would have been wise to tell him to which extent such code cleanups are welcomed and to be done. In addition, his initial suggestions were not welcomed and brushed off with an unfriendly comment:Simply put: If you have no time to introduce a new team member into your team, do not employ one. Or tell at least when you have time for him and what he can do or shall not do until a proper introduction takes place. Poor communication is an invitation of disaster or at least a waste of resources.
Admin
What's the picture of the bashed-up Saxo got to do with anything?
Admin
Whoosh...
Clearly, within the context of a mangled, pointless story, a pointless photo of a mangled car and a passing obscure reference to C makes perfect sense.
TRWTF is that "Winston" probably submitted this story, not "Chris" or whatever his name was. "I joined a company with no process and horrible code. I fixed some of the horrible code. They had a hissy fit because I didn't follow their unwritten, unstated, bass-ackwards 'process'. WTF?"
Admin
I am not as good with C as many ppl here appear to be, but I wrote a simple test application that does char a[] = "Short String". It appears that the string is in fact copied to the stack. Nothing like that happens with char *a. The code is as following:
I compiled this with gcc -S -O0 test.c -o test.asm (GNU C compiler, to assembler only, no optimizations), which resulted in:
Notice that space for the string is allocated on the stack (subl $40, %esp), and the value of the string is promptly copied from the heap, one "long" at a time (movl .LC0, %eax; movl %eax, -17(%ebp), etc).
Comments anyone?
Admin
Um, so you guy are honestly saying you don't see the WTF in changing every single file in a large application, even if it's trivial, that's still a lot of changes to go trough. SCM history also becomes quite useless. The rule is, if you're not touching something, and it works, don't touch it.
Admin
Today's lesson in arguing on the Internet:
Once you realize that you're obviously wrong and everyone else knows it, it's a good idea to stick to your position anyway and try to change the definition of your own statement so that it sounds more correct. You will always look more intelligent and mature this way than if you had simply admitted you were wrong.
Admin
Admin
Erm...what? Are you suggesting that a function that is caused recursively will enlarge my binary with multiple copies of the code constants? I will just assume you mixed that sentence up, stating the reverse of what you meant to. Either way, I don't see the point.
Admin
I must correct this a little bit...
WTF: doing repository wide changes just before release on the main repo(ie not doing it in a branch).
WTF: not doing (even repository wide) maintenance at all and letting your huge code decay and rot as changes stack-up. Typical symptoms: strange workarounds to limitations of working, don't touch it subsystems. But hey - if you need workarounds, it's not working anymore! Over time workarounds tend to pile on each other leaving you with a total mess. Then you leave company then and let some poor freshmen to deal with the sh1t.
Disclaimer: of course it depends on the nature and scale of the project. In a lot of cases hacking type maintenance is just good enough, BUT in this particular story they deal with project where any major technological/design debt will hurt big time later on.
Admin
Yes, congrats to the discovery(but... copied from heap?), but seriously - read the discussion first, there are like two pages of char[] vs char* already :/
Admin
The cardinal rule among S/W developers: If it ain't broke, don't fix it. Lots wasn't broke, and he fucked with it. This is most definitely a WTF.
Admin
I've lost track of the discussion above, maybe we can make a summary? But first, what is wrong with "copied from the heap"? I thought that assembler code
results in .LC0 being a pointer to char[] "Short string", located on the heap, and instructions
moved the contents of that string into the stack. Or is something wrong with that? Where is .LC0 located?
Admin
char array is effectively a char pointer "under the hood" in the same way that multiplication is effectively addition in a loop "under the hood"
Admin
Admin
Actually, neither is. Who the hell puts 100k of data on the stack? Can you say stack underflow?
Admin
Indeed, it's stated in there :o) It's exaggerated example of interesting side effect of stack-based array initialization. It can hurt in similar way in terms of resulting program-size if those arrays are small, but plentiful.
Admin
So, the following:
file1.c: int foo[100];
file2.c: extern int *foo;
int bar() { *foo = 10; }
Should be OK then, since they are the same? But, no, it is not OK, because they are not the same.
Just because an array decays to pointer to the first element when (in most cases) used in expressions does not mean they are the same.
int foo[10]; int *bar;
foo is not a modifiable lvalue, bar is. foo has a static size 10*sizeof(int) that can not be changed, bar can be pointed to any piece of memory basically and only the size of the pointer remains constant.
And so on, and so on...
Admin
Nice to see you're around Winston :)
Admin
The fact that you can treat them almost the same (or said another way, you can usually treat them the same) leads people to say that arrays "are" pointers, "are equivalent to" pointers, and other similar false statements, that these statements lead to confusion about what the differences are, and this confusion leads to sometimes-hard-to-diagnose bugs.
LC0 is located in a read-only section of the program image. The heap is specifically where dynamically-allocated memory comes from -- it's the part that malloc touches.It would be much more accurate to say that LC0 is stored in the same memory region as the actual program code, even though that's not quite true either.
Technically speaking, if you're on a Linux system say, it will be in a section called "rodata" (for read-only data). This is in contrast to "data" (used for globals and statics), and "text" (used for program code). (I'm sure Windows's PE format has equivalent sections, but I'm not sure if they have the same names or not.)
Admin
Forget "under the hood". While pointers and arrays have many similarities, they also have many practical differences.
Like, most obviously, try compiling this code:
The last statement will not successfully compile. Therefore, I cannot use an array in exactly the same way that I use a pointer. QED. End of story.
Or:
The first strcpy is completely valid and legal. The second strcpy will, if you're lucky, generate a runtime error. If you're not lucky, you'll overwrite a random memory location.
Or:
On my system, this compiles successfully, but the second strcpy throws a segmentation fault, because we're trying to copy into the constant pool, which is not allowed. A compiler that did allow it would be setting you up for some nasty bugs. Most compilers share constants, so changing a would also mysteriously change c. And as "Goodbye" is longer than "Hello", we would be overwriting 2 bytes that might well be allocated to some other constant, perhaps mysteriously changing d to "ye". (Depending on how memory was laid out.)
Admin
Oh gosh, you're right! People who prefer good coding practices are the problem. I'm sure glad that nobody at my last job gave a rat's about coding standards. Consistent, legible formatting? Proper initialization of variables? Why, if everybody had the attitude that these things are important, it would be a disaster! It's much better to stumble along with illegible, uncommented code, blindly trying to fix undocumented spaghetti. If people had cared about code standards, maybe all those bugs I had to track down wouldn't have existed in the first place, and that would definitely be a real WTF.
Admin
I did some inconclusive tests on things I wasn't completely sure about...
I compiled without any extra flags, so no optimization nor debug symbols, etc.
Sample output:
---Output begins after this line--- (long)const_str1=0x27ff30 (long)const_str2=0x27ff20 (long)str1=0x27ff10 (long)str2=0x27ff00 (long)const_ptr_str1=0x403000 (long)const_ptr_str2=0x403000 (long)ptr_str1=0x403000 (long)ptr_str2=0x403000
const_str1="data data data" const_str2="data data data" str1="data data data" str2="data data data" const_ptr_str1="data data data" const_ptr_str2="data data data" ptr_str1="data data data" ptr_str2="data data data" ---Output ended on previous line---
As you can see, all of the strings point to "data data data", but not all of them point to the same copy. The pointer types all point to the same one, likely the static data in the executable (I'm unfamiliar with executable formats and assembly, but I /think/ the region would be referred to as the data segment). The array types all point to their own copies (each of them on the stack, which according to Wikipedia is not considered the data segment). None of this should be a surprise.
Nowhere near conclusive, but when I changed the array types to be uninitialized and then strcpy'd TEH_STR into them,...
...the executable size grew (only about 200KB, with those 4 variables), suggesting that it was actually more space efficient to initialize them inline. That said, when I added optimization (MinGW/GCC with -O2), the executables from both versions came out the exact same size (albeit, not exactly the same). I'm not really an "engineer" so I don't claim to know the low-level details of what changes, etc.
I don't really see anything wrong with the changes Winston made, aside from re-factoring code that may have been fine, instead of properly diagnosing the problem. I've been guilty of re-factoring working code as well (particularly when I can't make sense of it in its current form). My instinctual thought when seeing the original code in the OP was probably similar to that of Winston's (albeit, I'm only an intermediate C programmer; and not even in a professional context with it ATM). That doesn't make pre-release time the appropriate time to refactor. :P
I definitely think that coding standards and commit policies are important to have. In a perfect world, all solutions would be group efforts and all code would be peer reviewed... And intern's should be supervised and reviewed to both guide them to proper solutions and avoid catastrophic losses...
Addendum (2009-07-31 13:30): My bad. It only grew about 200 bytes. That makes so much more sense. :D
Admin
Great, thanks for a good explanation Evan. Could just be that you saved me from being fired or put on a dead-end job like poor Winston. So, if I may attempt to sum up the reasoning behind the WTF:
buf is pointing to a region of memory called "rodata" on Linux, which can roughly be thought of as a part of actual program code. buf is an array allocated on the stack, exactly as long as the length of the string plus zero. Chars are automatically copied to the stack from rodata. same as above, but with exactly 10 bytes allocated on the stack. Due to language specifications, the unused bytes have to be filled with zeros. Sometimes this leads compiler to interpret the line as 'char buf[10] = "String\0\0\0"', adding, in this case, three bytes to rodata and thus to the size of the binary. Which is the reason why Winston's changes increased size of the binaries, presumably enough to fill two CD's. Which is what this WTF is all about.Okey, hopefully this clears it up a little.
PS: For completeness, we also have
Allocates 10 bytes on the stack. Content is not specified. Compiler is likely to leave it untouched, and whatever junk happened to be in that region of memory simply stays there.Admin
Yes, tools where branching/merging difficult would make this a pain in the ass. The solution is to get better tools, not to dumb down the process to accommodate bad tools.
Admin
All of the above presumes that declarations are made inside a function, otherwise they are of course going on heap :)
Admin
Not sure what you mean by a "complex type" (possibly one with a well-formed constructor, ie C++), but: no, bools do not auto initialize. Not on the stack, not on the heap. A shitload of inadequate programmers assume that they initialize to false (why? why? why? why not file_not_found?). These people do not understand how memory works, and should have theirs erased by the Matrix.
I've been working with these bozos for twenty-plus years, now. Things have not got better. There are dangerous raving maniacs out there.
It's not difficult, people:
(1) Understand the basics of a von Neumann architecture. (2) If using C, understand the semantics. Not just the syntax. The compiler will hit you over the head with the syntax, but it has absolutely no clue whatsoever with the semantics.
That's your effing job.
Admin
In my company it's really simpel. An engineer can do anything to the code he wants. The diffs are looked at by another engineer chosen at random. If that engineer agrees, it gets committed in the repository.
Our code quality is 'reasonably all right', because of this peer review process.
A lone engineer touching every file because of a pet peeve would not get integrated. Unless it really is an improvement.
Software engineering isn't a hard job. It just requires to set ego's aside and focus on the group effort. Such is life.
Admin
Admin
Admin
Admin
I hope I'm not the only one who thinks that describing am over-thorough code tidy-up as an "all-encompassing carnival of carnage" is a fantastic overstatement? It sounds like the code was messy; perhaps the TWTF is that the code was able to get into this state in the first place?
Admin
Admin
I think the second CD was needed because the old version was still there. Even with the possible issues that the change might have caused (and I'm not convinced they were severe or not addressed or fixable) you wouldn't expect a full blow-up. The Real WTF® is not having CVS or something similar and lack of oversight. And perhaps patience.
Admin
It appears Winston didn't know that char buf[10] = "String" leads to padding and binary size increase with some compilers. Otherwise, I think it is sensible to fix code like
because this code obfuscates bugs. He should have changed it to
Admin
Ok, so management hired a guy because he said he likes to destroy bad code, and when he does that, is removed of his ability to do so. Is that the WTF?
Admin
There is one thing that can cause problems with memory leaks in the original code.
Imagine this: void function() { int loopTill = 10; char szFoo[5]; sprintf(szFoo, "12345");
for(int i=0; i<loopTill, i++) { //initialize memory here or //or structures that say how much you should allocate }
since the data is on stack, buffer overflow will change the value of loopTill, which in turn can cause havoc somewhere down the road. Of course, endiannes, optimizations, security cookies and other things can change all of this, but the risk is there.
Of course, before shipping that's not a good thing to change a lot of code.
Admin
Admin
That is exactly the problem, a common mistake will cause variable pushed on stack to be overwritten, in sprintf operation.
sprintf will fill the buffer and the value of int before array will get changed.
The code is wrong, but it illustrates the problem.
Admin
Ah, I see, sorry. Good example.
Admin
What am I missing here? What exactly is the WTF?
Admin
I'm not sure where the confusion is coming from, but the WTF is clear. You can't just replace "int x; ... ; x = 2" with "int x = 2;" everywhere because it leads to bugs if you have code like this:
int x; ... while (something) { x = 2;
// Do something that depends on x initially being 2 // and which also changes the value of x }
Admin
I think this must have been a patch being sent to the client - just the files that had been changed, which should have fit on on 8 inch floppy if Winston hadn't changed every file.
Admin
Disc two doesn't do anything. That's the beauty of it!
Admin
I am glad that I don't see it being deprecated (gcc 4.1.2). The two declarations aren't the same and have different uses. Only the pointer declaration allows me to re-assign blah to a new char array. Using the char[] blah = "foo\n"; would break things. Another example where pointers and arrays are not the same.
Admin
A more pointed example of the difference between pointers and arrays - a mistake I've seen more than a few times:
zoe and frog are both declared as external character pointers. So this file should work just fine:
Compile them with debug, and run under gdb:
It looks like you need to pay attention to the difference between char * and char[] - they aren't the same. Change the extern declaration of zoe to extern char zoe[]; and things work. Now for all the C experts out there who think pointers and arrays are the same: What's up with that then?
They aren't the same and pretending they are is a recipe for disaster. Note that the incorrect program compiles without errors, it simply doesn't work like you'd expect if you are under the misapprehension that type * == type[]
Admin
There are two major things about his code:
char * a = "aaa aaa aaa"; is pointer to the constant in c++ (you might get a warning).
some compilers might want to optimize and merge all declarations like char * a = "aaa aaa aaa"; char * b = "aaa aaa aaa"; so variable "a" and "b" point to the very same memory. This might cause interesting behaviour like a[4] = 'c'; and b becomes "aaa caa aaa";
Admin
TRWTF is that this story made it onto TDWTF.. He improved the code and you're lambasting him for it. Sure it was a waste of time when they're about to deliver, but if the bugs are fixed, what's the problem? You didn't even mention if he fixed the memory leak.
Admin
Winston sounds just like me. And the way he was treated. The problem was that he was badly managed.
The manager should have been responsible for prioritising what he was working on rather than just leaving him to "get on with it".
The changes he made were good but of low priority compared to what he was asked to do.
If he introduced a bug by some accidental typo that cause the code to build to something bigger, they should be working on fixing it, but they had source control and can label some of the previous version to build the release on that are known to be bug free. It is also possible to move Winston's changes onto a branch whilst restoring the previous version to the main branch and then merge in Winston's changes once they are properly tested.
The lack of proper regression tests also seems to be an issue here.
QA actually means Quality Assurance, and it is ensuring proper software engineering are being carried out, including testing. For many companies though QA = testing which it isn't.
Admin
In the first case you have initialised your buffer of 100000 chars and when you do that it initialises all of it, not just the first 1100 that you have in your string.
Try using [] rather than specifying the size.
You could also use const char * if there is no plan to write into it.
A common thing one might do is
char buf[BUFSIZE] = { 0 };
which puts a 0 into every byte, not just the first one. char buf[ BUFSIZE ] = ""; does the same.
I'm sure Winston would not have done exactly the code above, because he was careful on good coding practices and would not have used magic numbers like 15.