- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Create a branch from the head of two weeks ago, then apply every change made by the other developers from two weeks ago to the present. Fix any conflicts that occur manually. When done, turn the branch into your new head.
All decent revision control tools can do this--even the broken ones that require you to explicitly create and publish a branch before you start doing work. The better tools can automate most of the process. Git has an example in its documentation for handling exactly this case (remove all contributions by a specific developer) on the filter-branch man page, although that was designed more for removing copyright infringements and "oops I accidentally committed all my pr0n" than for cleaning up after overzealous developers of questionable competence.
If the developer did any real work during those two weeks, you'll have to redo it, but in the worst case you only lose two weeks of one developer's work plus any conflicts with other developers' work, not two weeks of the entire development team's work. Testing, of course, is a total loss--you'll have to redo the previous two week's testing and possibly more.
Admin
In fact, char[] is aequivalent to char*. Each one is a pointer to an 'array' of chars, thus a pointer to the first byte of a number of bytes representing a string. The two different representations are only syntactic sugar.
AFAIR at least. :)
Admin
No, it isn't. data_string is a 15 character entry, and will take up 15 bytes on the stack. Want me to prove it to you?
I took this code and compiled it with three different definitions of size, and told Intel CC to output assembly for each.
Here is the beginning of the definition of foo() with "icc -S -DSIZE=5 -O0":
Note that the size of the stack frame is set at 8 (5 bytes aligned up).
Here it is with -DSIZE=16:
Note that the size of the stack frame is set at 16.
Here it is with -DSIZE=1024:
Note that the size of the stack frame is 1032 (some padding/alignment issue? I'm not sure).
Why would the size of the stack frame change if it was a pointer holding an address?
Want another proof? Run this program:
My output (from GCC):
Don't confuse the fact that arrays decay into pointers with the fact that they are still very different creatures.
Addendum (2009-07-30 14:24): I guess for completeness I should do the pointer-based version too. Here's the source:
Here's the beginning of foo with -DSIZE=5:
Here's the beginning of foo with -DSIZE=16:
And with -DSIZE=1024:
Note how they are all the same -- with a stack frame size of 12.
Admin
So even though compilers don't have to do that, it looks like they do.
No, it's not. Stop spreading this lie.
Admin
There are some hardware-portability greybeards that still insist that:
{ int x; x = 2; }
...is the one true path to guarantee divine compilation.
I light their beards and run.
Admin
Not true. Allocating v large buffers on the stack is a BAD idea, since you could blow out the stack. Also you can't return pointers to stack allocated arrays out of a function.
Admin
Not true. Allocating v large buffers on the stack is a BAD idea, since you could blow out the stack. Also you can't return pointers to stack allocated arrays out of a function. The WTF is clear to me. An overzealous dev going in and wasting company time by changing everything that personally offends him.
Admin
Whether it will blow your stack is definitely a consideration, but it's far from the only one. It may be that increasing your stack size is the right option.
Anyway, this ignores the fact that, if the example in the article is faithful, Winston or whatever his name was changed it from... stack allocation to stack allocation.
Admin
The language semantics are technically different, but under the hood, they are identical. It is not a lie.
char* == char[] char** == *char[]
Admin
If being a good team member means continuing to write bad code so it fits in with their style - count me out!
Admin
That's not true. Multidimensional arrays are quite different from pointers to pointers, or even arrays of pointers. Multidimensional arrays are allocated as continuous data. Reproducing that with pure pointers is not easy. Also, pointers are not necessarily constants while arrays are, by definition.
Admin
http://www.cplusplus.com/doc/tutorial/pointers/
Admin
That's like saying "except for the places where they are different, they are identical".
They are allocated differently, they take up different amounts of storage and provide different information to sizeof, they provide different information to C++'s typeid. If you declare a variable of type char[N], there will almost certainly be nowhere that the address of that array is explicitly stored (unless you pass it as a parameter to another function or assign it to a pointer), because the program will access it via the stack or frame pointer.
Admin
Shouldn't the text in the article:
"copying twenty characters into a variable with a length of twenty-five"
be
"copying twenty-five characters into a variable with a length of twenty"
?
Admin
Admin
For those who have not followed the chain, the cause of the bloat is:
The original strcpy(s,"static text") means that "static text" occupies only as much memory as the string plus a null.
The new char s[xx]="static text" requires that "static text" be stored in a buffer of xx characters, null padded - in otherwords, a heck of a lot of nulls padded into the executable to no advantage.
All the bloat was useless null padding.
Ok, granted it's a heck of headbanger (esp. for those of us who use languages that actually know what a string is) - however I think the real problem is perhaps being ignored here.
The real problem is the unintelligent way the compiler chose to initialize the char arrays. It took the easier, lazy route. So, perhaps this is more a wtf for the compiler's optimizer design.
That said, maybe some people foolishly depend on those nulls in their code (for ugly unwise hacks), and attempts to change the optimizer were quashed long ago.
Admin
And char arrays behave differently than char pointers.
What's the "underlying identical nature" if they are not implemented the same, don't mean the same thing, or behave the same way?Admin
I thought that was going to be the WTF. Then I read more. Now I don't know at all.
Is there a compressing compiler that will change "123456790"x1000 into a tiny zip'd blob?
Admin
Having read everyone's comments, I'm still not seeing much of a WTF. WTFs I see:
Presumably the install CDs did not contain the source code, but the compiled executable. Without knowing more about the code, we can't know whether Winston's changes made the compiled code larger or smaller, but my gut tells me it's essentially the same (within a few MB, anyway).
Having said that, I have a related WTF. I was working on an application that, over the years, had migrated through the following:
As a result, the code had a lot of ancient cruft. Specifically, there were a lot of #define'd constants, and the popup menu code was disgusting.
Soon after I started, I suggested changing these to const variables (after all, that's what the const keyword is for), since we'd get typechecking and such in return for the effort. This went very far down on the to-do list.
Eventually, just after a major release, I was told to do it, so I did. Magically, nothing broke, and I'm sure some bugs disappeared once I fixed the resulting syntax and linker errors. (Undefined variable? Duplicate definition? Say it ain't so!)
That's not the WTF, that's actually a good story. The WTF is that several months later, that same manager fired me from the team for taking some convoluted menu code (~2000 lines over four source files) and rewriting it cleanly (~200 lines over one source file). He claimed it was outside the scope of my assignment (which was to add another menu option in said menu code). Never mind that it took less time to rewrite than it would have taken to add the new menu option to the spaghetti.
So apparently, according to that manager, cleaning up code is only OK when it has been explicitly assigned, even if said cleanups are directly related to one's current assignment.
Also, builds were done in a most horrific manner. Source files were grouped by function, so files related to "foo" were in the "foo" folder. We'd have foo/bar1.cpp, foo/bar2.cpp, foo/bar3.cpp, and so on.
This same manager decided that in order to speed up complete rebuilds, he'd do the following in an all_foo.cpp:
He then excluded the original files from the build. This did result in faster complete rebuilds, at the expense of slower partial rebuilds (i.e. what we do all day) and BREAKING FILE SCOPE RULES.
Yeah... we had lots of fun tracking down compile errors due to programmers assuming that they could trust file scope rules to hold. We never did convince the manager to revert those changes.
Admin
(Not to say you can't depend on it in unwise ways, just that depending on it doesn't necessarily make it unwise.)
Heck, it's actually a really convenient shorthand. If you want a zero-initialized array, "int a[100] = {0};" is way better than "int a[100] = {0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0};".
That said, compilers probably could make that code better by not emitting all of the zeros in the data segment of the exe, and basically calling memset in its place dynamically. I doubt it'd be appreciably slower.
Admin
Especially since in many (most?) situations like this it could be done at startup, and adding a few ms to an app's startup time is rarely an issue. They could probably make it configurable behavior (a compiler flag).
Admin
Admin
Wow. It's like saying that a pointer and an int are technically different, but under the hood, they are identical. They both occupy 32 bits, after all. On x86, at least. IIRC, early K&R C didn't see a difference.
Go read up on lvalue and rvalue. And also see what's type of a pointer to char* and a pointer to char[10].
Admin
Admin
I don't know what you're saying here, but you can assign a pointer to an array all day long... The article you linked to even says:
The following assignment operation would be valid:
If you were taking a stand for char[] being different than char*, you're on the right side of the argument, but...
Admin
Confusion over what language elements mean, what compilers do with them and underlying memory structures is common among beginners.
Admin
You have my sympathy. We have always avoided needing to do anything more than check out, check in with sourcesafe. Was rolling back that many files as fun as it has always looked?
We are now in the process of moving to TFS. Sadly sourcesafe contains LOTS of shared files, so there is a fair bit of re-organisation having to happen to each project when it is moved.
Admin
So in your crazy moon language, assigning an array to a pointer is the same as assigning a pointer to an array?
Admin
Of course that stack variable 'fexternal' is going to cause a random (in release) crash somewhere. (initialization is good!)
Admin
This compiles with no warnings.
Admin
Some relevant words bolded. In the world of computer languages, the "meaning" of a language construct is its "behaviour", that's why the term "semantics" was borrowed from linguistics. If you used it in the linguistic sense in your first post, fine, but you should have said that since it's usually used in the CS sense when talking about programming language constructs.
They "behave" differently in the sense that the compiler produces different machine code for either of these structures, e.g. illustrates that.Yes, in many situations an array decays into a pointer, but not in all. Most of the time you can regard arrays and (const) pointers as equivalent, but not always.
Admin
I compiled this with 'gcc -g', loaded the resulting executable in GDB, set a breakpoint at main(), ran it, and stepped over the two initializations. I used "print &x" to get the address of x (0xbf9aad40) and "print &array" to get the address of array (0xbf9aad3a). I took a memory dump around those addresses with "x/20x 0xbf9aad20", and I've pasted the output from that command below. I've highlighted the string "hello\0" (0x68, 0x65, 0x6c, 0x6c, 0x6f, 0) and the value 0xABABABAB:
If you look around these values, there are a lot of pointers to areas around this: 0xbf9aadec, 0xbf0aade4, etc. But you'll notice a conspicuous lack of any pointer to 0xbf9aad3a. That's because there isn't one.
Still not convinced? Think that the initialization is overwriting the value of the pointer or something like that? The program compiles with g++ too, and with -Wall -Wextra the only warnings are about the two unused variables. Still not convinced? Here's a new version of the program:
GDB reports 'array' is at 0xbfdca956. Here's a memory dump:
Where is this mythical pointer that you think 'array' actually is? Again, there isn't one anywhere near that address. Furthermore, the string "hello\0" is now stored in the stack frame. If array were a pointer, it would be stored at whatever the address is that it pointed to.
Then you're wrong about what you know, or we're actually agreeing on everything except what it means to be "the same" (where for you, 'being the same" apparently means "being the same in everything except how it behaves, how it is implemented, and what it means to the programmer"). I just taught a course on compilers this past semester. Want to have a "who knows more about compilers"-off?Addendum (2009-07-30 16:32): BTW, in case it's not relevant, the reason "The program compiles with g++ too, and with -Wall -Wextra the only warnings are about the two unused variables" is relevant is because C++ doesn't allow implicit conversions between pointer and non-pointer types. So the fact that my program has no casts and doesn't produce an error means that "hello" isn't being treated as a pointer or something nonsensical like that.
Admin
Admin
Ha! I deserved that. I misread what he was saying and then confused my own wording... I thought he was saying that assigning the address of the array to a pointer was somehow impossible. I guess he meant the char[] was like a constant pointer that couldn't be re-assigned to point to something else...
Oh well. I fail for the day!
Admin
Then it's no surprise that it's the only way I could think of to get compiling code that does it ;)
Admin
And there's no way you can know that without checking each case. So if you have a large project with lots of possible usage of uninitialized variables, you can't just add initialization everywhere willy-nilly and claim you fixed the uninitialized variables problem.
Admin
Admin
Having written a compiler would seem to be an advantage.
Admin
template<typename T> boost::smart_ptr<t>;
doesn't deal with the heap frag issue.. but then that is generally an issue that should be deferred to the OS memory manager anyway.
does prevent any possibility of a memory leak, as long as all your references are getting cleaned up somewhere. circular references can be a beotch. (that's what boost::weak_ptr<T> is for)
huge stack variables are never a good idea. you can never guarantee you're only "so deep in the stack"
truth
Admin
I really don't believe that you can go ten years without tripping up on this basic semantic difference.
Frankly, I don't believe that you can manage more than a couple of weeks. Which probably brings us back to the OP.
Admin
No, the WTF is the process, because the world is full of Winstons, and if your process can't deal with them, it's too fragile.
Admin
My research into compilers (and constructing my own) and formal language/software development (primarily via Z) has also stood me in good stead for understanding what really happens 'under the hood'.
I think what we really have here, as they say, is a 'failure to communicate' ;)
Admin
Fine. I'll be explicit.
Pointers and arrays differ in:
Meaning to the programmer, in that the array positively indicates that it's actually an array of some size while a pointer could potentially just point to one element, and you have to know the context to know whether it points to one element or an array. (It's debatable how big this difference is. After all, an array could be one element large (which is still slightly different from just an atom, but not in ways I can think of that are likely to matter on any reasonable implementation), and to work safely on an array you need to know more context -- it's size -- anyway.)
Implementation. I've already asserted with what I feel is quite a bit of evidence that "char a[10]" does not, in fact, introduce a pointer. It definitely allocates the array's storage with automatic storage allocation, whereas with "char * p" you'll need to malloc some space (or find it elsewhere). Accesses to "a[i]" from the containing function are likely to be relative to the stack pointer or frame pointer rather than through pointer to the array proper. Accesses to "p[i]" are likely to be accessed through the pointer p. (Constant folding might eliminate this if the pointer is assigned to point to a local array, but you'll be able to see this if you don't turn on optimization or if you break the constant propagation by introducing an alternate path.)
Behavior. sizeof() returns different results for pointers and arrays. typeid() in C++ returns different results for pointers and arrays. Same with typeof() in GCC. You will blow your stack faster with "char a[100];" vs. "char * p = malloc(100);". The array allocated in the former declaration will be deallocated automatically upon exit to from the containing function; the latter won't.
These things are, as far as I'm concerned, not subject to debate. That is what I feel a plenty reasonable enough catalog to declare that "saying that char* and char[] are equivalent is a lie". If you feel that, despite these differences, you want to say they are equivalent; fine, go ahead.
Admin
Semantics is the word of the day... Semantics 3. the meaning, or an interpretation of the meaning, of a word, sign, sentence, etc.
I seriously doubt that he meant the compiler was going to create a pointer that then pointed to the memory location of the 10 characters... He was saying that once char a[10] is created, using the variable 'a' is the same as using a constant pointer. You can deference it with *, you can do pointer addition with it, you can use the [] operator... After creation it functions the same as a pointer. 'a' points to the first char in the array just like a pointer would. 'a+1' points to the second character, etc.
I know it isn't an actual pointer, but you can surely admit that the usage is identical.
Admin
I don't understand why so many people are saying that Winston's change was an improvement even in coding style, much less in actual practice.
Assigning to a variable on the same line where you declare it, then declaring another variable on the next line, constitutes putting a declaration after a statement.
Mixing declarations and statements that way is bad. (Proof left as an exercise for the reader, because I'm about to go home.)
Therefore, even just by the example given, this was a change for the worse in terms of good coding style.
Not necessarily on the level of a WTF just for that, but still. How people can say that moving from "declare everything first, then start working with it" to "start working with things while you're still declaring them" is an improvement is bewildering to me.
(The change is also bad in that, as someone else pointed out, just because you've initialized it doesn't automatically mean that you've given it the correct value; if it would have been used initialized before the change, you'd have gotten a warning, whereas after the change the compiler sees the initialization and doesn't issue the warning - and as a result, you don't get that pointer to the possible cause of a problem.)
Admin
Addendum (2009-07-30 17:17): And compiler diagnostics.
Addendum (2009-07-30 17:25): And sometimes when & is involved:
Addendum (2009-07-30 17:27): Bwuuuu? My last example's wrong... must have pulled the wrong version off the clipboard (ctrl-v vs. middle-click).
Admin
The Real WTF is how this article was written without a WTF. Tahts right, it was an article that involved ALL of the readers!! The future of the web. This, my friends, is web 3.0. Log in today to experience a REAL wtf!!!
Admin
You are cherry picking your examples. There are instances where char[] and char* are not interchangeable.
If you think none of this matters, I have seen real bugs caused when arrays were changed to pointers, in code such as the following:
Original code:
As long as a is an array, the code works. As soon as a is changed to a pointer, the code stops working.
Broken code:
Of course, the correct code would've been:
To be honest, I have seen this kind of confusion between (&some_array_name and some_array_name) from junior and senior coders alike, unfortunately.
So please don't tell me char [] and char * are interchangeable. Again, my example is taken from a real bug, caused by a programmer changing char[] to char* in real code. (The variable in question was changed from a statically defined string array to a passed-in string pointer.) The bug was caught well before release, though.
Admin
The simple code:
...viewed as assembler......highlights how char* and char[] are essentially no different 'under the hood'. Yes, semantically they mean different things (as I said in my original post). Here, they are both pointers, albeit local, where the array declaration resolves to an implicit pointer. This is why the "[n]" syntax is essentially a char* dereference plus an offset. This is why you can happily interchange char* and char[] without problems.
Interestingly, gcc now deprecates code like:
char* blah = "blah";
...and would prefer that you use:
char blah[] = "blah";
...and does so to enforce the semantics of the language, not because of the compiler output.
Admin
I worked on some code where the original developer always passed strings around using "&mystr[0]" regardless of whether it was a pointer or an array he was working with. I never did figure out why.