The Daily WTF: Curious Perversions in Information Technology

2009-07-30 Reply Admin

TheCPUWizard:
Regarding those who just said "Roll it back, no biggie"...
Consider if Winston checked in 20% of the files each day during the two weeks.

Each morning every other developer "refreshed", made (functional) changes, and checked in their code.

If you rollback to before Winston touched it, you also lose all of the other developers work!!!

(Now if Winston had worked on a BRANCH, and then did an atomic merge.......)

Create a branch from the head of two weeks ago, then apply every change made by the other developers from two weeks ago to the present. Fix any conflicts that occur manually. When done, turn the branch into your new head.

All decent revision control tools can do this--even the broken ones that require you to explicitly create and publish a branch before you start doing work. The better tools can automate most of the process. Git has an example in its documentation for handling exactly this case (remove all contributions by a specific developer) on the filter-branch man page, although that was designed more for removing copyright infringements and "oops I accidentally committed all my pr0n" than for cleaning up after overzealous developers of questionable competence.

If the developer did any real work during those two weeks, you'll have to redo it, but in the worst case you only lose two weeks of one developer's work plus any conflicts with other developers' work, not two weeks of the entire development team's work. Testing, of course, is a total loss--you'll have to redo the previous two week's testing and possibly more.

2009-07-30 Reply Admin

In fact, char[] is aequivalent to char*. Each one is a pointer to an 'array' of chars, thus a pointer to the first byte of a number of bytes representing a string. The two different representations are only syntactic sugar.

AFAIR at least. :)

EvanED · 2009-07-30 Reply Admin

Lupus.Umbrae:
Oxyd:
guy:
char data_string[15] = "data data data";
Doesn't this mean he overwrites the pointer data_string with the pointer to "data data data". Since both are of type char*?
data_string is of type char[15], not char*. In this context, this syntax is a mere syntactical sugar for char data_string[15] = { 'd', 'a', 't', 'a', ' ', 'd', 'a', 't', 'a', ' ','d', 'a', 't', 'a', 0 };. So no, he just initialized an array.
Actually, data_string IS a char*, pointing to data_string[0]. So, he alloated a 15 byte array and gave it the adress of... a 15 byte long static string? Weird...

No, it isn't. data_string is a 15 character entry, and will take up 15 bytes on the stack. Want me to prove it to you?

void foo() { char blah[SIZE] = "hi"; }

I took this code and compiled it with three different definitions of size, and told Intel CC to output assembly for each.

Here is the beginning of the definition of foo() with "icc -S -DSIZE=5 -O0":

foo:
..B1.1:                         # Preds ..B1.0
        pushl     %ebp
        movl      %esp, %ebp
        subl      $8, %esp
        movb      $104, -8(%ebp)
        movb      $105, -7(%ebp)
        xorl      %eax, %eax

Note that the size of the stack frame is set at 8 (5 bytes aligned up).

Here it is with -DSIZE=16:

foo:
..B1.1:                         # Preds ..B1.0
        pushl     %ebp
        movl      %esp, %ebp
        subl      $16, %esp

Note that the size of the stack frame is set at 16.

Here it is with -DSIZE=1024:

foo:
..B1.1:                         # Preds ..B1.0
        pushl     %ebp
        movl      %esp, %ebp
        subl      $1032, %esp

Note that the size of the stack frame is 1032 (some padding/alignment issue? I'm not sure).

Why would the size of the stack frame change if it was a pointer holding an address?

Want another proof? Run this program:

#include <stdio.h>

int main() {
  char * p;    
  char arr[15] = "hi";

  printf("p:   %d\narr: %d\n", sizeof(p), sizeof(arr));
  return 0;
}

My output (from GCC):

p:   4
arr: 15

Don't confuse the fact that arrays decay into pointers with the fact that they are still very different creatures.

Addendum (2009-07-30 14:24): I guess for completeness I should do the pointer-based version too. Here's the source:

#include <stdlib.h>
#include <string.h>
void foo() { char * p = malloc(SIZE); strcpy(p, "hi"); }

Here's the beginning of foo with -DSIZE=5:

foo:
..B1.1:                         # Preds ..B1.0
        pushl     %ebp
        movl      %esp, %ebp
        subl      $12, %esp
        addl      $0, %esp
        movl      $5, (%esp)
        call      malloc

Here's the beginning of foo with -DSIZE=16:

foo:
..B1.1:                         # Preds ..B1.0
        pushl     %ebp
        movl      %esp, %ebp
        subl      $12, %esp
        addl      $0, %esp
        movl      $16, (%esp)
        call      malloc

And with -DSIZE=1024:

foo:
..B1.1:                         # Preds ..B1.0
        pushl     %ebp
        movl      %esp, %ebp
        subl      $12, %esp
        addl      $0, %esp
        movl      $1024, (%esp)
        call      malloc

Note how they are all the same -- with a stack frame size of 12.

EvanED · 2009-07-30 Reply Admin

Bim Job:
A lot depends upon your choice of platform, compiler, and compiler flags. You aren't using flags, which isn't very enterprisey (or indeed typical). Checking your results with msys and gcc 3.4.2, they seem very plausible.

To be fair, I tried it with (GCC | Intel CC)(-O0 | -O2 | -Os) in all six configurations, and all six exhibited this difference.

So even though compilers don't have to do that, it looks like they do.

ollo:
In fact, char[] is aequivalent to char*. Each one is a pointer to an 'array' of chars, ...

No, it's not. Stop spreading this lie.

2009-07-30 Reply Admin

There are some hardware-portability greybeards that still insist that:

{ int x; x = 2; }

...is the one true path to guarantee divine compilation.

I light their beards and run.

2009-07-30 Reply Admin

Not true. Allocating v large buffers on the stack is a BAD idea, since you could blow out the stack. Also you can't return pointers to stack allocated arrays out of a function.

2009-07-30 Reply Admin

theCoder:
I must be missing something. Winston's changes, though extensive and probably unnecessary, should be harmless. Of course, I'm not sure what "initiated on the stack" means. Maybe that means "initialized on the stack"? But in the example given, both variables are allocated on the stack. The only other way they could be allocated is on the heap, such as:
  char* data_string = new char[15];
  strcpy(data_string,"data data data");
Changing that sort of allocation to be on the stack could cause problems, though it's generally a better idea to allocate on the stack than on the heap (faster and has automatic cleanup).
So is the WTF the fact that there was no oversight of what developers were doing?

Not true. Allocating v large buffers on the stack is a BAD idea, since you could blow out the stack. Also you can't return pointers to stack allocated arrays out of a function. The WTF is clear to me. An overzealous dev going in and wasting company time by changing everything that personally offends him.

EvanED · 2009-07-30 Reply Admin

cpun:
Not true. Allocating v large buffers on the stack is a BAD idea, since you could blow out the stack.

Saying it's a bad idea is too big of a generalization. There are quite a few benefits to automatic allocation: better compiler diagnostics if you screw up (it can do something dead simple to find out that the array is too small instead of dataflow analysis that I doubt many compilers do), the lack of uninitialized variables, the inability to get a memory leak, allocation and deallocation that's basically free, and allocation that doesn't contribute at all to heap fragmentation. That's off the top of my head and (for the diagnostics reason) from earlier in this thread.

Whether it will blow your stack is definitely a consideration, but it's far from the only one. It may be that increasing your stack size is the right option.

Anyway, this ignores the fact that, if the example in the article is faithful, Winston or whatever his name was changed it from... stack allocation to stack allocation.

2009-07-30 Reply Admin

EvanED:

ollo:
In fact, char[] is aequivalent to char*. Each one is a pointer to an 'array' of chars, ...

No, it's not. Stop spreading this lie.

The language semantics are technically different, but under the hood, they are identical. It is not a lie.

char* == char[] char** == *char[]

2009-07-30 Reply Admin

If being a good team member means continuing to write bad code so it fits in with their style - count me out!

2009-07-30 Reply Admin

That's not true. Multidimensional arrays are quite different from pointers to pointers, or even arrays of pointers. Multidimensional arrays are allocated as continuous data. Reproducing that with pure pointers is not easy. Also, pointers are not necessarily constants while arrays are, by definition.

2009-07-30 Reply Admin

Lupus.Umbrae:
Actually, data_string IS a char*, pointing to data_string[0]. So, he alloated a 15 byte array and gave it the adress of... a 15 byte long static string? Weird...

No ... data_string is CONSTANT pointer to 15 chars that reside in stack <=> you can not assign pointers to it. Luckily c/c++ is not an ass (i mean here x_x - otherwise it is as ass as it can be) and does here a copy instead.

http://www.cplusplus.com/doc/tutorial/pointers/

EvanED · 2009-07-30 Reply Admin

Stig:
The language semantics are technically different, but under the hood, they are identical. It is not a lie.
char* == char[] char** == *char[]

That's like saying "except for the places where they are different, they are identical".

They are allocated differently, they take up different amounts of storage and provide different information to sizeof, they provide different information to C++'s typeid. If you declare a variable of type char[N], there will almost certainly be nowhere that the address of that array is explicitly stored (unless you pass it as a parameter to another function or assign it to a pointer), because the program will access it via the stack or frame pointer.

2009-07-30 Reply Admin

Shouldn't the text in the article:

"copying twenty characters into a variable with a length of twenty-five"

be

"copying twenty-five characters into a variable with a length of twenty"

?

2009-07-30 Reply Admin

EvanED:
Stig:
The language semantics are technically different, but under the hood, they are identical. It is not a lie.
char* == char[] char** == *char[]

That's like saying "except for the places where they are different, they are identical".

No idea what you mean. "Semantically" they are different because one 'means' a pointer to a character, t'other 'means' and array of characters....however, an array essentially 'decays' (not my choice of word) into a char*, it just happens to have a contiguous bunch of chars next to it in memory.

EvanED:
They are allocated differently, they take up different amounts of storage and provide different information to sizeof, they provide different information to C++'s typeid.

All true. This is where confusion sets in. I am not talking about how the compiler allocates stuff, merely the underlying identical nature of a char* string and a char[] string.

Xepol · 2009-07-30 Reply Admin

For those who have not followed the chain, the cause of the bloat is:

The original strcpy(s,"static text") means that "static text" occupies only as much memory as the string plus a null.
The new char s[xx]="static text" requires that "static text" be stored in a buffer of xx characters, null padded - in otherwords, a heck of a lot of nulls padded into the executable to no advantage.

All the bloat was useless null padding.

Ok, granted it's a heck of headbanger (esp. for those of us who use languages that actually know what a string is) - however I think the real problem is perhaps being ignored here.

The real problem is the unintelligent way the compiler chose to initialize the char arrays. It took the easier, lazy route. So, perhaps this is more a wtf for the compiler's optimizer design.

That said, maybe some people foolishly depend on those nulls in their code (for ugly unwise hacks), and attempts to change the optimizer were quashed long ago.

EvanED · 2009-07-30 Reply Admin

Stig:
No idea what you mean. "Semantically" they are different because one 'means' a pointer to a character, t'other 'means' and array of characters....however, an array essentially 'decays' (not my choice of word) into a char*, it just happens to have a contiguous bunch of chars next to it in memory.

That's not what "semantics" means to me in the world of computer languages, where it means "how the program behaves".

And char arrays behave differently than char pointers.

I am not talking about how the compiler allocates stuff, merely the underlying identical nature of a char* string and a char[] string.

What's the "underlying identical nature" if they are not implemented the same, don't mean the same thing, or behave the same way?

Technical Thug · 2009-07-30 Reply Admin

anon:
Shouldn't the text in the article:
"copying twenty characters into a variable with a length of twenty-five"

be

"copying twenty-five characters into a variable with a length of twenty"

?

I thought that was going to be the WTF. Then I read more. Now I don't know at all.

Is there a compressing compiler that will change "123456790"x1000 into a tiny zip'd blob?

Heron · 2009-07-30 Reply Admin

Having read everyone's comments, I'm still not seeing much of a WTF. WTFs I see:

Winston made lots of changes just before a major release
Winston didn't do this on a branch (though, if they were using Visual SourceSafe, I can't blame him)
The article never explains why there were suddenly two CDs.

Presumably the install CDs did not contain the source code, but the compiled executable. Without knowing more about the code, we can't know whether Winston's changes made the compiled code larger or smaller, but my gut tells me it's essentially the same (within a few MB, anyway).

Having said that, I have a related WTF. I was working on an application that, over the years, had migrated through the following:

C on Unix with a custom-coded GUI
C, cross-platform, with a custom-coded GUI
C, Windows-only, with a custom-coded GUI
C++, Windows-only, using MFC

As a result, the code had a lot of ancient cruft. Specifically, there were a lot of #define'd constants, and the popup menu code was disgusting.

Soon after I started, I suggested changing these to const variables (after all, that's what the const keyword is for), since we'd get typechecking and such in return for the effort. This went very far down on the to-do list.

Eventually, just after a major release, I was told to do it, so I did. Magically, nothing broke, and I'm sure some bugs disappeared once I fixed the resulting syntax and linker errors. (Undefined variable? Duplicate definition? Say it ain't so!)

That's not the WTF, that's actually a good story. The WTF is that several months later, that same manager fired me from the team for taking some convoluted menu code (~2000 lines over four source files) and rewriting it cleanly (~200 lines over one source file). He claimed it was outside the scope of my assignment (which was to add another menu option in said menu code). Never mind that it took less time to rewrite than it would have taken to add the new menu option to the spaghetti.

So apparently, according to that manager, cleaning up code is only OK when it has been explicitly assigned, even if said cleanups are directly related to one's current assignment.

Also, builds were done in a most horrific manner. Source files were grouped by function, so files related to "foo" were in the "foo" folder. We'd have foo/bar1.cpp, foo/bar2.cpp, foo/bar3.cpp, and so on.

This same manager decided that in order to speed up complete rebuilds, he'd do the following in an all_foo.cpp:

#include <bar1.cpp>
#include <bar2.cpp>
#include <bar3.cpp>
// etc

He then excluded the original files from the build. This did result in faster complete rebuilds, at the expense of slower partial rebuilds (i.e. what we do all day) and BREAKING FILE SCOPE RULES.

Yeah... we had lots of fun tracking down compile errors due to programmers assuming that they could trust file scope rules to hold. We never did convince the manager to revert those changes.

EvanED · 2009-07-30 Reply Admin

Xepol:
That said, maybe some people foolishly depend on those nulls in their code (for ugly unwise hacks), and attempts to change the optimizer were quashed long ago.

That's not very foolish if you ask me; compilers are required to zero-initialize the remainder of arrays that are partially initialized. The C spec guarantees it and compilers do it. Why is depending on such behavior unwise?

(Not to say you can't depend on it in unwise ways, just that depending on it doesn't necessarily make it unwise.)

Heck, it's actually a really convenient shorthand. If you want a zero-initialized array, "int a[100] = {0};" is way better than "int a[100] = {0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0};".

That said, compilers probably could make that code better by not emitting all of the zeros in the data segment of the exe, and basically calling memset in its place dynamically. I doubt it'd be appreciably slower.

Heron · 2009-07-30 Reply Admin

EvanED:
That said, compilers probably could make that code better by not emitting all of the zeros in the data segment of the exe, and basically calling memset in its place dynamically. I doubt it'd be appreciably slower.

Especially since in many (most?) situations like this it could be done at startup, and adding a few ms to an app's startup time is rarely an issue. They could probably make it configurable behavior (a compiler flag).

2009-07-30 Reply Admin

EvanED:
Stig:
No idea what you mean. "Semantically" they are different because one 'means' a pointer to a character, t'other 'means' and array of characters....however, an array essentially 'decays' (not my choice of word) into a char*, it just happens to have a contiguous bunch of chars next to it in memory.
That's not what "semantics" means to me in the world of computer languages, where it means "how the program behaves".

Semantics is all about meaning, not behavior...an understanding of what it is you are expressing with your code.

And char arrays behave differently than char pointers.

Technically, neither 'behave' like anything. They're simply memory structures. If you look at char a[10] you will see that a is a pointer....to a char....followed by 9 other chars. This is the "underlying identical nature' I was referring to.

alegr · 2009-07-30 Reply Admin

Stig:
EvanED:

ollo:
In fact, char[] is aequivalent to char*. Each one is a pointer to an 'array' of chars, ...

No, it's not. Stop spreading this lie.

The language semantics are technically different, but under the hood, they are identical. It is not a lie.

char* == char[] char** == *char[]

Wow. It's like saying that a pointer and an int are technically different, but under the hood, they are identical. They both occupy 32 bits, after all. On x86, at least. IIRC, early K&R C didn't see a difference.

Go read up on lvalue and rvalue. And also see what's type of a pointer to char* and a pointer to char[10].

2009-07-30 Reply Admin

Heron:
EvanED:
That said, compilers probably could make that code better by not emitting all of the zeros in the data segment of the exe, and basically calling memset in its place dynamically. I doubt it'd be appreciably slower.

Especially since in many (most?) situations like this it could be done at startup, and adding a few ms to an app's startup time is rarely an issue. They could probably make it configurable behavior (a compiler flag).

I have a simple c program with a bunch of tests to verify various 'assumed' compiler behaviors. Whenever I hit a different compiler, I compile it, run it, and make sure I'm not going to blindly run headlong into gotcha territory :)

2009-07-30 Reply Admin

Not Quite:
No ... data_string is CONSTANT pointer to 15 chars that reside in stack <=> you can not assign pointers to it. Luckily c/c++ is not an ass (i mean here x_x - otherwise it is as ass as it can be) and does here a copy instead.
http://www.cplusplus.com/doc/tutorial/pointers/

I don't know what you're saying here, but you can assign a pointer to an array all day long... The article you linked to even says:

int numbers [20];
int * p;

The following assignment operation would be valid:

p = numbers;

If you were taking a stand for char[] being different than char*, you're on the right side of the argument, but...

2009-07-30 Reply Admin

Go read up on lvalue and rvalue. And also see what's type of a pointer to char* and a pointer to char[10].

No. I know I'm right, and have many years of experience being right about this kind of triviality.

Confusion over what language elements mean, what compilers do with them and underlying memory structures is common among beginners.

2009-07-30 Reply Admin

OutlawProgrammer:
...Good thing we had Visual Source Safe *groan*

You have my sympathy. We have always avoided needing to do anything more than check out, check in with sourcesafe. Was rolling back that many files as fun as it has always looked?

We are now in the process of moving to TFS. Sadly sourcesafe contains LOTS of shared files, so there is a fair bit of re-organisation having to happen to each project when it is moved.

Technical Thug · 2009-07-30 Reply Admin

Observer:
but you can assign a pointer to an array all day long... The article you linked to even says:
int numbers [20];
int * p;
The following assignment operation would be valid:
p = numbers;

So in your crazy moon language, assigning an array to a pointer is the same as assigning a pointer to an array?

2009-07-30 Reply Admin

Chris Becke:
Anon:
Stack variables don't cause memory leaks. Crashes due to buffer over/underflow, yes, but that's not a "memory leak".

This kind of pattern is not unusual
  void func(char const* param){
    bool fexternal;
    if(!param){
      param = malloc();
      fexternal=false;
    }
    DoSomething();
    if(!fexternal)
      free(param);
  }

Of course that stack variable 'fexternal' is going to cause a random (in release) crash somewhere. (initialization is good!)

Heron · 2009-07-30 Reply Admin

So in your crazy moon language, assigning an array to a pointer is the same as assigning a pointer to an array?

void foo(char xyz[]) { /* whatever */ }

int main()
{
  char* bar = "foobar";
  foo(bar);
  return 0;
}

This compiles with no warnings.

Ilya Ehrenburg · 2009-07-30 Reply Admin

Stig:
EvanED:
Stig:
No idea what you mean. "Semantically" they are different because one 'means' a pointer to a character, t'other 'means' and array of characters....however, an array essentially 'decays' (not my choice of word) into a char*, it just happens to have a contiguous bunch of chars next to it in memory.
That's not what "semantics" means to me in the world of computer languages, where it means "how the program behaves".
Semantics is all about meaning, not behavior...an understanding of what it is you are expressing with your code.

Some relevant words bolded. In the world of computer languages, the "meaning" of a language construct is its "behaviour", that's why the term "semantics" was borrowed from linguistics. If you used it in the linguistic sense in your first post, fine, but you should have said that since it's usually used in the CS sense when talking about programming language constructs.

And char arrays behave differently than [recte: from] char pointers.
Technically, neither 'behave' like anything. They're simply memory structures. If you look at char a[10] you will see that a is a pointer....to a char....followed by 9 other chars. This is the "underlying identical nature' I was referring to.

They "behave" differently in the sense that the compiler produces different machine code for either of these structures, e.g.

char a[15];
char *p = malloc(15);
printf("%d\n%d\n", sizeof(a), sizeof(p));

illustrates that.

Yes, in many situations an array decays into a pointer, but not in all. Most of the time you can regard arrays and (const) pointers as equivalent, but not always.

EvanED · 2009-07-30 Reply Admin

Stig:
Semantics is all about meaning, not behavior...an understanding of what it is you are expressing with your code.

I do research in programming languages and have a bunch of PL books at my fingertips; I know what semantics means in a PL context. Is it used differently in less academic settings? I don't know. That's why I didn't use that word in the rest of my post.

If you look at char a[10] you will see that a is a pointer....to a char....followed by 9 other chars.

What pointer? There's no pointer with char a[10]. Don't believe me? New test program:

int main()
{
    int x = 0xABABABAB;
    char array[] = "hello";
}

I compiled this with 'gcc -g', loaded the resulting executable in GDB, set a breakpoint at main(), ran it, and stepped over the two initializations. I used "print &x" to get the address of x (0xbf9aad40) and "print &array" to get the address of array (0xbf9aad3a). I took a memory dump around those addresses with "x/20x 0xbf9aad20", and I've pasted the output from that command below. I've highlighted the string "hello\0" (0x68, 0x65, 0x6c, 0x6c, 0x6f, 0) and the value 0xABABABAB:

0xbf9aad20:     0x00b81ff4      0x00b8020c      0xbf9aad58      0x080483b9
0xbf9aad30:     0x00a6ddb5      0xbf9aadec      0x6568ad58      0x006f6c6c
0xbf9aad40:     0xabababab      0xbf9aad60      0xbf9aadb8      0x00a57e8c
0xbf9aad50:     0x00a3aca0      0x080483a0      0xbf9aadb8      0x00a57e8c
0xbf9aad60:     0x00000001      0xbf9aade4      0xbf9aadec      0x00a3b810

If you look around these values, there are a lot of pointers to areas around this: 0xbf9aadec, 0xbf0aade4, etc. But you'll notice a conspicuous lack of any pointer to 0xbf9aad3a. That's because there isn't one.

Still not convinced? Think that the initialization is overwriting the value of the pointer or something like that? The program compiles with g++ too, and with -Wall -Wextra the only warnings are about the two unused variables. Still not convinced? Here's a new version of the program:

#include <string.h>
int main()
{
    int x = 0xABABABAB;
    char array[10];
    strcpy(&array[0], "hello");
}

GDB reports 'array' is at 0xbfdca956. Here's a memory dump:

0xbfdca940:     0x00b81ff4      0x00b8020c      0xbfdca978      0x080483b9
0xbfdca950:     0x00a6ddb5      0x6568aa0c      0x006f6c6c      0x00b81ff4
0xbfdca960:     0xabababab      0xbfdca980      0xbfdca9d8      0x00a57e8c
0xbfdca970:     0x00a3aca0      0x080483a0      0xbfdca9d8      0x00a57e8c
0xbfdca980:     0x00000001      0xbfdcaa04      0xbfdcaa0c      0x00a3b810

Where is this mythical pointer that you think 'array' actually is? Again, there isn't one anywhere near that address. Furthermore, the string "hello\0" is now stored in the stack frame. If array were a pointer, it would be stored at whatever the address is that it pointed to.

Stig:
No. I know I'm right, and have many years of experience being right about this kind of triviality.

Then you're wrong about what you know, or we're actually agreeing on everything except what it means to be "the same" (where for you, 'being the same" apparently means "being the same in everything except how it behaves, how it is implemented, and what it means to the programmer").

Confusion over what language elements mean, what compilers do with them and underlying memory structures is common among beginners.

I just taught a course on compilers this past semester. Want to have a "who knows more about compilers"-off?

Addendum (2009-07-30 16:32): BTW, in case it's not relevant, the reason "The program compiles with g++ too, and with -Wall -Wextra the only warnings are about the two unused variables" is relevant is because C++ doesn't allow implicit conversions between pointer and non-pointer types. So the fact that my program has no casts and doesn't produce an error means that "hello" isn't being treated as a pointer or something nonsensical like that.

Ilya Ehrenburg · 2009-07-30 Reply Admin

Heron:
So in your crazy moon language, assigning an array to a pointer is the same as assigning a pointer to an array?
void foo(char xyz[]) { /* whatever */ }

int main()
{
  char* bar = "foobar";
  foo(bar);
  return 0;
}
This compiles with no warnings.

Of course.

cfaq:
This conversion of array-like declarators into pointers holds only within function formal parameter declarations, nowhere else.

2009-07-30 Reply Admin

Technical Thug:

So in your crazy moon language, assigning an array to a pointer is the same as assigning a pointer to an array?

Ha! I deserved that. I misread what he was saying and then confused my own wording... I thought he was saying that assigning the address of the array to a pointer was somehow impossible. I guess he meant the char[] was like a constant pointer that couldn't be re-assigned to point to something else...

Oh well. I fail for the day!

Heron · 2009-07-30 Reply Admin

Ilya Ehrenburg:
Of course.
cfaq:
This conversion of array-like declarators into pointers holds only within function formal parameter declarations, nowhere else.

Then it's no surprise that it's the only way I could think of to get compiling code that does it ;)

joelkatz · 2009-07-30 Reply Admin

relaxing:
Uninitialized variables are bad, mmmkay?

Sure, but only because they might be *used* before they're initialized. However, changing: int x; to int x=0; doesn't fix anything. If 'x' is used before it's set to the *correct* value, arbitrarily setting it to zero won't help very much, unless you luck out and zero happens to be the correct value in every case where it was previously used without being initialized.

And there's no way you can know that without checking each case. So if you have a large project with lots of possible usage of uninitialized variables, you can't just add initialization everywhere willy-nilly and claim you fixed the uninitialized variables problem.

Ilya Ehrenburg · 2009-07-30 Reply Admin

EvanED:
Want to have a "who knows more about compilers"-off?

Bets now accepted. Currently, EvanED is 7-1 odds on favourite.

2009-07-30 Reply Admin

Having written a compiler would seem to be an advantage.

Kazan · 2009-07-30 Reply Admin

EvanED:
Saying it's a bad idea is too big of a generalization. There are quite a few benefits to automatic allocation: better compiler diagnostics if you screw up (it can do something dead simple to find out that the array is too small instead of dataflow analysis that I doubt many compilers do), the lack of uninitialized variables, the inability to get a memory leak, allocation and deallocation that's basically free, and allocation that doesn't contribute at all to heap fragmentation. That's off the top of my head and (for the diagnostics reason) from earlier in this thread.

template<typename T> boost::smart_ptr<t>;

doesn't deal with the heap frag issue.. but then that is generally an issue that should be deferred to the OS memory manager anyway.

does prevent any possibility of a memory leak, as long as all your references are getting cleaned up somewhere. circular references can be a beotch. (that's what boost::weak_ptr<T> is for)

EvanED:
Whether it will blow your stack is definitely a consideration, but it's far from the only one. It may be that increasing your stack size is the right option.

huge stack variables are never a good idea. you can never guarantee you're only "so deep in the stack"

EvanED:

Anyway, this ignores the fact that, if the example in the article is faithful, Winston or whatever his name was changed it from... stack allocation to stack allocation.

truth

2009-07-30 Reply Admin

Ilya Ehrenburg:
Stig:
EvanED:
Stig:
No idea what you mean. "Semantically" they are different because one 'means' a pointer to a character, t'other 'means' and array of characters....however, an array essentially 'decays' (not my choice of word) into a char*, it just happens to have a contiguous bunch of chars next to it in memory.
That's not what "semantics" means to me in the world of computer languages, where it means "how the program behaves".
Semantics is all about meaning, not behavior...an understanding of what it is you are expressing with your code.

Some relevant words bolded. In the world of computer languages, the "meaning" of a language construct is its "behaviour", that's why the term "semantics" was borrowed from linguistics. If you used it in the linguistic sense in your first post, fine, but you should have said that since it's usually used in the CS sense when talking about programming language constructs.
<sizeof snip/>
Yes, in many situations an array decays into a pointer, but not in all. Most of the time you can regard arrays and (const) pointers as equivalent, but not always.

What, we're still wasting time on this Stig loon? Decades of working with C, and he still hasn't grasped the simple fact that

char buffer [20];
char* ptr = buffer;

merely aliases (invisibly, via syntactic sugar) the lvalue ptr as the rvalue &buffer[0], but has no bearing on the fact that, semantically, a pointer-to-char is an entirely different beast to a fixed-length array of char?

I really don't believe that you can go ten years without tripping up on this basic semantic difference.

Frankly, I don't believe that you can manage more than a couple of weeks. Which probably brings us back to the OP.

2009-07-30 Reply Admin

Anonymous:
The WTF here is the attitude of the likes of Winston, plain and simple.

No, the WTF is the process, because the world is full of Winstons, and if your process can't deal with them, it's too fragile.

2009-07-30 Reply Admin

"Then you're wrong about what you know"

Then I'm wondering where all the exploding code I'm guilty of creating is ;) Since I'm rather meticulous about using tools like valgrind and gdb, I suspect that my code is perfectly fine.

My research into compilers (and constructing my own) and formal language/software development (primarily via Z) has also stood me in good stead for understanding what really happens 'under the hood'.

I think what we really have here, as they say, is a 'failure to communicate' ;)

EvanED · 2009-07-30 Reply Admin

Stig:
I think what we really have here, as they say, is a 'failure to communicate' ;)

Fine. I'll be explicit.

Pointers and arrays differ in:

Meaning to the programmer, in that the array positively indicates that it's actually an array of some size while a pointer could potentially just point to one element, and you have to know the context to know whether it points to one element or an array. (It's debatable how big this difference is. After all, an array could be one element large (which is still slightly different from just an atom, but not in ways I can think of that are likely to matter on any reasonable implementation), and to work safely on an array you need to know more context -- it's size -- anyway.)
Implementation. I've already asserted with what I feel is quite a bit of evidence that "char a[10]" does not, in fact, introduce a pointer. It definitely allocates the array's storage with automatic storage allocation, whereas with "char * p" you'll need to malloc some space (or find it elsewhere). Accesses to "a[i]" from the containing function are likely to be relative to the stack pointer or frame pointer rather than through pointer to the array proper. Accesses to "p[i]" are likely to be accessed through the pointer p. (Constant folding might eliminate this if the pointer is assigned to point to a local array, but you'll be able to see this if you don't turn on optimization or if you break the constant propagation by introducing an alternate path.)
Behavior. sizeof() returns different results for pointers and arrays. typeid() in C++ returns different results for pointers and arrays. Same with typeof() in GCC. You will blow your stack faster with "char a[100];" vs. "char * p = malloc(100);". The array allocated in the former declaration will be deallocated automatically upon exit to from the containing function; the latter won't.

These things are, as far as I'm concerned, not subject to debate. That is what I feel a plenty reasonable enough catalog to declare that "saying that char* and char[] are equivalent is a lie". If you feel that, despite these differences, you want to say they are equivalent; fine, go ahead.

2009-07-30 Reply Admin

EvanED:
Stig:
Semantics is all about meaning...

If you look at char a[10] you will see that a is a pointer....to a char....followed by 9 other chars.
What pointer? There's no pointer with char a[10]...

Semantics is the word of the day... Semantics 3. the meaning, or an interpretation of the meaning, of a word, sign, sentence, etc.

I seriously doubt that he meant the compiler was going to create a pointer that then pointed to the memory location of the 10 characters... He was saying that once char a[10] is created, using the variable 'a' is the same as using a constant pointer. You can deference it with *, you can do pointer addition with it, you can use the [] operator... After creation it functions the same as a pointer. 'a' points to the first char in the array just like a pointer would. 'a+1' points to the second character, etc.

I know it isn't an actual pointer, but you can surely admit that the usage is identical.

2009-07-30 Reply Admin

I don't understand why so many people are saying that Winston's change was an improvement even in coding style, much less in actual practice.

Assigning to a variable on the same line where you declare it, then declaring another variable on the next line, constitutes putting a declaration after a statement.

Mixing declarations and statements that way is bad. (Proof left as an exercise for the reader, because I'm about to go home.)

Therefore, even just by the example given, this was a change for the worse in terms of good coding style.

Not necessarily on the level of a WTF just for that, but still. How people can say that moving from "declare everything first, then start working with it" to "start working with things while you're still declaring them" is an improvement is bewildering to me.

(The change is also bad in that, as someone else pointed out, just because you've initialized it doesn't automatically mean that you've given it the correct value; if it would have been used initialized before the change, you'd have gotten a warning, whereas after the change the compiler sees the initialization and doesn't issue the warning - and as a result, you don't get that pointer to the possible cause of a problem.)

EvanED · 2009-07-30 Reply Admin

Person:
I know it isn't an actual pointer, but you can surely admit that the usage is identical.

I'll admit that -- except for allocation, deallocation, sizeof, typeof, typeid, and worrying about stack overflows -- the usage is identical.

Addendum (2009-07-30 17:17): And compiler diagnostics.

Addendum (2009-07-30 17:25): And sometimes when & is involved:

void foo(char **) {}
int main() { char * p = 0; char a[10] = {0};
foo(&p);  // no error
foo(&a);  // error ("cannot convert char(*)[10] to char**")
}

Addendum (2009-07-30 17:27): Bwuuuu? My last example's wrong... must have pulled the wrong version off the clipboard (ctrl-v vs. middle-click).

10390. ~/temp % cat arr_overload.cc #include <stdio.h>
void foo(char **) { printf("pointer version\n"); } void foo(char(*)[10]) { printf("array version\n"); }

int main() { char * p = 0; char a[10] = {0};
foo(&p);
foo(&a);
} 10391. ~/temp % g++ arr_overload.cc 10392. ~/temp % ./a.out pointer version array version

2009-07-30 Reply Admin

The Real WTF is how this article was written without a WTF. Tahts right, it was an article that involved ALL of the readers!! The future of the web. This, my friends, is web 3.0. Log in today to experience a REAL wtf!!!

2009-07-30 Reply Admin

Semantics is the word of the day... Semantics 3. the meaning, or an interpretation of the meaning, of a word, sign, sentence, etc.
I seriously doubt that he meant the compiler was going to create a pointer that then pointed to the memory location of the 10 characters... He was saying that once char a[10] is created, using the variable 'a' is the same as using a constant pointer. You can deference it with *, you can do pointer addition with it, you can use the [] operator... After creation it functions the same as a pointer. 'a' points to the first char in the array just like a pointer would. 'a+1' points to the second character, etc.

You are cherry picking your examples. There are instances where char[] and char* are not interchangeable.

char *a = "abcdefgh";
char b[] = "abcdefgh";

// a and b are NOT interchangeable

// sizeof() operator works differently for a and b
// sizeof(a) == 4 (on a 32-bit machine)
// sizeof(b) == 9 

// & (address of) operator works differently for a and b
// &a != a
// &b == b

If you think none of this matters, I have seen real bugs caused when arrays were changed to pointers, in code such as the following:

Original code:

char a[] = "abcdefgh";

printf("%s", &a);  /* Prints "abcdefgh" */
/* Original coder ignored compiler warning 
about incorrect printf argument type */

As long as a is an array, the code works. As soon as a is changed to a pointer, the code stops working.

Broken code:

char *a = "abcdefgh";

printf("%s", &a);  /* Behaviour is undefined */

Of course, the correct code would've been:

char *a = "abcdefgh";

printf("%s", a);

To be honest, I have seen this kind of confusion between (&some_array_name and some_array_name) from junior and senior coders alike, unfortunately.

So please don't tell me char [] and char * are interchangeable. Again, my example is taken from a real bug, caused by a programmer changing char[] to char* in real code. (The variable in question was changed from a statically defined string array to a passed-in string pointer.) The bug was caught well before release, though.

2009-07-30 Reply Admin

The simple code:

int main(int argc, char** argv) { char a[10] = "abcdefghi"; char* b = "jklmnopqr";
char* c = a;

printf("%s\n", a); printf("%s\n", b); printf("%s\n", c); }

...viewed as assembler...

.file "main.c" .section .rodata .LC1: .string "jklmnopqr" .LC0: .string "abcdefghi" .text .globl main .type main, @function main: .LFB2: pushq %rbp .LCFI0: movq %rsp, %rbp .LCFI1: subq $48, %rsp .LCFI2: movl %edi, -36(%rbp) movq %rsi, -48(%rbp) movq .LC0(%rip), %rax movq %rax, -32(%rbp) movzwl .LC0+8(%rip), %eax movw %ax, -24(%rbp) movq $.LC1, -16(%rbp) leaq -32(%rbp), %rax movq %rax, -8(%rbp) leaq -32(%rbp), %rdi call puts movq -16(%rbp), %rdi call puts movq -8(%rbp), %rdi call puts leave ret

...highlights how char* and char[] are essentially no different 'under the hood'. Yes, semantically they mean different things (as I said in my original post). Here, they are both pointers, albeit local, where the array declaration resolves to an implicit pointer. This is why the "[n]" syntax is essentially a char* dereference plus an offset. This is why you can happily interchange char* and char[] without problems.

Interestingly, gcc now deprecates code like:

char* blah = "blah";

...and would prefer that you use:

char blah[] = "blah";

...and does so to enforce the semantics of the language, not because of the compiler output.

Heron · 2009-07-30 Reply Admin

Anonymoose:
To be honest, I have seen this kind of confusion between (&some_array_name and some_array_name) from junior and senior coders alike, unfortunately.

I worked on some code where the original developer always passed strings around using "&mystr[0]" regardless of whether it was a pointer or an array he was working with. I never did figure out why.

Mister Fix-it

Leave a comment on “Mister Fix-it”