The Daily WTF: Curious Perversions in Information Technology

bjolling · 2018-10-01 Reply Admin

"Krzysztof is left to play to the gods " ... At first I thought Krzysztof was a Polish sounding naming but "playing to the gods" sounds more like Chinese?

2018-10-01 Reply Admin

It's undefined behaviour, meaning the next minor compiler update might format your hard drive instead. Practically speaking, it may put the variables in backward order just as likely, resulting in some kind of disaster. "Code that works" this is not!

2018-10-01 Reply Admin

In C, the line between "pointer" and "array" is fuzzy I was under the impression it was non-existent.

2018-10-01 Reply Admin

Try that again...

In C, the line between "pointer" and "array" is fuzzy

I was under the impression it was non-existent.

Note to self (and others): leave a blank line around quotations!

Nutster · 2018-10-01 Reply Admin

Yes, if the next version of your compiler's optimizer ever feels that rearranging the order of those variables makes better use of the stack, they are sooooo screwed.

uint32_t tempHistVal[20];
uint32_t &tempHistVal1 = tempHistVal[0];
uint32_t &tempHistVal2 = tempHistVal[1];
uint32_t &tempHistVal3 = tempHistVal[2];
...
uint32_t &tempHistVal20 = tempHistVal[19];

There. Now the other functions that are expecting to put individual values into tempHistVal8 can without messing things up. Of course, this changes this from C to C++, but for the most part C++ is backward-compatible to C. I guess you could use a bunch of #define lines to do the conversion.

uint32_t tempHistVal[20];
#define tempHistVal1 (tempHistVal[0])
#define tempHistVal2 (tempHistVal[1])
#define tempHistVal3 (tempHistVal[2])
...
#define tempHistVal20 (tempHistVal[19])

That way you can stay in pure C.

2018-10-01 Reply Admin

undefined behavior is so undefined

Nutster · 2018-10-01 Reply Admin

The only difference I know of relates to memory assignment. A pointer is just a few bytes and the array is several more. In reading, yeah, identical operations.

int iArray[27]; /* space on stack = 27 * sizeof(int) */
int *pArray = iArray; /* much smaller.  Probably should be const */

if (iArray[20] == pArray[20]) /* always true */
    ...
if (pArray + 5 == iArray + 5) /* pointer math.  Again always true, unless you futz with pArray */
    ...

Steve_The_Cynic · 2018-10-01 Reply Admin

In reading, yeah, identical operations.

No. In source code, identical operations. Behind the scenes, in the compiled code, it's ... different, and the representation is very different. It's worth remembering that in C, a[i] is nothing more and nothing less than syntactic sugar for (*(a+i)). And yes, because pointer addition is commutative, just like integer or floating point addition, a[i] can be written as i[a].

2018-10-01 Reply Admin

The sizeof operator yields different results for arrays and pointers. But arrays can decay into pointers, which can create subtle errors when refactoring code, which then leads to rambling on mailing lists.

2018-10-01 Reply Admin

Well, if a compiler update breaks it there are 5 options:

Stay with the previous version
Ask the company/organisation that writes it to change it back.
Use a different compiler, that does (for now) what you need.
Write your own compiler.
It is now a bug that has to be fixed.

2018-10-01 Reply Admin

No - it's not non existent.

int foo[20];

will allocate enough memory for 20 ints somewhere. OTOH,

int *foo;

will allocate enough memory for a pointer to one int.

Similarly, if I later write:

foo = &myint;

then if foo was defined as an array I'll get a compile-time error, whilst if it was defined as a pointer then all will be well.

For the second definition I can write "foo++;", but not for the first.

The only similarity between a pointer and an array is in the way you access their "contents". You can think of the name of an array as being a bit like a constant pointer to the beginning of said array. You can't modify it (point it somewhere else). OTOH, it does point to a valid chunk of memory to begin with, whilst a pointer needs to be pointed somewhere.

2018-10-01 Reply Admin

Isn't there #pragma pack for exactly such cases?

2018-10-01 Reply Admin

No: the compiler knows that (&tempHistVal1)[i] has undefined behaviour if i!=0. Therefore, it is free to ignore the actual value of i and optimise the expression down to a simple access to tempHistVal1.

I've seen this happen in real code. A vendor header file had this pattern, and it failed with optimisation switched on.

2018-10-01 Reply Admin

I was under the impression it was non-existent.

The value stored in a pointer object is treated as an address.

The value stored in an array object is the value of the first array element.

More simply, an array holds data, a pointer holds the address of data. They are two fundamentally different things.

2018-10-01 Reply Admin

Because all accesses to this "array" made like this all variables tempHistVal2 ... 20 don't show up in the source code and optimized away by the compiler. So this is clearly a compiler fault, because in the DEBUG build it worked. Lets use volatile.

alphajbravo · 2018-10-01 Reply Admin

The only similarity between a pointer and an array is in the way you access their "contents".

In the interest of ~~pedantry~~ clarity, the reason for the scare quotes is that an array contains a series of values of identical type stored sequentially in memory, whereas a pointer contains only an address, and that address may refer to the location of a value of the specified type in memory.

Addendum 2018-10-01 15:06: Wait, the article comments don't support the same markup as the forum? RIP my stupid strikeout joke.

Ext3h · 2018-10-01 Reply Admin

Except the compiler isn't going to optimize this down to "just" tempHistVal1. Clang for example is killing the entire function as "undefined", as there is an arithmetic expression involved.

And every calculation which depends on this functions return value is also killed as undefined. It's quite funny, if you ramp up in-lining / optimization far enough, you can get Clang to reduce your entire application to a single NOP due to propagation "undefined" results.

2018-10-01 Reply Admin

We had a problem at work recently where someone had taken some code involving undefined behaviour with unions, moved the code line for line into another file. And the compiler produced code that behaved completely differently. C++ UB is fun.

2018-10-01 Reply Admin

additional headaches with highly optimising compilers that put variables into registers, or remove unreferenced variables

2018-10-01 Reply Admin

Yes, I know they're not the same, but there isn't really a difference. char **argv and char *argv[] are both valid declarations for the second argument of main.

Although the other comments about sizeof are news to me.

2018-10-01 Reply Admin

Could a smart compiler be allowed to notice that ptr[1] is always accessed, so there is undefined behaviour if this function is called at all and decide to optimise out all paths where this function is called? Or at least if it's still trying to be nice, simply make the function return 0 or something.

DrOptableUser · 2018-10-02 Reply Admin

"We don't change code that works!"

That shows what "works" means for different people. One of the meanings is "i tested it 3 times when i rote it so i know it works".

Steve_The_Cynic · 2018-10-02 Reply Admin

char **argv and char *argv[] are both valid declarations for the second argument of main.

That's a special case.(1) Because it's unreasonable to pass arrays by value (i.e. copying them to the new activation record), they are passed by address, and that means that an apparently array-typed parameter is in reality a pointer, even if the array type is complete (which argv never is). For amusement value, consider this code:

void f( int arr[100] )
{
  printf("%u\n", sizeof(arr));
}

If called, this function will print whatever sizeof(int *) returns, you know, anything from 1 on up(2), and not 100 times whatever sizeof(int) returns.

(1) Your statement of it is also wrong. They are declarations for the second parameter of main, what Pascal calls a "formal parameter", the placeholder variable inside the function that receives the value passed in. The value passed in is an argument ("actual parameter").

(2) I have worked on a machine with a 16-bit word size, on which char, short, int, and something * would all have been 16 bits. Using that schema would have been annoying because the system call interfaces for strings (e.g. filenames when opening files) were "packed" as two 8-bit characters per 16-bit word, but in C, that isn't really a valid layout for char.

2018-10-02 Reply Admin

I had one of them a while back. Someone in the company where I work wrote a method that supposedly calculates the checksum of an IBAN code, and was so pleased with himself he passed it round the entire developer community as a useful reusable utility.

I politely raised the subject that it may have had problems with it (the guy who wrote it was not part of my management structure and so was not under my authority, so that was as strong as I could get). He replied, "Well, it passed all my test cases, so that proves it's good." I couldn't be bothered to put him right.

	private static final long MAX = 34;
	private static final long MODULUS = 97;

:
:
:

        String reformattedCode = "";

        if (IBANNumberValue.length() > 4)
            reformattedCode = IBANNumberValue.substring(4) + IBANNumberValue.substring(0, 4);
        else
            reformattedCode = IBANNumberValue;

        long total = 0;
        for (int i = 0; i < reformattedCode.length(); i++) {

            int charValue = Character.getNumericValue(reformattedCode.charAt(i));

            if (charValue < 0 || charValue > 35) {
                total = 0;
            }

            total = (charValue > 9 ? total * 100 : total * 10) + charValue;

            if (total > MAX) {
                total = (total % MODULUS);
            }

        }
        total = (total % MODULUS);

Steve_The_Cynic · 2018-10-02 Reply Admin

char **argv and char *argv[] are both valid declarations for the second argument of main.

That's a special case.(1) Because it's unreasonable to pass arrays by value (i.e. copying them to the new activation record), they are passed by address, and that means that an apparently array-typed parameter is in reality a pointer, even if the array type is complete (which argv never is). For amusement value, consider this code:

void f( int arr[100] )
{
  printf("%u\n", sizeof(arr));
}

If called, this function will print whatever sizeof(int *) returns, you know, anything from 1 on up(2), and not 100 times whatever sizeof(int) returns.

(1) Your statement of it is also wrong. They are declarations for the second parameter of main, what Pascal calls a "formal parameter", the placeholder variable inside the function that receives the value passed in. The value passed in is an argument ("actual parameter").

(2) I have worked on a machine with a 16-bit word size, on which char, short, int, and something * would all have been 16 bits. Using that schema would have been annoying because the system call interfaces for strings (e.g. filenames when opening files) were "packed" as two 8-bit characters per 16-bit word, but in C, that isn't really a valid layout for char.

2018-10-02 Reply Admin

Yes, there really, really is a difference.

StackUpthrow is full of beginner programmers from dubious colleges asking why they can't write to the "array" declared by int* a;(and yes, they'll probably call it either a or my_array, too.) Don't be a dubious programmer; don't insist that arrays and pointers are "really" the same thing in C.

2018-10-02 Reply Admin

"certainly isn't how I'd build it if I didn't have full control"

Shouldn't this be "certainly isn't how I'd build it if I did have full control" ?

dkf · 2018-10-02 Reply Admin

I was under the impression it was non-existent.

C most definitely knows the difference between arrays and pointers; put two arrays (of size greater than 1) next to each other in a structure and you'll see the differences clearly. Where you can't see the difference is when passing the array to a function; the array decays to a pointer (i.e., you actually get passed a pointer to the first element of the array). This only removes one level of true array-ness; an int[3][3] becomes an int[][3], with every element of the outer array being a int[3] in both cases.

Array arguments in C are one of those things that trips up many programmers who don't use the language very frequently.

2018-10-02 Reply Admin

I honestly don't know where these "embedded" programmers learned to code. I work in a company which has a distinct "firmware" team from the "software" team. As you'd imagine in recent years, the distinction between "firmware" and "software" has virtually disappeared since even the cheapest ARM chips can run some sort of Linux, meaning that "firmware" is just poorly written "software".

They used to hide behind all these excuses, such as 1.2kb of RAM, or only 128bytes of stack, or lack of preemptive multitasking or, some shit to do with interrupt service routine time limits (none of which is a concern now that we have an operating system). I've found that "firmware" coders are very set in their ways, usually learning to program in 8051 assembly language as an optional subject in their electrical engineering degrees. Over the years, they have begrudgingly adopted C89, but still they love using these fragile and quirky implementation details "that already work" all over the place (it's bloody 2018 guys, nobody has toggled anything in from the front panel of a computer in my lifetime).

2018-10-02 Reply Admin

I honestly don't know where these "embedded" programmers learned to code. I work in a company which has a distinct "firmware" team from the "software" team. As you'd imagine in recent years, the distinction between "firmware" and "software" has virtually disappeared since even the cheapest ARM chips can run some sort of Linux, meaning that "firmware" is just poorly written "software".

They used to hide behind all these excuses, such as 1.2kb of RAM, or only 128bytes of stack, or lack of preemptive multitasking or, some shit to do with interrupt service routine time limits (none of which is a concern now that we have an operating system). I've found that "firmware" coders are very set in their ways, usually learning to program in 8051 assembly language as an optional subject in their electrical engineering degrees. Over the years, they have begrudgingly adopted C89, but still they love using these fragile and quirky implementation details "that already work" all over the place (it's bloody 2018 guys, nobody has toggled anything in from the front panel of a computer in my lifetime).

2018-10-03 Reply Admin

I'm under the impression that this code is only compiled for 32-bit. If it were 64-bit, you're likely going to run into your variables being 64-bit aligned, and then the average will be half what it should be (or, well, the data between variables is undefined, but probably zero).

2018-10-04 Reply Admin

Protip about using sizeof:

When doing a memcpy to a pointer, use sizeof on /the variable/ rather than the type. Also, this is one of the little known situations where sizeof does not require parentheses.

So instead of: memcpy(&myMessageStructVariable, buffer, sizeof(MessageStruct));

do: memcpy(&myMessageStructVariable, buffer, sizeof myMessageStructVariable);

Then you have two copies of the same thingy in your code that are always correct, and when you have to change to use a different variable, you're always using the size of the correct type and never have to look up the name of the type!

2018-10-05 Reply Admin

Krzysztof is left to prayto the gods of compilers and hardware platforms and memory alignment that these variables keep getting compiled in order, with no gaps.

I would rather pray that they won't be compiled in order after the very next update. Then the code won't work anymore, giving the opportunity to correct it, plus an extra opportunity to make a point that the braindead management approach is braindead.

2018-10-06 Reply Admin

I do serial communications on flaky lines all the time. Four key things:

Every data packet has a length byte(s) - which should be sanity checked on receive.
Every data packet has a checksum/crc/equivalent. It doesn't have to be fancy.
If you can guarantee bytes in the same packet are back to back that makes things much easier - otherwise you have to add timeout logic.
'flush' bytes are really helpful. If you see N 0xFF (your choice) bytes in a row that resets all communications. Otherwise nodes get stuck in non-synchronized states.

Yeah, this introduces some data overhead, but if we really cared about speed we'd be using some flavor of I2C/SPI/etc.

2018-10-07 Reply Admin

And when some hapless coder has to maintain something completely unrelated to this, and the code is compiled using a new version of the compiler, that coder will be blamed for things no longer working.

2018-10-08 Reply Admin

I've just done a source code review where someone is copying one array of integers of a known, fixed size to another (backing up and restoring). They have a union which combines a 4-byte array with an int, and they go through the array one item at a time, copying from one array into the union, then from the union's "int", casting to an unsigned long (on Windows, so same thing), into the other array.

2018-10-08 Reply Admin

Just to be clear, the "same thing" comment above was ignoring the fact that one is signed and the other is unsigned. Just that they are both 4-byte integers. The union could have been defined as such without the need for the cast, if they wanted to persist with copying items one at a time.

Scarlet_Manuka · 2018-10-14 Reply Admin

Shouldn't this be "certainly isn't how I'd build it if I did have full control" ?

No, he's saying that the only reason he can get away with it is that he has full control over everything using the bus.

Pointed Array Access

Leave a comment on “Pointed Array Access”