The Daily WTF: Curious Perversions in Information Technology

Steve_The_Cynic · 2021-02-10 Reply Admin

That made me think of a thing I found in our code a couple of months ago, where the intent was to see if a was greater than b. Key points: this is in C and both variables were of type int, and neither of them could be less than zero. (They are both size-related, but had long ago been given, for unknowable reasons, a signed type.)

On the face of it, sounds easy, right?

    if ( a > b )

But no. For whatever reason, the person who wrote this line of code did this:

    if ( (a - b) > 0 )

Which works approximately fine until we change (for reasons of the "get rid of signed/unsigned comparison warnings somewhere else" type) the type of both variables from int to size_t which is not signed, so now all combinations except "exactly equal" would trigger the if().

2021-02-10 Reply Admin

Um, I'm pretty sure the difference operator returns a signed result even if the operands are unsigned.. C is not THAT tricky.

Remy Porter · 2021-02-10 Reply Admin

#include <stdio.h>

int main() {
    size_t x = 10;
    size_t y = 20;
    if ((x - y) > 0)
        printf("%d", x - y);
}

The if condition evaluates to true, but the output is -10. C is trickier than you think.

2021-02-10 Reply Admin

Or, from (ISO/IEC 9899:1999 (E) §6.2.5/9):

"A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."

Many people have pointed out that this "never overflow" stuff is just there to confuse the reader. But to quote the example in the Stack Overflow item I cribbed:

"As you can see, (unsigned)0 - (unsigned)1 equals -1 modulo UINT_MAX+1, or in other words, UINT_MAX."

Well.

At least it's "defined behaviour." If a little unexpected.

2021-02-10 Reply Admin

(I think the printf result is a consequence of your format specifier. Since %d is signed, I have a horrid suspicion that the result of the subtraction is internally coerced into a signed value. But I could be wrong here.

Remy Porter · 2021-02-10 Reply Admin

It's 100% the %d, the compiler even spits out a warning to that effect. I'm mildly surprised that the coercion happens before the operation, but not like, super surprised.

2021-02-10 Reply Admin

It's very tricky. If you mix signed and unsigned in an expression, it promotes the signed to unsigned. So for example, even if "n" is signed,

for (size_t i = 0; i < n - 1; i++)

is an infinite loop when n=0.

2021-02-10 Reply Admin

Oh, it's much weirder than that... There's no coercion here, unsigned int subtraction simply does not stop at 0. The unsigned just means that the values are always positive, giving you larger values for the same size, but the actual computation is still 2's complement. So, your unsigned result, if interpreted as signed (by printf) is indeed -10.

So, assuming you have the same sizes as I do (it's implementation dependent after all):

If you use int, the bug doesn't trigger, the result is -10.
If you use unsigned int, the bug triggers and the result is 4294967286.
If you use size_t, which happens to be long unsigned int, the bug triggers and the result is 18446744073709551606. All of the above will display as -10 if you just use %d. In the last case you get a size warning, but %ld fixes that and still displays -10.

BUT! What happens if you use unsigned short int (size_t COULD be it, it's just a typedef afterall)? Same thing, right? NOPE. Bug doesn't trigger, everything is fine.

Why? Any integer type which fits in int gets promoted to int, then truncated if necessary. That's signed int. Unsigned short int is shorter than int, so the subtraction in the conditional is performed on ints, the result is signed and the bug does not show up. The result is -10. If you do display it as %hu, forcing it to be unsigned short int, you get 65526.

// I might be a bit rusty on the standards, I haven't programmed in C for years, but just in case I tested all of the above in gcc...

Remy Porter · 2021-02-10 Reply Admin

That makes a lot of sense. I have one, relatively tiny, pile of C code that I maintain, so while I can navigate C well enough not to cause absolute disasters, I still can't do it without accidentally hurting myself or others.

2021-02-10 Reply Admin

Isn't that just an example of "it can never overflow?"

I think what's happening in your example is that i eventually reaches UINT_MAX +1 (which cannot be represented in a size_t, always assuming that size_t is defined "normally, ie as an unsigned integer of appropriate size for the platform). At that point the modulo kicks in, and i is set to 0. Again, I'm not sure (some other rule in the standard may apply), but it doesn't seem to me that there's any type promotion implied.

2021-02-10 Reply Admin

That's why I think C is a perfect language for education at a certain stage. If you just try to do something non-trivial, you learn the hard way what your variables really are, how and where they are stored, how all of this needs to be managed and what the contract implied in the language is - if you don't follow the rules, don't expect the results to be... anything, really, unspecified behavior is unspecified, prepare to die. This translates to a better understanding of the machine you're working with, of what the smarter languages do for you, and lets you avoid many stupid mistakes.

It's also the reason why it's rarely the right language for almost anything else nowadays..

2021-02-10 Reply Admin

Maybe you can't see any kind of promotion implied, but as the natural registers of the CPU in this case is 64 bit, then however you try to put a 16- og 32-bit integer into it, it has to work using 64 bit register rules. The compiler could emit a lot of corrective code to cater for this, but it doesn't. But as long as the compiler behaviour, together with the processor architecture behaviour, is known and stable, you should get idempotent results. Maybe just not hte results you were looking for.

2021-02-10 Reply Admin

I wonder if whoever wrote that started out as an ASM programmer? To me it smacks of "SUB, JNE" (subtract, Jump if Not Equal to Zero). Or maybe I'm just old enough to remember programming in ASM.

2021-02-10 Reply Admin

The moral of the story for all you c-nauts out there is: invest the energy you have been using in trying to be clever into being careful instead.

2021-02-10 Reply Admin

I always say "worse things happen in C"

2021-02-10 Reply Admin

Weird code like this is usually a result of gradually simplifying some expression that used to make sense to the point where it looks dumb.

Steve_The_Cynic · 2021-02-10 Reply Admin

@Remy, you should be ashamed of yourself, passing a 64-bit size_t to printf() and formatting it with an int-consuming %d. That's first-tier UB right there.

2021-02-10 Reply Admin

... and worst things happen at C++ ...

2021-02-10 Reply Admin

Having said which, I wonder what happens in PHP under similar conditions?

Not jumping on the usual band-waggon. Just curious.

Does PHP coerce the result to "File Not Found?" OK, I just jumped on the usual band-waggon.

2021-02-10 Reply Admin

It print -10, because you used %d. If you use %u, you will get 4294967286, because of wrap around.

2021-02-10 Reply Admin

The thing with C is that a lot of things are left as undefined behavior, which means you can do anything. Usually the result is sane (modern computers are 2's compliment, so underflow/overflow wraps around identically regardless of sign).

But remember there are undefined behaviors, and those mean "computer explodes" is actually a valid response.

It's why C also has a lot of sanity types of late - if you need to cast a pointer to an int, you have to use an intptr_t nowadays which guarantees being an int big enough to hold a pointer. Big problems were had in the move to 64 bit systems where previously 32-bit systems used 32-bit int and 32-bit pointers, and depending on the 64-bit system, 32-bit int, 64-bit pointer.

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)

Note sizeof(void*) is actually missing in that relation. Also why we have int8_t/int16_t/int32_t/int64_t and the unsigned counterparts uint8_t/uint16_t, uint32_t, uint64_t - which actually let you specify the exact storage you need.

The only thing you need to know is sizeof(void*) <= sizeof(intptr_t)

2021-02-10 Reply Admin

Well, I have two visuals illustrating the funny middle ground C takes between weakly and strongly typed languages.

One is the Mr. Incredible meme. "Bits are bits!"

The other one is a service you implement, where you bring a package and a bird, and it sends the package over a chasm using the bird.

Weakly typed: Come with a pidgeon - as designed. Come with a parrot - sure thing. Come with a bat - sure! And it works, too. Somehow the language just doesn't care. Come with a cat... Sure! Just don't be surprised if you come back to find your cat dead and stuffed with a propeller up it's butt to make it fly over the chasm.

Strongly typed: Come with a bat... Na-ah! That's not a bird! Ok, reimplement to accept FlyingAnimal. Bats now accepted. Come with a cat... Nope. Tell me explicilty how to make this Animal a FlyingAnimal, or how to send using cats, or whatever. No explanation, no service. Implement a cast, make a overloaded version of the function, use polymorphism... Doesn't matter, but you HAVE to tell me how to do it.

C? Come with a parrot - no problem. Come with a bat... Yeah, ok. It will fly. Come with a cat - nope! You said this function needs a bird, this animal can't be promoted to a bird, no way. So you cast: I'm the programmer, and I tell you this cat is a bird. You walk away, but look back, hearing some weird noises. Just in time to see your function kicking your cat with the package tied to its belly into the chasm, screaming "FLY, YOU B*TCH!"

C gives you a lot of power, and with great power... you know. It will stop you (or at least warn you) from doing something incredibly stupid by mistake, but if you explicitly tell it that it should see the 8-bit bitmap you just loaded as a struct with an array of 8 floats followed by a string... "Yeth, marther!"

What else did you expect from a language which has and can dereference (with a cast) a void *?

2021-02-10 Reply Admin

I wish I could vote this up.

2021-02-10 Reply Admin

Didn't old-school Java effectively allow a cast from void* to a (Java reference to) Thing? I mean, before Streams (and for all I know, even now), Java collections effectively type-erased the items collected and turned them into the base "object" class. Retrieving an object from a collection is not really any different from upcasting from void*.

I'm not objecting to the approach taken (and I might well be wrong). But it's harsh to complain about the way C handles its type system when you look around and see languages like Perl, where a "type" is redefined every time you change context from scalar to whatever.

C does what it does, and when you've spent a fair few years working with it, you know where to avoid (language-specific) stupidities such as mixing signed and unsigned types without really understanding what happens under the hood.

2021-02-11 Reply Admin

Not just old-school Java. New-school Scala (which compiles to the same byte-code as Java, and runs on the same JVM) lets you cast a null.

I’m historically a Java programmer, so whenever I see a colleague write “null.asInstanceOf[Boolean]” I have to go and lie down in a darkened room until the monsters stop screaming...

Watson · 2021-02-11 Reply Admin

I've seen "if((a-b) > 0)" in a lot of C in a lot of places. I really don't know why so many C programmers think it's the Right Thing to the point where they bend themselves into knots to avoid writing "a > b". The closest I've come is to note that sometimes in the branch the expression "a-b" is used, and — look, a microoptimisation! The compiler is smrt and will be able to save a whole subtraction operation by stashing the first use in a register! And I'm smrt for seeing that!

And it seems I'm not alone. Also a couple of months ago I stumbled across "Why [Static Analyser] PVS-Studio Doesn't Offer Automatic Fixes" (https://www.viva64.com/en/b/0776/) and found the same cleverness and the same bug being discussed, in the context of why the analyser can recognise the problem but not do anything about it.

2021-02-11 Reply Admin

I'll be damned. I guess the wrong idea I had about that dates back to bothering with short ints...

2021-02-11 Reply Admin

It might serve a purpose (but then certainly deserves a comment): assuming the types of the variables are signed, over- and underflow are undefined behaviour. That means the compiler can deduce quite a bit of information about the two values from assuming that undefined behaviour will not occur.

Of course in practice, actual C code is full of unintended undefined behaviour, to the degree that compilers implement work-around flags or optimization paths for important non-conforming benchmarks.

Steve_The_Cynic · 2021-02-11 Reply Admin

Big problems were had in the move to 64 bit systems where previously 32-bit systems used 32-bit int and 32-bit pointers, and depending on the 64-bit system, 32-bit int, 64-bit pointer.

Pfft. I remember seeing that with a fellow student back in the day. He was a mature student, and worked on VAXen in his day job, which were fully 32-bit (even if it was physically impossible to install an actual 4GiB of memory - the racks weren't big enough...), and had acquired a bad habit of casting pointers to int, which didn't work so well on the 286s we were using in the university. Um. Pointers that took 32 bits (16:16, strictly), and int that was only 16 bits.

Casting back to a pointer crashed and burned on those machines, using that well-known Microsoft operating system. You've surely all used it.

Er.

XENIX.

Addendum 2021-02-11 06:58: Key point: XENIX ran in protected mode, and a pointer of the form 0x0000:XXXX is close enough to NULL to explode your program in 286 or 386 protected mode.

2021-02-11 Reply Admin

Didn't old-school Java effectively allow a cast from void* to a (Java reference to) Thing?

Not a cast, but yeah, Java generics are compile time only, so if you do untyped stuff with them you can ask it to do crazy things. It will still explode with a ClassCastException at runtime, though, so it won't actually try to do the stupid thing.

But yeah you can do this

f1() { List<Integer> x = new ArrayList<>(); stupidListCall(x); return 1 + x.get(0); // ClassCastException }

stupidListCall(List x) { x.add("wat"); }

A good IDE will warn you about doing this.

2021-02-15 Reply Admin

It does look like the programmer has seen '-=' as a 'decrement-by' and hasn't thought through to it being 'x = x - y' which if written out would have been clear that it was x = x - x + y which even the most maths afflicted would realise is x = y.

Stocking Up

Leave a comment on “Stocking Up”