- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
0, Fu ...
Admin
Actually, the first bug is that the cast to
(signed)
truncates the value to 16 bits... (Well, it might, depending on stuff, but it's still a possibility. Um.signed
is shorthand forsigned int
, and the C standards allowint
to be only 16 bits.) The casts should be to a fixed bit-length type rather than toint
.And of course that funky-chicken machine in the corner that still does one's complement arithmetic will break it as well, since the negation just inverts all the bits, so the bit-and returns zero.
Leaving aside the somewhat esoteric portability issues, it's worth nothing that the biggest WTF in this is that it relies on what amounts to trickery (accurate trickery, but trickery nonetheless) in order to achieve its result, and doesn't have any comments to explain the trickery.
Admin
If you want to really optimize it, use the CTZ instructions of your CPU.
Admin
The trickery is not that hard: it recursively divides the number in two halves. I've seen it on a page of bit operations. Here's one: https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightParallel, although it doesn't really explain it either.
Admin
It's easier to make out how it works if you show the binary values of those masks. This is operating a bit like a binary search, going through 5 iterations to "search" through 2 ** 5 bits.
Adding a comment like this could have been really helpful:
Admin
For many years, I worked with highly optimized code... trying to save 8uS (back when single instructions were often more than 10uS!) was one effort that took a team of 3 of us over 2 weeks to complete (the initial code was pretty darn optimized)...Other times it was to reduce the VARIANCE in execution time based on data (making many loops evil if they did not execute the same number of times always)....
Not is this level of optimization is not necessary, then it is a TRWTF, but there is nothing in the post to indicate that it was not..
Admin
I was thinking more of the trickery to flatten the number just to its least-significant
1
bit.Admin
If the bit representation of x is the same as of INT_MIN, then "undefined behaviour" is the result of "-(signed)x".
Admin
The real WTF is that the developer thought people would just understand it. I did tons of C early in my career, computers were slow, and too often you needed to hand tweak code for performance. Three things always happened: the original simple version stayed (sometimes commented out, other times behind an #ifdef), tests got added, and a huge comment explained in detail how and why the optimized version worked. We never assumed someone would be smart enough to figure it out as usually the dumbass that couldn’t understand that “trivial bit of bit-twiddling” was us, a couple of moths later.
Admin
You mean one lunch break later, don't you. ;)
Admin
Well, maybe I’m the real WTF, but that is the most unique explanation of signed/unsigned casting for C I’ve ever encountered.
Given that number representations are undefined, and that out of range values are allowed to do anything, maybe this works somewhere, but the idiomatic C for isolating the msb assuming 2’s complement is:
X &= ~(x -1)
Admin
A variation of this, using a multiplication by either 0 or 1 instead of the if-clause, would result in some pretty nice real-time code, executing in constant time.
Admin
Nope, not quite. C99 §6.3.1.3/3 says that either the result is implementation-defined or an implementation-defined signal is raised. In practice, on any modern two's complement architecture, this will do the right, expected thing.
Admin
Without knowing what this was used for, it's hard to say if it's premature optimization. What would require you to count zeroes millions of times per second?
Admin
Perhaps the real wtf is that POSIX has a perfectly good ffs() function that can be used for this?
Admin
Gotta chime in. C++ 20 exposes the hardware instructions for doing this stuff. https://en.cppreference.com/w/cpp/numeric/countr_zero Obviously this code existed before C++ 20, but I think we should be loath to dump on this kind of stuff. At least he didn't convert it to a text string then count the '0's.
Admin
Actually my mistake, I was so focused on the unsigned-to-signed conversion that I completely missed the unary negation—that is certainly undefined behavior when the input is a signed INT_MIN (C99 §6.5/5). Plus there's also §6.2.6.2/3 which says that it's implementation-defined whether 0x80000000 is a trap representation or a normal value for a signed integer.
Admin
Probably this is used in a non-POSIX environment and they had to stick to plain ISO C for some reason. Probably compiler intrinsic functions may have been an option as well :-)
Admin
Not all systems support signals and "implementation-defined" makes the situation not better. Even in my recent practice I had to deal with CPU's, which have been designed decades ago. How many developers are checking compiler and CPU documentation to find out, if the "implementation-defined" behaviour matches the intended behaviour? How many developers do recheck their code, when porting to a different system/hardware (or even compiler)? We don't know the submitter's system environment. "implementation-defined" or "undefined behaviour", I wouldn't count on either one. IMHO, betting on any of them is a WTF in itself.
Admin
They almost certainly didn't write this themselves. It's an old bit-twiddling trick that predates CPUs with a built-in instruction.
http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightParallel
Admin
This is Gaudet's algorithm. You can find it, as well a several others to do this job, in Hacker's Delight.
Admin
Arianne 5: The result of relying on 'implementation-defined' error handling.
Admin
I guess, that code was just copied from somewhere and the programmer in question did not know either why or how it worked. It is quite ingenious, so usually you would be proud of something like that and document it, just to show how smart you are.
Admin
Or maybe they know what the deployment architecture is to a reasonable degree. You're not going to be running Windows, Linux, macOS, iOS, or Android on anything that's not doing 2's complement. (Even most things that are more exotic in terms of what OS they use these days are still going to have exactly the same defined behaviour for this stuff.)
But using the relevant intrinsic instead would be far more sensible. Or even the surfacing of that in some platform-specific library.
Admin
BTW, if you want to be really fast, you should avoid all those pesky ifs, that work not very well with modern processors. It might be better to do something like:
c -= 8u * !!(x&0x00FF00FFu);
or even better:
c -= 8u * !(x&0xFF00FF00u);
instead of:
Can't tell without benchmarking, though.
Why is this the same? Remember that x only has one bit set. If x & 0x00FF00FFu is not zero, that means that the set bit is in bits 0-7 or 16-23. At this point c is either 31 or 15, but since we now know it can't be at the most significant part of each 16-bit word, we subtract 8 from c, so now it is either 23 or 7. The simple ! will turn any number into 0 or 1, depending whether it was non-zero or zero before, so the !!(..) part will be 1 if x & 0x00FF00FFu is non-zero, otherwise 0. Multiply that by 8 and you get the same result as with using the if-clause. Since there is only one bit set, it can either be in 0xFF00FF00 or 0x00FF00FF, so if you bitwise-and those values with x, they are always complementary and therefore !(x&0x00FF00FF) is non-zero if and only if (x&0xFF00FF00) is.
Admin
@my name is missing - One (real, though decades old) use-case... presume that each bit represents a harmonic of a signal with the MSB being the base (primary) wavelength. Determining how many zeros are at the end tells you as an integer what the highest harmonic present actually is.
Admin
Am I the only person bothered by the opening sentence? I remember when C WAS a cutting edge language.
Admin
@ksdd - there are many systems developed to work only on one CPU. In a few cases I had (late 1970's early/mid 1980's) even only specific mask versions of CPUs....
Admin
This comment should be featured. Great resource!
Admin
The "c = 15" line looks weird, but it is really "c -= 16". At that line in the code, we know c is 31, so c can only become 15.
Whoever wrote this didn't trust the compiler to figure that out.
Admin
Clearly that. If you want to count bits, let the CPU do what it does the best. Use TZCNT instruction. POSIX has a definition for it, GCC has it, visual studio has it, nvidi has it, c++ has it, rust has it... Even java has it!
Admin
Looks quite similar to the implementation that comes with Java.
https://github.com/openjdk/jdk17/blob/master/src/java.base/share/classes/java/lang/Integer.java#L1678-L1688
They also have a source in the big Javadoc comment at the top of the class.
Admin
This isn't something the author came up with; they took it from Sean Anderson's Bit Twiddling Hacks page:
https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightParallel
Admin
I disagree that its a premature optimisation. Its an optimisation - it has no impact on further design decisions, is completely self contained, and works. It's just the fastest way of doing it, something that I imagine most languages have built into them.
Sure, some may raise an eyebrow, but perhaps that eyebrow raise will turn into a question, which turns into being pointed to resources like Sean Anderson's Bit Twiddling Hacks or Henry S Warren Jr's book Hackers Delight. And hopefully readers of this and other comments will do the same, because it's a lot of fun and it has performance and power consumption benefits (there's an environmental argument if deployed widely enough, but it's a bit tongue-in-cheek).
The only downside here is that the code doesn't contain a link to a resource for intrigued readers - that's bad manners both to the original author and to the reader.
Admin
Yup, "I'm gonna use this code I found in Knuth!1!!1! You can't argue with me, it's KNUTH!1!!eleven"
Admin
There is no such thing as implementation-defined unless you are working with bespoke hardware. In every general purpose enviroment there is only defined, undefined and "I think its defined but it's really not."
Admin
Okay, I just woke up and haven't rebooted the brain yet, but are the lines:
really needed? (Assuming the c = 15u; line becomes c -= 16u; of course. That, rather than c = 16u; so when someone extends the code to handle 64-bit ints they don't miss that line).
Admin
|It's easier to make out how it works if you show the binary values of those masks. This |is operating a bit like a binary search, going through 5 iterations to "search" through 2 |** 5 bits.
|Adding a comment like this could have been really helpful:
|0000FFFF: 00000000000000001111111111111111 |00FF00FF: 00000000111111110000000011111111 |0F0F0F0F: 00001111000011110000111100001111 |33333333: 00110011001100110011001100110011 |55555555: 01010101010101010101010101010101
Go train under a witch! Programmers should speak hex! :)
Admin
@Steve:
I suppose more elegantly, you could do c -=1, instead of c=31u. But you do need it to change from the default of 32, and the x &= stuff line needs to happen.