• Prime Mover (unregistered)

    0, Fu ...

  • (nodebb)

    Actually, the first bug is that the cast to (signed) truncates the value to 16 bits... (Well, it might, depending on stuff, but it's still a possibility. Um. signed is shorthand for signed int, and the C standards allow int to be only 16 bits.) The casts should be to a fixed bit-length type rather than to int.

    And of course that funky-chicken machine in the corner that still does one's complement arithmetic will break it as well, since the negation just inverts all the bits, so the bit-and returns zero.

    Leaving aside the somewhat esoteric portability issues, it's worth nothing that the biggest WTF in this is that it relies on what amounts to trickery (accurate trickery, but trickery nonetheless) in order to achieve its result, and doesn't have any comments to explain the trickery.

  • Paulo Marques (unregistered)

    If you want to really optimize it, use the CTZ instructions of your CPU.

  • Hanzito (unregistered)

    The trickery is not that hard: it recursively divides the number in two halves. I've seen it on a page of bit operations. Here's one: https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightParallel, although it doesn't really explain it either.

  • Matt (unregistered)

    It's easier to make out how it works if you show the binary values of those masks. This is operating a bit like a binary search, going through 5 iterations to "search" through 2 ** 5 bits.

    Adding a comment like this could have been really helpful:

    0000FFFF: 00000000000000001111111111111111
    00FF00FF: 00000000111111110000000011111111
    0F0F0F0F: 00001111000011110000111100001111
    33333333: 00110011001100110011001100110011
    55555555: 01010101010101010101010101010101
  • (nodebb)

    For many years, I worked with highly optimized code... trying to save 8uS (back when single instructions were often more than 10uS!) was one effort that took a team of 3 of us over 2 weeks to complete (the initial code was pretty darn optimized)...Other times it was to reduce the VARIANCE in execution time based on data (making many loops evil if they did not execute the same number of times always)....

    Not is this level of optimization is not necessary, then it is a TRWTF, but there is nothing in the post to indicate that it was not..

  • (nodebb) in reply to Hanzito

    The trickery is not that hard: it recursively divides the number in two halves.

    I was thinking more of the trickery to flatten the number just to its least-significant 1 bit.

  • ksdd (unregistered)

    If the bit representation of x is the same as of INT_MIN, then "undefined behaviour" is the result of "-(signed)x".

  • cdegroot (unregistered)

    The real WTF is that the developer thought people would just understand it. I did tons of C early in my career, computers were slow, and too often you needed to hand tweak code for performance. Three things always happened: the original simple version stayed (sometimes commented out, other times behind an #ifdef), tests got added, and a huge comment explained in detail how and why the optimized version worked. We never assumed someone would be smart enough to figure it out as usually the dumbass that couldn’t understand that “trivial bit of bit-twiddling” was us, a couple of moths later.

  • 516052 (unregistered) in reply to cdegroot

    You mean one lunch break later, don't you. ;)

  • PixelCurmudgeon (unregistered)

    Well, maybe I’m the real WTF, but that is the most unique explanation of signed/unsigned casting for C I’ve ever encountered.

    Given that number representations are undefined, and that out of range values are allowed to do anything, maybe this works somewhere, but the idiomatic C for isolating the msb assuming 2’s complement is:

    X &= ~(x -1)

  • NoLand (unregistered)

    A variation of this, using a multiplication by either 0 or 1 instead of the if-clause, would result in some pretty nice real-time code, executing in constant time.

  • Anonymous') OR 1=1; DROP TABLE wtf; -- (unregistered) in reply to ksdd

    If the bit representation of x is the same as of INT_MIN, then "undefined behaviour" is the result of "-(signed)x".

    Nope, not quite. C99 § says that either the result is implementation-defined or an implementation-defined signal is raised. In practice, on any modern two's complement architecture, this will do the right, expected thing.

  • my name is missing (unregistered)

    Without knowing what this was used for, it's hard to say if it's premature optimization. What would require you to count zeroes millions of times per second?

  • Geoff (unregistered)

    Perhaps the real wtf is that POSIX has a perfectly good ffs() function that can be used for this?

  • Ollie Jones (unregistered)

    Gotta chime in. C++ 20 exposes the hardware instructions for doing this stuff. https://en.cppreference.com/w/cpp/numeric/countr_zero Obviously this code existed before C++ 20, but I think we should be loath to dump on this kind of stuff. At least he didn't convert it to a text string then count the '0's.

  • Anonymous') OR 1=1; DROP TABLE wtf; -- (unregistered) in reply to Anonymous') OR 1=1; DROP TABLE wtf; --

    Actually my mistake, I was so focused on the unsigned-to-signed conversion that I completely missed the unary negation—that is certainly undefined behavior when the input is a signed INT_MIN (C99 §6.5/5). Plus there's also § which says that it's implementation-defined whether 0x80000000 is a trap representation or a normal value for a signed integer.

  • ksdd (unregistered) in reply to Geoff

    Probably this is used in a non-POSIX environment and they had to stick to plain ISO C for some reason. Probably compiler intrinsic functions may have been an option as well :-)

  • ksdd (unregistered) in reply to Anonymous') OR 1=1; DROP TABLE wtf; --

    Not all systems support signals and "implementation-defined" makes the situation not better. Even in my recent practice I had to deal with CPU's, which have been designed decades ago. How many developers are checking compiler and CPU documentation to find out, if the "implementation-defined" behaviour matches the intended behaviour? How many developers do recheck their code, when porting to a different system/hardware (or even compiler)? We don't know the submitter's system environment. "implementation-defined" or "undefined behaviour", I wouldn't count on either one. IMHO, betting on any of them is a WTF in itself.

  • Barry Margolin (github)

    They almost certainly didn't write this themselves. It's an old bit-twiddling trick that predates CPUs with a built-in instruction.


  • HyssingSid (unregistered)

    This is Gaudet's algorithm. You can find it, as well a several others to do this job, in Hacker's Delight.

  • ooOOooGa (unregistered)

    Arianne 5: The result of relying on 'implementation-defined' error handling.

  • some ron (unregistered)

    I guess, that code was just copied from somewhere and the programmer in question did not know either why or how it worked. It is quite ingenious, so usually you would be proud of something like that and document it, just to show how smart you are.

  • (nodebb) in reply to ksdd

    betting on any of them is a WTF in itself

    Or maybe they know what the deployment architecture is to a reasonable degree. You're not going to be running Windows, Linux, macOS, iOS, or Android on anything that's not doing 2's complement. (Even most things that are more exotic in terms of what OS they use these days are still going to have exactly the same defined behaviour for this stuff.)

    But using the relevant intrinsic instead would be far more sensible. Or even the surfacing of that in some platform-specific library.

  • some ron (unregistered)

    BTW, if you want to be really fast, you should avoid all those pesky ifs, that work not very well with modern processors. It might be better to do something like:

    c -= 8u * !!(x&0x00FF00FFu);

    or even better:

    c -= 8u * !(x&0xFF00FF00u);

    instead of:

      if ((x & 0x00FF00FFu) != 0u) 
         c -= 8u;

    Can't tell without benchmarking, though.

    Why is this the same? Remember that x only has one bit set. If x & 0x00FF00FFu is not zero, that means that the set bit is in bits 0-7 or 16-23. At this point c is either 31 or 15, but since we now know it can't be at the most significant part of each 16-bit word, we subtract 8 from c, so now it is either 23 or 7. The simple ! will turn any number into 0 or 1, depending whether it was non-zero or zero before, so the !!(..) part will be 1 if x & 0x00FF00FFu is non-zero, otherwise 0. Multiply that by 8 and you get the same result as with using the if-clause. Since there is only one bit set, it can either be in 0xFF00FF00 or 0x00FF00FF, so if you bitwise-and those values with x, they are always complementary and therefore !(x&0x00FF00FF) is non-zero if and only if (x&0xFF00FF00) is.

  • (nodebb) in reply to my name is missing

    @my name is missing - One (real, though decades old) use-case... presume that each bit represents a harmonic of a signal with the MSB being the base (primary) wavelength. Determining how many zeros are at the end tells you as an integer what the highest harmonic present actually is.

  • Argle (unregistered)

    Am I the only person bothered by the opening sentence? I remember when C WAS a cutting edge language.

  • (nodebb) in reply to ksdd

    @ksdd - there are many systems developed to work only on one CPU. In a few cases I had (late 1970's early/mid 1980's) even only specific mask versions of CPUs....

  • Feature it (unregistered) in reply to Barry Margolin

    This comment should be featured. Great resource!

  • Zygo (unregistered)

    The "c = 15" line looks weird, but it is really "c -= 16". At that line in the code, we know c is 31, so c can only become 15.

    Whoever wrote this didn't trust the compiler to figure that out.

  • Charles (unregistered) in reply to cdegroot
    Comment held for moderation.
  • Tchize (unregistered) in reply to Paulo Marques

    Clearly that. If you want to count bits, let the CPU do what it does the best. Use TZCNT instruction. POSIX has a definition for it, GCC has it, visual studio has it, nvidi has it, c++ has it, rust has it... Even java has it!

  • mihi (unregistered)

    Looks quite similar to the implementation that comes with Java.


    They also have a source in the big Javadoc comment at the top of the class.

  • I'm sorry, but who? (unregistered)

    This isn't something the author came up with; they took it from Sean Anderson's Bit Twiddling Hacks page:


  • Premature Optimisation Pills (unregistered)

    I disagree that its a premature optimisation. Its an optimisation - it has no impact on further design decisions, is completely self contained, and works. It's just the fastest way of doing it, something that I imagine most languages have built into them.

    Sure, some may raise an eyebrow, but perhaps that eyebrow raise will turn into a question, which turns into being pointed to resources like Sean Anderson's Bit Twiddling Hacks or Henry S Warren Jr's book Hackers Delight. And hopefully readers of this and other comments will do the same, because it's a lot of fun and it has performance and power consumption benefits (there's an environmental argument if deployed widely enough, but it's a bit tongue-in-cheek).

    The only downside here is that the code doesn't contain a link to a resource for intrigued readers - that's bad manners both to the original author and to the reader.

  • ANUBHAV GUPTA (unregistered)
    Comment held for moderation.
  • löchleindeluxe (unregistered)

    Yup, "I'm gonna use this code I found in Knuth!1!!1! You can't argue with me, it's KNUTH!1!!eleven"

  • 516052 (unregistered) in reply to ksdd

    There is no such thing as implementation-defined unless you are working with bespoke hardware. In every general purpose enviroment there is only defined, undefined and "I think its defined but it's really not."

  • Steve (unregistered)

    Okay, I just woke up and haven't rebooted the brain yet, but are the lines:

     x &= (unsigned)-(signed)x;
      c = 31u;

    really needed? (Assuming the c = 15u; line becomes c -= 16u; of course. That, rather than c = 16u; so when someone extends the code to handle 64-bit ints they don't miss that line).

  • Loren Pechtel (unregistered)

    |It's easier to make out how it works if you show the binary values of those masks. This |is operating a bit like a binary search, going through 5 iterations to "search" through 2 |** 5 bits.

    |Adding a comment like this could have been really helpful:

    |0000FFFF: 00000000000000001111111111111111 |00FF00FF: 00000000111111110000000011111111 |0F0F0F0F: 00001111000011110000111100001111 |33333333: 00110011001100110011001100110011 |55555555: 01010101010101010101010101010101

    Go train under a witch! Programmers should speak hex! :)

  • CmdrKien (unregistered) in reply to Steve


    I suppose more elegantly, you could do c -=1, instead of c=31u. But you do need it to change from the default of 32, and the x &= stuff line needs to happen.

  • David (unregistered)
    Comment held for moderation.
  • Sam (unregistered)
    Comment held for moderation.
  • zainzulfiqar799 (unregistered)
    Comment held for moderation.
  • zainzulfiqar799 (unregistered)
    Comment held for moderation.

Leave a comment on “Wearing a Mask”

Log In or post as a guest

Replying to comment #:

« Return to Article