find number of Bits required to represent a value

2007-02-01 Reply Admin

You know, they could have just used Youd's constant to figure that out. For more information visit: http://www.youdzone.com/youds_constant.html

bstorer · 2007-02-01 Reply Admin

You don't see how range could ever be negative?

Actually, range is never negative, for the simple reason that abs cannot return a negative number. There are, however, situations where it's a nonsense number: abs(MIN_INT - 1) for example.

Since you brought it up, what is the result of adding 1 to MAX_INT?

Because range is a real, it (more than likely) has a larger maximum value than MAX_INT, meaning that range + 1 would be fine so long as your compiler promotes 1 to a real, which any good compiler should do.

2007-02-01 Reply Admin

Duston:
Could have been worse...could have been:
if (x == 1) { nbits = 1; } else { if (x == 2 || x == 3) { nbits = 2; } else { if (x >= 4 && x < 8) { . . .

It'd be better with a switch/case.

2007-02-01 Reply Admin

See http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious for a table lookup version (among others) of integer log2

2007-02-01 Reply Admin

WTF? That's just log_10(2). This constant does nothing!

2007-02-01 Reply Admin

Duston:
Could have been worse...could have been:
if (x == 1) { nbits = 1; } else { if (x == 2 || x == 3) { nbits = 2; } else { if (x >= 4 && x < 8) { . . .

Something like that may have been brain-dead (although consider that fortran doesn't have bit operators), but at least it would work correctly. As previously pointed out, the only real wtf here is using floating point (with the inevitable rounding error) to do integer calculations. Once he figured out what the code was supposed to be doing, that list of everything else is irrelevant.

2007-02-01 Reply Admin

He's saying that the difference between a and b when a == b requires zero bits.

a - b + 1 = 1

log_2(1) = 0

Ergo, the number of bits needed to represent (a - b) is zero. Yes, his wording is a bit weird, but that is (I believe) what he meant. In a real application of this, you probably want to reserve at least one bit for a zero value to indicate that there actually IS a value (and that it is zero), but maybe the person programming works for Oracle and doesn't think distinguishing between those two things is very important.

2007-02-01 Reply Admin

msarnoff:
(13) The function can be rewritten with two assembly language instructions. Assuming an x86 machine and that the input value is in %eax:
bsr %eax,%eax
inc %eax

Since the program processes gigabytes of data using Fortran, assuming x86 would be major WTF.

Captcha: darwin If Darwin was right, Fortran would be extinct

2007-02-01 Reply Admin

whicker:
I had my chuckle for the day. You can't possibly be serious. The algorithm is wrong, period.
Any sort of computer science justification at this point is silly. What next? Remarks about how optimized this is using big O?

I'm not sure what you're getting at here. If you're saying the big O of an algorithm doesn't matter, you're an idiot. Just because you don't understand computer science doesn't mean it's not important. Anyway, the previous poster wasn't arguing that the original code was correct, just that the "+ 1" in the correct formula from previous comments was necessary.

aquanight · 2007-02-01 Reply Admin

charles bretana:
for standar IEEE floating point representation, those that are some exact binary number raised to some positive or negative integral power of 2 (within certain limits) All the other numbers, must be represented by the closest number from the ones which can be represented. But this is true for any representation scheme. If you try to represent .8 using integers you have the same issue.

Maybe a BCD floating point would solve most accuracy problems. (Yes I know of at least one "system" (actually, it's a calculator) that uses BCD for floating point math.)

2007-02-01 Reply Admin

I don't think you understand the problem with floats for this application. Who needs double precision? This application needs more than double precision. For most cases it doesn't matter, but for cases close to a boundary (eg 7.9999...) you could need arbitrarily many digits of precision to get the correct result. (eg 8 vs 7.999...) Because it is rounded, it is very important that the result falls on the correct side of the boundary. The approach used in this code is fundamentally broken because of this.

Um, the inputs (min and max) are integers. Therefore the logs only need to be accurate to one part in 2^31, which is 10 significant digits.

So maybe I was a bit hasty in dismissing the need for double precision, but for this application floating point works just fine.

2007-02-01 Reply Admin

Adrian Milliner:

Right, so I learned fortran for this :-)

program how_many_bits
integer :: num = 2022
integer :: nbits = 0
do
if ( num == 0 ) exit
nbits = nbits + 1
num = num / 2
end do
print *,"nbits = ",nbits
print *,"max", 2**nbits
end program how_many_bits

That's not FORTRAN! Why isn't everything indented to column 7? What's with the :: ? end do?

Sheesh! You kids today...

2007-02-01 Reply Admin

Not having Fortran readily available, I just translated the code snippet into Perl, and it seems to give incorrect results (too small by 1) for many possible inputs, starting with f(4) = 2 instead of 3. As was mentioned before, the correct final operation is ceil(), not int() or floor(). I think the biggest WTF is that a routine that obviously does not work properly has been allowed to remain in software used by hundreds of people for years.

ligne · 2007-02-01 Reply Admin

Crotchy Old Dude:
Captcha: darwin If Darwin was right, Fortran would be extinct

i get the feeling that it's precisely because of WTFs like this that FORTRAN isn't extinct. the system becomes so complicated that a complete re-write (plus all the testing that that entails) becomes such an un-appealing task that no-one wants to take it on.

punissuer · 2007-02-01 Reply Admin

punissuer:
starting with f(4) = 2 instead of 3

Make that f(2) = 1 as the first point of departure from reality.

2007-02-01 Reply Admin

Wow. Lots that do not work with the shifting etc.

static int Bits(long range) { int nbits = 1; while ((range >>= 1) > 0) { nbits++; } return nbits; }

Works for all positives. (You still need 1 bit to represent zero)

2007-02-01 Reply Admin

There's still the fact that the original code almost all of the time. So the code ain't that bad. Just because a tiny bit of refactoring of the logic is in order it seems a bit crazy to see the need to enumerate in detail the reasons why it needs to be changed in the light of some of its now known failure conditions. Classic case of 20/20 hindsight. Most bugs look like a WTF when you find the root cause of it.

In general this list seems like some sort of vindictive move for being tasked with tracking this difficult bug down. In my experiance this happens when some programmer is tasked with helping find a bug when he/she normally doesn't do that sort of thing and is used to writing new code most of the time. Bug-fixers (maintainers) see this sort of thing all the time but going off on the original programmer is a no-no at most shops.

ligne · 2007-02-01 Reply Admin

Fred Flintstone:
That's not FORTRAN! Why isn't everything indented to column 7? What's with the :: ? end do?
Sheesh! You kids today...

yup. watch out for the 72 character cut-off, too (after all, if if that was enough for my grand-parents' generation, it's certainly enough for me now)

2007-02-01 Reply Admin

Uhm... All you guys pasting C code to solve this problem: You know about the ffs() function, right? (OK, it's UNIX-specific, but still...)

2007-02-01 Reply Admin

KingNetSurfer:
Don't feel bad Rafael, you say you've never programmed Math . . . I'm only 24 and have never touched FORTRAN, and dont' do much programming, so it's totally out of my range too

Anyone under the age of thirty who has used FORTRAN needs to get a life!

SeekerDarksteel · 2007-02-01 Reply Admin

Erwin Bolwidt:
Alex didn't get problem #12 right. To represent 256 you need 9 bits, not eight, so the answer is off by two.

To represent the number 256 you need 9 bits. To represent a range of 256 different numbers, 0 to 255 for example, you only need 8.

ligne · 2007-02-01 Reply Admin

JTK:
Anyone under the age of thirty who has used FORTRAN needs to get a life!

oi! some of us don't get much choice in the matter!

2007-02-01 Reply Admin

Steinar H. Gunderson:
Uhm... All you guys pasting C code to solve this problem: You know about the ffs() function, right? (OK, it's UNIX-specific, but still...)

How is knowing the position of the least significant set bit helpful? ffs(n)=1 for every odd n.

2007-02-01 Reply Admin

K Klein:
Um, the inputs (min and max) are integers. Therefore the logs only need to be accurate to one part in 2^31, which is 10 significant digits.
So maybe I was a bit hasty in dismissing the need for double precision, but for this application floating point works just fine.

Since when are integers only allowed to be 32 bits?

Try ceil(log((1UL << 63) + 1)/log(2)) on a 64-bit machine!

Captcha: darwin (survival of the precisest).

2007-02-01 Reply Admin

Sean:
I'm not sure what you're getting at here. If you're saying the big O of an algorithm doesn't matter, you're an idiot. Just because you don't understand computer science doesn't mean it's not important. Anyway, the previous poster wasn't arguing that the original code was correct, just that the "+ 1" in the correct formula from previous comments was necessary.

Lighten up. That I was anticipating people giving partial credit for the poor implementation was what I found so humorous. I was doubting that the 'fencepost' thinking even came into play with the snippet of this article. Sheesh.

2007-02-01 Reply Admin

For most languages, (C, Pascal, Java, etc) the loop is the natural approach. However, in my experience Fortran users tend towards mathematical solutions, so here's my attempt:

double drange drange = dabs(max, min) nbits = int((dlog(drange + 1.0d0) / dlog(2.0d0)) + 1.0d-10)

As others have noted, double precision is accurate to 53 bits, so it will preserve the 31 bit (positive) integer accuracy. You need to stay in double precision (e.g. abs() is single precision). The +1.d-10 rounding constant is bigger than any possible double precision error (~10^-16) but smaller than 2^-31 ~ 2*10^-9.

And, depending on the likely values of max and min, may be faster than the loop solution.

2007-02-01 Reply Admin

One again, the comments to the Daily WTF make it abundantly clear why these WTFs are so common.

2007-02-01 Reply Admin

As bad as it is, this overstates the problems.

Using double precision DLOG doesn't really fix the problem, although it does make it more unlikely.

Adding 1 to a real (instead of 1.0 or even 1.0D0) requires converting integer 1 to real. Any fortran worth its salt will do this at compile time, so there's no difference there. Same goes for 2.0D0. All of these can be exactly represented in single proecision real.

The real problem is using floating-point numbers here instead of the ISHIFT intrinsic or a table lookup.

2007-02-01 Reply Admin

aquanight:
charles bretana:
for standar IEEE floating point representation, those that are some exact binary number raised to some positive or negative integral power of 2 (within certain limits) All the other numbers, must be represented by the closest number from the ones which can be represented. But this is true for any representation scheme. If you try to represent .8 using integers you have the same issue.

Maybe a BCD floating point would solve most accuracy problems. (Yes I know of at least one "system" (actually, it's a calculator) that uses BCD for floating point math.)

Not likely. Express 1/3 exactly using BCD. You can't. Only way to exactly represent some real numbers is to use alternate bases. 1/3 cannot be exactly represented as a floaing point in base 10. However, 0.1 in base 3 does.

Typically BCD calculators are used for an arbitrary precision and scale. You'll typically see these used for currency instead of floats. (most databases have this datatype, as well, check out 'decimal').

2007-02-01 Reply Admin

For a really cool solution, see http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2

(C only, requires bitwise or and bit shift). Took me a while to understand it :)

For fortran, I'd take the loop or even the unrolled loop over the floating point calculations - because the latter is so much harder to prove correct.

captcha:muhahaha

2007-02-01 Reply Admin

before anyone gets verbally medieval on me for the "C only": I change that to "from the two languages given, and assuming that fortran indeed can't do bitwise or shift operations".

phew.

2007-02-01 Reply Admin

Probably a stupid question, but when would one need to determine the number of storage bits programatically?

betaray · 2007-02-01 Reply Admin

The real WTF to me is why would you need to know the minimum number of bits to store a number? I can see what's the smallest data type that'll hold this number, but why the exact number of bits?

gwenhwyfaer · 2007-02-01 Reply Admin

Welbog:
Please explain how the value 1 can be represented in 0 bits.

Any value can be expressed in 0 bits if it's the only possible value...

Consider. You have a range of numbers between A and B, you know how many you have, and you need to know the fewest bits to represent each of them.

Now, suppose B=A. How many different values do you have to encode? 1. How many bits are you going to use to represent them..?

As a note, I wouldn't be bothering to post this, but you made such a meal of your derision for the poster you were replying to, despite the fact the only ignorance on show was your own. You might want to think about that next time you indulge in playground point'n'laugh.

gwenhwyfaer · 2007-02-01 Reply Admin

Zonkers:
There's still the fact that the original code almost all of the time.

Just as that sentence has almost all of its words in it. ;)

However, you're spot on about maintainers; in fact the feeling I usually get when I manage to track down a weird bug isn't vindictiveness, but euphoria. It's like doing a crossword puzzle. I suspect I'm in a relatively rare group of people who actually likes maintenance programming; I'm at my happiest with a big pile of bugs to fix and no external interference, especially as I'm learning my way around a codebase.

mbessey · 2007-02-01 Reply Admin

betaray:
The real WTF to me is why would you need to know the minimum number of bits to store a number?

The most common application is in data compression, or at least that's where I've run into it. Suppose you want to store a very large set of numbers between 719 and 782 in the smallest number of bytes.

By doing this sort of analysis, you can determine that 6 bits are all you need for each value. By packing those 6-bit values as closely as possible, you can fit 10 of them into 8 bytes. The more straightforward solution of using a byte for each value requires 25% more space.

qbolec · 2007-02-01 Reply Admin

I find only three problems with this snippet.

it should have been a function since it's used so often
it should have been using integers
it should have use ceil() instead of int().

The logic of the function is obvious:

check the number of distinct elements in the range
take the base 2 logarithm to see how many bits of information must be sent to distinguish between them
(missing) take the ceil of the result, since you can't send a fraction of the bit.

Maybe it would have been more clear to write: range= max-min+1; Instead of adding one in the call to the log().

I guess this function was used for some kind of compression. Then the real WTF is, that there were GB of data [intact].

2007-02-01 Reply Admin

Remco Gerlich:
Eating my own words immediately - the tiny imprecision can hear mean the difference between 7 and 8, because it's a rounding to a precise int that we need. I am stupid.

The important thing to keep in mind is that accuracy degrades whenever you subtract two numbers that are close together and are real numbers that cannot be represented exactly in your hardware. Unfortunately that normally means any serious number crunching as it is not merely transcendantals that fall in that category but rather the vast majority of decimal fractions that cannot be represented exactly as binary floating point numbers.

2007-02-01 Reply Admin

aquanight:
charles bretana:
for standar IEEE floating point representation, those that are some exact binary number raised to some positive or negative integral power of 2 (within certain limits) All the other numbers, must be represented by the closest number from the ones which can be represented. But this is true for any representation scheme. If you try to represent .8 using integers you have the same issue.

Maybe a BCD floating point would solve most accuracy problems. (Yes I know of at least one "system" (actually, it's a calculator) that uses BCD for floating point math.)

BCD floating point should be used for all financial calculations and most calculations where the end result is shown to the user. The problem is that there aren't that many good implementations (Java didn't get it right until Java 5), and I don't think there is a hardware BCD floating point implementation even today.

2007-02-01 Reply Admin

Some old C++-code of mine that uses binary search to do the same thing (a different kind of WTF!, I guess):

template <typename T>
inline int iterative_bits_used(T value)
{
    static const int initial_mask_size = sizeof(T) * 4;
    static const T initial_mask = ((1 << initial_mask_size) - 1) << initial_mask_size;

    if (value <= 2)
	return value;

    int count = 0;
    int mask_size = initial_mask_size;
    T mask = initial_mask;

    do
    {
	if (value & mask)
	{
	    count += mask_size;
	    value >>= mask_size;
	}
	mask_size >>= 1;
	mask >>= mask_size;
    }
    while (value > 2);
    
    return count + value;
}

And then, to find log2:

unsigned range = max - min; // Can be any unsigned integer type.
int nbits = iterative_bits_used(range);

2007-02-01 Reply Admin

Well, my fortran is non-existant. So I'll throw in some perl instead and since I need the POSIX module, it should be possible in C

find number of Bits required to represent a value

and the max value those bits can represent

use POSIX;

#$Val = some integer value

$log2 = log(2); $logV = log($Val);

($frac, $int) = modf($logV/$log2);

$bits = ($int + ceil($frac)); $max = pow(2, $bits);

The inability of floating point math to properly represent integers is largely irrelevant. If there is a fractional result from the modf() then we have to add 1 to $int to obtain the necessary number of bits. If fortran can do a modulus operation with real numbers then you may be able to test for a mod greater than 0 and add 1.

2007-02-01 Reply Admin

Why do you people think this is such a WTF? It's not QUITE correct but not off the deep end, either. So it uses floats, which is probably not optimal, but you people seem to be struggling with the basic MATH. You people write code?

To figure the number of bits required to represent a number N:

nbits = ceil(log(N+1)/log(2)).

What's so damn hard? And the +1 is not a kludge, it's what MAKES IT WORK. Find a value of N where it doesn't give the right answer, I dare you.

God you people suck at math.

And negative values of N don't count. Why? Don't be stupid -- a negative value in two's complement always has a 1 in the highest bit, so the number of bits required to represent any negative value is always the same.

triso · 2007-02-01 Reply Admin

anonymous Tom:
For most languages, (C, Pascal, Java, etc) the loop is the natural approach. However, in my experience Fortran users tend towards mathematical solutions, so here's my attempt:
double drange drange = dabs(max, min) nbits = int((dlog(drange + 1.0d0) / dlog(2.0d0)) + 1.0d-10)

As others have noted, double precision is accurate to 53 bits, so it will preserve the 31 bit (positive) integer accuracy. You need to stay in double precision (e.g. abs() is single precision). The +1.d-10 rounding constant is bigger than any possible double precision error (~10^-16) but smaller than 2^-31 ~ 2*10^-9.

And, depending on the likely values of max and min, may be faster than the loop solution.

It doesn't matter which one is faster if one or more is not correct.

Ancient_Hacker · 2007-02-01 Reply Admin

To figure the number of bits required to represent a number N:

nbits = ceil(log(N+1)/log(2)).

??? You must be the guy that wrote the original code!

Think *

(1) Why the +1 in there ? (2) If you take out the +1, log(N) / log(2) might end up being 7.9999999 or 8.0000001, depending on the precision of the math. taking the ceiling of those numbers gives different results. So how is this adequate?

And negative values of N don't count.

Well, they do, as log blows up. it would be nice if the program could tell the user about this, or maybe continue, instead of getting a run-time error.

2007-02-01 Reply Admin

Assuming N is a positive integer, the number of bits is either

1 + floor( log(N) / log(2) )

or

ceil( log(N+1) / log(2) )

The former is better suited to bit-twiddling implementations though, so should be considered superior.

2007-02-01 Reply Admin

Ancient_Hacker:
Why the +1 in there ?

Uh, to make it correct? Think of the case N=1... It's not in there because of rounding issues, it's in there because IT'S NOT RIGHT without it.

Ancient_Hacker:
If you take out the +1, log(N) / log(2) might end up being 7.9999999 or 8.0000001, depending on the precision of the math. taking the ceiling of those numbers gives different results. So how is this adequate?

So you're saying, doing things the wrong way leads to wrong answers? "Brillant" insight, there.

BTW, I wrote a program which checks every value in the 32 bit integer domain, and the answer is right in ALL CASES, using IEEE floating point math. What more do you want?

2007-02-01 Reply Admin

Ancient_Hacker:
(1) Why the +1 in there ?

Because the number of items in a range is found by taking the maximum item, subtracting the minimum item, and adding 1.

If there are a groups of apples numbered 2,3,4,5,6,7 and you wanted to know how many there are, that would be: (7 - 2) + 1 = 6

Ancient_Hacker:
(2) If you take out the +1, log(N) / log(2) might end up being 7.9999999 or 8.0000001, depending on the precision of the math. taking the ceiling of those numbers gives different results. So how is this adequate?

Taking the +1 out of the log function doesn't change the precision of the results. It changes what problem you are solving. You're still likely to get "rounding errors." However, by taking the ceil()ing of the result, you get an accurate integer power of two which answers the question. The operation isn't reversible if the desired result is a number of bits--since we can't have a fractional bit.

Ancient_Hacker:
>And negative values of N don't count.
Well, they do, as log blows up. it would be nice if the program could tell the user about this, or maybe continue, instead of getting a run-time error.

They don't count, since N is a range whose value is returned from abs(), which returns either zero or a positive number. There will be no blowing up of log, as only legal values will ever reach it.

2007-02-01 Reply Admin

Ooh, maths... pretty... wouldn't the "a" of a log be an exp?

Come to think of it it's amazing what you can still develop without using either logs or antilogs.

2007-02-01 Reply Admin

Duston:
Could have been worse...could have been:
if (x == 1) { nbits = 1; } else { if (x == 2 || x == 3) { nbits = 2; } else { if (x >= 4 && x < 8) { . . .

But if that's the case, the function would be clearly understandable and easily verified as correct, much much better than the original WTF. So what if it takes log(N) operations to get the result for input N? We are talking about very important data here.

2007-02-01 Reply Admin

Jeltz:

nbits = 0;
while(range != 0)
{
   range >>= range;
   nbits++;
}

Almost, but you've got a bug in your bit shifting. And it can be done on one line.

int bits;
for (bits = 0; number != 0; number >>= 1; bits++) ;

Twelve on a Line

find number of Bits required to represent a value

and the max value those bits can represent

Leave a comment on “Twelve on a Line”