The Daily WTF: Curious Perversions in Information Technology

Alex Papadimoulis · 2007-02-01 Reply Admin

From GRG, the twelve problems with the last line ...

(1) Floating point numbers can't give an exact result when the numbers involved are basically transcendental and can't be represented exactly in a finite number of bits, so using the log function is basically foobar. (2) There's in FORTRAN a double-precision function "dlog" which will give better results. (3) "range" is a real number, which on most computers can't represent the full range of an integer. (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases. (5) That should be "1.0" anyway. (6) Coercing "range + 1" to real is unnecessary. (7) It should be "1.0D0" to ensure a double precision result. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom. (9) Dividing two inexact numbers will result in even more inexactitude. (10) Converting the result to log base two by dividing by log(2) is correct, but evaluating log base e of two will give an inexact result, as that number is transcendental. (11) And that should be "2.0" or better yet "2.0D0", or better yet, a variable holding the computed value. (12) Taking the int() of the real or double result is WRONG, as the result will sometimes be a few bits shy of the correct value. For example if you plug in "256" the answer is about 7.99981672, which when you int() it, becomes seven, which is off by one.

2007-02-01 Reply Admin

first;-)

Only according to the incorrect rounding output of the code above are you correct.

2007-02-01 Reply Admin

(5) That should be "1.0" anyway. (7) It should be "1.0D0" to ensure a double precision result.

I don’t know Fortran, but ... isn’t that cheating?

2007-02-01 Reply Admin

(13) the number of bits it takes to represent a positive integer n >= 1 is floor(log_2(n)) + 1 not floor(log_2(n+1)) So the function is wrong even without floating-point rounding errors.

2007-02-01 Reply Admin

You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

Expressed in C since I don't know fortran:

nbits = 0;
while(range != 0)
{
   nbits++;
}

2007-02-01 Reply Admin

Gah obviosuly it should have been:

nbits = 0;

while(range != 0)
{
   range >>= range;
   nbits++;
}

Also signedness ahve to be taken into account

2007-02-01 Reply Admin

Jeltz:
You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?
Expressed in C since I don't know fortran:
nbits = 0;
while(range != 0)
{
   nbits++;
}

you mean? nbits = 0; while (range > 0) { nbits++; range >>= 1; }

2007-02-01 Reply Admin

GRG:
(6) Coercing "range + 1" to real is unnecessary. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom.

That's probably why they started saying log(abs(...)+1). Wrong solution, anyone?

2007-02-01 Reply Admin

GRG:
(6) Coercing "range + 1" to real is unnecessary. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom.

That's probably why they started saying log(abs(...)+1). Wrong solution, anyone?

2007-02-01 Reply Admin

For someone who confuses the numbers '2022' and '2048', and seems to shift back and forth between expressing percentages as decimals and whole numbers, the original reporter sure is picky about math.

(I kid, I kid.)

2007-02-01 Reply Admin

ugh, don't make me think.

2007-02-01 Reply Admin

Alex didn't get problem #12 right. To represent 256 you need 9 bits, not eight, so the answer is off by two.

Something like this would fix it.

if (range == 0) { nbits = 0; } // Probably not fortran but you catch the idea. nbits = int( alog( real( range ) ) / alog( 2 ) ) + 1

2007-02-01 Reply Admin

Bithacks, people!

http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog

CAPTCHA : xevious - like a xox!

2007-02-01 Reply Admin

cory:
For someone who confuses the numbers '2022' and '2048', and seems to shift back and forth between expressing percentages as decimals and whole numbers, the original reporter sure is picky about math.
(I kid, I kid.)

I hadn't even got a clue about the original problem. I got 2022 needing 11 bits (2^11 = 2048), but I hadn't got a clue how to turn my 11 crank backwards into 2022, instead of any other number between 1024 and 2047.

Now, getting that 'magic' to happen on one line would be impressive. You've gotta have faith.

2007-02-01 Reply Admin

The real WTF is the 12 complaints.

(1) Yes, floating point numbers can't exactly represent all real numbers. That doesn't make them "foobar".

(2) When you're rounding the result to an integer, who needs double precision?

(3) Reals can't exactly represent the full range of an integer, that is true. However, for the purposes here, they are more than adequate.

(4) The real kludge is ignorance of the ceil() function.

(5) What compiler today isn't smart enough to figure this out?

(6) Duh.

(7) Again the obsession with needless precision.

(8) This comment is actually not a WTF.

(9) So what should the programmer do instead? Multiply?

(10) By this logic, it's a wonder that computers even have a log() function at all. Since they can't exactly represent a transcendental number, what's the point?

(11) Pedantic obsession with precision again.

(12) Yep, it's called round(). Or probably better in this application would be ceil().

msarnoff · 2007-02-01 Reply Admin

(13) The function can be rewritten with two assembly language instructions. Assuming an x86 machine and that the input value is in %eax:

bsr %eax,%eax
inc %eax

WeatherGod · 2007-02-01 Reply Admin

188 separate FORTRAN and C files? Gigabytes of really, really important data? I am going to take a wild stab and guess that this is GEMPAK?

ligne · 2007-02-01 Reply Admin

Jeltz:
You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

nothing, except FORTRAN doesn't have bit-shift operators, as far as i know...

Alexander McLeay:
(5) That should be "1.0" anyway. (7) It should be "1.0D0" to ensure a double precision result.
I don’t know Fortran, but ... isn’t that cheating?

the first implies REAL, the second DOUBLE PRECISION. same distinction as between float and double in C.

bunch of quiche-eaters :p

2007-02-01 Reply Admin

I don't understand 8... The argument to log is always the the sum of the absolute value of some integer and one... the smallest the absolute value could be is zero ... even if zero ends up getting rounded to -0.000000001, 1+ -0.0000001 will never be <= zero

Someone's [captcha] kungfu is off today.

bstorer · 2007-02-01 Reply Admin

Alex Papadimoulis:
From GRG, the twelve problems with the last line ... (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom.

I don't see how the range can ever be negative. abs() will return a value between 0 (inclusive) and infinity (okay, okay, MAX_INT). And the addition of one is a way of avoiding taking log(0). Admittedly, it's not a good way.

2007-02-01 Reply Admin

You could get the same effect by successively multiplying 2 until it is > the number and then counting the number of mults.

2007-02-01 Reply Admin

Could have been worse...could have been:

if (x == 1) { nbits = 1; } else { if (x == 2 || x == 3) { nbits = 2; } else { if (x >= 4 && x < 8) { . . .

2007-02-01 Reply Admin

ligne:
Jeltz:
You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

nothing, except FORTRAN doesn't have bit-shift operators, as far as i know...

Then you simply divide by two instead. Fortran must have division.

Daid · 2007-02-01 Reply Admin

Don't all 12 problems boil down to problem 1: Never send a float to do a longs job?

2007-02-01 Reply Admin

ligne:
nothing, except FORTRAN doesn't have bit-shift operators, as far as i know...

Fortran 77 didn't but the US MIL-STD-1753 specified bitwise operations so most flavours of Fortran 77 ended up supporting these . Fortran 90 does have bitwise operations as standard - ISHFT should do the job

2007-02-01 Reply Admin

I never programmed math related stuff... so this post has passed 5 Miles above me....

I'm Ashamed :(

Captcha: Pointer..... all the way To ignorance!

2007-02-01 Reply Admin

Jeltz:
Gah obviosuly it should have been:
nbits = 0;
while(range != 0)
{
   range >>= range;
   nbits++;
}

Wow, even if range is 100000000, then nbits is only 1. Seems you have found a truly brillant compression algorithm.

Alex Papadimoulis:
(4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases.

This comment is another WTF altogether, the + 1 is not a "fudge factor" but fencepost adjustment. Not adding it means an underestimation by one bit whenever range is 2^n-1. And an underestimation is obviously a lot worse than an overestimation.

2007-02-01 Reply Admin

Michael:
Jeltz:
You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?
Expressed in C since I don't know fortran:
nbits = 0;
while(range != 0)
{
   nbits++;
}
you mean? nbits = 0; while (range > 0) { nbits++; range >>= 1; }

Does't that fail if range is negative?

// while range is not 0; shift right and increment counter nbits = 0; while(range >>= 1) { ++nbits; } return nbits;

ligne · 2007-02-01 Reply Admin

Ben Follis:
You could get the same effect by successively multiplying 2 until it is > the number and then counting the number of mults.

yup. it looks like whoever wrote this copied the formula out of a maths book, and (rather incompetently) converted that to FORTRAN.

WeatherGod:
188 separate FORTRAN and C files? Gigabytes of really, really important data? I am going to take a wild stab and guess that this is GEMPAK?

the application i've been working with (particle physics event simulation) takes the opposite tack. it comes as one 75,000 line file that you compile along with your code. at least i never have to look in there...

2007-02-01 Reply Admin

I believe the correct solution is actually to use C, and to do a binary search for the first none-zero byte before scanning that to find the high bit.

aquanight · 2007-02-01 Reply Admin

ligne:
Jeltz:
You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

nothing, except FORTRAN doesn't have bit-shift operators, as far as i know...

You could just divide by 2. If the compiler's smart enough it'll compile that to a (most likely arithmetic(*)) bitshift instruction instead of a division.

(* = Which preserves the sign as would be appropriate for using it for dividing. Meaning -1 / 2 == -1. So you'd be advised to abs() first. Or use an unsigned type which will make the compiler use a logical shift instead.)

2007-02-01 Reply Admin

K Klein:
The real WTF is the 12 complaints.
(1) Yes, floating point numbers can't exactly represent all real numbers. That doesn't make them "foobar".

(2) When you're rounding the result to an integer, who needs double precision?

(3) Reals can't exactly represent the full range of an integer, that is true. However, for the purposes here, they are more than adequate.

I don't think you understand the problem with floats for this application. Who needs double precision? This application needs more than double precision. For most cases it doesn't matter, but for cases close to a boundary (eg 7.9999...) you could need arbitrarily many digits of precision to get the correct result. (eg 8 vs 7.999...) Because it is rounded, it is very important that the result falls on the correct side of the boundary. The approach used in this code is fundamentally broken because of this.

snoofle · 2007-02-01 Reply Admin

FORTRAN? You need some XML:

<BitCountData>
  <range min="0" max="1"  value="1"/>
  <range min="2" max="3"  value="2"/>
  <range min="4" max="7"  value="3"/>
  <range min="8" max="15" value="4"/>
  ...
  <range min="Math.NEGINFINITY" max="-1" value="FileNotFound"/>
</BitCountData>

Then just add some xslt to do the computation...

2007-02-01 Reply Admin

Don't feel bad Rafael, you say you've never programmed Math . . . I'm only 24 and have never touched FORTRAN, and dont' do much programming, so it's totally out of my range too

2007-02-01 Reply Admin

SnapShot:
Does't that fail if range is negative?

Anyways, when it's negative, how exactly do you want to determine the amount of bits used to represent the number?

I mean, if you are to store a positive number along with a separate sign bit, you'd have to add at least one bit.

However, if you want to know how many bits it takes to keep the binary representation of the integer intact, it will always be the number of bits in the used integer type (e.g. 16, 32, 64 or whatever), since the highest bit will always be 1 if it is negative.

Captcha: kungfu... yeah!

2007-02-01 Reply Admin

In C without bitshifts... Kind of depends on whether you think only negative numbers should account for the sign bit though, otherwise simply set the original bits value to 1...

int BitSize ( int Number ) { int bits = 0 ; int workingNumber = Number ;

if ( workingNumber < 0 ) { bits++ ; workingNumber = -workingNumber ; } while ( workingNumber > 0 ) { workingNumber /= 2 ; bits++ ; } return bits ; }

TGV · 2007-02-01 Reply Admin

(4) isn't a kludge, it doesn't overestimate, it isn't a heuristic: it is correct. The number of integers in [a, b] is b - a + 1. The 2log of that number correctly represents the number of bits required for that range. When a == b, b - a + 1 equals 1, and therefore requires 0 bits. When b and a differ 1, the number of bits needed is 2log(1 + 1) == 1, when a and b differ 3, it is 2log(3+1) = 2, etc.

The only WTF is that there is no guard against imprecision in the floating point computations. And the consequences of that can be truely awful in this case.

2007-02-01 Reply Admin

I hate people whining about floating point being "inexact." Even though it is technically true, a IEEE 754 float or double is the closest possible approximation in the given number of bits. On modern hardware, the error for a double is never greater than 1 part in 2**53.

That error can of course dramatically (say, if you have a loop the runs 100 times, and every loop does an operation that increaes the error by a factor 10, your end result will be nonsense).

But in this case there's a couple of logs and a divide - the imprecision simply isn't a problem here.

2007-02-01 Reply Admin

Eating my own words immediately - the tiny imprecision can hear mean the difference between 7 and 8, because it's a rounding to a precise int that we need. I am stupid.

2007-02-01 Reply Admin

AdT:
Jeltz:
Gah obviosuly it should have been:
nbits = 0;
while(range != 0)
{
   range >>= range;
   nbits++;
}
Wow, even if range is 100000000, then nbits is only 1. Seems you have found a truly brillant compression algorithm.

Alex Papadimoulis:
(4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases.

This comment is another WTF altogether, the + 1 is not a "fudge factor" but fencepost adjustment. Not adding it means an underestimation by one bit whenever range is 2^n-1. And an underestimation is obviously a lot worse than an overestimation.

I had my chuckle for the day. You can't possibly be serious. The algorithm is wrong, period.

Any sort of computer science justification at this point is silly. What next? Remarks about how optimized this is using big O?

2007-02-01 Reply Admin

bstorer:
Alex Papadimoulis:
From GRG, the twelve problems with the last line ... (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom.

I don't see how the range can ever be negative. abs() will return a value between 0 (inclusive) and infinity (okay, okay, MAX_INT). And the addition of one is a way of avoiding taking log(0). Admittedly, it's not a good way.

You don't see how range could ever be negative? Since you brought it up, what is the result of adding 1 to MAX_INT?

2007-02-01 Reply Admin

Right, so I learned fortran for this :-)

program how_many_bits

integer :: num = 2022
integer :: nbits = 0

do
  if ( num == 0 ) exit
  nbits = nbits + 1
  num = num / 2
end do

print *,"nbits = ",nbits
print *,"max", 2**nbits

end program how_many_bits

2007-02-01 Reply Admin

Luci:
SnapShot:
Does't that fail if range is negative?
Anyways, when it's negative, how exactly do you want to determine the amount of bits used to represent the number?
I mean, if you are to store a positive number along with a separate sign bit, you'd have to add at least one bit.

However, if you want to know how many bits it takes to keep the binary representation of the integer intact, it will always be the number of bits in the used integer type (e.g. 16, 32, 64 or whatever), since the highest bit will always be 1 if it is negative.

Captcha: kungfu... yeah!

You can represent a negative number in fewer bits than the integer size. -1 can be represented as 3 using 2 bits (assuming you know there are 2 bits). Two's complement doesn't need a size barrier.

2007-02-01 Reply Admin

Hahahah, oh wow.

Please explain how the value 1 can be represented in 0 bits. I could use a laugh today.

Capcha = gotcha. HOW DOES IT KNOW?

2007-02-01 Reply Admin

On 32-bit PowerPC processors, the CNTLZW opcode counts the leading zero bits in the given number. Subtract that from 32 and you have the answer.

2007-02-01 Reply Admin

really all you need is

int x = 1; while (number > 1) Begin x = x+1 Number = numer / 2 End return x

Also, to comment 1 about floating point accuracy: Any numbering system used in a computer is limited in which numbers it can represent exactly, and which ones it cannot. Floating point represetations can represent some numbers exactly, for standar IEEE floating point representation, those that are some exact binary number raised to some positive or negative integral power of 2 (within certain limits) All the other numbers, must be represented by the closest number from the ones which can be represented.
But this is true for any representation scheme. If you try to represent .8 using integers you have the same issue.

Alex Papadimoulis:
From GRG, the twelve problems with the last line ...
(1) Floating point numbers can't give an exact result when the numbers involved are basically transcendental and can't be represented exactly in a finite number of bits, so using the log function is basically foobar.
(3) "range" is a real number, which on most computers can't represent the full range of an integer. (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases. (5) That should be "1.0" anyway. (6) Coercing "range + 1" to real is unnecessary. (7) It should be "1.0D0" to ensure a double precision result. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom. (9) Dividing two inexact numbers will result in even more inexactitude. (10) Converting the result to log base two by dividing by log(2) is correct, but evaluating log base e of two will give an inexact result, as that number is transcendental. (11) And that should be "2.0" or better yet "2.0D0", or better yet, a variable holding the computed value. (12) Taking the int() of the real or double result is WRONG, as the result will sometimes be a few bits shy of the correct value. For example if you plug in "256" the answer is about 7.99981672, which when you int() it, becomes seven, which is off by one.

2007-02-01 Reply Admin

WeatherGod:
188 separate FORTRAN and C files? Gigabytes of really, really important data? I am going to take a wild stab and guess that this is GEMPAK?

Sounds like AIPS. The source download is in the region of 90MiB, is written in FORTRAN and C, doesn't use Makefiles (some horrible mess with a gazillion shell scripts to do the linking and compilation), DOS 8.3 file names and the 'versioning' is based on release dates (always 31DECYY where YY is the two digit year number) Wooh!

http://en.wikipedia.org/wiki/AIPS

I bet there are some WTF's there.

2007-02-01 Reply Admin

msarnoff:
(13) The function can be rewritten with two assembly language instructions. Assuming an x86 machine and that the input value is in %eax:
bsr %eax,%eax
inc %eax

Ahh, good to see there are still assembler users out there :)

However, you need to fix this code to eg.

bsr %eax, %eax
jnz .not_zero
xor %eax, %eax
.not_zero:
inc %eax

because bsr gives undefined results if the source operand is zero (this code returns 1 if input == 0, can easily be changed to return 0 instead).

2007-02-01 Reply Admin

Remco Gerlich:
I hate people whining about floating point being "inexact." Even though it is technically true, a IEEE 754 float or double is the closest possible approximation in the given number of bits. On modern hardware, the error for a double is never greater than 1 part in 2**53.

"closest possible" means "inexact" when it's not possible to be exact.

Remember, in binary, no .1 for you!

2007-02-01 Reply Admin

Jeltz:
You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?
Expressed in C since I don't know fortran:
nbits = 0;
while(range != 0)
{
   nbits++;
}

That's silly. The answer is always going to be in a limited range, so just write a static lookup table with all the values pre-computed. You will get much better performance, it is easy to translate to other languages, and it is trivial to bench test.

Twelve on a Line

Leave a comment on “Twelve on a Line”