• (cs)

    From GRG, the twelve problems with the last line ...

    (1) Floating point numbers can't give an exact result when the numbers involved are basically transcendental and can't be represented exactly in a finite number of bits, so using the log function is basically foobar. (2) There's in FORTRAN a double-precision function "dlog" which will give better results. (3) "range" is a real number, which on most computers can't represent the full range of an integer. (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases. (5) That should be "1.0" anyway. (6) Coercing "range + 1" to real is unnecessary. (7) It should be "1.0D0" to ensure a double precision result. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom. (9) Dividing two inexact numbers will result in even more inexactitude. (10) Converting the result to log base two by dividing by log(2) is correct, but evaluating log base e of two will give an inexact result, as that number is transcendental. (11) And that should be "2.0" or better yet "2.0D0", or better yet, a variable holding the computed value. (12) Taking the int() of the real or double result is WRONG, as the result will sometimes be a few bits shy of the correct value. For example if you plug in "256" the answer is about 7.99981672, which when you int() it, becomes seven, which is off by one.

  • Frank (unregistered)
    first;-)
    Only according to the incorrect rounding output of the code above are you correct.
  • Alexander McLeay (unregistered) in reply to Alex Papadimoulis

    (5) That should be "1.0" anyway. (7) It should be "1.0D0" to ensure a double precision result.

    I don’t know Fortran, but ... isn’t that cheating?

  • Welbog (unregistered) in reply to Alex Papadimoulis

    (13) the number of bits it takes to represent a positive integer n >= 1 is floor(log_2(n)) + 1 not floor(log_2(n+1)) So the function is wrong even without floating-point rounding errors.

  • Jeltz (unregistered)

    You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

    Expressed in C since I don't know fortran:

    nbits = 0;
    while(range != 0)
    {
       nbits++;
    }
    
  • Jeltz (unregistered) in reply to Jeltz

    Gah obviosuly it should have been:

    nbits = 0;

    while(range != 0)
    {
       range >>= range;
       nbits++;
    }
    

    Also signedness ahve to be taken into account

  • Michael (unregistered) in reply to Jeltz
    Jeltz:
    You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

    Expressed in C since I don't know fortran:

    nbits = 0;
    while(range != 0)
    {
       nbits++;
    }
    
    you mean? nbits = 0; while (range > 0) { nbits++; range >>= 1; }
  • mooo (unregistered) in reply to Alex Papadimoulis
    GRG:
    (6) Coercing "range + 1" to real is unnecessary. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom.

    That's probably why they started saying log(abs(...)+1). Wrong solution, anyone?

  • mooo (unregistered) in reply to Alex Papadimoulis
    GRG:
    (6) Coercing "range + 1" to real is unnecessary. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom.

    That's probably why they started saying log(abs(...)+1). Wrong solution, anyone?

  • cory (unregistered) in reply to Alex Papadimoulis

    For someone who confuses the numbers '2022' and '2048', and seems to shift back and forth between expressing percentages as decimals and whole numbers, the original reporter sure is picky about math.

    (I kid, I kid.)

  • steven22 (unregistered)

    ugh, don't make me think.

  • Erwin Bolwidt (unregistered) in reply to Alex Papadimoulis

    Alex didn't get problem #12 right. To represent 256 you need 9 bits, not eight, so the answer is off by two.

    Something like this would fix it.

    if (range == 0) { nbits = 0; } // Probably not fortran but you catch the idea. nbits = int( alog( real( range ) ) / alog( 2 ) ) + 1

  • lab27 (unregistered)

    Bithacks, people!

    http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog

    CAPTCHA : xevious - like a xox!

  • f@ (unregistered) in reply to cory
    cory:
    For someone who confuses the numbers '2022' and '2048', and seems to shift back and forth between expressing percentages as decimals and whole numbers, the original reporter sure is picky about math.

    (I kid, I kid.)

    I hadn't even got a clue about the original problem. I got 2022 needing 11 bits (2^11 = 2048), but I hadn't got a clue how to turn my 11 crank backwards into 2022, instead of any other number between 1024 and 2047.

    Now, getting that 'magic' to happen on one line would be impressive. You've gotta have faith.

  • K Klein (unregistered) in reply to Alex Papadimoulis

    The real WTF is the 12 complaints.

    (1) Yes, floating point numbers can't exactly represent all real numbers. That doesn't make them "foobar".

    (2) When you're rounding the result to an integer, who needs double precision?

    (3) Reals can't exactly represent the full range of an integer, that is true. However, for the purposes here, they are more than adequate.

    (4) The real kludge is ignorance of the ceil() function.

    (5) What compiler today isn't smart enough to figure this out?

    (6) Duh.

    (7) Again the obsession with needless precision.

    (8) This comment is actually not a WTF.

    (9) So what should the programmer do instead? Multiply?

    (10) By this logic, it's a wonder that computers even have a log() function at all. Since they can't exactly represent a transcendental number, what's the point?

    (11) Pedantic obsession with precision again.

    (12) Yep, it's called round(). Or probably better in this application would be ceil().

  • (cs)

    (13) The function can be rewritten with two assembly language instructions. Assuming an x86 machine and that the input value is in %eax:

    bsr %eax,%eax
    inc %eax
  • (cs)

    188 separate FORTRAN and C files? Gigabytes of really, really important data? I am going to take a wild stab and guess that this is GEMPAK?

  • (cs) in reply to Jeltz
    Jeltz:
    You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

    nothing, except FORTRAN doesn't have bit-shift operators, as far as i know...

    Alexander McLeay:
    (5) That should be "1.0" anyway. (7) It should be "1.0D0" to ensure a double precision result.

    I don’t know Fortran, but ... isn’t that cheating?

    the first implies REAL, the second DOUBLE PRECISION. same distinction as between float and double in C.

    bunch of quiche-eaters :p

  • some guy (unregistered) in reply to Alex Papadimoulis

    I don't understand 8... The argument to log is always the the sum of the absolute value of some integer and one... the smallest the absolute value could be is zero ... even if zero ends up getting rounded to -0.000000001, 1+ -0.0000001 will never be <= zero

    Someone's [captcha] kungfu is off today.

  • (cs) in reply to Alex Papadimoulis
    Alex Papadimoulis:
    From GRG, the twelve problems with the last line ... (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom.

    I don't see how the range can ever be negative. abs() will return a value between 0 (inclusive) and infinity (okay, okay, MAX_INT). And the addition of one is a way of avoiding taking log(0). Admittedly, it's not a good way.

  • Ben Follis (unregistered) in reply to ligne

    You could get the same effect by successively multiplying 2 until it is > the number and then counting the number of mults.

  • Duston (unregistered)

    Could have been worse...could have been:

    if (x == 1) { nbits = 1; } else { if (x == 2 || x == 3) { nbits = 2; } else { if (x >= 4 && x < 8) { . . .

  • Jeltz (unregistered) in reply to ligne
    ligne:
    Jeltz:
    You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

    nothing, except FORTRAN doesn't have bit-shift operators, as far as i know...

    Then you simply divide by two instead. Fortran must have division.

  • (cs)

    Don't all 12 problems boil down to problem 1: Never send a float to do a longs job?

  • Tyche (unregistered) in reply to ligne
    ligne:
    nothing, except FORTRAN doesn't have bit-shift operators, as far as i know...

    Fortran 77 didn't but the US MIL-STD-1753 specified bitwise operations so most flavours of Fortran 77 ended up supporting these . Fortran 90 does have bitwise operations as standard - ISHFT should do the job

  • Rafael Larios (unregistered)

    I never programmed math related stuff... so this post has passed 5 Miles above me....

    I'm Ashamed :(

    Captcha: Pointer..... all the way To ignorance!

  • AdT (unregistered) in reply to Jeltz
    Jeltz:
    Gah obviosuly it should have been:

    nbits = 0;

    while(range != 0)
    {
       range >>= range;
       nbits++;
    }
    

    Wow, even if range is 100000000, then nbits is only 1. Seems you have found a truly brillant compression algorithm.

    Alex Papadimoulis:
    (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases.

    This comment is another WTF altogether, the + 1 is not a "fudge factor" but fencepost adjustment. Not adding it means an underestimation by one bit whenever range is 2^n-1. And an underestimation is obviously a lot worse than an overestimation.

  • SnapShot (unregistered) in reply to Michael
    Michael:
    Jeltz:
    You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

    Expressed in C since I don't know fortran:

    nbits = 0;
    while(range != 0)
    {
       nbits++;
    }
    
    you mean? nbits = 0; while (range > 0) { nbits++; range >>= 1; }

    Does't that fail if range is negative?

    // while range is not 0; shift right and increment counter nbits = 0; while(range >>= 1) { ++nbits; } return nbits;

  • (cs) in reply to Ben Follis
    Ben Follis:
    You could get the same effect by successively multiplying 2 until it is > the number and then counting the number of mults.

    yup. it looks like whoever wrote this copied the formula out of a maths book, and (rather incompetently) converted that to FORTRAN.

    WeatherGod:
    188 separate FORTRAN and C files? Gigabytes of really, really important data? I am going to take a wild stab and guess that this is GEMPAK?

    the application i've been working with (particle physics event simulation) takes the opposite tack. it comes as one 75,000 line file that you compile along with your code. at least i never have to look in there...

  • Marcin (unregistered) in reply to Jeltz

    I believe the correct solution is actually to use C, and to do a binary search for the first none-zero byte before scanning that to find the high bit.

  • (cs) in reply to ligne
    ligne:
    Jeltz:
    You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

    nothing, except FORTRAN doesn't have bit-shift operators, as far as i know...

    You could just divide by 2. If the compiler's smart enough it'll compile that to a (most likely arithmetic(*)) bitshift instruction instead of a division.

    (* = Which preserves the sign as would be appropriate for using it for dividing. Meaning -1 / 2 == -1. So you'd be advised to abs() first. Or use an unsigned type which will make the compiler use a logical shift instead.)

  • Tom (unregistered) in reply to K Klein
    K Klein:
    The real WTF is the 12 complaints.

    (1) Yes, floating point numbers can't exactly represent all real numbers. That doesn't make them "foobar".

    (2) When you're rounding the result to an integer, who needs double precision?

    (3) Reals can't exactly represent the full range of an integer, that is true. However, for the purposes here, they are more than adequate.

    I don't think you understand the problem with floats for this application. Who needs double precision? This application needs more than double precision. For most cases it doesn't matter, but for cases close to a boundary (eg 7.9999...) you could need arbitrarily many digits of precision to get the correct result. (eg 8 vs 7.999...) Because it is rounded, it is very important that the result falls on the correct side of the boundary. The approach used in this code is fundamentally broken because of this.

  • (cs) in reply to Rafael Larios

    FORTRAN? You need some XML:

    <BitCountData>
      <range min="0" max="1"  value="1"/>
      <range min="2" max="3"  value="2"/>
      <range min="4" max="7"  value="3"/>
      <range min="8" max="15" value="4"/>
      ...
      <range min="Math.NEGINFINITY" max="-1" value="FileNotFound"/>
    </BitCountData>
    
    Then just add some xslt to do the computation...
    
  • KingNetSurfer (unregistered) in reply to Rafael Larios

    Don't feel bad Rafael, you say you've never programmed Math . . . I'm only 24 and have never touched FORTRAN, and dont' do much programming, so it's totally out of my range too

  • Luci (unregistered) in reply to SnapShot
    SnapShot:
    Does't that fail if range is negative?
    Anyways, when it's negative, how exactly do you want to determine the amount of bits used to represent the number?

    I mean, if you are to store a positive number along with a separate sign bit, you'd have to add at least one bit.

    However, if you want to know how many bits it takes to keep the binary representation of the integer intact, it will always be the number of bits in the used integer type (e.g. 16, 32, 64 or whatever), since the highest bit will always be 1 if it is negative.

    Captcha: kungfu... yeah!

  • Mike (unregistered)

    In C without bitshifts... Kind of depends on whether you think only negative numbers should account for the sign bit though, otherwise simply set the original bits value to 1...

    int BitSize ( int Number ) { int bits = 0 ; int workingNumber = Number ;

    if ( workingNumber < 0 ) { bits++ ; workingNumber = -workingNumber ; } while ( workingNumber > 0 ) { workingNumber /= 2 ; bits++ ; } return bits ; }

  • (cs) in reply to Alex Papadimoulis

    (4) isn't a kludge, it doesn't overestimate, it isn't a heuristic: it is correct. The number of integers in [a, b] is b - a + 1. The 2log of that number correctly represents the number of bits required for that range. When a == b, b - a + 1 equals 1, and therefore requires 0 bits. When b and a differ 1, the number of bits needed is 2log(1 + 1) == 1, when a and b differ 3, it is 2log(3+1) = 2, etc.

    The only WTF is that there is no guard against imprecision in the floating point computations. And the consequences of that can be truely awful in this case.

  • Remco Gerlich (unregistered)

    I hate people whining about floating point being "inexact." Even though it is technically true, a IEEE 754 float or double is the closest possible approximation in the given number of bits. On modern hardware, the error for a double is never greater than 1 part in 2**53.

    That error can of course dramatically (say, if you have a loop the runs 100 times, and every loop does an operation that increaes the error by a factor 10, your end result will be nonsense).

    But in this case there's a couple of logs and a divide - the imprecision simply isn't a problem here.

  • Remco Gerlich (unregistered) in reply to Remco Gerlich

    Eating my own words immediately - the tiny imprecision can hear mean the difference between 7 and 8, because it's a rounding to a precise int that we need. I am stupid.

  • whicker (unregistered) in reply to AdT
    AdT:
    Jeltz:
    Gah obviosuly it should have been:

    nbits = 0;

    while(range != 0)
    {
       range >>= range;
       nbits++;
    }
    

    Wow, even if range is 100000000, then nbits is only 1. Seems you have found a truly brillant compression algorithm.

    Alex Papadimoulis:
    (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases.

    This comment is another WTF altogether, the + 1 is not a "fudge factor" but fencepost adjustment. Not adding it means an underestimation by one bit whenever range is 2^n-1. And an underestimation is obviously a lot worse than an overestimation.

    I had my chuckle for the day. You can't possibly be serious. The algorithm is wrong, period.

    Any sort of computer science justification at this point is silly. What next? Remarks about how optimized this is using big O?

  • kbiel (unregistered) in reply to bstorer
    bstorer:
    Alex Papadimoulis:
    From GRG, the twelve problems with the last line ... (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom.

    I don't see how the range can ever be negative. abs() will return a value between 0 (inclusive) and infinity (okay, okay, MAX_INT). And the addition of one is a way of avoiding taking log(0). Admittedly, it's not a good way.

    You don't see how range could ever be negative? Since you brought it up, what is the result of adding 1 to MAX_INT?

  • Adrian Milliner (unregistered)

    Right, so I learned fortran for this :-)

    program how_many_bits
    
    integer :: num = 2022
    integer :: nbits = 0
    
    do
      if ( num == 0 ) exit
      nbits = nbits + 1
      num = num / 2
    end do
    
    print *,"nbits = ",nbits
    print *,"max", 2**nbits
    
    end program how_many_bits
    
  • mhendren (unregistered) in reply to Luci
    Luci:
    SnapShot:
    Does't that fail if range is negative?
    Anyways, when it's negative, how exactly do you want to determine the amount of bits used to represent the number?

    I mean, if you are to store a positive number along with a separate sign bit, you'd have to add at least one bit.

    However, if you want to know how many bits it takes to keep the binary representation of the integer intact, it will always be the number of bits in the used integer type (e.g. 16, 32, 64 or whatever), since the highest bit will always be 1 if it is negative.

    Captcha: kungfu... yeah!

    You can represent a negative number in fewer bits than the integer size. -1 can be represented as 3 using 2 bits (assuming you know there are 2 bits). Two's complement doesn't need a size barrier.

  • Welbog (unregistered) in reply to TGV

    Hahahah, oh wow.

    Please explain how the value 1 can be represented in 0 bits. I could use a laugh today.

    Capcha = gotcha. HOW DOES IT KNOW?

  • Frzr (unregistered)

    On 32-bit PowerPC processors, the CNTLZW opcode counts the leading zero bits in the given number. Subtract that from 32 and you have the answer.

  • charles bretana (unregistered) in reply to Alex Papadimoulis

    really all you need is

    int x = 1; while (number > 1) Begin x = x+1 Number = numer / 2 End return x

    Also, to comment 1 about floating point accuracy: Any numbering system used in a computer is limited in which numbers it can represent exactly, and which ones it cannot. Floating point represetations can represent some numbers exactly, for standar IEEE floating point representation, those that are some exact binary number raised to some positive or negative integral power of 2 (within certain limits) All the other numbers, must be represented by the closest number from the ones which can be represented.
    But this is true for any representation scheme. If you try to represent .8 using integers you have the same issue.

    Alex Papadimoulis:
    From GRG, the twelve problems with the last line ...

    (1) Floating point numbers can't give an exact result when the numbers involved are basically transcendental and can't be represented exactly in a finite number of bits, so using the log function is basically foobar.
    (3) "range" is a real number, which on most computers can't represent the full range of an integer. (4) Adding a fudge factor of "1" is a kludge, which helps a bit in some cases but also overestimates the answer by one in other cases. (5) That should be "1.0" anyway. (6) Coercing "range + 1" to real is unnecessary. (7) It should be "1.0D0" to ensure a double precision result. (8) In a few instances the argument to log() isnt constrained to be positive. Taking the log of a negateive or zero is undefined and results in a run-time error. Kaboom. (9) Dividing two inexact numbers will result in even more inexactitude. (10) Converting the result to log base two by dividing by log(2) is correct, but evaluating log base e of two will give an inexact result, as that number is transcendental. (11) And that should be "2.0" or better yet "2.0D0", or better yet, a variable holding the computed value. (12) Taking the int() of the real or double result is WRONG, as the result will sometimes be a few bits shy of the correct value. For example if you plug in "256" the answer is about 7.99981672, which when you int() it, becomes seven, which is off by one.

  • Draal (unregistered) in reply to WeatherGod
    WeatherGod:
    188 separate FORTRAN and C files? Gigabytes of really, really important data? I am going to take a wild stab and guess that this is GEMPAK?

    Sounds like AIPS. The source download is in the region of 90MiB, is written in FORTRAN and C, doesn't use Makefiles (some horrible mess with a gazillion shell scripts to do the linking and compilation), DOS 8.3 file names and the 'versioning' is based on release dates (always 31DECYY where YY is the two digit year number) Wooh!

    http://en.wikipedia.org/wiki/AIPS

    I bet there are some WTF's there.

  • Aysen (unregistered) in reply to msarnoff
    msarnoff:
    (13) The function can be rewritten with two assembly language instructions. Assuming an x86 machine and that the input value is in %eax:
    bsr %eax,%eax
    inc %eax
    Ahh, good to see there are still assembler users out there :)

    However, you need to fix this code to eg.

    bsr %eax, %eax
    jnz .not_zero
    xor %eax, %eax
    .not_zero:
    inc %eax
    because bsr gives undefined results if the source operand is zero (this code returns 1 if input == 0, can easily be changed to return 0 instead).
  • webrunner (unregistered) in reply to Remco Gerlich
    Remco Gerlich:
    I hate people whining about floating point being "inexact." Even though it is technically true, a IEEE 754 float or double is the closest possible approximation in the given number of bits. On modern hardware, the error for a double is never greater than 1 part in 2**53.

    "closest possible" means "inexact" when it's not possible to be exact.

    Remember, in binary, no .1 for you!

  • Jonathan Allrn (unregistered) in reply to Jeltz
    Jeltz:
    You shouldn't use floats at all. What is wrong with ordinary shifting to count bit?

    Expressed in C since I don't know fortran:

    nbits = 0;
    while(range != 0)
    {
       nbits++;
    }
    

    That's silly. The answer is always going to be in a limited range, so just write a static lookup table with all the values pre-computed. You will get much better performance, it is easy to translate to other languages, and it is trivial to bench test.

Leave a comment on “Twelve on a Line”

Log In or post as a guest

Replying to comment #:

« Return to Article