The Daily WTF: Curious Perversions in Information Technology

2008-12-29 Reply Admin

Hmm, a bunch of comments I read before have now gone missing. Has anyone else noticed that people's comments are getting deleted? That to me is the biggest WTF.

2008-12-29 Reply Admin

you must be joking right?

booth's algorithm was developed in 1951 and even the grade school shift and add requires are fewer instructions than just repeated addition...

-x

2008-12-29 Reply Admin

ciph3r:
You're all wrong! *shakes head* over-complicating! *mumbles to self*
dblVal = val * 2; for(int i=0;i<dblVal;i++) val--;

THERE!

val = -13

What now?

CAPTHA: ingenium

2008-12-29 Reply Admin

Heh, nice ! I've seen stuff like that before, people compete in awesome code like that

2008-12-29 Reply Admin

hcs:
for what little it may be worth: <snip> Some of the comparisons are redundant as the 0 is promoted to 0. anyway. I don't know what how much of this is processor-dependent, compiler-dependent, or specified by the C standard. And this is why I hate floating point. (also boo for bonus newlines)

So long story short there's actually a 0 and a -0 but they're both the same in arithmetic comparisons (which should be expected)

2008-12-29 Reply Admin

My preferred solution:

int foo = x;

foo = (foo^-1)+1;

joelkatz · 2008-12-29 Reply Admin

Vertu:
Why n*(-1), I think 0-n or simply -n would usually be faster and better operation.

Faster and better? If your compiler doesn't realize those are all the same operation, trade it in for a real compiler. You should use whichever is most natural or logical and count on the compiler to find the fastest.

I tested gcc with: return 0-i; return -i; return i*(-1); All three produced the same code, equally fast, equally good, 'negl %eax'. (And that's with all optimizations turned off and with them all on.)

joelkatz · 2008-12-29 Reply Admin

barvins:
Multiplication is slow. At least on older processors it used to be much slower than subtraction. Of course, it only matters if you are doing processor-intensive stuff like raycasting.

Right, but this is not multiplication. Just like 'i=i*0;' is not multiplication, it's just expressed using multiplication. Every sensible compiler knows that some operations are expressed in terms other than what they actually are. This is called "strength reduction" and every sensible compiler does it, even with most optimizations turned off.

2008-12-29 Reply Admin

OK, I know I'll get pilloried, but there have been some languages which, at some stage in their development, did not handle negative numbers in ways one might expect. Shouldn't be forgotten that those who write the language implementations are programmers, too, and they're as subject to dumb mistakes as much as the rest of us.

Kuba · 2008-12-29 Reply Admin

Steve H.:
Mcoder:
Also, for the others, there is no speed difference on integer addition and multiplication on consumer-grade hardware since the 486
Not if you program on small embedded processors that don't have a multiply instruction (or take much longer to multiply than divide).

I have no idea on what CPU division would be faster than multiplication. Care to elaborate? There are many CPUs which have no division instructions, or only have division-support instructions where you code division yourself, but most common operation(s) are "factored out" into the hardware. Those same CPUs can have pretty good multiplication instructions.

Think of it like this: none < mul < div, where < is set-contains operator.

2008-12-29 Reply Admin

Mcoder:
Also, for the others, there is no speed difference on integer addition and multiplication on consumer-grade hardware since the 486

you havn't a clue lol....

directly from the IA32 optimization manual, assuming 32bit integer instructions....

Core micro arch ( 06_ODH ) ADC/SBB reg,imm latency 1 IMUL r32 latency 4

4 times more cycles required to complete the uops of a MUL instruction over an add with carry and half the throughout.

step back a generation to the P4 land and you'll seen numbers more like 10 to 20 times slower but still half the throughout...interesting to see what they optimized...

-x

2008-12-29 Reply Admin

Steve H.:
Mcoder:
Also, for the others, there is no speed difference on integer addition and multiplication on consumer-grade hardware since the 486
Not if you program on small embedded processors that don't have a multiply instruction (or take much longer to multiply than divide).

consumer-grade hardware != small embedded processors

2008-12-29 Reply Admin

All operations are not equally costly. A substraction is cheaper than a multiplication (in real life, also. Try to multiply arbitrary long numbers and you should see how it is harder than substracting them). Depending on the processor, a multiplication may be 3-100 times slower. On a recent out-of-order processor, fewer processing units are able to handle multiplications that substractions. In the worst case, on a really simple processor, substraction is an instruction and multiplication is a jump to a subroutine.

2008-12-29 Reply Admin

David W.:
consumer-grade hardware != small embedded processors

I didn't realize small embedded processors were so expensive they were out of reach of consumers and thus only available for professionals.

I guess they're never embedded in consumer electronics.

2008-12-29 Reply Admin

joelkatz:
I tested gcc with: return 0-i; return -i; return i*(-1); All three produced the same code, equally fast, equally good

I tried with javac, default settings. Each produced different code.

return k*(-1);

0 iload_1 1 iconst_m1 2 imul 3 ireturn

return 0-k;

0 iconst_0 1 iload_1 2 isub 3 ireturn

return -k;

0 iload_1 1 ineg 2 ireturn

I agree with you that a good optimizing compiler could figure that they're all the same. Unfortunately people are using compilers that are not that good at optimizing.

2008-12-29 Reply Admin

DWalker59:
VibroKatana:
Shoot the difference between n and -n is usually 1 bit.

Shoot, it's not 1 bit, at least not in two's complement form. You'll find that ALL of the bits are flipped, and the sign bit is 1 instead of 0. I think.

Actually to negate a number in two's complement you flip all the bits and then add 1. For example, 0101 becomes 1010, then add 1 and get 1011. The first bit tells the sign.

2008-12-29 Reply Admin

because remember, noone ever develops for anything other than a PC...iPhones don't exist or anything...

2008-12-29 Reply Admin

I love you guys.

Seriously. I can come here every day and you make me laugh, with all your cute arguing over the "right" way to do something. Every day, without fail, no matter what the article is.

Seriously. Thanks.

2008-12-29 Reply Admin

lesle:
Well, multiplication is slower because it's actually repetitive addition, or at least it was 35 years ago on big iron. Likewise, division is repetitive substraction.
Has that really changed?

Since computers are basically advanced 'adding machines', this takes place very quickly.

To multiply by 2, you start with 0, add the number, and add it again, taking 2 'cycles'.

To multiply by 1000, you start with 0, and add the number to it 1000 times over, taking 1000 'cycles' (this is called 'unrolling')

The really clever bit is when you multiply by a negative number. Since the number of cycles is equal to the number you are multiplying by, it's obvious that it takes a negative number of cycles, so the result is available before the negative multiplication instuction. However, the CPU hides the result in 'cache' until after the code has reached the point where the negative multiplication was requested, imagine the difficulty debugging otherwise! (this is called 'reordering')

2008-12-30 Reply Admin

(x^-1)+1 = 1/x + 1 = (1+x)/x != -x

??

2008-12-30 Reply Admin

My above comment is in reply to:

taiki:
My preferred solution:
int foo = x;

foo = (foo^-1)+1;

random_garbage · 2008-12-30 Reply Admin

Mitch:
(x^-1)+1 = 1/x + 1 = (1+x)/x != -x
??

I suspect he's using ^ as the XOR operator...

joelkatz · 2008-12-30 Reply Admin

Mitch:
(x^-1)+1 = 1/x + 1 = (1+x)/x != -x
??

This was a mistake, they meant 1+~x You see, ~x gives you the one's complement of x. 1+~x gives you the two's complement of x, which is -x on every modern system I know of.

shill · 2008-12-30 Reply Admin

NEG EBX

simple as pie

2008-12-30 Reply Admin

Zatanix:
i really doubt that multiplying by -1 is any more work than using unary minus on any modern compiler. Whether it is more readable depends on the situation, but yeah i agree that usually unary minus is prettier

Try it and see. /FAcs in Visual Studio, -S in gcc.

bdew · 2008-12-30 Reply Admin

sf:
Well for one thing negation usually means change of sign, so 0-n would not work unless you knew n was always positive.

WTF?

2008-12-30 Reply Admin

Vertu:
David W.:
consumer-grade hardware != small embedded processors
I didn't realize small embedded processors were so expensive they were out of reach of consumers and thus only available for professionals.
I guess they're never embedded in consumer electronics.

As I've understood things programming on embedded processors is quite different the programming on PCs as the embedded processors have much more limited resources in their disposal forcing the programmer to use all kinds of optimizations. In that sense small embedded processors really are not equal to consumer-grade hardware.

2008-12-30 Reply Admin

That's ones-complement, not negation.

2008-12-30 Reply Admin

random_garbage:
Mitch:
(x^-1)+1 = 1/x + 1 = (1+x)/x != -x
??
I suspect he's using ^ as the XOR operator...

Of course, thats fair enough then.

2008-12-30 Reply Admin

n * (-1) wastes CPU time. It requires at least one multiplication and one LOAD operation (unless compiler is smart).

2008-12-30 Reply Admin

John Muller:
To multiply by 2, you start with 0, add the number, and add it again, taking 2 'cycles'.
To multiply by 1000, you start with 0, and add the number to it 1000 times over, taking 1000 'cycles' (this is called 'unrolling')

I knew that. This is why I always make sure to factorise the operands in a multiplication first. That way you can hand-optimise the algorithm, e.g. instead of the 1000 cycles, you can cut it down to two multiplications by 500 and add the result to itself, taking only 501 cycles. You can continue this and get the complete multiplication optimised a lot.

John Muller:
The really clever bit is when you multiply by a negative number. Since the number of cycles is equal to the number you are multiplying by, it's obvious that it takes a negative number of cycles, so the result is available before the negative multiplication instuction. However, the CPU hides the result in 'cache' until after the code has reached the point where the negative multiplication was requested, imagine the difficulty debugging otherwise! (this is called 'reordering')

Thanks for this explanation. I always wondered why the CPU has a cache. I suppose this also explains that huge metal thing on top of my CPU. That must be a protective layer, avoiding time dillation.

Channel6 · 2008-12-30 Reply Admin

John Muller:
lesle:
Well, multiplication is slower because it's actually repetitive addition, or at least it was 35 years ago on big iron. Likewise, division is repetitive substraction.
Has that really changed?

Since computers are basically advanced 'adding machines', this takes place very quickly.

To multiply by 2, you start with 0, add the number, and add it again, taking 2 'cycles'.

To multiply by 1000, you start with 0, and add the number to it 1000 times over, taking 1000 'cycles' (this is called 'unrolling')

The really clever bit is when you multiply by a negative number. Since the number of cycles is equal to the number you are multiplying by, it's obvious that it takes a negative number of cycles, so the result is available before the negative multiplication instuction. However, the CPU hides the result in 'cache' until after the code has reached the point where the negative multiplication was requested, imagine the difficulty debugging otherwise! (this is called 'reordering')

Sounds like FTL Quantum computing, where results are delivered before you even request them!

joelkatz · 2008-12-30 Reply Admin

DWalker59:
VibroKatana:
Shoot the difference between n and -n is usually 1 bit.

Shoot, it's not 1 bit, at least not in two's complement form. You'll find that ALL of the bits are flipped, and the sign bit is 1 instead of 0. I think.

The difference between 99 and 100 and is 1 unit, though all the digits are different.

ZaM · 2008-12-30 Reply Admin

If you read carefully the 1st function returns -Abs(dblAmount)

2008-12-30 Reply Admin

I stumbled across this beauty a while ago:

int sign = vel/fabs(vel);

How long before vel was 0 do you reckon? :-)

2008-12-30 Reply Admin

I'm not sure about the latest processors, but even on the pentium (1993-1996) integer multiplication was about 11 clock cycles, while addition is only 1.

See ftp://download.intel.com/design/pentium/manuals/24143004.pdf, page 25-234

2008-12-30 Reply Admin

Mod Vinson:
John Muller:
To multiply by 2, you start with 0, add the number, and add it again, taking 2 'cycles'.
To multiply by 1000, you start with 0, and add the number to it 1000 times over, taking 1000 'cycles' (this is called 'unrolling')
I knew that. This is why I always make sure to factorise the operands in a multiplication first. That way you can hand-optimise the algorithm, e.g. instead of the 1000 cycles, you can cut it down to two multiplications by 500 and add the result to itself, taking only 501 cycles. You can continue this and get the complete multiplication optimised a lot.

Maybe I'm just feeding trolls however:

b even a*b = (2*a)*(b/2)
b odd  a*b = a + 2*a*b/2

int c = 0;
while (b != 0) {
    if (b%2)
        c += a;
     a <<= 1;
     b >>= 1;
     
}
return c + a;

joelkatz · 2008-12-30 Reply Admin

mykelyk:

Maybe I'm just feeding trolls however:

b even a*b = (2*a)*(b/2)
b odd  a*b = a + 2*a*b/2

int c = 0;
while (b != 0) {
    if (b%2)
        c += a;
     a <<= 1;
     b >>= 1;
     
}
return c + a;

This is gratuitously inefficient, but it works. For one thing, it has large numbers of needless dependencies. There is no need to wait for the results of one shift before performing the next.

It does demonstrate a typical binary multiplication algorithm though, which is the first step to understanding one. A modern CPU will overlap many, if not all, of these operations. For example, a Pentium2 (I think) will split the input into two 16-bit parts and combine the results.

Kind of like doing this, but in binary:

36 * 24 = (310+6) * (2010+4) = 310210 + 3104 + 6210 + 64 = 100*(32) + 10(34+62) + 64 = Step 1: 32, 34, 62, 64 Step 2: 34+62 64 Step 3: 100*(32) + 10(34+62) + (6*4) (Multiplying by 10 or 100 is free, it's just wiring the digits off-by-one or two from step to step.)

If this was splitting a 32-bit number into two 16-bit numbers, it would turn your 64 shifts and 0-32 adds into: Step 1: 32 shifts, 0-16 adds. (These count since they are one-by-one) Step 2: 1 add Step 3: 2 adds (Shifts here are free, they're just wiring) This totals 32 shifts, 3-19 adds. A huge savings over 64 shifts and 0-32 adds.

Of course, the Pentium2 actually had a highly optimized 16x16 bit multiplier, so it didn't need to do the 32 shifts in step 1 one at a time. Multiply took about 50 cycles on an 8086, doing it more or less the way you showed. It took four cycles on the Pentium2, and it even got the right answer ... well, most of the time.

2008-12-30 Reply Admin

draeath:
More examples of what happens when you pay per line.

I realize that "pay per line" is a standard joke, but in the history of software development has ANYONE really been paid by the line? I've never seen this in practice and I've been doing software for 2 decades.

2008-12-30 Reply Admin

Those programs are so overcomplicated, I find it difficult to make my brain follow their logic.

Still, I was the same way when I was a kid.

When I was playing around on the Commodore 128, I wanted a little prog to tell the user whether a number was odd or even. The computer didn't have a function for that, so I wrote one that mirrored the sort of logic I would use to figure out the answer. It had a DATA statement containing "2", "4", "6", "8", "0". It would have the user input a number, then convert that number to a string, then try comparing the last character against the list in the data statements. Well, you get the idea.

So then I showed it to my dad. I said, "Look at this. Can you believe this!? I had to go through this much effort just to find out if a number is odd or even." My dad asked me if there was a function in the computer to determine if a given number is divisible by another number. He said if I could do that then I could check for divisibility by 2 and thereby get the even/odd answer I sought. ( Answer: for the C128, use comparison (int(num/2) == num/2) )

It was at that moment that I had the epiphany that math would play a large role in computer programming. Ok, so I was like 12 so I was allowed to be naive. The thing that keeps me up at night is knowing that the goof balls that write the code that gets onto this site haven't yet learned these things.

-- Furry cows moo and decompress.

2008-12-30 Reply Admin

I don't know that anyone's ever been paid-per-line, but I do remember hearing Steve Balmer once refer to "k-locs of code". I don't know wtf a k-loc is, but I gathered it was a measurement of lines of code. I further gathered that it was used as a rough measure of an individual's or department's productivity.

So, like, I doubt they were paid per line of code, but in a general fuzzy sense, the amount of code they produced over a fixed amount of time may have been used to gauge whether or not they deserved a raise.

-- Furry cows moo and decompress.

2008-12-30 Reply Admin

Wyrd:
Those programs are so overcomplicated, I find it difficult to make my brain follow their logic.
Still, I was the same way when I was a kid.

When I was playing around on the Commodore 128, I wanted a little prog to tell the user whether a number was odd or even. The computer didn't have a function for that, so I wrote one that mirrored the sort of logic I would use to figure out the answer. It had a DATA statement containing "2", "4", "6", "8", "0". It would have the user input a number, then convert that number to a string, then try comparing the last character against the list in the data statements. Well, you get the idea.

So then I showed it to my dad. I said, "Look at this. Can you believe this!? I had to go through this much effort just to find out if a number is odd or even." My dad asked me if there was a function in the computer to determine if a given number is divisible by another number. He said if I could do that then I could check for divisibility by 2 and thereby get the even/odd answer I sought. ( Answer: for the C128, use comparison (int(num/2) == num/2) )

It was at that moment that I had the epiphany that math would play a large role in computer programming. Ok, so I was like 12 so I was allowed to be naive. The thing that keeps me up at night is knowing that the goof balls that write the code that gets onto this site haven't yet learned these things.

-- Furry cows moo and decompress.

The real WTF is that someone actually bought a Commodore 128! (Disclaimer: the C64 was my first computer)

2008-12-30 Reply Admin

Wyrd:
I don't know that anyone's ever been paid-per-line, but I do remember hearing Steve Balmer once refer to "k-locs of code". I don't know wtf a k-loc is, but I gathered it was a measurement of lines of code. I further gathered that it was used as a rough measure of an individual's or department's productivity.
So, like, I doubt they were paid per line of code, but in a general fuzzy sense, the amount of code they produced over a fixed amount of time may have been used to gauge whether or not they deserved a raise.

-- Furry cows moo and decompress.

K-loc == Thousands of Line of Code. It is measure of how large a system is, and can be used for rough estimates of how long a development effort will take. In general, it is not really used to measure productivity because people realize that statistic is next to meaningless

2008-12-30 Reply Admin

John Muller:
lesle:
Well, multiplication is slower because it's actually repetitive addition, or at least it was 35 years ago on big iron. Likewise, division is repetitive substraction.
Has that really changed?

Since computers are basically advanced 'adding machines', this takes place very quickly.

To multiply by 2, you start with 0, add the number, and add it again, taking 2 'cycles'.

To multiply by 1000, you start with 0, and add the number to it 1000 times over, taking 1000 'cycles' (this is called 'unrolling')

The really clever bit is when you multiply by a negative number. Since the number of cycles is equal to the number you are multiplying by, it's obvious that it takes a negative number of cycles, so the result is available before the negative multiplication instuction. However, the CPU hides the result in 'cache' until after the code has reached the point where the negative multiplication was requested, imagine the difficulty debugging otherwise! (this is called 'reordering')

computers are not adding machines, a modern intel CPU has hundreds of opcodes only a few of which have anything to do with adding.

hardware multiplication units are normally radix-4 or radix-8 booth algorithm based. this reduces the number of partial products greatly thus reducing propagation delay (makes it run faster). for fix length input (i.e. 2 32bit inputs) multiplication time is O(1) and time is dependent on propagation delay through the circuit.

software implementations of multiplication algorithms absolutely do NOT just repeatedly add, that would be a solid WTF in itself, even grade school shift and add is many times faster. booths algorithm can be implemented in software or for arbitrarily large numbers something like Sch¨nhage-Strassen algorithm should be use as it runs in O(n log n log log n) time

-x

2008-12-30 Reply Admin

Now that's just ridiculous. I wouldn't even begin to compare that to a programmer who wants to overengineer because they want to have the most amazing creation on the face of their IT department. And who is to judge when something is overengineered? At my company, there is a wonderful symptom where people might expect you to roll out something dirty and quick, which might be perfectly doable (and perhaps boring, but that's part of the game sometimes), HOWEVER... Two months later they are trying to bring multiple different accounts under the umbrella of this "dirty and quick" app, the DB design begins to suffer horribly, and ergo a spaghetti mess. So, sometimes I would say overengineering is merely a deragatory term for planning ahead. Then again, I'll defer to company policy of getting work done on time, so it looks like no "overengineering" for me..

As for the the article-- That's just ridiculous! Beyond stupidity. Paid by the line, perhaps...

2008-12-30 Reply Admin

OH MY FUCKING GOD - Please tell me they don't get money for that code...

2008-12-30 Reply Admin

People have a lot of misconceptions about processor math. Generally speaking some ops are slower than others. This table was gleaned from reading architecture manuals from Intel, AMD, MIPS, and HP PA-RISC in the course of my owrk over the last 10 years.

Generally For double precision floating points ops processors made before this year:

Add: 1 cycle Subtract: 1 cycle Multiply: 1 cycle Divide: 32 cycles using hardware iterative radix 2 division +FP pipe stall on x86 (No stall on newer PA-Risc) Integer Modulo: 32 cycles (uses Floating Point hardware) + FP pipe stall on x86 Square Root: 32 cycles (Executed using Divide Unit) +FP pipe stall on x86 Trig Functions: Varies up to 100-200 cycles

Some Intel processors released in the last year use higher radix division algorithms and finish FP double divide in 16 cycles.

Don't believe internet pundits on this topic if it matters to you. Go read the processor architecture manual and read the assembler being generated by your compiler and work it out doing timing tests if there are questions.

2008-12-31 Reply Admin

Actually most CPUs have a machine language level command to invert a number.

-n is probably the fastest.
If the compile is smart enough x * -1 may be just as fast. But that entirely depends on the compiler realizing what the function is really doing. "-n" is obvious:

load register A with n negate register A

v/s

load register A with n load register B with -1 multiply register A*B store result reg A

2008-12-31 Reply Admin

India rulezzz!

Never do programming under strong sun and high humitidy!

Zatanix · 2008-12-31 Reply Admin

LOL! Never knew this! Amma optimize all my code and all my company's code to always use negative numbers right now! They'll love me! :D

Classic WTF: The Challenges of Negation

Leave a comment on “Classic WTF: The Challenges of Negation”