- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Hmm, a bunch of comments I read before have now gone missing. Has anyone else noticed that people's comments are getting deleted? That to me is the biggest WTF.
Admin
you must be joking right?
booth's algorithm was developed in 1951 and even the grade school shift and add requires are fewer instructions than just repeated addition...
-x
Admin
What now?
CAPTHA: ingenium
Admin
Heh, nice ! I've seen stuff like that before, people compete in awesome code like that
Admin
So long story short there's actually a 0 and a -0 but they're both the same in arithmetic comparisons (which should be expected)
Admin
My preferred solution:
int foo = x;
foo = (foo^-1)+1;
Admin
I tested gcc with: return 0-i; return -i; return i*(-1); All three produced the same code, equally fast, equally good, 'negl %eax'. (And that's with all optimizations turned off and with them all on.)
Admin
Admin
OK, I know I'll get pilloried, but there have been some languages which, at some stage in their development, did not handle negative numbers in ways one might expect. Shouldn't be forgotten that those who write the language implementations are programmers, too, and they're as subject to dumb mistakes as much as the rest of us.
Admin
Think of it like this: none < mul < div, where < is set-contains operator.
Admin
you havn't a clue lol....
directly from the IA32 optimization manual, assuming 32bit integer instructions....
Core micro arch ( 06_ODH ) ADC/SBB reg,imm latency 1 IMUL r32 latency 4
4 times more cycles required to complete the uops of a MUL instruction over an add with carry and half the throughout.
step back a generation to the P4 land and you'll seen numbers more like 10 to 20 times slower but still half the throughout...interesting to see what they optimized...
-x
Admin
consumer-grade hardware != small embedded processors
Admin
All operations are not equally costly. A substraction is cheaper than a multiplication (in real life, also. Try to multiply arbitrary long numbers and you should see how it is harder than substracting them). Depending on the processor, a multiplication may be 3-100 times slower. On a recent out-of-order processor, fewer processing units are able to handle multiplications that substractions. In the worst case, on a really simple processor, substraction is an instruction and multiplication is a jump to a subroutine.
Admin
I guess they're never embedded in consumer electronics.
Admin
return k*(-1);
0 iload_1 1 iconst_m1 2 imul 3 ireturn
return 0-k;
0 iconst_0 1 iload_1 2 isub 3 ireturn
return -k;
0 iload_1 1 ineg 2 ireturn
I agree with you that a good optimizing compiler could figure that they're all the same. Unfortunately people are using compilers that are not that good at optimizing.
Admin
Actually to negate a number in two's complement you flip all the bits and then add 1. For example, 0101 becomes 1010, then add 1 and get 1011. The first bit tells the sign.
Admin
because remember, noone ever develops for anything other than a PC...iPhones don't exist or anything...
Admin
I love you guys.
Seriously. I can come here every day and you make me laugh, with all your cute arguing over the "right" way to do something. Every day, without fail, no matter what the article is.
Seriously. Thanks.
Admin
Since computers are basically advanced 'adding machines', this takes place very quickly.
To multiply by 2, you start with 0, add the number, and add it again, taking 2 'cycles'.
To multiply by 1000, you start with 0, and add the number to it 1000 times over, taking 1000 'cycles' (this is called 'unrolling')
The really clever bit is when you multiply by a negative number. Since the number of cycles is equal to the number you are multiplying by, it's obvious that it takes a negative number of cycles, so the result is available before the negative multiplication instuction. However, the CPU hides the result in 'cache' until after the code has reached the point where the negative multiplication was requested, imagine the difficulty debugging otherwise! (this is called 'reordering')
Admin
(x^-1)+1 = 1/x + 1 = (1+x)/x != -x
??
Admin
My above comment is in reply to:
Admin
Admin
Admin
NEG EBX
simple as pie
Admin
Try it and see. /FAcs in Visual Studio, -S in gcc.
Admin
WTF?
Admin
As I've understood things programming on embedded processors is quite different the programming on PCs as the embedded processors have much more limited resources in their disposal forcing the programmer to use all kinds of optimizations. In that sense small embedded processors really are not equal to consumer-grade hardware.
Admin
That's ones-complement, not negation.
Admin
Of course, thats fair enough then.
Admin
n * (-1) wastes CPU time. It requires at least one multiplication and one LOAD operation (unless compiler is smart).
Admin
Admin
Sounds like FTL Quantum computing, where results are delivered before you even request them!
Admin
The difference between 99 and 100 and is 1 unit, though all the digits are different.
Admin
If you read carefully the 1st function returns -Abs(dblAmount)
Admin
I stumbled across this beauty a while ago:
int sign = vel/fabs(vel);
How long before vel was 0 do you reckon? :-)
Admin
I'm not sure about the latest processors, but even on the pentium (1993-1996) integer multiplication was about 11 clock cycles, while addition is only 1.
See ftp://download.intel.com/design/pentium/manuals/24143004.pdf, page 25-234
Admin
Maybe I'm just feeding trolls however:
Admin
This is gratuitously inefficient, but it works. For one thing, it has large numbers of needless dependencies. There is no need to wait for the results of one shift before performing the next.
It does demonstrate a typical binary multiplication algorithm though, which is the first step to understanding one. A modern CPU will overlap many, if not all, of these operations. For example, a Pentium2 (I think) will split the input into two 16-bit parts and combine the results.
Kind of like doing this, but in binary:
36 * 24 = (310+6) * (2010+4) = 310210 + 3104 + 6210 + 64 = 100*(32) + 10(34+62) + 64 = Step 1: 32, 34, 62, 64 Step 2: 34+62 64 Step 3: 100*(32) + 10(34+62) + (6*4) (Multiplying by 10 or 100 is free, it's just wiring the digits off-by-one or two from step to step.)
If this was splitting a 32-bit number into two 16-bit numbers, it would turn your 64 shifts and 0-32 adds into: Step 1: 32 shifts, 0-16 adds. (These count since they are one-by-one) Step 2: 1 add Step 3: 2 adds (Shifts here are free, they're just wiring) This totals 32 shifts, 3-19 adds. A huge savings over 64 shifts and 0-32 adds.
Of course, the Pentium2 actually had a highly optimized 16x16 bit multiplier, so it didn't need to do the 32 shifts in step 1 one at a time. Multiply took about 50 cycles on an 8086, doing it more or less the way you showed. It took four cycles on the Pentium2, and it even got the right answer ... well, most of the time.
Admin
I realize that "pay per line" is a standard joke, but in the history of software development has ANYONE really been paid by the line? I've never seen this in practice and I've been doing software for 2 decades.
Admin
Those programs are so overcomplicated, I find it difficult to make my brain follow their logic.
Still, I was the same way when I was a kid.
When I was playing around on the Commodore 128, I wanted a little prog to tell the user whether a number was odd or even. The computer didn't have a function for that, so I wrote one that mirrored the sort of logic I would use to figure out the answer. It had a DATA statement containing "2", "4", "6", "8", "0". It would have the user input a number, then convert that number to a string, then try comparing the last character against the list in the data statements. Well, you get the idea.
So then I showed it to my dad. I said, "Look at this. Can you believe this!? I had to go through this much effort just to find out if a number is odd or even." My dad asked me if there was a function in the computer to determine if a given number is divisible by another number. He said if I could do that then I could check for divisibility by 2 and thereby get the even/odd answer I sought. ( Answer: for the C128, use comparison (int(num/2) == num/2) )
It was at that moment that I had the epiphany that math would play a large role in computer programming. Ok, so I was like 12 so I was allowed to be naive. The thing that keeps me up at night is knowing that the goof balls that write the code that gets onto this site haven't yet learned these things.
-- Furry cows moo and decompress.
Admin
I don't know that anyone's ever been paid-per-line, but I do remember hearing Steve Balmer once refer to "k-locs of code". I don't know wtf a k-loc is, but I gathered it was a measurement of lines of code. I further gathered that it was used as a rough measure of an individual's or department's productivity.
So, like, I doubt they were paid per line of code, but in a general fuzzy sense, the amount of code they produced over a fixed amount of time may have been used to gauge whether or not they deserved a raise.
-- Furry cows moo and decompress.
Admin
Admin
Admin
computers are not adding machines, a modern intel CPU has hundreds of opcodes only a few of which have anything to do with adding.
hardware multiplication units are normally radix-4 or radix-8 booth algorithm based. this reduces the number of partial products greatly thus reducing propagation delay (makes it run faster). for fix length input (i.e. 2 32bit inputs) multiplication time is O(1) and time is dependent on propagation delay through the circuit.
software implementations of multiplication algorithms absolutely do NOT just repeatedly add, that would be a solid WTF in itself, even grade school shift and add is many times faster. booths algorithm can be implemented in software or for arbitrarily large numbers something like Sch¨nhage-Strassen algorithm should be use as it runs in O(n log n log log n) time
-x
Admin
Now that's just ridiculous. I wouldn't even begin to compare that to a programmer who wants to overengineer because they want to have the most amazing creation on the face of their IT department. And who is to judge when something is overengineered? At my company, there is a wonderful symptom where people might expect you to roll out something dirty and quick, which might be perfectly doable (and perhaps boring, but that's part of the game sometimes), HOWEVER... Two months later they are trying to bring multiple different accounts under the umbrella of this "dirty and quick" app, the DB design begins to suffer horribly, and ergo a spaghetti mess. So, sometimes I would say overengineering is merely a deragatory term for planning ahead. Then again, I'll defer to company policy of getting work done on time, so it looks like no "overengineering" for me..
As for the the article-- That's just ridiculous! Beyond stupidity. Paid by the line, perhaps...
Admin
OH MY FUCKING GOD - Please tell me they don't get money for that code...
Admin
People have a lot of misconceptions about processor math. Generally speaking some ops are slower than others. This table was gleaned from reading architecture manuals from Intel, AMD, MIPS, and HP PA-RISC in the course of my owrk over the last 10 years.
Generally For double precision floating points ops processors made before this year:
Add: 1 cycle Subtract: 1 cycle Multiply: 1 cycle Divide: 32 cycles using hardware iterative radix 2 division +FP pipe stall on x86 (No stall on newer PA-Risc) Integer Modulo: 32 cycles (uses Floating Point hardware) + FP pipe stall on x86 Square Root: 32 cycles (Executed using Divide Unit) +FP pipe stall on x86 Trig Functions: Varies up to 100-200 cycles
Some Intel processors released in the last year use higher radix division algorithms and finish FP double divide in 16 cycles.
Don't believe internet pundits on this topic if it matters to you. Go read the processor architecture manual and read the assembler being generated by your compiler and work it out doing timing tests if there are questions.
Admin
Actually most CPUs have a machine language level command to invert a number.
-n is probably the fastest.
If the compile is smart enough x * -1 may be just as fast. But that entirely depends on the compiler realizing what the function is really doing. "-n" is obvious:
load register A with n negate register A
v/s
load register A with n load register B with -1 multiply register A*B store result reg A
Admin
India rulezzz!
Never do programming under strong sun and high humitidy!
Admin
LOL! Never knew this! Amma optimize all my code and all my company's code to always use negative numbers right now! They'll love me! :D