Board index » delphi » ASM Opt.: XOR vs MOV vs INC/DEC

ASM Opt.: XOR vs MOV vs INC/DEC

Which is the fastest?

XOR, MOV or INC/DEC

I figure that XOR should be the fastest because
bit-twiddling is a lot faster than MOVing a
part of the memory or a still number to a register.

Correct me if I'm wrong.
Please, reply by MAIL as I don't visit the newsgroup
often.

Emil.
--
--------------------------------------------------------------
/------\/-----\/-----\/-\  /-\/------\/-----\/-----\/---\  /-\
\| /-\ || /-\ || /-\ || | /  /| /\/\ || /-\ || /-\ ||    \ | |
 | | | || \_/ || \_/ || |/ /_ | |||| || | | || | | || |\  \| |
 | | | || /-\ || /-\| | /-\  || |\/| || | | || | | || | \  \ |
/| \-/ || | | || | | \| | |  || |  | || \_/ || \_/ || |  \   |
\______/\_/ \_/\_/ \_/\_/ \__/\_/  \_/\_____/\_____/\_/   \__/
--------- http://connexus.apana.org.au/~mikuto/darky ---------

 

Re:ASM Opt.: XOR vs MOV vs INC/DEC


Quote
Emil Mikulic <darkm...@connexus.apana.org.au> wrote:
>Which is the fastest?

>XOR, MOV or INC/DEC

>I figure that XOR should be the fastest because
>bit-twiddling is a lot faster than MOVing a
>part of the memory or a still number to a register.

A quick look in the Turbo Assembler Quick Reference Guide shows that this would
depend on the CPU (86, 286, 386, 496, etc), the size of the operand (byte, word,
dword), and the type and location of the operand (register, memory, stack-frame,
etc).

I assume that XOR and MOV refer to clearing a register, so lets consider the
various ways to clear a register -

              486  386  286  86   (size)   example
MOV r8,im8     1    2    2    4      2     mov AL,0
MOV r16,im16   1    2    2    4      3     mov AX,0
no flags altered

XOR r8,r8      1    2    2    3      2     xor AL,AL
XOR r16,r16    1    2    2    3      2     xor,AX,AX
affects O,S,Z,A,P,C

SUB r8,r8      1    2    2    3      2     sub AL,AL
SUB r16,r16    1    2    2    3      2     sub AX,AX
affects O,S,Z,A,P,C

lea r16,m      1    2    2   10      4     lea AX,[DS:0]
no flags altered

I tossed the last one in for the fun of it.  

Anyway, the differences you assumed will only exist in the 8086.  All other
versions of the x86 require the same number of clock cycles across the board.

Actual speed is based upon a lot more.  The above times assume:
1) the instruction has been pre-fetched, decoded, and is ready for execution.
2) Bus cycles do not require wait states.
3) There are no bus HOLD requests delaying the processor.
4) No exceptions are detected during instruction execution.
5) Memory operands are aligned.

A series of fast executing (fewer than two cycles per opcode byte) instructions
can drain the prefetch queue and increase execution time.  With typical
instruction mixes, actual time will be within 5-10% of the sum of the individual
times.  Of course the 8k onboard cache of the 486 would tend to limit penalties
for fast executing instructions.

From the above you can see that most of the time you spend worrying about which
instructions execute faster would be better spent on other activities.  Even the
1-cycle difference in the 8086 XOR and MOV instructions can easily be canceled
when used with too many fast instructions that drain the 4-byte 8088, 6-byte
8086, ... prefetch queue.  Then again, 1 single interrupt can upset the delicate
balance of things.

So when you see myself or others using "xor AL,AL" or "sub AL,AL" to zero a
register it may simply be from many years of habit, or it could really be an
efficient way to clear the register and the carry flag (e.g. mov AL,0; clc).

Quote

>Correct me if I'm wrong.
>Please, reply by MAIL as I don't visit the newsgroup
>often.

I think not.  Normally I would have cc'd a copy, but the selfish nature of this
request got the better of me today.  Must be a touch of indigestion. ;-)

    ....red

Re:ASM Opt.: XOR vs MOV vs INC/DEC


Quote
R.E.Donais wrote:

> Emil Mikulic <darkm...@connexus.apana.org.au> wrote:

> >Which is the fastest?

> >XOR, MOV or INC/DEC

> >I figure that XOR should be the fastest because
> >bit-twiddling is a lot faster than MOVing a
> >part of the memory or a still number to a register.

> A quick look in the Turbo Assembler Quick Reference Guide shows that this would
> depend on the CPU (86, 286, 386, 496, etc), the size of the operand (byte, word,
> dword), and the type and location of the operand (register, memory, stack-frame,
> etc).

Don't assume that I have a TASM reference guide because I might be using
GNU Pascal.

Quote

> I assume that XOR and MOV refer to clearing a register, so lets consider the
> various ways to clear a register -

no, I know that it's better o use XOR AX,AX than MOV AX,0
I mean switching between two jnown values.
I.e.

MOV DX,3c8h ; dx = 3c8
...
XOR DX,1    ; dx = 3c9
...

Quote
>               486  386  286  86   (size)   example
> MOV r8,im8     1    2    2    4      2     mov AL,0
> (SNIP)
> Actual speed is based upon a lot more.  The above times assume:
> 1) the instruction has been pre-fetched, decoded, and is ready for execution.
> 2) Bus cycles do not require wait states.
> 3) There are no bus HOLD requests delaying the processor.
> 4) No exceptions are detected during instruction execution.
> 5) Memory operands are aligned.

Ouch.

Quote

> >Correct me if I'm wrong.
> >Please, reply by MAIL as I don't visit the newsgroup
> >often.

> I think not.  Normally I would have cc'd a copy, but the selfish nature of this
> request got the better of me today.  Must be a touch of indigestion. ;-)

>     ....red

I got it anyways, what do you mean selfish?
You just hit the RE:BOTH button instead of the RE:NEWS button
(if you're lucky ebough to have netscape <grin>)

Emil.
--
--------------------------------------------------------------
/------\/-----\/-----\/-\  /-\/------\/-----\/-----\/---\  /-\
\| /-\ || /-\ || /-\ || | /  /| /\/\ || /-\ || /-\ ||    \ | |
 | | | || \_/ || \_/ || |/ /_ | |||| || | | || | | || |\  \| |
 | | | || /-\ || /-\| | /-\  || |\/| || | | || | | || | \  \ |
/| \-/ || | | || | | \| | |  || |  | || \_/ || \_/ || |  \   |
\______/\_/ \_/\_/ \_/\_/ \__/\_/  \_/\_____/\_____/\_/   \__/
--------- http://connexus.apana.org.au/~mikuto/darky ---------

Re:ASM Opt.: XOR vs MOV vs INC/DEC


Quote
Emil Mikulic <darkm...@connexus.apana.org.au> wrote:
>R.E.Donais wrote:

>> Emil Mikulic <darkm...@connexus.apana.org.au> wrote:

>> >Which is the fastest?

>> >XOR, MOV or INC/DEC

>> >I figure that XOR should be the fastest because
>> >bit-twiddling is a lot faster than MOVing a
>> >part of the memory or a still number to a register.

Typically an instruction like XOR r16,im16 feeds the register contents and the
immediate operand to the alu (arithmetic/logic unit) and routes the result back
to the register.  A mov r16,im16 instruction could route the immediate value
directly to the register.  My guess is that it would still pass through  the
alu.  INC/DEC would typically pass the register contents to the alu with a
"hard-wired" 1 and route the result back to the register.  Since all information
is "on-chip", it would probably be the fastest.  By the same token, I would
guess xor to be the slowest.

Quote
>> A quick look in the Turbo Assembler Quick Reference Guide shows that this would
>> depend on the CPU (86, 286, 386, 496, etc), the size of the operand (byte, word,
>> dword), and the type and location of the operand (register, memory, stack-frame,
>> etc).

>Don't assume that I have a TASM reference guide because I might be using
>GNU Pascal.

I didn't.  If I had, I would have told you to look it up rather than write the
relevant information. :-)

Quote
>> I assume that XOR and MOV refer to clearing a register, so lets consider the
>> various ways to clear a register -

>no, I know that it's better o use XOR AX,AX than MOV AX,0
>I mean switching between two jnown values.
>I.e.

>MOV DX,3c8h ; dx = 3c8
>...
>XOR DX,1    ; dx = 3c9
>...

I would consider clarity first.  Loading a constant and incrementing and
decrementing are somewhat obvious, while using xor hides the purpose.  If xor
did prove to be the best, I would add a comment to explain what was really
happening.  Something like you did "dx = 3c9", or maybe "DX was 3C8, now 3C9"

You might not think it necessary if you're the only one to see the code and are
comfortable with the method, but before you decide, think about how easy or
difficult things will be when you come back to the code some 5, 10, or 20 years
from now.

The following table lists AX and AL separately since these registers often use a
different machine instruction.

Quote
>>               486  386  286  86   (size)

   mov AX,im16    1    2    2    4      3
   add AX,im16    1    2    3    4      3
   xor AX,im16    1    2    3    4      3
   inc/dec AX     1    2    2    3      1

   mov r16,im16   1    2    3    4      3
   xor r16,im8    1    2    3    4      3
   xor r16,im16   1    2    3    4      4
   add r16,im8    1    2    3    4      3
   add r16,im16   1    2    3    4      4
   inc/dec r16    1    2    2    3      1

   mov AL,im8     1    2    2    4      2
   add AL,im8     1    2    3    4      2
   xor AL,im8     1    2    3    4      3
   inc/dec AL     1    2    2    3      2

   mov r8,im8     1    2    3    4      2
   xor r8,im8     1    2    3    4      3
   add r8,im8     1    2    3    4      3
   inc/dec r8     1    2    2    3      2

Quote
>> Actual speed is based upon a lot more.  The above times assume:
>> 1) the instruction has been pre-fetched, decoded, and is ready for execution.
>> 2) Bus cycles do not require wait states.
>> 3) There are no bus HOLD requests delaying the processor.
>> 4) No exceptions are detected during instruction execution.
>> 5) Memory operands are aligned.

>Ouch.

That's just about the reaction I had.  

FWIW, I had a real mode program execute 5% faster on a 386 when routines pushed
and popped segment registers (2-bytes, 9-cycles) rather than load them through a
general purpose register (4-bytes, 4-cycles).

Quote

>> >Correct me if I'm wrong.
>> >Please, reply by MAIL as I don't visit the newsgroup
>> >often.

>> I think not.  Normally I would have cc'd a copy, but the selfish nature of this
>> request got the better of me today.  Must be a touch of indigestion. ;-)

>>     ....red
>I got it anyways, what do you mean selfish?
>You just hit the RE:BOTH button instead of the RE:NEWS button
>(if you're lucky ebough to have netscape <grin>)

It has nothing to do with the ease or difficulty involved. It was more the mood
I was in and the way the request was worded.  It probably wouldn't have got
under my skin if you had asked more along the lines of having the reply posted
and emailed.  Interesting questions should be answered to the group so that
everyone might benefit.  

To me, "not visiting", seems to imply that you aren't willing to put that much
effort into obtaining an answer, are unappreciative of the effort spent on your
behalf, and hints at no one here has anything worthwhile to say. :-)

    ....red
cc

Re:ASM Opt.: XOR vs MOV vs INC/DEC


On Wed, 05 Mar 1997 17:02:01 +1000, Emil Mikulic

Quote
<darkm...@connexus.apana.org.au> wrote:
>Which is the fastest?

>XOR, MOV or INC/DEC

>I figure that XOR should be the fastest because
>bit-twiddling is a lot faster than MOVing a
>part of the memory or a still number to a register.

>Correct me if I'm wrong.
>Please, reply by MAIL as I don't visit the newsgroup
>often.

Xor is fastest. Actually on anything above a 286 both "mov" and "Xor"
use only 1 cycle.

_____________________________ __ _  _
Norsk / Hoaxers. ( kb...@sn.no & http://home.sn.no/~kblix )
- -
How do you shoot the devil in the back? What if you miss? -Verbal Kint
---------------------------- -- -  -

Re:ASM Opt.: XOR vs MOV vs INC/DEC


Quote
Emil Mikulic (darkm...@connexus.apana.org.au) wrote:

> Don't assume that I have a TASM reference guide because I might be using
> GNU Pascal.

You should get HelpPC from Garbo that has a listing of all the assembler
opcodes with size, timing, modified flags. And there are probably more
free sources around.

Quote
> I got it anyways, what do you mean selfish?
> You just hit the RE:BOTH button instead of the RE:NEWS button
> (if you're lucky ebough to have netscape <grin>)

To put it simple:

There are more people here who'll be interested in this topic.
There are more topics here you'll be interested in if you read the group.

(That means, if all people who ask here first read the group for awhile,
there'd be much less questions, esp. FAQs.)

BTW: Nice to see that you use your real name now. Bur please cut down
your signature to max. 4 lines. I think most people here don't like
Ascii "graphics", esp. when they see them in every posting.

--
Frank Heckenbach, Erlangen, Germany
heck...@mi.uni-erlangen.de
Turbo Pascal:   http://www.mi.uni-erlangen.de/~heckenb/programs.htm
Internet links: http://www.mi.uni-erlangen.de/~heckenb/links.htm

Other Threads