Gentle Introduction to x86-64 Assembly

Introduction

This document is meant to summarise differences between x86-64 and i386 assembly assuming that you already know well the i386 gas syntax. I will try to keep this document up to date until official documentation is available.

Register set extensions

X86-64 defines eight new integer registers named r8-r15. These registers are encoded using special REX prefix and so using them in non-64-bit instruction implies instruction length growth by 1 byte. They are named as follows:
  rXb for 8 bit register (containing the lowest byte of the 64-bit value)
  rXw for 16 bits
  rXd for 32 bits
  rX  for 64 bits
Where X stands for integer in the range of 8 to 16.
Original integer registers keeps their irregular names and the 64-bit versions of the 32-bit registers eax, edx, exc, ebx, esi, esi, edi, esp and ebp are now called rax, rdx, rcx, rbx, rsi, rdi, rsp and respectivetly rbp.
The new registers can be used in the same places as the old ones, except for implicit register usage. Some instructions implicitly use specific fixed registers, e.g. as shift counters, source and destination for string operations, etc.

Extended 8-bit instructions

Instructions with REX prefix change behaviour of 8-bit register parts so that all registers can be accessed as 8-bit registers. The upper halves (ah, dh, ch, bh) are replaced by lower halves of next 4 registers (sil, dil, spl, bpl). Then the rules described above are applied.
Unfortunatly some instructions require a REX prefix, so you can't use upper halves together with addresses requiring REX prefix:
  addb %ah, (%r10)	# Invalid instruction.

64bit instructions

By default most operations remain 32-bit and the 64-bit counterparts are invoked by the fourth bit in the REX prefix. This means that each 32-bit instruction has it's natural 64-bit extension and that extended registers are for free in 64-bit instructions.
To write 64bit instructions, use 'q' as a suffix (q for 'quad-word'):
  movl $1,  %eax	# 32-bit instruction
  movq $1,  %rax	# 64-bit instruction
Exceptions from this rule are instructions manipulating the stack (push, pop, call, ret, enter and leave) which are implicitly 64-bit and their 32-bit counterparts are not available anymore, yet their 16-bit counterparts are. So:
  pushl %eax		# Illegal instruction
  pushq %rax		# 1 byte instruction encoded as pushl %eax in 32 bits
  pushq %r10		# 2 byte instruction encoded as pushl preceeded by REX.

Implicit zero extend

Results of 32-bit operations are implicitly zero extended to 64-bit values. This differs from 16 and 8 bit operations, that don't affect the upper part of registers. This can be used for code size optimisations in some cases, such as:
  movl $1, %eax                 # one byte shorter movq $1, %rax
  xorq %rax, %rax		# three byte equivalent of mov $0,%rax
  andl $5, %eax			# equivalent for andq $5, %eax

Immediates

Immediate values inside instructions remain 32 bits and their value is sign extended to 64 bits before calculation. This means that:
  addq $1, %rax		         # Valid instruction
  addq $0x7fffffff, %rax	 # As this
  addq $0xffffffffffffffff, %rax # as this one
  addq $0xffffffff, %rax	 # Invalid instruction
  addl $0xffffffff, %eax	 # Valid instruction
Only exception from this rule are the moves of constant to registers that have 64bit form. This means:
  movl 1, %eax			# 5  byte instruction
  movq 1, %rax			# 7  byte instruction
  movq 0xffffffffffffffff, %rax # 7  byte instruction
  movq 0x1122334455667788, %rax # 10 byte instruction
  movq 0xffffffff, %rax		# 10 byte instruction
  movl 0xffffffff, %eax		# 5 byte instruction equivalent to above
You may write symbolic expressions as operands to both 64-bit and 32-bit operations. For 32-bit operations they result in zero extending relocations, while in 64-bit operations they result in sign extending ones.
  movl $symb, %eax		# 5 byte instruction
  movq $symb, %rax		# 7 byte instruction
So in case you know that the symbol is in the first 32 bits, you should use 32bit instructions whenever possible.
To load a symbol as 64-bit value, you need to use movabs instruction, that is a synonym for mov only changes the default behaviour:
  movandq %symb, %rax		# 11 byte instruction

Displacements

Similarly as immediates, the displacements are also sign extended and pretty much the same rules apply to them. X86-64 defines a special form of move instruction having 64-bit displacement and similarly, as for immediates, it is implicitly used when the value is known to not fit at compilation time and you need to use movabs to force a 64-bit relocation:
  movl 0x1, %eax		# load with 32bit sign extended relocation
  movl 0xffffffff, %eax		# load with 64bit relocation
  movl symb, %eax		# load with 32bit sign extended relocation
  movabsl symb, %eax		# load with 64bit sign extended relocation
Loads and stores with 64-bit displacement are available only for the eax instruction.

RIP relative addressing

X86-64 defines a new instruction pointer relative addressing mode to simplify writing of position independent code. The original displacement-only addressing of are overwritten by this one and displacement only is now encoded by one of the redundant SIB form. This means that RIP relative addressing is actually cheaper than displacement only.
To encode this addressing, just write rip as yet another register:
  movl $0x1, 0x10(%rip)
will store the value 0x1 10 bytes after the end of the instruction.
Symbolic relocation will be implicitly RIP relative, so
  movl $0x1, symb(%rip)
Will write 0x1 to the address of symbol "symb".
FIXME: This looks particularly confusing in the Intel syntax [symb+rip] suggest different location than [symb]. Suggestions for better syntax with symbols?
You are recommended to use RIP relative addressing whenever possible to reduce code size.
The RIP relative branch instructions are still encoded equally to 32bit mode. This means that they are implicitly RIP relative and "*" is used to switch to absolute form.

R13 addressing limitations

The R13 is upper-half equivalent of RBP, that is used in MODRM encoding to escape out into SIB. The R13 also does the encoding (to prevent REX prefix from changing instruction length), so pretty much same limitations to RBP addressing apply to the R13. This means that
  (%rbp,index,scale)
is not encodable and:
  0(%rbp,index,scale)
must be used.