Understanding Intel Instruction Sizes

Understanding Intel Instruction Sizes

In certain types of programming, such as 256 byte intros, space is severely limited. As a result, size-optimizing code in assembly language it is often necessary. This article discusses the machine-code sizes of the common Intel architecture instructions, from the perspective of code optimization. Understanding the size of the machine code produced by an assembler is necessary to make effective optimization decisions. Without this information, it is impossible to chose between different coding options other than by trial-and-error, which is time-consuming and not highly effective.

This article contains two sections. The first section gives a general overview of the Intel instruction format, while the second part gives the encoding details of each common Intel instruction. The first section contains the background information necessary to understand the second part, while the second part is meant to be more of a reference.

An important distinction between DOS and Windows applications is the size of a machine word. Intel designed the original 8086 opcodes with 16-bit computing in mind. As a result, they use a single bit to distinguish between 16-bit operands and 8-bit operands. Because one bit has two possible values, this forces the newer 32-bit processor mode to have only two operand sizes as well. 32-bit mode changes the meaning of the size bit to distinguish between 32-bit operands and 8-bit operands. As a result, the size of a "small" operand is 8 bits in both modes, but the size of a "large" operand depends on the mode. Under DOS, the CPU runs in legacy 16-bit mode. This means that the default size of a "large" operand is 16 bits. Windows, however, runs in 32-bit mode, making the default size of a "large" operand 32 bits. To keep things simple, this article uses the term "word" to mean the size of a large operand. If your application runs under DOS, a "word" is 16 bits, but if your application runs under Windows, a "word" is 32 bits.

Intel Instruction Format

Although Intel instructions vary in size from one byte up to fourteen bytes, all Intel instructions have the same six-part structure. Understanding the purpose of each part is the first step to learning the sizes of the different Intel instructions. The parts of an Intel-format instruction are listed below, in the order that they appear in the instruction:

  • Prefixes: 0-4 bytes
  • Opcode: 1-2 bytes
  • ModR/M: 1 byte
  • SIB: 1 byte
  • Displacement: 1 byte or word
  • Immediate: 1 byte or word

Except for the opcode, all of these parts are optional. They are only present when the particular instruction requires them. Simple instructions such as NOP require just the opcode. Complicated instructions, such as ADD [ES: my_data+EBX+ESI*8], WORD 1003H, require all of the fields. The following paragraphs explain how and when each instruction field is used.

Prefixes

The optional prefixes are the first part of an Intel instruction. These prefixes modify the instruction's behavior in several different ways. Prefixes can change the default segment of an instruction, override the default size of the machine-word, control looping in string instructions, and control the processor's bus usage. Each prefix adds one byte to the instruction. An instruction can have one prefix from each of the four prefix groups, for a maximum of four prefix bytes:

  • Group 1: LOCK, REPE/REPZ, REP, REPNE/REPNZ
  • Group 2: CS, DS, ES, FS, GS, SS, Branch hints
  • Group 3: Operand-size override (16 bit vs. 32 bit)
  • Group 4: Address-size override (16 bit vs. 32 bit)
Opcode

The operation code, or opcode, comes after any optional prefixes. The opcode tells the processor which instruction to execute. In addition, opcodes contain bit fields describing the size and type of operands to expect. The NOT instruction, for example, has the opcode 1111011w. In this opcode, the w bit determines whether the operand is a byte or a word. The OR instruction has the opcode 000010dw. In this opcode, the d bit determines which operands are the source and destination, and the w bit determines the size again. Some instructions have several different opcodes. For example, when OR is used with the accumulator register (AX or EAX) and a constant, it has the special space-saving opcode 0000110w, which eliminates the need for a separate ModR/M byte. From a size-coding perspective, memorizing exact opcode bits is not necessary. Having a general idea of what type of opcodes are available for a particular instruction is more important.

Not all opcodes are the same size. The original instructions from the 8088 have one-byte opcodes, while new instructions since the 386 generally have two-byte opcodes. Some SSE instructions even have three-byte opcodes. This is because the size of a byte limits the number of possible opcodes. As Intel runs out of unused opcodes, the only way to add more instructions is to give them opcodes larger than one byte.

ModR/M

If the instruction requires it, the ModR/M byte comes after the opcode. This byte tells the processor which registers or memory locations to use as the instruction's operands. The byte has the following structure:

Both the reg1 and reg2 fields take three-bit register codes, indicating which registers to use as the instruction's operands. By default, reg1 is the source operand and reg2 is the destination. Some opcodes, such as the OR opcode mentioned above, contain a direction bit which overrides this default. Other instructions require a single operand. If an instruction requires only one operand, the unused reg2 field holds extra opcode bits rather than a register code. This is especially true for floating-point instructions, which use ST(0) as their implied destination.

The mod field determines the meaning of the reg1 field. It can have the following possible values:

CodeAssembly SyntaxMeaning
00[reg1]The operand's memory address is in reg1.
01[reg1 + byte]The operand's memory address is reg1 + a byte-sized displacement.
10[reg1 + word]The operand's memory address is reg1 + a word-sized displacement.
11reg1The operand is reg1 itself.

The meaning of reg1 field becomes more complicated in 16-bit mode. When mod specifes a memory address (mod = 00, 01, or 10), reg1 does not contain a simple register code. Instead, it specifies one of the following register combinations:

CodeRegister Combination
000BX + SI
001BX + DI
010BP + SI
011BP + DI
100SI
101DI
110BP
111BX

Both 16-bit and 32-bit modes have an additional complication. In the system above, ModR/M provides no obvious way to specify a fixed memory location as an operand. All of the combinations for mod and reg1 include a register as part of the memory address. To fix this problem, Intel arbitrarily defines the combination mod = 00, reg = BP / EBP to mean that the address of the operand is a simple [word] displacement. Because the codes for [BP] and [EPB] have this new meaning, there is no simple way to access memory given by the base pointer register. When the assembler sees one of these operands, it automatically creates the form [BP+00] or [EBP+00], which requires an additional displacement byte.

Finally, 32-bit mode has its own complication. When mod indicates a memory address (mod = 00, 01, or 10) and when reg1 indicates the ESP register, an additional byte follows the ModR/M byte. This byte, called the SIB byte, is used instead of reg1 to determine the operand's memory address. The structure of the SIB byte is discussed later.

Not all opcodes require the ModR/M byte. Some instructions, such as AAM, have fixed sources and destinations. Other instructions, such as PUSH and POP, encode the source or destination directly into the opcode. Knowing which instructions need a ModR/M byte and which instructions do not is the hardest part of learning Intel instruction sizes.

SIB

When ModR/M contains the correct mod and reg1 combination, a SIB byte follows the ModR/M byte. SIB is an acronym which stands for Scale*Index+Base. It is a powerful addressing format available only in 32-bit mode. In SIB, the combination of two registers and a scaling factor replaces reg1 in the operand's address. The SIB byte's format is shown below:

In the SIB byte, both index and base are three-bit register codes, and scale is a two-bit number. To compute the SIB value, the processor uses the following formula: (index * 2^scale) + base. (Obviously, the processor uses a bit shift to perform the power-of-two multiplication.) Once the processor finds the SIB value, it uses it in place of the ModR/M byte's reg1 value in the memory address computation.

The SIB byte enables complicated addresses such as [ebx*4 + esi + my_table]. For this example, the ModR/M and SIB bytes' fields have the following values:

  • ModR/M.mod = 10 (In other words, the mode is [reg1 + word].)
  • ModR/M.reg2 = Whatever (Usually the destination register, but depends on the opcode.)
  • ModR/M.reg1 = ESP (Intel redefines ESP's code to mean SIB in 32-bit memory addresses.)
  • SIB.scale = 2 (Because 2^2 = 4)
  • SIB.index = EBX
  • SIB.base = ESI

The SIB byte is ordinarily not present. It is only needed when an instruction uses the Scale*Index+Base addressing format.

Displacement

When the mod is either 01 or 10, a displacement is part of the operand's address. This displacement comes immediately after the ModR/M and optional SIB byte. Dending on the mod field, the displacement is either a byte or a word.

For example, here is the full machine code for the 32-bit instruction OR EAX, [ECX + EDX*2 + 406080A0h]:

OpcodeModR/MSIBDisplacement
0000101110 000 10001 010 00110100000 10000000 01100000 01000000

In 32-bit mode, a word-sized displacement takes four bytes. This is an enormous amount of space. When an instruction contains a four byte displacement, it is usually a good idea to look at other forms of addressing the may be smaller, such as using the stack, or a register plus a smaller displacement.

Immediate

If an instruction uses an immediate value as an operand, such as ADD AX, 0xF00F, the immediate value is the last part of the instruction. Like addressing displacements, immediates can be either a byte or a machine word.

To illustrate, here is the machine code for the 16-bit instruction, AND SI, 0420h:

OpcodeModR/MImmediate
1000000111 100 11000100000 00000100

Just with addressing displacements, a 32-bit word-sized immediate requires a huge amount of space. Big immediates usually compress better than displacements, however, because immediates usually contain more zero bytes.

Detailed Instruction Encodings

Directly memorizing Intel instruction sizes is not really possible, because an instruction's size depends on its operands. Instead, it is better to memorize which fields an instruction contains. By adding the sizes of the different fields, finding the instruction's size is easy. This section lists the opcode sizes, ModR/M requirements, and literal sizes of the common Intel instructions.

Integer Instructions

For simplicity, this section is organized as a table. The first column of the table lists the instructions in alphabetical order. The second column shows the different combinations of operands each instruction can take, while the third column shows the fields required to encode each combination. The table uses the following abbreviations:

  • m - memory
  • r - register
  • * - memory or register
  • i - immediate
  • disp - displacement
  • ac - accumulator (AL, AX, or EAX)
  • cc - condition code
  • op - one opcode byte
  • mod - ModR/M [+ optional SIB] [+ optional disp]

To show the size of each operand, the following suffixes are used:

  • b - byte
  • w - machine word
  • 1, 2, 3, 4, 6, 8 - number of bytes

If the table does not show the size of some operands, the operands can be either a byte or a word, as long as they are the same size. This is because opcodes use a size bit to determine operand sizes.

InstructionOperandsEncoding
AAAnoneop
AADnoneop i.b
AAMnoneop i.b
AASnoneop
ADC*, *
*, i
*.w, i.b
ac, i
op mod
op mod i
op mod i.b
op i
AND*, *
*, i
*.w, i.b
ac, i
op mod
op mod i
op mod i.b
op i
ADD*, *
*, i
*.w, i.b
ac, i
op mod
op mod i
op mod i.b
op i
BOUNDr.w, m.wop mod
BSFr.w, *.wop op mod
BSRr.w, *.wop op mod
BSWAPr.wop op mod
BT*.w, r.w
*.w, i.b
op op mod
op op mod i.b
BTC*.w, r.w
*.w, i.b
op op mod
op op mod i.b
BTR*.w, r.w
*.w, i.b
op op mod
op op mod i.b
BTS*.w, r.w
*.w, i.b
op op mod
op op mod i.b
CALLdisp.w
*.w
op disp.w
op mod
CBWnoneop
CDQnoneop
CLCnoneop
CLDnoneop
CLInoneop
CMCnoneop
CMOVcc*.w, *.wop op mod
CMP*, *
*, i
*.w, i.b
ac, i
op mod
op mod i
op mod i.b
op i
CMPSnoneop
CMPXCHG*, rop op mod
CMPXCHG8Bm.8op op mod
CPUIDnoneop op
CWDnoneop
CWDEnoneop
DAAnoneop
DASnoneop
DEC*
r.w
op mod
op
DIV*op mod
ENTERi.16, i.8op i.3
HLTnoneop
IDIV*op mod
IMUL*
r.w, *.w
r.w, i
r.w, *.w, i
op mod
op op mod
op mod i
op mod i
INac, i.b
ac, DX
op i.b
op
INC*
r.w
op mod
op
INSnoneop
INTi.b
3
op i.b
op
INTOnoneop
IRETnoneop
Jccdisp.b
disp.w
op disp.b
op op disp.w
JCXZdisp.bop disp.b
JMPdisp
*.w
op disp
op mod
LAHFnoneop
LDSr.w, m.wop mod
LEAr.w, mop mod
LEAVEnoneop
LESr.w, m.wop mod
LFSr.w, m.wop mod
LGSr.w, m.wop mod
LSSr.w, m.wop mod
LODSnoneop
LOOPdisp.bop disp.b
LOOPZdisp.bop disp.b
LOOPNZdisp.bop disp.b
MOV*, *
*, i
r, i
ac, [disp.w]
op mod
op mod i
op i
op disp.w
MOVSnoneop
MOVSXr.w, *.bop op mod
MOVZXr.w, *.bop op mod
MUL*op mod
NEG*op mod
NOPnoneop
NOT*op mod
OR*, *
*, i
*.w, i.b
ac, i
op mod
op mod i
op mod i.b
op i
OUTac, i.b
OUT ac, DX
op i.b
op
OUTSnoneop
POP*
r
FS
GS
op mod
op
op op
op op
POPAnoneop
POPFnoneop
PUSH*
r
i
FS
GS
op mod
op
op i
op op
op op
PUSHAnoneop
PUSHFnoneop
RCR*, 1
*, CL
*, i.b
op mod
op mod
op mod i.b
RCL*, 1
*, CL
*, i.b
op mod
op mod
op mod i.b
RETnone
i.2
op
op i.2
ROL*, 1
*, CL
*, i.b
op mod
op mod
op mod i.b
ROR*, 1
*, CL
*, i.b
op mod
op mod
op mod i.b
SAHFnoneop
SAL*, 1
*, CL
*, i.b
op mod
op mod
op mod i.b
SAR*, 1
*, CL
*, i.b
op mod
op mod
op mod i.b
SBB*, *
*, i
*.w, i.b
ac, i
op mod
op mod i
op mod i.b
op i
SCASnoneop
SETcc*.bop op mod
SHL*, 1
*, CL
*, i.b
op mod
op mod
op mod i.b
SHLD*.w, r.w, CL
*.w, r.w, i.b
op op mod
op op mod i.b
SHR*, 1
*, CL
*, i.b
op mod
op mod
op mod i.b
SHRD*.w, r.w, CL
*.w, r.w, i.b
op op mod
op op mod i.b
STCnoneop
STDnoneop
STInoneop
STOSnoneop
SUB*, *
*, i
*.w, i.b
ac, i
op mod
op mod i
op mod i.b
op i
TEST*, r
*, i
ac, i
op mod
op mod i
op i
WAITnoneop
XADD*, rop op mod
XCHG*, r
ac, r
op mod
op
XLATnoneop
XOR*, *
*, i
*.w, i.b
ac, i
op mod
op mod i
op mod i.b
op i

Many instructions in the above list have special space-saving opcodes that do not require an additional ModR/M byte. These instructions are:

  • DEC, INC, POP, or PUSH used with a word-sized general register.
  • ADC, ADD, AND, CMP, OR, SBB, SUB, TEST, or XOR used with the accumulator and an immediate.
  • MOV used with any general register and an immediate.
  • MOV used with the accumulator and a simple word displacement.
  • XCHG used with the accumulator and a word register.

To save space, the binary arithmetic instructions ADC, ADD, AND, CMP, OR, SBB, SUB, and XOR can use a byte-sized immediate with a word-sized destination. To do this, these instructions first sign-extend the literal to the destination's size before using it in the operation. This is especially valuable for 32-bit code, since it saves three bytes per instruction. Unfortunately, NASM, a fairly popular assembler, does not use the sign-extension encoding by default. To use this encoding, prefix the immediate with the BYTE keyword.

Two notable instructions in the above list are AAD and AAM. In the old Intel manuals, these instructions have two-byte opcodes. New Intel manuals now show these instructions with one-byte opcodes followed by an immediate equal to 0x0A. The AAD instruction multiplies AH by the immediate and adds the product to AL. AAM divides AL by the immediate, and stores the remainder in AL and the quotient in AH. It is possible to change the value of the immediate byte by coding the instructions in machine language, creating two new, nameless instructions for quickly dividing and multiplying a byte by a constant. The opcode for AAD is 0xD5, and the opcode for AAM is 0xD4.

Note also that ENTER, CALL FAR, and JMP FAR, and RET's immediate form are exceptions to the rule that an instruction's displacement and immediate literals must be either a byte or a word. ENTER takes a three-byte immediate, while CALL FAR and JMP FAR take either four-byte or six-byte displacements, depending on whether the processor is in 16 or 32-bit mode. The immediate form of RET requires a two-byte literal, regardless of the machine word's size.

Floating Point Instructions

For historical reasons, all floating-point instructions have a one-byte opcode followed by a ModR/M byte. If a floating point instruction does not access memory, the entire ModR/M byte holds opcode bits, so the encoding is effectively a two-byte opcode.

The original PC processor, the 8088, did not contain floating-point instructions. An optional math coprocessor, the 8087, provided floating point support for the 8088. To communicate with the math coprocessor, the 8088 contained eight escape instructions with ModR/M bytes. When the main processor received an escape instruction, it read the memory identified by the ModR/M byte and then performed a no-operation. Meanwhile, the math coprocessor recorded the contents of the escape opcode, the ModR/M byte, and the address of the memory read. The math coprocessor used the escape opcode and the ModR/M byte to determine the operation to perform, and used the memory address as the operation's target.

All Intel processors since the 486 have integrated floating-point units, so they no longer use the escape mechanism. Nevertheless, the instruction format of the 8087 remains.

Other Instructions

MMX instructions all have two-byte opcodes plus a ModR/M byte, except for shift-by-constant instructions, which include a one-byte immediate as well.

People who plan to use SSE or 3DNow! are probably code gurus already, so they can look up the operation sizes themselves.

There are many issues related to size-optimizing code; writing an article on all of them is impossible. Hopefully, understanding the sizes of Intel instructions provides a useful basis for discovering and better understanding these optimizing techniques.

May the Source be with you.

1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
Intel x86指令集手册是一本详细描述了英特尔处理器x86架构所支持的指令集的重要参考资料。这个手册被广泛用于软件开发人员、编译器开发人员和硬件设计工程师,以帮助他们更好地理解和利用x86架构的功能。 手册首先介绍了x86架构的基本知识,包括寄存器、内存寻址模式和数据类型等。然后,它详细描述了x86指令集的各个方面,包括整数和浮点指令、逻辑和位操作指令、条件跳转和循环指令,以及系统管理指令等。手册还涵盖了指令的格式、操作数的类型和寻址模式的解释,以及每个指令的执行结果和影响。 通过学习这个手册,软件开发人员可以了解如何利用各种指令来最大程度地优化他们的代码,从而提高程序的性能。编译器开发人员可以了解指令集的特性,并根据目标处理器的架构进行代码优化。硬件设计工程师可以通过研究手册中关于指令的执行过程和影响的信息,来设计更高效的处理器。 此外,这个手册还包含了其他有用的信息,比如关于浮点数的表示和处理、指令执行的时序要求、异常和中断处理的细节等。这些内容对于开发高性能、可靠的软件和硬件系统非常重要。 总之,Intel x86指令集手册是x86架构软硬件开发人员的重要工具,它提供了关于指令集的详尽解释和指导,帮助他们最大限度地利用这个强大的架构。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值