Assembly Language

最新推荐文章于 2024-07-22 18:58:45 发布

SP FA

最新推荐文章于 2024-07-22 18:58:45 发布

阅读量1.2k

点赞数 6

分类专栏： CPT 文章标签：汇编

本文链接：https://blog.csdn.net/SP_FA/article/details/120439175

版权

CPT 专栏收录该内容

11 篇文章 40 订阅

订阅专栏

汇编语言是任何一种用于电子计算机、微处理器、微控制器的低级语言，亦称为符号语言。在汇编语言中，用助记符代替机器指令的操作码，用地址符号或标号代替指令或操作数的地址。在不同的设备中，汇编语言对应着不同的机器语言指令集，通过汇编过程转换成机器指令。特定的汇编语言和特定的机器语言指令集是一一对应的，不同平台之间不可直接移植。

文章目录

0. Inline Assembler
1. General-Purpose Registers (通用寄存器)
- 1.1 Name of the registers
- 1.2 Some Specialized Register Uses
2. 数据类型
3 Comments
4. Instructions
5. Addressing modes
6. Flags Affected by Arithmetic
- 6.1 Zero Flag (ZF)
- 6.2 Sign Flag (SF)
7. Useful Operators
8. Spanning Multiple Lines
9. LABEL Directive
10. Subroutine

0. Inline Assembler

这里使用 $M i c r o s o f t V i s u a l C + +$ 来学习汇编的编写。你可以在 C++ 程序内通过使用 _asm 关键字来添加汇编语法。

e.g.

#include <stdio.h>
#include <iostream>
using namespace std;

int main(void)
{
	_asm MOV EAX, var
	//或者添加多行
	_asm{
		MOV EAX, var1
		MOV EAX, var2
	}
	return 0;
}

1. General-Purpose Registers (通用寄存器)

1.1 Name of the registers

汇编中所有的运算与命令都需要放到寄存器内执行，为了方便，汇编已经定义好了一些寄存器的名字。
在这里插入图片描述
EAX, EBX, ECX, EDX have 8-bit name, 16-bit name, 32-bit name.

ESI, EDI, EBP, ESP only have a 16-bit name for their lower half

1.2 Some Specialized Register Uses

General-Purpose
- EAX : accumulator
- ECX : loop counter
- ESP : stack pointer
- ESI, EDI : index registers
- EBP : extended frame pointer (stack) ESP是一个指针，始终执行堆栈的栈顶。而EBP就是那个栈
Segment
- CS : Code Segment 汇编语言用内存中的某一片连续地址存放代码，称为 Code Segment，首地址存放在 CS 中
- DS : Data Segment
- SS : Stack Segment
- ES, FS, GS : additional segment
EIP : instruction pointer EIP 寄存器存储着 CPU 要读取指令的地址，没有它，CPU 就无法读取下面的指令，每次汇编指令执行完相应的 EIP 值就会增加
EFLAGS : 文章1，文章2

2. 数据类型

在这里插入图片描述
数据类型与寄存器的对应关系：

BYTE, SBYTE : AH, AL, BH, BL, CH, CL, DH, DL
WORD, SWORD : AX, BX, CX, DX, SI, DI
DWORD, SDWORD : EAX, EBX, ECX, EDX, ESI, EDI

3 Comments

Single-line comments : begin with semicolon
Multi-line comments : begin with COMMENT directive and a programmer-chosen character. End with the same programmer-chosen character.

4. Instructions

we use the Intel IA-32 instruction set

An instruction contains :

Label (optional)
Mnemonic (required)
Operand (depends on the instruction)
Comment (optional)

4.1 Labels

Act as place markers. Marks the address of code and data.
Follow identifer rules
Data label :
- must be unique
- not followed by colon
Code label :
- target of jump and loop instructions
- followed by colon

4.2 Mnemonics and Operands

Instruction Mnemonics
- memory aid
- e.g. MOV, ADD, SUB, MUL, INC, DEC
Operands
- constant
- constant expression : Constants and constant expressions are often called immediate values
- register
- memory (data label)

4.3 Structure of instructions

The binary codes of almost all instruction contain three pieces of information:

The action or operation of the instruction
The operands involved (where to find the information to operate with)
Where the result is to go

Machine instructions are encoded with distinct bit fields in the prefix to contain information about

The operation required.
The location of operands and results
The data type of the operand

The length of the instructions is depended on:

The operation required
The addressing modes employed

(Pentium instructions can be from 1 to 15 bytes long)

4.4 Basic instructions

4.4.1 MOV

MOV destination, source

Move from source to destination.

No more than one memory operand permitted, CS, EIP, IP cannot be the destination. No immediate to segment moves. memory-to-memory move not permitted.

Zero Extension
When you copy a smaller value into a larger destination, the MOVZX instruction fills the upper half of the destination with zeros.

Sign Extension
The MOVSX instruction fills the upper half of the destination with a copy of the source operand’s sign bit.

4.4.2 XCHG Instruction

Exchanges the values of two operands. At least one operand must be a register. No immediate operands are permitted.

4.4.3 INC and DEC Instructions

INC : Add 1 from destination operand (register | memory)
DEC : Subtract 1

4.4.4 ADD and SUB Instructions

ADD | SUB destination, source

Same operand rules as for the MOV instruction

4.4.5 NEG Instruction

NEG source

Reverses the sign of an operand. Operand can be a register or memory operand.

NEG Instruction and the Flags
Any nonzero operand causes the Carry flag to be set.
在这里插入图片描述

CF : 进位标志
OF : 溢出标志

4.4.6 Implementing Arithmetic Expressions

HLL compilers translate mathematical expressions into assembly language.
e.g.
在这里插入图片描述

4.5 Flow Controls

4.5.1 JMP

JMP is an unconditional jump to a label that is usually within the same procedure.

target:
	.
	.
	.
	JMP target

logic : EIP $\leftarrow$ target

4.5.2 JCXZ and JECXZ

There are more than 30 jump instructions, JCXZ and JECXZ are to of them, they are conditional jump to test whether CX and ECX is zero and remaining jump instructions test the status flags. Jump if the condition is true or continue if it is false.

	JCXZ target
	.
	.
	.
target:

4.5.3 Other Jump Instructions

JC / JB : Jump if Carry flag is set
JNC / JNB : Jump if Carry flag is clear
JE / JZ : Jump if Zero flag is set
JNE / JNZ : Jump if Zero flag is clear
JS : Jump if Sign flag is set
JNS : Jump if Sign flag is clear
JO : Jump if Overflow flag is set
JNO : Jump if Overflow flag is clear

4.5.4 CMP

The CMP instruction is the most common way to test for conditional jumps.

CMP EAX EBX

It will set zero flag Z = 1 if EAX and EBX are the same.

Jumps based on CMP

Assuming execution just after CMP

JE : Jump if the first operand (in CMP) is equal to the second operand.
JNE : Jump if the first and second operands are not equal.
JGE : Jump if first operand is greater or equal
JG : Jump if first operand is greater
JLE : Jump if first operand is less or equal
JL : Jump if first operand is less

4.5.5 LOOP

target:
	.
	.
	.
	LOOP target

logic : ECX $\leftarrow$ ECX-1, if ECX != 0, jump to target

4.5.6 LOOPNE

e.g. While EAX is not equal to EBX, and not 200 times yet:

	MOV ECX 200
target:
	.
	.
	.
	CMP EAX, EBX
	LOOPNE target

4.6 Basic Data Structure

4.6.1 Stack

Runtime Stack

Managed by the CPU, using two registers

SS (stack segment)
ESP (stack pointer)

PUSH Operation (入栈 / 压栈)

A 32-bit push operation decrements ESP by 4 and copies a value into the location pointed to by ESP. The stack grows downward. The area below ESP is always available (unless the stack has overflowed)

PUSH EAX

POP Operation (出栈 / 弹栈)

Copies value at top of the stack into a register or variable, adds 2 or 4 to ESP, depends on the attribute of the operand receiving the data.

POP EAX

4.7 IO Instructions

4.7.1 Input

CALL scanf : It will take two parameters from the stack, the address of the format of the input, and the address of the variable to store the input.

4.7.2 Output

CALL printf : It will take one parameter from the stack, the variable (not address)

e.g.

#include <stdio.h>
#include <iostream>
using namespace std;

int main(void) {
	char message[] = "The input number is %d\n";
	char format[] = "%d";
	int input;
	_asm {
		LEA EAX, input
		PUSH EAX
		LEA EAX, format
		PUSH EAX
		CALL scanf
		ADD ESP, 8
		
		PUSH input
		LEA EAX, message
		PUSH EAX
		CALL printf
		ADD ESP, 8
	}
	return 0;
}

5. Addressing modes

The way of forming operand addresses. Offering various addressing modes support better the needs of HLLs when they need to manipulate large data structures.

Immediate mode
MOV EAX, 104
Part of the binary code here is the value (= 104) of the operand

Data Register Direct
MOV EAX, EBX
This is the fastest to execute

Memory Direct
MOV EAX, a
a is a variable, stored in memory and the instruction contains the address of this variable.

Address Register Direct
LEA EAX, message
The instruction, contains the address of message variable, which is loaded into EAX register after the execution of the instruction

Register Indirect
MOV EAX, [EBX]
The instruction copies to the EAX register the content of a memory location with the address stored in EBX

Indexed Register Indirect with displacement
MOV EAX, [array + ESI]
MOV EAX, array[ESI]

通过循环操作数组

.data
array WORD 100h, 200h, 300h, 400h
.code
	MOV EDI, OFFSET array
	MOV ECX, LENGTHOF array
	MOV AX, 0
L1:
	ADD ax, [EDI]
	ADD EDI, TYPE array
	LOOP L1

这里 [EDI] 是间接引用，不是直接使用寄存器中的值，而是把这个值作为地址，取该地址中的值。

.data
source BYTE "This is the source string", 0
target BYTE SIZEOF source DUP(0)
.code
	MOV ESI, 0
	MOV ECX, SIZEOF source
L1:
	MOV al, source[ESI]
	MOV target[ESI], al
	INC ESI
	LOOP L1

此处的 source[ESI] 表示取 source 中的某个元素，与 C++ 类似

The assembler calculate the distance between the offset of the following instruction and the target lable. It is called relative offset. The relative offset is added to EIP

6. Flags Affected by Arithmetic

The ALU has a number of status flags that reflect the outcome of arithmetic (and bitwise) operations based on the contents of the destination operand.
The MOV Instruction never affects the flags

6.1 Zero Flag (ZF)

The Zero Flag is set when the result of an operation produces zero in the destination operand.
当运算结果为 0 的时候，ZF 为 1

6.2 Sign Flag (SF)

The Sign Flag is set when the destination operand is negative, the flag is clear when the destination is positive.
数据的最高位是符号位，SF 是符号位的拷贝

7. Useful Operators

7.1 OFFSET

Returns the distance in bytes, of a label from the beginning of its enclosing segment
返回某变量在该片段中的偏移量（也就是地址），可类比 C / C++ 中的指针
在这里插入图片描述

7.2 TYPE

Returns the size in bytes.

7.3 LENGTHOF

Counts the number of elements in a single data declaration
在这里插入图片描述
x DUP(y) : 将 $y$ 重复 $x$ 次

7.4 SIZEOF

Returns a value that is equivalent to multiplying LENGTHOF by TYPE
有点像 C++ 中的 sizeof

8. Spanning Multiple Lines

A data declaration spans multiple lines if each line (except the last) ends with a comma. The LENGTHOF and SIZEOF operatiors include all lines belonging to the declaration.

9. LABEL Directive

Assigns an alternate label name and type to an existing storage location. LABEL does not allocate any storage of its own

LABEL 指令详解 (https://blog.csdn.net/deniece1/article/details/103213681)

10. Subroutine

label PROC
	.
	.
	.
	RET
label ENDP

The procedure can be called by the instruction CALL label

CALL : Records the current value of EIP as the return address and push into the stack. Places the required subroutine address into EIP
RET : Changes the control, causing execution to continue from the point following the CALL by poping the last address stored in the stack and put it into EIP

10.1 Value Parameters

普通的值传递

e.g. 返回两数中较大者

	MOV EAX, first
	MOV EBX, second
	CALL bigger
	MOV max, EAX

bigger PROC
	MOV save1, EAX
	MOV save2, EBX
	CMP EAX, EBX
	JG first_big
	MOV EAX, save2
	RET
first_big:
	MOV EAX, save1
	RET
bigger ENDP

10.2 Reference Parameters

引用传递，传递的是地址而不是数值，所以会直接改变原变量的值。

e.g.

	LEA EAX, first
	LEA EBX, second
	CALL swap
swap PROC
	MOV temp, [EAX]
	MOV [EBX], [EBX]
	MOV [EBX], temp
	RET
swap ENDP

10.3 Stack Frame

在 10.1 和 10.2 中我们使用寄存器来实现值的传递，但是这个方式太过局限了，可以使用堆栈 (Stack frame) 来替代，使其更加灵活

Just before and during the call of a subroutine the following happens:

The parameters are pushed on the stack
The return address is pushed on the stack
The address stored in EBP is pushed on the stack
A new stack frame is created
The current address of the top of the new stack frame is saved in EBP
The local variables are installed on the new stack

Once the subroutine done its job:

Pop all local variables out of the stack
Pop the previous EBP address from the top of the stack and restore it in EBP
Clean up parameters in the stack
Pop the return address and save it in EIP

popping order is crucial

10.4 Recursive subroutines

~~定义：递归：见递归~~

函数自己调用自己。

e.g. 阶乘

factorial 	PROC
	PUSH	EAX
	DEC		EAX
	JZ		finish
	CALL	factorial
	PUSH	EAX
	CALL	multiply
	RET
finish:
	POP		EAX
	RET
factorial 	ENDP

multiply 	PROC
	POP		EAX
	MOV		aux, EAX
	POP		EAX
	MUL		EAX, aux
	RET
multiply	ENDP

SP FA

关注

6
点赞
踩
17

收藏

觉得还不错? 一键收藏
打赏
0
评论
Assembly Language

汇编语言（Assembly Language）是任何一种用于电子计算机、微处理器、微控制器或其他可编程器件的低级语言，亦称为符号语言。在汇编语言中，用助记符代替机器指令的操作码，用地址符号或标号代替指令或操作数的地址。在不同的设备中，汇编语言对应着不同的机器语言指令集，通过汇编过程转换成机器指令。特定的汇编语言和特定的机器语言指令集是一一对应的，不同平台之间不可直接移植。...
复制链接

扫一扫