Assembly Language

汇编语言是任何一种用于电子计算机、微处理器、微控制器的低级语言,亦称为符号语言。在汇编语言中,用助记符代替机器指令的操作码,用地址符号或标号代替指令或操作数的地址。在不同的设备中,汇编语言对应着不同的机器语言指令集,通过汇编过程转换成机器指令。特定的汇编语言和特定的机器语言指令集是一一对应的,不同平台之间不可直接移植。




0. Inline Assembler

这里使用 M i c r o s o f t   V i s u a l   C + + Microsoft~Visual~C++ Microsoft Visual C++ 来学习汇编的编写。你可以在 C++ 程序内通过使用 _asm 关键字来添加汇编语法。

e.g.

#include <stdio.h>
#include <iostream>
using namespace std;

int main(void)
{
	_asm MOV EAX, var
	//或者添加多行
	_asm{
		MOV EAX, var1
		MOV EAX, var2
	}
	return 0;
}


1. General-Purpose Registers (通用寄存器)

1.1 Name of the registers

汇编中所有的运算与命令都需要放到寄存器内执行,为了方便,汇编已经定义好了一些寄存器的名字。
在这里插入图片描述
EAX, EBX, ECX, EDX have 8-bit name, 16-bit name, 32-bit name.
在这里插入图片描述
ESI, EDI, EBP, ESP only have a 16-bit name for their lower half
在这里插入图片描述

1.2 Some Specialized Register Uses

  • General-Purpose
    • EAX : accumulator
    • ECX : loop counter
    • ESP : stack pointer
    • ESI, EDI : index registers
    • EBP : extended frame pointer (stack) ESP是一个指针,始终执行堆栈的栈顶。而EBP就是那个栈
  • Segment
    • CS : Code Segment 汇编语言用内存中的某一片连续地址存放代码,称为 Code Segment,首地址存放在 CS
    • DS : Data Segment
    • SS : Stack Segment
    • ES, FS, GS : additional segment
  • EIP : instruction pointer EIP 寄存器存储着 CPU 要读取指令的地址,没有它,CPU 就无法读取下面的指令,每次汇编指令执行完相应的 EIP 值就会增加
  • EFLAGS : 文章1文章2


2. 数据类型

在这里插入图片描述
数据类型与寄存器的对应关系:

  • BYTE, SBYTE : AH, AL, BH, BL, CH, CL, DH, DL
  • WORD, SWORD : AX, BX, CX, DX, SI, DI
  • DWORD, SDWORD : EAX, EBX, ECX, EDX, ESI, EDI


3 Comments

  • Single-line comments : begin with semicolon
  • Multi-line comments : begin with COMMENT directive and a programmer-chosen character. End with the same programmer-chosen character.


4. Instructions

we use the Intel IA-32 instruction set

An instruction contains :

  • Label (optional)
  • Mnemonic (required)
  • Operand (depends on the instruction)
  • Comment (optional)

4.1 Labels

  • Act as place markers. Marks the address of code and data.
  • Follow identifer rules
  • Data label :
    • must be unique
    • not followed by colon
  • Code label :
    • target of jump and loop instructions
    • followed by colon

4.2 Mnemonics and Operands

  • Instruction Mnemonics
    • memory aid
    • e.g. MOV, ADD, SUB, MUL, INC, DEC
  • Operands
    • constant
    • constant expression : Constants and constant expressions are often called immediate values
    • register
    • memory (data label)

4.3 Structure of instructions

The binary codes of almost all instruction contain three pieces of information:

  • The action or operation of the instruction
  • The operands involved (where to find the information to operate with)
  • Where the result is to go

Machine instructions are encoded with distinct bit fields in the prefix to contain information about

  • The operation required.
  • The location of operands and results
  • The data type of the operand

The length of the instructions is depended on:

  • The operation required
  • The addressing modes employed

(Pentium instructions can be from 1 to 15 bytes long)


4.4 Basic instructions

4.4.1 MOV

MOV destination, source

Move from source to destination.

No more than one memory operand permitted, CS, EIP, IP cannot be the destination. No immediate to segment moves. memory-to-memory move not permitted.

Zero Extension
When you copy a smaller value into a larger destination, the MOVZX instruction fills the upper half of the destination with zeros.

Sign Extension
The MOVSX instruction fills the upper half of the destination with a copy of the source operand’s sign bit.


4.4.2 XCHG Instruction

Exchanges the values of two operands. At least one operand must be a register. No immediate operands are permitted.


4.4.3 INC and DEC Instructions

  • INC : Add 1 from destination operand (register | memory)
  • DEC : Subtract 1

4.4.4 ADD and SUB Instructions

ADD | SUB destination, source

Same operand rules as for the MOV instruction


4.4.5 NEG Instruction

NEG source

Reverses the sign of an operand. Operand can be a register or memory operand.

NEG Instruction and the Flags
Any nonzero operand causes the Carry flag to be set.
在这里插入图片描述

  • CF : 进位标志
  • OF : 溢出标志

4.4.6 Implementing Arithmetic Expressions

HLL compilers translate mathematical expressions into assembly language.
e.g.
在这里插入图片描述

4.5 Flow Controls

4.5.1 JMP

JMP is an unconditional jump to a label that is usually within the same procedure.

target:
	.
	.
	.
	JMP target

logic : EIP ← \leftarrow target


4.5.2 JCXZ and JECXZ

There are more than 30 jump instructions, JCXZ and JECXZ are to of them, they are conditional jump to test whether CX and ECX is zero and remaining jump instructions test the status flags. Jump if the condition is true or continue if it is false.

	JCXZ target
	.
	.
	.
target:

4.5.3 Other Jump Instructions

  • JC / JB : Jump if Carry flag is set
  • JNC / JNB : Jump if Carry flag is clear
  • JE / JZ : Jump if Zero flag is set
  • JNE / JNZ : Jump if Zero flag is clear
  • JS : Jump if Sign flag is set
  • JNS : Jump if Sign flag is clear
  • JO : Jump if Overflow flag is set
  • JNO : Jump if Overflow flag is clear

4.5.4 CMP

The CMP instruction is the most common way to test for conditional jumps.

CMP EAX EBX

It will set zero flag Z = 1 if EAX and EBX are the same.

Jumps based on CMP

Assuming execution just after CMP

  • JE : Jump if the first operand (in CMP) is equal to the second operand.
  • JNE : Jump if the first and second operands are not equal.
  • JGE : Jump if first operand is greater or equal
  • JG : Jump if first operand is greater
  • JLE : Jump if first operand is less or equal
  • JL : Jump if first operand is less

4.5.5 LOOP

target:
	.
	.
	.
	LOOP target

logic : ECX ← \leftarrow ECX-1, if ECX != 0, jump to target


4.5.6 LOOPNE

e.g. While EAX is not equal to EBX, and not 200 times yet:

	MOV ECX 200
target:
	.
	.
	.
	CMP EAX, EBX
	LOOPNE target

4.6 Basic Data Structure

4.6.1 Stack

Runtime Stack

Managed by the CPU, using two registers

  • SS (stack segment)
  • ESP (stack pointer)

PUSH Operation (入栈 / 压栈)

A 32-bit push operation decrements ESP by 4 and copies a value into the location pointed to by ESP. The stack grows downward. The area below ESP is always available (unless the stack has overflowed)

PUSH EAX

POP Operation (出栈 / 弹栈)

Copies value at top of the stack into a register or variable, adds 2 or 4 to ESP, depends on the attribute of the operand receiving the data.

POP EAX

4.7 IO Instructions

4.7.1 Input

CALL scanf : It will take two parameters from the stack, the address of the format of the input, and the address of the variable to store the input.

4.7.2 Output

CALL printf : It will take one parameter from the stack, the variable (not address)

e.g.

#include <stdio.h>
#include <iostream>
using namespace std;

int main(void) {
	char message[] = "The input number is %d\n";
	char format[] = "%d";
	int input;
	_asm {
		LEA EAX, input
		PUSH EAX
		LEA EAX, format
		PUSH EAX
		CALL scanf
		ADD ESP, 8
		
		PUSH input
		LEA EAX, message
		PUSH EAX
		CALL printf
		ADD ESP, 8
	}
	return 0;
}

5. Addressing modes

The way of forming operand addresses. Offering various addressing modes support better the needs of HLLs when they need to manipulate large data structures.

Immediate mode
MOV EAX, 104
Part of the binary code here is the value (= 104) of the operand

Data Register Direct
MOV EAX, EBX
This is the fastest to execute

Memory Direct
MOV EAX, a
a is a variable, stored in memory and the instruction contains the address of this variable.

Address Register Direct
LEA EAX, message
The instruction, contains the address of message variable, which is loaded into EAX register after the execution of the instruction

Register Indirect
MOV EAX, [EBX]
The instruction copies to the EAX register the content of a memory location with the address stored in EBX

Indexed Register Indirect with displacement
MOV EAX, [array + ESI]
MOV EAX, array[ESI]

通过循环操作数组

.data
array WORD 100h, 200h, 300h, 400h
.code
	MOV EDI, OFFSET array
	MOV ECX, LENGTHOF array
	MOV AX, 0
L1:
	ADD ax, [EDI]
	ADD EDI, TYPE array
	LOOP L1

这里 [EDI] 是间接引用,不是直接使用寄存器中的值,而是把这个值作为地址,取该地址中的值。

.data
source BYTE "This is the source string", 0
target BYTE SIZEOF source DUP(0)
.code
	MOV ESI, 0
	MOV ECX, SIZEOF source
L1:
	MOV al, source[ESI]
	MOV target[ESI], al
	INC ESI
	LOOP L1

此处的 source[ESI] 表示取 source 中的某个元素,与 C++ 类似

The assembler calculate the distance between the offset of the following instruction and the target lable. It is called relative offset. The relative offset is added to EIP



6. Flags Affected by Arithmetic

The ALU has a number of status flags that reflect the outcome of arithmetic (and bitwise) operations based on the contents of the destination operand.
The MOV Instruction never affects the flags


6.1 Zero Flag (ZF)

The Zero Flag is set when the result of an operation produces zero in the destination operand.
当运算结果为 0 的时候,ZF 为 1


6.2 Sign Flag (SF)

The Sign Flag is set when the destination operand is negative, the flag is clear when the destination is positive.
数据的最高位是符号位,SF 是符号位的拷贝



7. Useful Operators

7.1 OFFSET

Returns the distance in bytes, of a label from the beginning of its enclosing segment
返回某变量在该片段中的偏移量(也就是地址),可类比 C / C++ 中的指针
在这里插入图片描述

7.2 TYPE

Returns the size in bytes.


7.3 LENGTHOF

Counts the number of elements in a single data declaration
在这里插入图片描述
x DUP(y) : 将 y y y 重复 x x x


7.4 SIZEOF

Returns a value that is equivalent to multiplying LENGTHOF by TYPE
有点像 C++ 中的 sizeof



8. Spanning Multiple Lines

A data declaration spans multiple lines if each line (except the last) ends with a comma. The LENGTHOF and SIZEOF operatiors include all lines belonging to the declaration.



9. LABEL Directive

Assigns an alternate label name and type to an existing storage location. LABEL does not allocate any storage of its own

LABEL 指令详解 (https://blog.csdn.net/deniece1/article/details/103213681)


10. Subroutine

label PROC
	.
	.
	.
	RET
label ENDP

The procedure can be called by the instruction CALL label

  • CALL : Records the current value of EIP as the return address and push into the stack. Places the required subroutine address into EIP

  • RET : Changes the control, causing execution to continue from the point following the CALL by poping the last address stored in the stack and put it into EIP


10.1 Value Parameters

普通的值传递

e.g. 返回两数中较大者

	MOV EAX, first
	MOV EBX, second
	CALL bigger
	MOV max, EAX

bigger PROC
	MOV save1, EAX
	MOV save2, EBX
	CMP EAX, EBX
	JG first_big
	MOV EAX, save2
	RET
first_big:
	MOV EAX, save1
	RET
bigger ENDP

10.2 Reference Parameters

引用传递,传递的是地址而不是数值,所以会直接改变原变量的值。

e.g.

	LEA EAX, first
	LEA EBX, second
	CALL swap
swap PROC
	MOV temp, [EAX]
	MOV [EBX], [EBX]
	MOV [EBX], temp
	RET
swap ENDP

10.3 Stack Frame

在 10.1 和 10.2 中我们使用寄存器来实现值的传递,但是这个方式太过局限了,可以使用堆栈 (Stack frame) 来替代,使其更加灵活

Just before and during the call of a subroutine the following happens:

  • The parameters are pushed on the stack
  • The return address is pushed on the stack
  • The address stored in EBP is pushed on the stack
  • A new stack frame is created
  • The current address of the top of the new stack frame is saved in EBP
  • The local variables are installed on the new stack

Once the subroutine done its job:

  • Pop all local variables out of the stack
  • Pop the previous EBP address from the top of the stack and restore it in EBP
  • Clean up parameters in the stack
  • Pop the return address and save it in EIP

popping order is crucial


10.4 Recursive subroutines

定义:递归:见递归

函数自己调用自己。

e.g. 阶乘

factorial 	PROC
	PUSH	EAX
	DEC		EAX
	JZ		finish
	CALL	factorial
	PUSH	EAX
	CALL	multiply
	RET
finish:
	POP		EAX
	RET
factorial 	ENDP

multiply 	PROC
	POP		EAX
	MOV		aux, EAX
	POP		EAX
	MUL		EAX, aux
	RET
multiply	ENDP
  • 6
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

SP FA

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值