CS:APP第三章知识总结（汇编语言、机器码、寄存器、编译器优化、函数底层实现、浮点指令）

rookie19_

已于 2022-08-19 15:19:25 修改

阅读量428

点赞数

分类专栏：读书文章标签： unix

于 2020-12-17 21:53:22 首次发布

本文链接：https://blog.csdn.net/weixin_42100211/article/details/111316476

版权

读书专栏收录该内容

15 篇文章 2 订阅

订阅专栏

文章目录

高级语言相对汇编语言的优势
编译器优化的选项

高级语言相对汇编语言的优势

开发效率高。IDE和编译器会提醒你的错误。由于编译器优化的存在，高级语言在执行效率上的劣势并不大。
出错概率
跨平台

cc是C compiler的缩写。

编译器优化的选项

-Og 使机器码的结构与源代码相似，避免代码的变形，通常用于教学。实际使用中，一般使用更高级别的优化，如-O1 or -O2。

关于非法地址错误：
At any given time, only limited sub-ranges of virtual addresses are considered valid. For example, x86-64 virtual addresses are represented by 64-bit words. In current implementations of these machines, the upper 16 bits must be set to zero, and so an address can potentially specify a byte
over a range of 248, or 64 terabytes.The operating system manages this virtual address space, translating virtual addresses into the physical addresses of values in the actual processor memory.

$ gcc -Og -S mstore.c
$ gcc -Og -c mstore.c
$ objdump -d mstore.o
objdump据传是gcc下的反汇编工具。

在link这一步一定需要一个main函数。仅含非main函数的.c也可以生成.s汇编和.o机器码。main函数的加入会使得尺寸大大增加，因为 it contains not just the machine code for the procedures we provided but also code used to start and terminate the program as well as to interact with the operating system.

linker到底做了什么：
1.shifted the location of the code to a different range of addresses
2. match function calls with the locations of the executable code for
those functions（也就是call命令中会指明被call函数的地址）
3. NOP have been inserted to grow the code for the function to 16 bytes, enabling a better placement of the next block of code in terms of memory system performance.

P205，AT&T格式和Intel格式的汇编代码有所区别。本科学的是Intel，GCC跟这本书默认使用AT&T。

P208，各寄存器的历史称谓和作用。

P209，立即数、寄存器与内存三种操作数的表示。

P212，对64位寄存器进行mov操作时，对低1字节、低2字节的操作不影响高位；对低4字节的操作会清零高位。

Recall that when performing a cast that involves both a size change and a
change of “signedness” in C, the operation should change the size first (Section 2.2.6).

Since the stack is contained in the same memory as the program code and
other forms of program data, programs can access arbitrary positions within the stack using the standard memory addressing methods.（可以随机访问的栈，可与STL对比）

In addition, LEA can be used to compactly describe common arithmetic operations.
在这里插入图片描述

用cl寄存器做移位操作数：
The higher-order bits are ignored. So, for example, when register %cl has hexadecimal value 0xFF, then instruction salb would shift by 7, while salw would shift by 15, sall would shift by 31, and salq would shift by 63.

使用补码表示有符号数的一个原因：
We see that most of the instructions shown in Figure 3.10 can be used for either unsigned or two’s-complement arithmetic. This is one of the features that makes two’s-complement arithmetic the preferred way to implement signed integer arithmetic.
有符号数和无符号数的差异：
They use different versions of right shifts, division and multiplication instructions, and different combinations of condition codes.

用rax和rdx拼接成oct word用于乘除法：
multiplying two 64-bit signed or unsigned integers can yield a product that requires 128 bits to represent.

条件跳转的实现依靠flag寄存器：
In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.
flag寄存器中不同位的组合可以表示是正是负、是0是非0、是大是小。
SET系列指令可以取出标志寄存器的内容，放进通用寄存器中。

jmp label (direct jump) 指令中的label在得到.o文件时会被翻译：
In generating the object-code file, the assembler determines the addresses of all labeled instructions and encodes the jump targets (the addresses of the destination instructions) as part of the jump instructions.
此外也可jmp reg或者jmp mem (indirect jmp)
Conditional jumps can only be direct.

汇编代码中rep ret的解释：
AMD recommends using the combination of rep followed by ret to avoid making the ret instruction the destination of a conditional jump instruction. According to AMD, their processors cannot properly predict the destination of a ret instruction when it is reached from a jump instruction. The rep instruction serves as a form of no-operation here, and so inserting it as the jump destination does not change behavior of the code, except to make it faster on AMD processors.
类似的古怪问题可能需要查Intel或者AMD的文档。

除了条件跳转，比如jne，还有条件赋值，比如cmovge（详细列表见P245）。都可以实现条件分支。在书中所举的例子中（条件分支随机）后者实际执行的效率更高，因为前者需要为分支预测错误付出较高的代价，而后者不需要预测分支。
The flow of control does not depend on data, and this makes it easier for the processor to keep its pipeline full. （P243）
在C层次上，只在if内写赋值语句即可编译出条件赋值语句而不是条件跳转语句。
编译器无法可靠决断用条件跳转还是条件赋值，因为编译器不知道条件分支的分布。如果分支的内容比较复杂，那条件赋值可能会更慢。
（我自己做实验时没发现条件赋值指令,但加上-O1之后出现了cmovg。-Og看来是默认的）

while会被转为do-while的形式。先单独判断一次条件表达式，决定是否跳转到done。其它与do-while相同。
在-O1下，while、do-while、for都会被转为guarded-do形式。

用switch代替一堆if-else:
They are particularly useful when dealing with tests where there can be a large number of possible outcomes. Not only do they make the C code more readable, but they also allow an efficient implementation using a data structure called a jump table.
The advantage of using a jump table over a long sequence of if-else statements is that the time taken to perform the switch is independent of the number of switch cases.
P262有一个很好的例子。

函数的内部实现：控制信息传递、数据传递、内存分配
As P calls Q, control and data information are added to the end of the stack.

many procedures have six or fewer arguments, and so all of their parameters can be passed in registers.

CALL instruction pushes an address A onto the stack and sets the PC to the beginning
of Q. The counterpart instruction ret pops an address A off the stack and sets the PC to A.

在这里插入图片描述
如果参数数量超过六个，则通过栈传参。需要注意的是，通过栈传参会8字节对齐。

caller调用callee的时候，caller的入参可能不会被马上用到，而rdi要给callee用，这时候就要先保存rdi；callee返回的结果可能不会马上用到，而rax要给下一个callee用，这时候就要保存rax。

在汇编语言中，二维数组的实现以第一个下标为高位，第二个下标为低位。

For data type T and integer constant N, consider a declaration of the form T A[N];
Let us denote the starting location as xA. The declaration has two effects. First, it allocates a contiguous region of L . N bytes in memory, where L is the size (in bytes) of data type T . Second, it introduces an identifier A that can be used as a pointer to the beginning of the array. The value of this pointer will be xA.

循环变量是下标变量时，循环变量可能会被优化掉，转而变成指针在while内更新。这样可以省掉很多乘法（index=Ni+j变成了ptr+=size或者ptr+=Nsize（此处是移位，因为size为2的幂））。

The struct data type constructor is the closest thing C provides to the objects of C++ and Java.The objects of C++ and Java are more elaborate than structures in C, in that they also associate
a set of methods with an object that can be invoked to perform computation.

对结构体成员变量的访问，在编译阶段就会被转换为结构体首地址加偏移量的形式。
The selection of the different fields of a structure is handled completely at compile time. The machine code contains no information about the field declarations or the names of the fields.

联合的优劣：
Unions can be useful in several contexts. However, they can also lead to nasty bugs, since they bypass the safety provided by the C type system. One application is when we know in advance that the use of two different fields in a data structure will be mutually exclusive. Then, declaring these two fields as part of a union rather than a structure will reduce the total space allocated.
联合情景下的强制类型转换P299，这时整型和浮点型转换前后的字节存储是相同的。

对齐问题：（K就是数据类型的大小）
在这里插入图片描述
关于指针：
Casting from one type of pointer to another changes its type but not its value. Pointers can also point to functions.The value of a function pointer is the address of the first instruction in the machine-code representation of the function.

DDD调试器，图形化GDB。还有之前讲的gef和pwntools。

ASLR：
Thus, even if many machines are running identical code, they would all be using different stack addresses. This is implemented by allocating a random amount of space between 0 and n bytes on the stack at the start of a program, for example, by using the allocation function alloca, which allocates space for a specified number of bytes on the stack.
If we set up a 256-byte nop sled, then the randomization over n = 223 can be cracked by enumerating 215 = 32,768 starting addresses, which is entirely feasible for a determined attacker. For the 64-bit case, trying to enumerate 224 = 16,777,216 is a bit more daunting. We can see that stack randomization and other aspects of ASLR can increase the effort required to successfully attack a system, and therefore greatly reduce the rate at which a virus or worm can spread,
but it cannot provide a complete safeguard.

Buffer canary：
Stack protection does a good job of preventing a buffer overflow attack from corrupting state stored on the program stack. It incurs only a small performance penalty, especially because gcc only inserts it when there is a local buffer of type char in the function.

不可执行栈（NX位）：
Some types of programs require the ability to dynamically generate and execute code. For example, “just-in-time” compilation techniques dynamically generate code for programs written in interpreted languages, such as Java, to improve execution performance. Whether or not the run-time system can restrict the executable code to just that part generated by the compiler in creating the original
program depends on the language and the operating system.

对于变长栈（中括号内是变量），需要使用rbp来帮忙寻找定长的local variable。（如果用rsp来寻址的话，偏移量会与中括号内的变量相关，而用rbp则可以确保偏移量为常数）
不是所有的函数都会使用rbp。如果要用，记住rbp是一个callee-saved寄存器。开头要保存，结束要恢复。这被称为convention（哪些是callee-saved、各寄存器的用途，都可算是convention）
在这里插入图片描述

指令层面对图像、视频、音频处理的优化：
single instruction, multiple data, or SIMD（P322）
media register被称为MM，扩展版本包括XMM、YMM，它们被用于存储浮点数。

当scalar跟vector相对时：
operations like y=a+r, where y and a are vectors, while r is a real scalar. It essentially adds the scalar r to every element of a.
当scalar跟compound相对时：
C++, on the other hand, as well as other higher-level languages, supports operations on user-defined types, which are by definition not scalar, or on other types that have no immediate support from hardware. （built_in类型的一般是scalar，自定义类型的一般是compound。）

the code optimization guidelines recommend that 32-bit memory data satisfy a 4-byte alignment and that 64-bit data satisfy an 8-byte alignment.

Up to eight floating-point arguments can be passed in XMM registers %xmm0–%xmm7. These registers are used in the order the arguments are listed. Additional floating-point arguments can be passed on the stack.

A function that returns a floating-point value does so in register %xmm0.

All XMM registers are caller saved. The callee may overwrite any of these registers without first saving it.

注意，下图中的十位数是十进制的。
在这里插入图片描述

浮点相关的指令助记符（如move类的、compare类的）都很长，遇到时可返回原书查阅。

浮点数之间比较大小时的四种情况（unordered是由于NaN）。在C代码中比较浮点数的大小应该也要写四种情况。
在这里插入图片描述

rookie19_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS:APP第三章知识总结（汇编语言、机器码、寄存器、编译器优化、函数底层实现、浮点指令）

高级语言相对汇编语言的优势：1.开发效率 2.出错概率 3.跨平台由于编译器优化的存在，高级语言在执行效率上的劣势并不大。Most of the time, it is much more productive and reliable to work at the higher level of abstraction provided by a high-level language. The type checking provided by a compiler helps detect ma
复制链接

扫一扫