3.7 Procedures（栈帧+子程序调用过程）

你回到了你的家

已于 2022-06-16 16:03:44 修改

阅读量1k

点赞数

分类专栏：计算机体系结构文章标签：程序执行

于 2020-12-02 16:26:15 首次发布

本文链接：https://blog.csdn.net/kking_edc/article/details/110482013

版权

计算机体系结构专栏收录该内容

26 篇文章 3 订阅

订阅专栏

Procedures are a key abstraction in software. They provide a way to package code that implements some functionality with 一组指定的参数以及一个可选的返回值. This function can then be invoked from different points in a program. Well-designed software uses procedures as an abstraction mechanism, hiding the detailed implementation of some action while providing a clear and concise interface definition of what values will be computed and what effects the procedure will have on the program state. Procedures 在不同的编程语言中有多种表现形式—functions, methods, subroutines, handlers, and so on—but they all share a general set of features.

There are many different attributes that must be handled when providing machine-level support for procedures. 以如下为例，假设procedure P 调用了 procedure Q, and Q then executes and returns back to P. These actions involve one or more of the following mechanisms:

Passing control：The program counter must be set to the starting address of the code for Q upon entry and then set to the instruction in P following the call to Q upon return.
Passing data：P must be able to provide one or more parameters to Q, and Q must be able to return a value back to P.
Allocating and deallocating memory：Q may need to allocate space for local variables when it begins and then free that storage before it returns.

The x86-64 implementation of procedures involves a combination of special instructions and a set of conventions on how to use the machine resources, such as the registers and the program memory. Great effort has been made to minimize the overhead involved in invoking a procedure. As a consequence, it follows what can be seen as a minimalist strategy, implementing only as much of the above set
of mechanisms as is required for each particular procedure. In our presentation, we build up the different mechanisms step by step, first describing control, then data passing, and, finally, memory management.

1 Run-Time Stack

C以及其他大多数语言中procedure调用机制的一个核心特征是利用栈作为内存管理discipline。仍以上面P调用Q为例，当Q正在执行时，P以及调用链条中其他的procedures被暂时挂起。此时只有Q需要为局部变量分配新的内存空间或者调用其他procedure。另一方面，当Q返回时，它所分配的任何局部存储空间都可以被释放。Therefore, a program can manage the storage required by its procedures using a stack, where the stack and the program registers store the information required for passing control and data, and for allocating memory. 当P调用Q时，control and data information被添加到栈中。这些信息当P返回时会被释放。

x86-64栈向低地址进行扩张并且栈指针%rsp指向栈顶元素。数据可以通过pushq以及popq指令来压入或弹出栈。Space for data with no specified initial value can be allocated on the stack by simply decrementing the stack pointer by an appropriate amount. 相似地，空间可以通过增加栈指针得到释放。

当x86-64 procedure所需的存储空间超出寄存器的极限时，它会在栈上分配空间。这个区域被称为procedure’s stack frame。图3.25展示了run-time stack的整体结构，包括its partitioning into stack frames, in its most general form.

在这里插入图片描述
当前执行的procedure的frame总是位于栈的顶部。当procedure P调用procedure Q时，它会将返回地址压入栈中，指明Q返回时应当返回到P的哪里（这里将返回地址划分为P的栈帧中的一部分，因为它保存了和P有关的状态）。Q中的代码通过扩展当前栈边界来分配其栈所需空间。在这个空间内，它可以存储寄存器的值，为局部变量分配空间，并且为它要调用的procedure设置参数。大多数procedure所需的栈帧是固定大小的，allocated at the beginning of the procedure. 但是有一些procedures需要variable-size frames. This issue is discussed in Section 3.10.5. Procedure P can pass up to six integral values (i.e., pointers and integers) on the stack, but if Q requires more arguments, these can be stored by P within its stack frame prior to the call.

In the interest of space and time efficiency, x86-64 procedures allocate only the portions of stack frames they require. 例如，许多的参数量小于6，此时所有参数都可以通过寄存器进行传递。因此，图3.25中栈帧的一部分可以被省略。实际上，许多函数甚至都不需要栈帧（函数的所有局部变量都可以存储在寄存器中并且函数不需要调用其他函数）。

2 Control Transfer

Passing control from function P to function Q involves simply setting the program counter (PC) to the starting address of the code for Q. 然而当Q返回时，the processor must have some record of the code location where it should resume the execution of P. This information is recorded in x86-64 machines by invoking procedure Q with the instruction call Q. 这个指令将一个地址A压入栈中并且将PC设置为Q的开始。被压入栈的地址A被称为返回地址，它的值为call指令后的下一个指令的地址。相应的返回指令ret将地址A从栈中弹出并且将PC设置为A。

call和ret指令的通用格式如下：
在这里插入图片描述
(These instructions are referred to as callq and retq in the disassembly outputs generated by the program objdump. The added suffix ‘q’ simply emphasizes that these are x86-64 versions of call and return instructions, not IA32. In x86-64 assembly code, both versions can be used interchangeably.)

The call instruction has a target indicating the address of the instruction where the called procedure starts. 和jumps指令类似，call指令可以是direct或indirect。在汇编代码中，the target of a direct call is given as a label, while the target of an indirect call is given by * followed by an operand specifier using one of the formats described in Figure 3.3.
在这里插入图片描述
下面是3.2.2中multstore以及main函数反汇编代码的节选：

图 3.26 阐述了这两个函数的call和ret指令的执行过程：

在这里插入图片描述
在上面的汇编代码中，main函数中位于地址0x400563处的call指令调用了函数multstore（如图3.26a所示，其中%rip是程序计数器，指向程序当前执行地址，%rsp是栈指针）。call指令的作用是将当前的返回地址0x400568压入栈中并且跳转到函数mulstore的第一个指令（位于地址0x0400540，图3.26b）。接下来执行函数 multstore 直到到达位于0x40054d的指令 ret 处。这个指令将0x400568从栈上弹出并且跳转到它所指向的地址，接下来继续执行main（图3.26c）。

下面是一个更具体的例子，图3.27a展示了top和leaf函数的反汇编代码，以及main函数中调用top处的反汇编代码。

在这里插入图片描述

Each instruction is identified by labels L1–L2 (in leaf), T1–T4 (in top), and M1–M2 in main. Part (b) of the figure shows a detailed trace of the code execution, in which main calls top(100), causing top to call leaf(95).

在这里插入图片描述

函数leaf返回top的值为97，接下来向main返回194。前三列描述了被执行的指令，包括指令label，地址以及指令类型。接下来的四列展示了指令执行前程序的状态，包括寄存器%rdi、%rax、%rsp的内容和栈顶的内容。

leaf函数中的指令L1将返回值%rax设定为97。指令L2是返回指令。它将 0x400054e 从栈中弹出。通过将PC设定为这个弹出值，control transfers back to instruction T3 of top.

指令T3将返回值%rax设定为194。接下来指令T4进行返回，它将0x4000560从栈中弹出，然后设定PC指向mian函数的M2。此时程序已经成功结束对top的调用并且返回到main。此时可以看出栈指针恢复到了0x7fffffffe820，即它调用top前的值。

（上面有两点没懂，为什么L2那里%rdi没值了，以及%rsp的值为什么是从820到818到810？？）
在这里插入图片描述

答案：

在这里插入图片描述

imul指令不太熟悉，还有就是和上面那个一个问题，这里的rdi怎么就一直有值了。

3 Data Transfer

procedure calls 可能涉及传递数据作为arguments，从一个procedure返回同样涉及到返回值。在x86-64架构下，most of these data passing to and from procedures take place via registers. 例如，之前的函数例子中已经大量使用了%rdi, %rsi等寄存器传递arguments，并将返回值存储在寄存器%rax中。当procedure P 调用 procedure Q时，P中的代码首先需要将arguments拷贝到合适的寄存器中。类似地，当Q返回到P时，P中的代码可以通过寄存器%rax来访问返回值。接下来进行更详细的介绍。

在x86-64架构下，最多可以通过寄存器传递六个integral (i.e., integer and pointer) arguments。The registers are used in a specified order, with the name used for a register depending on the size of the data type being passed. These are shown in Figure 3.28. Arguments are allocated to these registers according to their ordering in the argument list. Arguments smaller than 64 bits can be accessed using the appropriate subsection of the 64-bit register. For example, if the first argument is 32 bits, it can be accessed as %edi.

在这里插入图片描述
当一个函数有超过六个integral arguments时，其他参数通过栈进行传递。假设procedure P 调用 procedure Q 时传递了n个integral arguments（n > 6），那么 the code for P must allocate a stack frame with enough storage for arguments 7 through n, as illustrated in Figure 3.25. 它将arguments 1–6 复制进入合适的寄存器中，然后将arguments 7 through n 压入栈内，其中argument 7位于栈顶。通过栈传递参数时，all data sizes are rounded up to be multiples of eight. With the arguments in place, 程序接下来可以执行一个call指令来transfer control to procedure Q. Procedure Q可以通过寄存器以及栈来访问它的arguments。 If Q, in turn, calls some function that has more than six arguments, it can allocate space within its stack frame for these, as is illustrated by the area labeled “Argument build area” in Figure 3.25.

在这里插入图片描述

以图3.29a的C程序proc为例来解释argument passing：
在这里插入图片描述
这个函数有8个参数，包括了不同长度的整数（8字节、4字节、2字节和1字节），以及不同种类的指针（指针都是8字节）。

proc程序的汇编代码如图3.29b所示：
在这里插入图片描述
前6个参数通过寄存器进行传递。最后两个通过栈进行传递，如图3.30所示：

这个图展示了proc执行时栈的情况。我们可以看出返回地址作为procedure call的一部分被压入栈中。两个参数，相比于栈指针位于位置8和16。在上面的代码内，我们可以看出根据不同的操作数会使用不同的add指令：addq用于long，addl用于int，addw用于short，addb用于char。注意第6行的movl指令从内存中读取4个字节，接下来的addb指令仅仅使用了 low-order byte。

在这里插入图片描述

答案（这题没看）：

在这里插入图片描述

4 Local Storage on the Stack

Most of the procedure examples we have seen so far did not require any local storage beyond what could be held in registers. 但有时局部数据需要存储在内存中，常见情况包括：

没有足够的寄存器来存储全部的局部数据
局部变量被应用了地址操作符&，因此我们需要为它生成一个地址
Some of the local variables are arrays or structures and hence must be accessed by array or structure references. We will discuss this possibility when we describe how arrays and structures are allocated.

通常来说，一个procedure通过减小栈指针来在栈帧上分配空间This results in the portion of the stack frame labeled “Local variables” in Figure 3.25.
在这里插入图片描述
作为一个处理地址操作符的例子，考虑图3.31a所示的两个函数：

函数swap_add交换由指针xp和指针yp指向的两个值并且返回它们的和。函数调用者创建指向局部变量arg1和局部变量arg2的指针，然后将这两个指针传往swap_add。图3.31b展示了调用者如何通过栈帧进行实现。
在这里插入图片描述
代码的开始将栈指针减小16，在栈上分配了16个字节的空间。让S表示栈指针的值，我们可以看出&arg2表示为 S + 8 (第5行)，&arg1表示为S (第6行)。此时可以推断出局部变量arg1和arg2存储在相对于栈指针偏移量为0和8的栈帧内。当调用swap_add结束后，调用者从栈中检索这两个值（第8-9行）并计算他们的差，然后用 swap_add在%rax寄存器中返回的值乘以这个差（第10行）。最终函数通过将栈指针加16释放它的栈帧（第11行）。我们从这个例子可以看出运行时栈provides a simple mechanism for allocating local storage when it is required and deallocating it when the function completes.

As a more complex example, the function call_proc, shown in Figure 3.32, illustrates many aspects of the x86-64 stack discipline.

图 3.32
在这里插入图片描述
It shows a function that must allocate storage on the stack for local variables, as well as to pass values to the 8-argument function proc (Figure 3.29).

图 3.29
在这里插入图片描述
call_proc函数创建了如图3.33所示的栈帧。

图3.32b展示了call_proc函数的汇编代码：

我们可以看出代码中很大一部分(第12–15行) 都在为调用函数proc做准备。包括为局部变量以及函数参数设定栈帧，以及将函数参数载入寄存器。如图3.33所示，如图3.33所示，局部变量 x1–x4在栈上进行分配并且有着不同的大小。其占用的字节为：24–31 (x1), 20–23 (x2), 18–19 (x3), and 17 (s3). 指向这些地址的指针由leaq指令生成（第7、10、12、14行）参数7（值为4）以及参数8（指向x4所在的地址）存储在栈中相比栈指针偏移量为0和8处。

当调用proc时，程序将会开始执行3.29b所示的代码：
在这里插入图片描述
如图3.30所示

参数7和参数8现在位于相对栈指针偏移为8和16的内存地址，这是因为返回地址也被压入栈中。

当函数返回到call_pro时，代码会检索四个变量的值并执行计算（17-20行）。它通过将栈指针增加32来释放栈帧。

5 寄存器中的局部存储

The set of program registers acts as a single resource shared by all of the procedures. Although only one procedure can be active at a given time, we must make
sure that when one procedure (the caller) calls another (the callee), the callee does
not overwrite some register value that the caller planned to use later. For this reason, x86-64 adopts a uniform set of conventions for register usage that must be
respected by all procedures, including those in program libraries.

通常来说，寄存器%rbx、%rbp以及%r12–%r15被划分为 callee-saved registers. 当procedure P调用procedure Q时，Q必须保留这些寄存器的值，确保当Q返回P时这些寄存器的值和P调用Q时是一样的。Procedure Q可以不更改这些寄存器或将原始值压入栈中，修改后再从栈中恢复。The pushing of register values has the effect of creating the portion of the stack frame labeled “Saved registers” in Figure 3.25. With this convention, the code for P can safely store a
value in a callee-saved register (after saving the previous value on the stack, of course), call Q, and then use the value in the register without risk of it having been corrupted.

除了栈指针%rsp外的所有其他寄存器都被划分为caller-saved registers. This means that they can be modified by any function. The name “caller saved” can be understood in the context of a procedure P having some local data in such a register and calling procedure Q. Since Q is free to alter this register, it is incumbent upon P (the caller) to first save the data before it makes the call.

考虑如图3.34a所示的函数P：

在这里插入图片描述
它两次调用了Q。在第一次调用时，它必须保留x的值。类似地，在第二次调用时，它必须保存Q(y)计算得到的值。图3.34b展示了gcc生成的汇编代码：

在这里插入图片描述

从图3.34b中，我们可以看出gcc生成的代码使用了两个callee-saved 寄存器：%rbp用于保存x，%rbx用于保存Q(y)计算的返回值。在函数的开端，它将这两个寄存器的值保存到栈中（第2-3行）。接下来它在第一次调用Q前（第5行）将变量x拷贝到%rbp中。然后它在第二次调用Q前将这次调用的结果拷贝到%rbx中（第8行）。在函数末尾（第13-14行），它从栈中恢复了这两个callee-saved寄存器的值。

在这里插入图片描述

6 递归Procedure

待补充 281

你回到了你的家

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
3.7 Procedures（栈帧+子程序调用过程）

A procedure call involves passing both data (in the form of procedure parameters and return values) and control from one part of a program to another. In addition, it must allocate space for the local variables of the procedure on entry and deallocate them
复制链接

扫一扫