x86_64 函数调用约定

Registers galore

x86 has just 8 general-purpose registers available (eax, ebx, ecx, edx, ebp, esp, esi, edi). x64 extended them to 64 bits (prefix "r" instead of "e") and added another 8 (r8, r9, r10, r11, r12, r13, r14, r15). Since some of x86's registers have special implicit meanings and aren't really used as general-purpose (most notably ebp and esp), the effective increase is even larger than it seems.

There's a reason I'm mentioning this in an article focused on stack frames. The relatively large amount of available registers influenced some important design decisions for the ABI, such as passing many arguments in registers, thus rendering the stack less useful than before [2].

Argument passing

I'm going to simplify the discussion here on purpose and focus on integer/pointer arguments [3]. According to the ABI, the first 6 integer or pointer arguments to a function are passed in registers. The first is placed in rdi, the second in rsi, the third in rdx, and then rcx, r8 and r9. Only the 7th argument and onwards are passed on the stack.

The stack frame

With the above in mind, let's see how the stack frame for this C function looks:

long myfunc(long a, long b, long c, long d,
            long e, long f, long g, long h)
{
    long xx = a * b * c * d * e * f * g * h;
    long yy = a + b + c + d + e + f + g + h;
    long zz = utilfunc(xx, yy, xx % yy);
    return zz + 20;
}

This is the stack frame:

So the first 6 arguments are passed via registers. But other than that, this doesn't look very different from what happens on x86 [4], except this strange "red zone". What is that all about?

The red zone

First I'll quote the formal definition from the AMD64 ABI:

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

Put simply, the red zone is an optimization. Code can assume that the 128 bytes below rsp will not be asynchronously clobbered by signals or interrupt handlers, and thus can use it for scratch data, without explicitly moving the stack pointer. The last sentence is where the optimization lays - decrementing rsp and restoring it are two instructions that can be saved when using the red zone for data.

However, keep in mind that the red zone will be clobbered by function calls, so it's usually most useful in leaf functions (functions that call no other functions).

Recall how myfunc in the code sample above calls another function named utilfunc. This was done on purpose, to make myfunc non-leaf and thus prevent the compiler from applying the red zone optimization. Looking at the code of utilfunc:

long utilfunc(long a, long b, long c)
{
    long xx = a + 2;
    long yy = b + 3;
    long zz = c + 4;
    long sum = xx + yy + zz;

    return xx * yy * zz + sum;
}

This is indeed a leaf function. Let's see how its stack frame looks when compiled with gcc:

Since utilfunc only has 3 arguments, calling it requires no stack usage since all the arguments fit into registers. In addition, since it's a leaf function, gcc chooses to use the red zone for all its local variables. Thus, esp needs not be decremented (and later restored) to allocate space for this data.

Preserving the base pointer

The base pointer rbp (and its predecessor ebp on x86), being a stable "anchor" to the beginning of the stack frame throughout the execution of a function, is very convenient for manual assembly coding and for debugging [5]. However, some time ago it was noticed that compiler-generated code doesn't really need it (the compiler can easily keep track of offsets from rsp), and the DWARF debugging format provides means (CFI) to access stack frames without the base pointer.

This is why some compilers started omitting the base pointer for aggressive optimizations, thus shortening the function prologue and epilogue, and providing an additional register for general-purpose use (which, recall, is quite useful on x86 with its limited set of GPRs).

gcc keeps the base pointer by default on x86, but allows the optimization with the -fomit-frame-pointer compilation flag. How recommended it is to use this flag is a debated issue - you may do some googling if this interests you.

Anyhow, one other "novelty" the AMD64 ABI introduced is making the base pointer explicitly optional, stating:

The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp (the stack pointer) to index into the stack frame. This technique saves two instructions in the prologue and epilogue and makes one additional general-purpose register (%rbp) available.

gcc adheres to this recommendation and by default omits the frame pointer on x64, when compiling with optimizations. It gives an option to preserve it by providing the -fno-omit-frame-pointer flag. For clarity's sake, the stack frames showed above were produced without omitting the frame pointer.

The Windows x64 ABI

Windows on x64 implements an ABI of its own, which is somewhat different from the AMD64 ABI. I will only discuss the Windows x64 ABI briefly, mentioning how its stack frame layout differs from AMD64. These are the main differences:

  1. Only 4 integer/pointer arguments are passed in registers (rcx, rdx, r8, r9).
  2. There is no concept of "red zone" whatsoever. In fact, the ABI explicitly states that the area beyond rsp is considered volatile and unsafe to use. The OS, debuggers or interrupt handlers may overwrite this area.
  3. Instead, a "register parameter area" [6] is provided by the caller in each stack frame. When a function is called, the last thing allocated on the stack before the return address is space for at least 4 registers (8 bytes each). This area is available for the callee's use without explicitly allocating it. It's useful for variable argument functions as well as for debugging (providing known locations for parameters, while registers may be reused for other purposes). Although the area was originally conceived for spilling the 4 arguments passed in registers, these days the compiler uses it for other optimization purposes as well (for example, if the function needs less than 32 bytes of stack space for its local variables, this area may be used without touching rsp).

Another important change that was made in the Windows x64 ABI is the cleanup of calling conventions. No more cdecl/stdcall/fastcall/thiscall/register/safecall madness - just a single "x64 calling convention". Cheers to that!

For more information on this and other aspects of the Windows x64 ABI, here are some good links:


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值