Linux内核函数调用规范(function call convention)

最新推荐文章于 2022-04-21 14:08:25 发布

papaofdoudou

最新推荐文章于 2022-04-21 14:08:25 发布

阅读量858

点赞数 7

分类专栏：嵌入式系统

原文链接：http://blog.bytemem.com/post/linux-kernel-function-call-convention

版权

嵌入式系统专栏收录该内容

355 篇文章 54 订阅

订阅专栏

对于C语言，编译器定义了多种不同的函数调用规范，而对于每个规范，不通的体系架构的具体实现又不同。GCC支持很多种调用规范，常用的有cdecl、fastcall、thiscall、stdcall、interrupt。C语言的默认规范是cdecl，这也是内核代码所用的规范，当然内核中的中断和系统调用ABI另有定义。

GCC的x86平台cdecl规范详

cdecl属于Caller clean-up类规范。在调用子程序（callee）时，x87浮点寄存器ST0-ST7必须是空的，在退出子程序时，ST1-ST7必须是空的，如果没有浮点返回值，ST0也必须是空的。从gcc 4.5开始，函数栈的地址必是16-byte对齐的，在此之前只要求4-byte对齐。

寄存器现场的保存：

x86-32: 寄存器EAX, ECX, EDX由调用者自己保存（caller-saved），子程序可以改变这些寄存器的值而不用恢复，其他寄存器是callee-saved。
x86-64: 寄存器RBX, RBP, 和 R12–R15 由子程序保存和恢复，其他寄存器由调用者自己保存。

函数返回值：

x86-32: 如果是整数存放在EAX寄存器, 如果是浮点数存放在x87协处理器的ST0寄存器。
x86-64: 64位返回值存放在RAX寄存器，128为返回值保存在RAX和RDX寄存器。浮点返回值保存在XMM0和XMM1寄存器。

其函数参数传递方式在x86-32和x86-64上是不同的：

x86-32: 所有函数参数都通过函数栈传递，并且参数入栈顺序是Right-to-Left，即最后一个参数先入栈，第一个参数最后入栈。
o x86-64: 由于AMD64架构提供了更多的可用寄存器，编译器充分利用寄存器来传递参数。函数的前六个整数参数依次用寄存器RDI, RSI, RDX, RCX, R8, R9 (R10 is used as a static chain pointer in case of nested functions)传递，比如只有一个参数时，用RDI传递参数；如果参数是浮点数，则依次用寄存器XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7传递。额外的参数仍然通过函数栈传递。对于可变参数的函数，实际浮点类型的参数的个数保存在RAX寄存器

cdecl实例

在x86-32平台上，cdecl规范如下：

int callee(int, int, int);

int caller(void)
{
    return callee(1, 2, 3) + 5;
}

Copy

对应的汇编代码：

caller:
        ; make new call frame (some compilers may produce an 'enter' instruction instead)
        push    ebp       ; save old call frame
        mov     ebp, esp  ; initialize new call frame
        ; push call arguments, in reverse (some compilers may subtract the required space from the
        ; stack pointer, then write each argument directly, see below. The 'enter' instruction can also do something similar)
        ; sub esp, 12 ; 'enter' instruction could do this for us
        ; mov [ebp-12], 3 ; or mov [esp+8], 3
        ; mov [ebp-8], 2  ; or mov [esp+4], 2
        ; mov [ebp-4], 1  ; or mov [esp], 1
        push    3
        push    2
        push    1
        call    callee    ; call subroutine 'callee'
        add     eax, 5    ; modify subroutine result (eax is the return value for our function as well as the callee,
                          ; so we don't have to move it into a local variable)
        ; restore old call frame (some compilers may produce a 'leave' instruction instead)
        ; add   esp, 12   ; remove arguments from frame, ebp - esp = 12.
                          ; compilers will usually produce the following instead, which is just as fast,
                          ; and, unlike the add instruction, also works for variable length arguments
                          ; and variable length arrays allocated on the stack.
        mov     esp, ebp  ; most calling conventions dictate ebp be callee-saved,
                          ; i.e. it's preserved after calling the callee.
                          ; it therefore still points to the start of our stack frame.
                          ; we do need to make sure callee doesn't modify (or restores) ebp, though,
                          ; so we need to make sure it uses a calling convention which does this
        pop     ebp       ; restore old call frame
        ret               ; return

Copy

注意点

1. 由于编译器的auto-inline优化，有些静态函数会被优化掉，其代码会被直接放到调用者中，所以函数调用没有了。

2. 即便有些函数没有被inline，但是GCC可能对这个函数的调用优化成一条跳转(jmp)指令，而不是callq指令(-foptimize-sibling-calls)。此时，虽然编译器保留了这个函数的符号，即其对应的代码有一个确定的起始地址，但实际上其和inline函数已经没区别了，没有子程序的结构，只是作为父函数的一部分。以下是my_main()调用子函数xxx()的汇编代码，xxx()通过__attribute__((noinline))禁止inline：

int __attribute__((noinline)) xxx(int a)
{
    return a+3;
}

int my_main(void)
{
    return xxx(3);
}

Copy

汇编代码：

0000000000000000 <xxx.constprop.0>:
	   0:   b8 06 00 00 00          mov    $0x6,%eax
	   5:   c3                      retq
	   6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
	   d:   00 00 00

0000000000000010 <xxx>:
	  10:   8d 47 03                lea    0x3(%rdi),%eax
	  13:   c3                      retq
	  14:   66 90                   xchg   %ax,%ax
	  16:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
	  1d:   00 00 00

0000000000000020 <my_main>:
	  20:   eb de                   jmp    0 <xxx.constprop.0>

Copy

可以看到，对xxx()的调用变成了jmp指令，xxx()中的retq指令直接导致my_main()函数返回，即这段代码实际上已经作为my_main（）的一部分。注意，这里xxx()函数编译出了两套汇编代码:xxx和xxx.constprop.0，其中xxx.constprop.0是编译器给链接器优化常量变量用的（constant propagation and merging）。

3. 并不是每个函数都需要push ebp等寄存器，有几种情况：

1）函数内没有局部变量或者全被优化掉了；（如果没有启用优化，GCC总是按标准形式调用函数）

2）函数的前一部分代码没有用到局部变量，此时保存ebp等操作会被推迟到真正需要用到局部变量的地方。

3) 有时，我们会看到每个函数以一个奇怪的callq开始并调用的是自己的下一条指令，后面才是正常的函数汇编代码。这是GCC的profile(-pg)功能插入的代码，这个callq实际上最终链接后调用的函数是mcount或__fentry__。（注： GCC在编译阶段，所有调用外部函数的代码，都会编译成一个调用自己后面一条指令地址的callq指令（即callq指令的offset字段为0），并在obj文件中记录这个外部调用；在链接阶段，GCC会修改这个callq指令，使其指向真正的外部函数地址。所以我们仅能在反汇编.o文件时能看到这种特殊的callq，而反汇编链接后的程序时是正常的callq指令。）

以下是objdump -d的结果片段，函数regulator_resolve_supply()调用子函数regulator_dev_lookup（），regulator_dev_lookup（）禁用inline。

0000000000003e50 <regulator_dev_lookup.isra.15>:
	    3e50:       e8 00 00 00 00          callq  3e55 <regulator_dev_lookup.isra.15+0x5>
	    3e55:       31 ff                   xor    %edi,%edi
	    3e57:       e9 00 00 00 00          jmpq   3e5c <regulator_dev_lookup.isra.15+0xc>
	    3e5c:       0f 1f 40 00             nopl   0x0(%rax)

00000000000056a0 <regulator_resolve_supply>:
	    56a0:       e8 00 00 00 00          callq  56a5 <regulator_resolve_supply+0x5>
	    56a5:       48 83 bf 58 06 00 00    cmpq   $0x0,0x658(%rdi)
	    56ac:       00
	    56ad:       74 0a                   je     56b9 <regulator_resolve_supply+0x19>
	    56af:       48 83 bf 50 06 00 00    cmpq   $0x0,0x650(%rdi)
	    ...
	    56c7:       4c 8b a7 88 01 00 00    mov    0x188(%rdi),%r12
	    56ce:       e8 7d e7 ff ff          callq  3e50 <regulator_dev_lookup.isra.15>
	    56d3:       48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
	    56d9:       48 89 c5                mov    %rax,%rbp

Copy

其中，后缀isra是编译器-fipa-sra优化加入的： Perform interprocedural scalar replacement of aggregates, removal of unused parameters and replacement of parameters passed by reference by parameters passed by value. Enabled at levels -O2, -O3 and -Os.

End.

papaofdoudou

关注

7
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Linux内核函数调用规范(function call convention)

对于C语言，编译器定义了多种不同的函数调用规范，而对于每个规范，不通的体系架构的具体实现又不同。GCC支持很多种调用规范，常用的有cdecl、fastcall、thiscall、stdcall、interrupt。C语言的默认规范是cdecl，这也是内核代码所用的规范，当然内核中的中断和系统调用ABI另有定义。GCC的x86平台cdecl规范详cdecl属于Caller clean-up类规范。在调用子程序（callee）时，x87浮点寄存器ST0-ST7必须是空的，在退出子程序时，ST1-ST7必
复制链接

扫一扫