gcc对if里面调用的函数
We know compilers like gcc can do lots smart optimization to make the program run faster. Regarding functions call optimization, gcc can do tail-call elimination to save the cost of allocating a new stack frame, and tail recursion elimination to turn a recursive function to non-recursive iterative one. gcc can even transform some recursive functions that are not tail-recursive into a tail-recursive one so that the compiler can then do tail recursion elimination. But what will happen if a function can not be optimized using tail recursion elimination because of some non-safe operations inside of the function body? In this post, we analyze one example.
我们知道像gcc这样的编译器可以做很多聪明的优化,以使程序运行更快 。 关于函数调用优化,gcc可以执行尾部调用消除以节省分配新堆栈帧的成本,而尾部消除可以将递归函数转换为非递归迭代函数。 gcc甚至可以将一些不是尾递归的递归函数转换为尾递归函数,以便编译器可以执行尾递归消除。 但是,如果由于函数体内的某些非安全操作而无法使用尾部递归消除来优化函数,将会发生什么? 在这篇文章中,我们分析一个例子。
我们分析的C程序 (The C program we analyze)
The prog.c
program we analyze is as follows.
我们分析的prog.c
程序如下。
#include <stdio.h>
void RecursiveFunction( int recursion_times )
{
printf("stack: %p\n", (void*)&recursion_times);
if (recursion_times <= 1) return;
return RecursiveFunction( --recursion_times );
}
int main(int argc, char* args[])
{
RecursiveFunction(4);
return 0;
}
We pass the &recursion_times
into another function which may change its value. C/C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations. The number of recursion_times
variables are only known during run time. So tail recursion elimination should not be done simply here by the compiler although RecursiveFunction
is tail recursive. So will the compiler just stop here and do nothing to optimize it? Let’s see by running it with different optimization levels.
我们将&recursion_times
传递到另一个函数中,该函数可能会更改其值。 C / C ++要求每个变量(包括递归调用中相同变量的多个实例) 具有不同的位置 。 recursion_times
变量的数量仅在运行时知道。 因此,尽管RecursiveFunction
是尾递归,编译器不应在此处简单地完成尾递归消除。 那么,编译器会停在这里并且不对其进行优化吗? 让我们以不同的优化级别运行它。
执行结果 (The execution results)
We use a script run.sh
to try different cases and disassemble the executable binary files generated by gcc.
我们使用脚本run.sh
尝试不同的情况,并反汇编gcc生成的可执行二进制 文件 。
#!/bin/bash
set -o errexit
for opt in 0 1 2 s 3 ; do
echo "Begin -O$opt"
gcc -fno-stack-protector -O$opt prog.c -o prog.$opt
./prog.$opt
objdump -d ./prog.$opt > prog.$opt.as
# gcc -fno-stack-protector -O$opt -Wa,-adhln -g prog.c > prog.$opt.list
echo "End -O$opt"
done
The -fno-stack-protector
option tells gcc not to generate stack protection code so that assembly code is cleaner to read.
-fno-stack-protector
选项告诉gcc不要生成堆栈保护代码,以便汇编代码更易于阅读。
I tried it on Ubuntu 18.04 with gcc 7.5.0 (gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
) and the results are as follows.
我在Ubuntu 18.04上使用gcc 7.5.0( gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
)进行了gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
,结果如下。
$ ./run.sh
Begin -O0
stack: 0x7fff3690668c
stack: 0x7fff3690666c
stack: 0x7fff3690664c
stack: 0x7fff3690662c
End -O0
Begin -O1
stack: 0x7ffde5213d4c
stack: 0x7ffde5213d2c
stack: 0x7ffde5213d0c
stack: 0x7ffde5213cec
End -O1
Begin -O2
stack: 0x7ffd6a09587c
stack: 0x7ffd6a09585c
stack: 0x7ffd6a09583c
stack: 0x7ffd6a09581c
End -O2
Begin -Os
stack: 0x7ffef5b6ce9c
stack: 0x7ffef5b6ce7c
stack: 0x7ffef5b6ce5c
stack: 0x7ffef5b6ce3c
End -Os
Begin -O3
stack: 0x7ffd414fb53c
stack: 0x7ffd414fb54c
stack: 0x7ffd414fb50c
stack: 0x7ffd414fb51c
End -O3
For optimization levels 0,1,2 and s, the results are consistent. For each function calls, the stack frame increases (address decreases) by 0x20
bytes. The interesting one is the result when gcc optimization level is 3. The stack frame decreases by 0x10
bytes and then increases by 0x40
bytes. Why does the stack frame wigwag? Let’s take a look at the assembly code for the RecursiveFunction
.
对于0,1,2和s的优化级别,结果是一致的。 对于每个函数调用,堆栈帧都会增加(地址减少) 0x20
字节。 有趣的是gcc优化级别为3时的结果。堆栈帧减少0x10
字节,然后增加0x40
字节。 为什么堆栈框架棚架? 让我们看一下RecursiveFunction
的汇编代码。
优化级别0生成的代码 (The generated code at optimization level 0)
Let’s first take a look at the generated code in prog.0.as
for optimization level 0. It is quite straightforward and the flow follows the C code’s order. I added annotations.
首先让我们看一下prog.0.as
针对优化级别0生成的代码。这非常简单,流程遵循C代码的顺序。 我添加了注释。
000000000000064a <RecursiveFunction>:
# save old stack frame base in stack, use 0x8 bytes on stack
64a: 55 push %rbp
# new stack frame base at old stack pointer
64b: 48 89 e5 mov %rsp,%rbp
# allocate 0x10 for the new stack
64e: 48 83 ec 10 sub $0x10,%rsp
# store recursion_times into stack
652: 89 7d fc mov %edi,-0x4(%rbp)
# get &recursion_times
655: 48 8d 45 fc lea -0x4(%rbp),%rax
# call printf()
659: 48 89 c6 mov %rax,%rsi
65c: 48 8d 3d d1 00 00 00 lea 0xd1(%rip),%rdi # 734 <_IO_stdin_used+0x4>
663: b8 00 00 00 00 mov $0x0,%eax
668: e8 b3 fe ff ff callq 520 <printf@plt>
# get recursion_times to %eas
66d: 8b 45 fc mov -0x4(%rbp),%eax
# if (resursion_times <= 1) return
670: 83 f8 01 cmp $0x1,%eax
673: 7e 15 jle 68a <RecursiveFunction+0x40>
# do --resursion_times
675: 8b 45 fc mov -0x4(%rbp),%eax
678: 83 e8 01 sub $0x1,%eax
67b: 89 45 fc mov %eax,-0x4(%rbp)
# Prepare arguments and call RecursiveFunction
67e: 8b 45 fc mov -0x4(%rbp),%eax
681: 89 c7 mov %eax,%edi
# callq will store the returning address on stack, using 0x8 bytes
683: e8 c2 ff ff ff callq 64a <RecursiveFunction>
688: eb 01 jmp 68b <RecursiveFunction+0x41>
68a: 90 nop
68b: c9 leaveq
68c: c3 retq
The 0x20
-byte stack frame consists of 0x8 bytes for storing old %rbp
(at 64a), 0x10 bytes for this function’s usage (at 64e), and the 0x8 bytes used by callq
at 683. There is no surprise.
0x20
字节的堆栈帧由0x8字节(用于存储旧的%rbp
(在64a),0x10字节(用于此功能的使用)(在64e)和callq
在683处使用的0x8字节callq
。这并不奇怪。
优化级别3生成的代码 (The generated code at optimization level 3)
Now let’s look at the generated code by gcc under optimization level 3. Annotations are added too.
现在,让我们看看优化级别3下gcc生成的代码。还添加了注释。
0000000000000690 <RecursiveFunction>:
# allocate a stack frame of 0x28 bytes
690: 48 83 ec 28 sub $0x28,%rsp
694: 48 8d 35 e9 00 00 00 lea 0xe9(%rip),%rsi # 784 <_IO_stdin_used+0x4>
69b: 31 c0 xor %eax,%eax
69d: 48 8d 54 24 0c lea 0xc(%rsp),%rdx
# store recursion_times into stack at 0xc(%rsp)
6a2: 89 7c 24 0c mov %edi,0xc(%rsp)
6a6: bf 01 00 00 00 mov $0x1,%edi
# call __printf_chk()
6ab: e8 90 fe ff ff callq 540 <__printf_chk@plt>
# if recursion_times > 1, goto 6c0
# so, if recursion_times <= 1, return
6b0: 8b 44 24 0c mov 0xc(%rsp),%eax
6b4: 83 f8 01 cmp $0x1,%eax
6b7: 7f 07 jg 6c0 <RecursiveFunction+0x30>
6b9: 48 83 c4 28 add $0x28,%rsp
6bd: c3 retq
6be: 66 90 xchg %ax,%ax
# --recursion_times
6c0: 83 e8 01 sub $0x1,%eax
# call __printf_chk()
6c3: 48 8d 54 24 1c lea 0x1c(%rsp),%rdx
6c8: 48 8d 35 b5 00 00 00 lea 0xb5(%rip),%rsi # 784 <_IO_stdin_used+0x4>
# store recursion_times to stack at 0xc(%rsp)
6cf: 89 44 24 0c mov %eax,0xc(%rsp)
# store recursion_times to stack at 0x1c(%rsp)
6d3: 89 44 24 1c mov %eax,0x1c(%rsp)
6d7: bf 01 00 00 00 mov $0x1,%edi
6dc: 31 c0 xor %eax,%eax
6de: e8 5d fe ff ff callq 540 <__printf_chk@plt>
# if recursion_times <= 1, return
6e3: 8b 7c 24 1c mov 0x1c(%rsp),%edi
6e7: 83 ff 01 cmp $0x1,%edi
6ea: 7e cd jle 6b9 <RecursiveFunction+0x29>
# -- recursion_times
6ec: 83 ef 01 sub $0x1,%edi
# store recursion_times into stack at 0x1c(%rsp)
6ef: 89 7c 24 1c mov %edi,0x1c(%rsp)
# call RecursiveFunction
6f3: e8 98 ff ff ff callq 690 <RecursiveFunction>
6f8: eb bf jmp 6b9 <RecursiveFunction+0x29>
6fa: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
Before we get into the RecursiveFunction logic, let’s first check the optimizations applied here.
在进入RecursiveFunction逻辑之前,让我们首先检查此处应用的优化。
%rbp
is not used and only%rsp
is maintained. So the 2 instructionspushq %rbp
andmovq %rsp,%rbp
in the code generated at level 0 are not needed.- 不使用
%rbp
,仅保留%rsp
。 因此不需要在0级生成的代码中的2条指令pushq %rbp
和movq %rsp,%rbp
pushq %rbp
movq %rsp,%rbp
。 __printf_chk()
is directly called instead of__printf_chk()
而不是printf()
because theprintf()
因为printf()
body only containsprintf()
主体仅包含return __printf_chk(...
and the tail-call elimination takes effect here. You may use thereturn __printf_chk(...
,并且尾部调用消除在这里生效。您可以使用gcc -fno-stack-protector -O3 -Wa,-adhln -g prog.c
(as a technique fromgcc -fno-stack-protector -O3 -Wa,-adhln -g prog.c
(作为Generating Mixed Source and Assembly List using GCC) to verify and see the mixed source code and assembly code.使用GCC生成混合源代码和程序集列表的技术 )来验证和查看混合源代码和程序集代码。
Now, let’s take a look at why the &recursion_times
wigwags.
现在,让我们看一下为什么&recursion_times
摆动。
In the body of the generated code of RecursiveFunction
, the __printf_chk()
are called twice and recursion_times
is deducted by 1 for two times. So, 2 RecursiveFunction
functions’ logic from the C code may be executed in one RecursiveFunction
procedure call in the generated optimized code. So the RecursiveFunction
is not the same any more! That is, one RecursiveFunction
C function is inlined into itself. For the 2 recursive_times
variables in one RecursiveFunction
procedure call, they are stored in 0xc(%rsp)
and 0x1c(%rsp)
so their address differs by 0x10 bytes. The stack frame size for a RecursiveFunction
procedure (the generated one) call is actually 0x28+0x8 = 0x30 (by sub
at 690 and callq
at 6f3).
在所生成的RecursiveFunction
代码的主体中, __printf_chk()
被调用两次,而recursion_times
被1减去两次。 因此,可以在生成的优化代码中的一个RecursiveFunction
过程调用中执行来自C代码的2个RecursiveFunction
函数的逻辑。 所以RecursiveFunction
不再一样了! 即,一个RecursiveFunction
C函数被内联到其自身中。 对于一个RecursiveFunction
过程调用中的2个recursive_times
变量,它们存储在0xc(%rsp)
和0x1c(%rsp)
因此它们的地址相差0x10字节。 一个RecursiveFunction
过程(生成的)调用的堆栈帧大小实际上是0x28 + 0x8 = 0x30(由690处的sub
和callq
处的callq组成)。
原因是递归内联优化 (The reason is recursive inlining optimization)
Now it is clear the cause is the recursive inlining optimization by gcc. gcc also has options to control the optimization behavior. From gcc
manual:
现在很清楚,原因是gcc进行了递归内联优化 。 gcc还具有控制优化行为的选项。 从gcc
手册 :
max-inline-insns-recursive
max-inline-insns-recursive-auto
Specifies the maximum number of instructions an out-of-line copy of a
self-recursive inline function can grow into by performing recursive
inlining.
--param max-inline-insns-recursive applies to functions declared inline.
For functions not declared inline, recursive inlining happens only when
-finline-functions (included in -O3) is enabled; --param
max-inline-insns-recursive-auto applies instead. The default value is 450.
max-inline-recursive-depth
max-inline-recursive-depth-auto
Specifies the maximum recursion depth used for recursive inlining.
--param max-inline-recursive-depth applies to functions declared inline.
For functions not declared inline, recursive inlining happens only when
-finline-functions (included in -O3) is enabled; --param
max-inline-recursive-depth-auto applies instead. The default value is 8.
min-inline-recursive-probability
Recursive inlining is profitable only for function having deep recursion in
average and can hurt for function having little recursion depth by
increasing the prologue size or complexity of function body to other
optimizers.
When profile feedback is available (see -fprofile-generate) the actual
recursion depth can be guessed from the probability that function recurses
via a given call expression. This parameter limits inlining only to call
expressions whose probability exceeds the given threshold (in percents).
The default value is 10.
进一步尝试 (Further tries)
Now, let’s use the parameters to tune gcc’s optimization parameters and see what’s the result. Let’s set min-inline-recursive-probability
to 5. To make the results clearer, I changed the initial recursion_times
to 10.
现在,让我们使用这些参数来调整gcc的优化参数并查看结果。 让我们将min-inline-recursive-probability
为5。为使结果更清晰,我将初始recursion_times
更改为10。
$ gcc -fno-stack-protector -O3 --param min-inline-recursive-probability=5 prog.c -o prog.3
$ ./prog.3
stack: 0x7ffdfa315a7c
stack: 0x7ffdfa315a88
stack: 0x7ffdfa315a8c
stack: 0x7ffdfa315a4c
stack: 0x7ffdfa315a58
stack: 0x7ffdfa315a5c
stack: 0x7ffdfa315a1c
stack: 0x7ffdfa315a28
stack: 0x7ffdfa315a2c
stack: 0x7ffdfa3159ec
Now 3 RecursiveFunction
s are merged together. We can verify this by checking assembly code which calls __printf_chk()
3 times.
现在,3个RecursiveFunction
合并在一起。 我们可以通过检查3次调用__printf_chk()
汇编代码来验证这一点。
0000000000000690 <RecursiveFunction>:
690: 48 83 ec 28 sub $0x28,%rsp
694: 48 8d 35 19 01 00 00 lea 0x119(%rip),%rsi # 7b4 <_IO_stdin_used+0x4>
69b: 31 c0 xor %eax,%eax
69d: 48 8d 54 24 0c lea 0xc(%rsp),%rdx
6a2: 89 7c 24 0c mov %edi,0xc(%rsp)
6a6: bf 01 00 00 00 mov $0x1,%edi
6ab: e8 90 fe ff ff callq 540 <__printf_chk@plt>
6b0: 8b 44 24 0c mov 0xc(%rsp),%eax
6b4: 83 f8 01 cmp $0x1,%eax
6b7: 7f 07 jg 6c0 <RecursiveFunction+0x30>
6b9: 48 83 c4 28 add $0x28,%rsp
6bd: c3 retq
6be: 66 90 xchg %ax,%ax
6c0: 83 e8 01 sub $0x1,%eax
6c3: 48 8d 54 24 18 lea 0x18(%rsp),%rdx
6c8: 48 8d 35 e5 00 00 00 lea 0xe5(%rip),%rsi # 7b4 <_IO_stdin_used+0x4>
6cf: 89 44 24 0c mov %eax,0xc(%rsp)
6d3: 89 44 24 18 mov %eax,0x18(%rsp)
6d7: bf 01 00 00 00 mov $0x1,%edi
6dc: 31 c0 xor %eax,%eax
6de: e8 5d fe ff ff callq 540 <__printf_chk@plt>
6e3: 8b 44 24 18 mov 0x18(%rsp),%eax
6e7: 83 f8 01 cmp $0x1,%eax
6ea: 7e cd jle 6b9 <RecursiveFunction+0x29>
6ec: 83 e8 01 sub $0x1,%eax
6ef: 48 8d 54 24 1c lea 0x1c(%rsp),%rdx
6f4: 48 8d 35 b9 00 00 00 lea 0xb9(%rip),%rsi # 7b4 <_IO_stdin_used+0x4>
6fb: 89 44 24 18 mov %eax,0x18(%rsp)
6ff: 89 44 24 1c mov %eax,0x1c(%rsp)
703: bf 01 00 00 00 mov $0x1,%edi
708: 31 c0 xor %eax,%eax
70a: e8 31 fe ff ff callq 540 <__printf_chk@plt>
70f: 8b 7c 24 1c mov 0x1c(%rsp),%edi
713: 83 ff 01 cmp $0x1,%edi
716: 7e a1 jle 6b9 <RecursiveFunction+0x29>
718: 83 ef 01 sub $0x1,%edi
71b: 89 7c 24 1c mov %edi,0x1c(%rsp)
71f: e8 6c ff ff ff callq 690 <RecursiveFunction>
724: eb 93 jmp 6b9 <RecursiveFunction+0x29>
726: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
72d: 00 00 00
It’s great fun, right? You may try more combinations and see how gcc generate different code.
很好玩吧? 您可以尝试更多组合,看看gcc如何生成不同的代码。
gcc对if里面调用的函数