gcc对if里面调用的函数_GCC可能会“保存”您一些递归函数调用:对函数调用堆栈长度示例的分析

gcc对if里面调用的函数

We know compilers like gcc can do lots smart optimization to make the program run faster. Regarding functions call optimization, gcc can do tail-call elimination to save the cost of allocating a new stack frame, and tail recursion elimination to turn a recursive function to non-recursive iterative one. gcc can even transform some recursive functions that are not tail-recursive into a tail-recursive one so that the compiler can then do tail recursion elimination. But what will happen if a function can not be optimized using tail recursion elimination because of some non-safe operations inside of the function body? In this post, we analyze one example.

我们知道像gcc这样的编译器可以做很多聪明的优化,以使程序运行更快 。 关于函数调用优化,gcc可以执行尾部调用消除以节省分配新堆栈帧的成本,而尾部消除可以将递归函数转换为非递归迭代函数。 gcc甚至可以将一些不是尾递归的递归函数转换为尾递归函数,以便编译器可以执行尾递归消除。 但是,如果由于函数体内的某些非安全操作而无法使用尾部递归消除来优化函数,将会发生什么? 在这篇文章中,我们分析一个例子。

我们分析的C程序 (The C program we analyze)

The prog.c program we analyze is as follows.

我们分析的prog.c程序如下。

#include <stdio.h>

void RecursiveFunction( int recursion_times )
{
  printf("stack: %p\n", (void*)&recursion_times);

  if (recursion_times <= 1) return;

  return RecursiveFunction( --recursion_times );
}

int main(int argc, char* args[])
{
  RecursiveFunction(4);
  return 0;
}

We pass the &recursion_times into another function which may change its value. C/C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations. The number of recursion_times variables are only known during run time. So tail recursion elimination should not be done simply here by the compiler although RecursiveFunction is tail recursive. So will the compiler just stop here and do nothing to optimize it? Let’s see by running it with different optimization levels.

我们将&recursion_times传递到另一个函数中,该函数可能会更改其值。 C / C ++要求每个变量(包括递归调用中相同变量的多个实例) 具有不同的位置recursion_times变量的数量仅在运行时知道。 因此,尽管RecursiveFunction是尾递归,编译器不应在此处简单地完成尾递归消除。 那么,编译器会停在这里并且不对其进行优化吗? 让我们以不同的优化级别运行它。

执行结果 (The execution results)

We use a script run.sh to try different cases and disassemble the executable binary files generated by gcc.

我们使用脚本run.sh尝试不同的情况,并反汇编gcc生成的可执行二进制 文件

#!/bin/bash

set -o errexit

for opt in 0 1 2 s 3 ; do
  echo "Begin -O$opt"
  gcc -fno-stack-protector -O$opt prog.c -o prog.$opt
  ./prog.$opt
  objdump -d ./prog.$opt > prog.$opt.as
  # gcc -fno-stack-protector -O$opt -Wa,-adhln -g prog.c > prog.$opt.list
  echo "End -O$opt"
done

The -fno-stack-protector option tells gcc not to generate stack protection code so that assembly code is cleaner to read.

-fno-stack-protector选项告诉gcc不要生成堆栈保护代码,以便汇编代码更易于阅读。

I tried it on Ubuntu 18.04 with gcc 7.5.0 (gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0) and the results are as follows.

我在Ubuntu 18.04上使用gcc 7.5.0( gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 )进行了gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 ,结果如下。

$ ./run.sh
Begin -O0
stack: 0x7fff3690668c
stack: 0x7fff3690666c
stack: 0x7fff3690664c
stack: 0x7fff3690662c
End -O0
Begin -O1
stack: 0x7ffde5213d4c
stack: 0x7ffde5213d2c
stack: 0x7ffde5213d0c
stack: 0x7ffde5213cec
End -O1
Begin -O2
stack: 0x7ffd6a09587c
stack: 0x7ffd6a09585c
stack: 0x7ffd6a09583c
stack: 0x7ffd6a09581c
End -O2
Begin -Os
stack: 0x7ffef5b6ce9c
stack: 0x7ffef5b6ce7c
stack: 0x7ffef5b6ce5c
stack: 0x7ffef5b6ce3c
End -Os
Begin -O3
stack: 0x7ffd414fb53c
stack: 0x7ffd414fb54c
stack: 0x7ffd414fb50c
stack: 0x7ffd414fb51c
End -O3

For optimization levels 0,1,2 and s, the results are consistent. For each function calls, the stack frame increases (address decreases) by 0x20 bytes. The interesting one is the result when gcc optimization level is 3. The stack frame decreases by 0x10 bytes and then increases by 0x40 bytes. Why does the stack frame wigwag? Let’s take a look at the assembly code for the RecursiveFunction.

对于0,1,2和s的优化级别,结果是一致的。 对于每个函数调用,堆栈帧都会增加(地址减少) 0x20字节。 有趣的是gcc优化级别为3时的结果。堆栈帧减少0x10字节,然后增加0x40字节。 为什么堆栈框架棚架? 让我们看一下RecursiveFunction的汇编代码。

优化级别0生成的代码 (The generated code at optimization level 0)

Let’s first take a look at the generated code in prog.0.as for optimization level 0. It is quite straightforward and the flow follows the C code’s order. I added annotations.

首先让我们看一下prog.0.as针对优化级别0生成的代码。这非常简单,流程遵循C代码的顺序。 我添加了注释。

000000000000064a <RecursiveFunction>:
                                # save old stack frame base in stack, use 0x8 bytes on stack
 64a:   55                      push   %rbp
                                # new stack frame base at old stack pointer 
 64b:   48 89 e5                mov    %rsp,%rbp
                                # allocate 0x10 for the new stack
 64e:   48 83 ec 10             sub    $0x10,%rsp
                                # store recursion_times into stack
 652:   89 7d fc                mov    %edi,-0x4(%rbp)
                                # get &recursion_times
 655:   48 8d 45 fc             lea    -0x4(%rbp),%rax
                                # call printf()
 659:   48 89 c6                mov    %rax,%rsi
 65c:   48 8d 3d d1 00 00 00    lea    0xd1(%rip),%rdi        # 734 <_IO_stdin_used+0x4>
 663:   b8 00 00 00 00          mov    $0x0,%eax
 668:   e8 b3 fe ff ff          callq  520 <printf@plt>
                                # get recursion_times to %eas
 66d:   8b 45 fc                mov    -0x4(%rbp),%eax
                                # if (resursion_times <= 1) return
 670:   83 f8 01                cmp    $0x1,%eax
 673:   7e 15                   jle    68a <RecursiveFunction+0x40>
                                # do --resursion_times
 675:   8b 45 fc                mov    -0x4(%rbp),%eax
 678:   83 e8 01                sub    $0x1,%eax
 67b:   89 45 fc                mov    %eax,-0x4(%rbp)
                                # Prepare arguments and call RecursiveFunction
 67e:   8b 45 fc                mov    -0x4(%rbp),%eax
 681:   89 c7                   mov    %eax,%edi
                                # callq will store the returning address on stack, using 0x8 bytes
 683:   e8 c2 ff ff ff          callq  64a <RecursiveFunction>
 688:   eb 01                   jmp    68b <RecursiveFunction+0x41>
 68a:   90                      nop
 68b:   c9                      leaveq 
 68c:   c3                      retq

The 0x20-byte stack frame consists of 0x8 bytes for storing old %rbp (at 64a), 0x10 bytes for this function’s usage (at 64e), and the 0x8 bytes used by callq at 683. There is no surprise.

0x20字节的堆栈帧由0x8字节(用于存储旧的%rbp (在64a),0x10字节(用于此功能的使用)(在64e)和callq在683处使用的0x8字节callq 。这并不奇怪。

优化级别3生成的代码 (The generated code at optimization level 3)

Now let’s look at the generated code by gcc under optimization level 3. Annotations are added too.

现在,让我们看看优化级别3下gcc生成的代码。还添加了注释。

0000000000000690 <RecursiveFunction>:
                                # allocate a stack frame of 0x28 bytes
 690:   48 83 ec 28             sub    $0x28,%rsp
 694:   48 8d 35 e9 00 00 00    lea    0xe9(%rip),%rsi        # 784 <_IO_stdin_used+0x4>
 69b:   31 c0                   xor    %eax,%eax
 69d:   48 8d 54 24 0c          lea    0xc(%rsp),%rdx
                                # store recursion_times into stack at 0xc(%rsp)
 6a2:   89 7c 24 0c             mov    %edi,0xc(%rsp)
 6a6:   bf 01 00 00 00          mov    $0x1,%edi
                                # call __printf_chk()
 6ab:   e8 90 fe ff ff          callq  540 <__printf_chk@plt>
                                # if recursion_times > 1, goto 6c0
                                # so, if recursion_times <= 1, return
 6b0:   8b 44 24 0c             mov    0xc(%rsp),%eax
 6b4:   83 f8 01                cmp    $0x1,%eax
 6b7:   7f 07                   jg     6c0 <RecursiveFunction+0x30>
 6b9:   48 83 c4 28             add    $0x28,%rsp
 6bd:   c3                      retq   
 6be:   66 90                   xchg   %ax,%ax
                                # --recursion_times
 6c0:   83 e8 01                sub    $0x1,%eax
                                # call __printf_chk()
 6c3:   48 8d 54 24 1c          lea    0x1c(%rsp),%rdx
 6c8:   48 8d 35 b5 00 00 00    lea    0xb5(%rip),%rsi        # 784 <_IO_stdin_used+0x4>
                                # store recursion_times to stack at 0xc(%rsp)
 6cf:   89 44 24 0c             mov    %eax,0xc(%rsp)
                                # store recursion_times to stack at 0x1c(%rsp)
 6d3:   89 44 24 1c             mov    %eax,0x1c(%rsp)
 6d7:   bf 01 00 00 00          mov    $0x1,%edi
 6dc:   31 c0                   xor    %eax,%eax
 6de:   e8 5d fe ff ff          callq  540 <__printf_chk@plt>
                                # if recursion_times <= 1, return
 6e3:   8b 7c 24 1c             mov    0x1c(%rsp),%edi
 6e7:   83 ff 01                cmp    $0x1,%edi
 6ea:   7e cd                   jle    6b9 <RecursiveFunction+0x29>
                                # -- recursion_times
 6ec:   83 ef 01                sub    $0x1,%edi
                                # store recursion_times into stack at 0x1c(%rsp)
 6ef:   89 7c 24 1c             mov    %edi,0x1c(%rsp)
                                # call RecursiveFunction
 6f3:   e8 98 ff ff ff          callq  690 <RecursiveFunction>
 6f8:   eb bf                   jmp    6b9 <RecursiveFunction+0x29>
 6fa:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

Before we get into the RecursiveFunction logic, let’s first check the optimizations applied here.

在进入RecursiveFunction逻辑之前,让我们首先检查此处应用的优化。

  • %rbp is not used and only %rsp is maintained. So the 2 instructions pushq %rbp and movq %rsp,%rbp in the code generated at level 0 are not needed.
  • 不使用%rbp ,仅保留%rsp 。 因此不需要在0级生成的代码中的2条指令pushq %rbpmovq %rsp,%rbp pushq %rbp movq %rsp,%rbp
  • __printf_chk() is directly called instead of __printf_chk()而不是printf() because the printf()因为printf() body only contains printf()主体仅包含return __printf_chk(... and the tail-call elimination takes effect here. You may use the return __printf_chk(... ,并且尾部调用消除在这里生效。您可以使用gcc -fno-stack-protector -O3 -Wa,-adhln -g prog.c (as a technique from gcc -fno-stack-protector -O3 -Wa,-adhln -g prog.c (作为Generating Mixed Source and Assembly List using GCC) to verify and see the mixed source code and assembly code.使用GCC生成混合源代码和程序集列表的技术 )来验证和查看混合源代码和程序集代码。

Now, let’s take a look at why the &recursion_times wigwags.

现在,让我们看一下为什么&recursion_times摆动。

In the body of the generated code of RecursiveFunction, the __printf_chk() are called twice and recursion_times is deducted by 1 for two times. So, 2 RecursiveFunction functions’ logic from the C code may be executed in one RecursiveFunction procedure call in the generated optimized code. So the RecursiveFunction is not the same any more! That is, one RecursiveFunction C function is inlined into itself. For the 2 recursive_times variables in one RecursiveFunction procedure call, they are stored in 0xc(%rsp) and 0x1c(%rsp) so their address differs by 0x10 bytes. The stack frame size for a RecursiveFunction procedure (the generated one) call is actually 0x28+0x8 = 0x30 (by sub at 690 and callq at 6f3).

在所生成的RecursiveFunction代码的主体中, __printf_chk()被调用两次,而recursion_times被1减去两次。 因此,可以在生成的优化代码中的一个RecursiveFunction过程调用中执行来自C代码的2个RecursiveFunction函数的逻辑。 所以RecursiveFunction不再一样了! 即,一个RecursiveFunction C函数被内联到其自身中。 对于一个RecursiveFunction过程调用中的2个recursive_times变量,它们存储在0xc(%rsp)0x1c(%rsp)因此它们的地址相差0x10字节。 一个RecursiveFunction过程(生成的)调用的堆栈帧大小实际上是0x28 + 0x8 = 0x30(由690处的subcallq处的callq组成)。

原因是递归内联优化 (The reason is recursive inlining optimization)

Now it is clear the cause is the recursive inlining optimization by gcc. gcc also has options to control the optimization behavior. From gcc manual:

现在很清楚,原因是gcc进行了递归内联优化 。 gcc还具有控制优化行为的选项。 从gcc手册

max-inline-insns-recursive
max-inline-insns-recursive-auto

    Specifies the maximum number of instructions an out-of-line copy of a
    self-recursive inline function can grow into by performing recursive
    inlining.

    --param max-inline-insns-recursive applies to functions declared inline.
    For functions not declared inline, recursive inlining happens only when
    -finline-functions (included in -O3) is enabled; --param
    max-inline-insns-recursive-auto applies instead.  The default value is 450.

max-inline-recursive-depth
max-inline-recursive-depth-auto

    Specifies the maximum recursion depth used for recursive inlining.

    --param max-inline-recursive-depth applies to functions declared inline.
    For functions not declared inline, recursive inlining happens only when
    -finline-functions (included in -O3) is enabled; --param
    max-inline-recursive-depth-auto applies instead.  The default value is 8.

min-inline-recursive-probability

    Recursive inlining is profitable only for function having deep recursion in
    average and can hurt for function having little recursion depth by
    increasing the prologue size or complexity of function body to other
    optimizers.

    When profile feedback is available (see -fprofile-generate) the actual
    recursion depth can be guessed from the probability that function recurses
    via a given call expression.  This parameter limits inlining only to call
    expressions whose probability exceeds the given threshold (in percents).
    The default value is 10.

进一步尝试 (Further tries)

Now, let’s use the parameters to tune gcc’s optimization parameters and see what’s the result. Let’s set min-inline-recursive-probability to 5. To make the results clearer, I changed the initial recursion_times to 10.

现在,让我们使用这些参数来调整gcc的优化参数并查看结果。 让我们将min-inline-recursive-probability为5。为使结果更清晰,我将初始recursion_times更改为10。

$ gcc -fno-stack-protector -O3 --param min-inline-recursive-probability=5 prog.c -o prog.3
$ ./prog.3 
stack: 0x7ffdfa315a7c
stack: 0x7ffdfa315a88
stack: 0x7ffdfa315a8c
stack: 0x7ffdfa315a4c
stack: 0x7ffdfa315a58
stack: 0x7ffdfa315a5c
stack: 0x7ffdfa315a1c
stack: 0x7ffdfa315a28
stack: 0x7ffdfa315a2c
stack: 0x7ffdfa3159ec

Now 3 RecursiveFunctions are merged together. We can verify this by checking assembly code which calls __printf_chk() 3 times.

现在,3个RecursiveFunction合并在一起。 我们可以通过检查3次调用__printf_chk()汇编代码来验证这一点。

0000000000000690 <RecursiveFunction>:
 690:   48 83 ec 28             sub    $0x28,%rsp
 694:   48 8d 35 19 01 00 00    lea    0x119(%rip),%rsi        # 7b4 <_IO_stdin_used+0x4>
 69b:   31 c0                   xor    %eax,%eax
 69d:   48 8d 54 24 0c          lea    0xc(%rsp),%rdx
 6a2:   89 7c 24 0c             mov    %edi,0xc(%rsp)
 6a6:   bf 01 00 00 00          mov    $0x1,%edi
 6ab:   e8 90 fe ff ff          callq  540 <__printf_chk@plt>
 6b0:   8b 44 24 0c             mov    0xc(%rsp),%eax
 6b4:   83 f8 01                cmp    $0x1,%eax
 6b7:   7f 07                   jg     6c0 <RecursiveFunction+0x30>
 6b9:   48 83 c4 28             add    $0x28,%rsp
 6bd:   c3                      retq   
 6be:   66 90                   xchg   %ax,%ax
 6c0:   83 e8 01                sub    $0x1,%eax
 6c3:   48 8d 54 24 18          lea    0x18(%rsp),%rdx
 6c8:   48 8d 35 e5 00 00 00    lea    0xe5(%rip),%rsi        # 7b4 <_IO_stdin_used+0x4>
 6cf:   89 44 24 0c             mov    %eax,0xc(%rsp)
 6d3:   89 44 24 18             mov    %eax,0x18(%rsp)
 6d7:   bf 01 00 00 00          mov    $0x1,%edi
 6dc:   31 c0                   xor    %eax,%eax
 6de:   e8 5d fe ff ff          callq  540 <__printf_chk@plt>
 6e3:   8b 44 24 18             mov    0x18(%rsp),%eax
 6e7:   83 f8 01                cmp    $0x1,%eax
 6ea:   7e cd                   jle    6b9 <RecursiveFunction+0x29>
 6ec:   83 e8 01                sub    $0x1,%eax
 6ef:   48 8d 54 24 1c          lea    0x1c(%rsp),%rdx
 6f4:   48 8d 35 b9 00 00 00    lea    0xb9(%rip),%rsi        # 7b4 <_IO_stdin_used+0x4>
 6fb:   89 44 24 18             mov    %eax,0x18(%rsp)
 6ff:   89 44 24 1c             mov    %eax,0x1c(%rsp)
 703:   bf 01 00 00 00          mov    $0x1,%edi
 708:   31 c0                   xor    %eax,%eax
 70a:   e8 31 fe ff ff          callq  540 <__printf_chk@plt>
 70f:   8b 7c 24 1c             mov    0x1c(%rsp),%edi
 713:   83 ff 01                cmp    $0x1,%edi
 716:   7e a1                   jle    6b9 <RecursiveFunction+0x29>
 718:   83 ef 01                sub    $0x1,%edi
 71b:   89 7c 24 1c             mov    %edi,0x1c(%rsp)
 71f:   e8 6c ff ff ff          callq  690 <RecursiveFunction>
 724:   eb 93                   jmp    6b9 <RecursiveFunction+0x29>
 726:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 72d:   00 00 00

It’s great fun, right? You may try more combinations and see how gcc generate different code.

很好玩吧? 您可以尝试更多组合,看看gcc如何生成不同的代码。

翻译自: https://www.systutorials.com/gcc-may-save-you-some-recursive-functions-calls-an-analysis-of-a-function-call-stack-length-example/

gcc对if里面调用的函数

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值