使用gdb判断Segmentation fault (core dump)是否为栈溢出导致

PerfMan

已于 2024-05-07 12:38:41 修改

阅读量821

点赞数 23

分类专栏：调试技巧文章标签： c++ 开发语言 c语言 linux

于 2024-05-07 12:38:13 首次发布

本文链接：https://blog.csdn.net/2401_84703565/article/details/138522251

版权

调试技巧专栏收录该内容

6 篇文章 0 订阅

订阅专栏

当程序访问了非法的内存地址后会发生段错误（segment fault），并会产生coredump文件。这通常不是我们所期望的结果，需要对产生的coredump文件进行深入的分析并排查原因。本文我们将学习当发生coredump时，如何快速的排查是否是由于栈溢出导致的。

示例代码

#include <iostream>

void test()
{
    char tmp[512];     // 申请512字节的栈空间
    test();            // 递归调用自己
}
int main()
{
    test();
    return 0;
}

操作系统：linux
编译命令：g++ test.cpp -o test -g -O0
代码分析：在示例代码中，定义了一个test函数，其中在栈空间上申请了512字节的char数组，为了造成栈溢出，接下来递归调用test函数，会不停的申请栈空间，最后导致栈溢出，发生段错误。

运行程序

首先，设置ulimit参数，以保证正常生成coredump文件。

[root@VM-8-2-centos gdb_stack_overflow]# ulimit -c unlimited

查看设置是否成功

[root@VM-8-2-centos gdb_stack_overflow]# ulimit -c
unlimited

运行程序

[root@VM-8-2-centos gdb_stack_overflow]# ./test
Segmentation fault (core dumped)

coredump文件一般会生成在当前目录下或/var/lib/systemd/coredump/目录下，将core文件放到跟可执行程序相同的目录中，如果是压缩的格式，则将coredump文件解压。

调试coredump文件

命令：gdb 可执行文件名 coredump文件名

[root@VM-8-2-centos gdb_stack_overflow]# gdb test core.test.0.b04e6db77aa04f9fbeb741681dae9e12.3265122.1715049796000000
......忽略一些gdb版本信息

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test...done.
[New LWP 3265122]
Core was generated by `./test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  test () at test.cpp:6
6           test();
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-251.el8.x86_64 libgcc-8.5.0-21.el8.x86_64 libstdc++-8.5.0-21.el8.x86_64
(gdb)

使用bt命令查看程序调用栈信息，命令：bt

(gdb) bt
#0  test () at test.cpp:6
#1  0x00000000004006b6 in test () at test.cpp:6
#2  0x00000000004006b6 in test () at test.cpp:6
#3  0x00000000004006b6 in test () at test.cpp:6
#4  0x00000000004006b6 in test () at test.cpp:6
#5  0x00000000004006b6 in test () at test.cpp:6
......
#15867 0x00000000004006b6 in test () at test.cpp:6
#15868 0x00000000004006b6 in test () at test.cpp:6
#15869 0x00000000004006b6 in test () at test.cpp:6
#15870 0x00000000004006b6 in test () at test.cpp:6
#15871 0x00000000004006b6 in test () at test.cpp:6
#15872 0x00000000004006b6 in test () at test.cpp:6
#15873 0x00000000004006b6 in test () at test.cpp:6
#15874 0x00000000004006c2 in main () at test.cpp:11
(gdb)

通过查看调用栈信息，证明了正如示例程序设计那样，程序一直递归调用test函数直到栈溢出段错误，调用了超过1.5万次。

判断栈溢出

coredump文件中记录了所有的函数调用栈的栈帧信息，通过将最后一帧和第0帧的栈指针寄存器rsp的值相减就能计算出在程序崩溃的时刻使用的栈空间的总量。
命令：（1）f 栈帧号
切换到该栈帧号对应的栈帧
（2）p $rsp
打印栈指针寄存器rsp中存储的值，也就是当前栈帧的栈顶的地址

#15868 0x00000000004006b6 in test () at test.cpp:6
#15869 0x00000000004006b6 in test () at test.cpp:6
#15870 0x00000000004006b6 in test () at test.cpp:6
#15871 0x00000000004006b6 in test () at test.cpp:6
#15872 0x00000000004006b6 in test () at test.cpp:6
#15873 0x00000000004006b6 in test () at test.cpp:6
#15874 0x00000000004006c2 in main () at test.cpp:11
(gdb) 
(gdb) f 15874
#15874 0x00000000004006c2 in main () at test.cpp:11
11          test();
(gdb) p $rsp
$1 = (void *) 0x7ffffdeca3f0
(gdb) f 0
#0  test () at test.cpp:6
6           test();
(gdb) p $rsp
$2 = (void *) 0x7ffffd6cbfd0
(gdb) p 0x7ffffdeca3f0 - 0x7ffffd6cbfd0
$3 = 8381472
(gdb)

从本例中，最后一帧（main函数对应的栈帧，15874号）和第0帧的栈指针寄存器rsp的值相减等于8381472，单位为字节，约为8MB。
接下来，查看一下当前环境的单个线程的最大栈大小，命令：ulimit -s

[root@VM-8-2-centos gdb_stack_overflow]# ulimit -s
8192

可见，当前环境单个线程的最大栈大小为8192，单位是KB，等于8MB。而coredump文件中显示栈空间使用了约8MB，这时就基本可以判定是由于栈溢出导致程序出现段错误。如果有测试环境，可以将栈空间的最大值改大，假如改成16MB，命令：ulimit -s 16384，再启动测试程序，测试一下是否运行正常。