gdb and coredump && valgrind分析

前言

首先我们在写代码的时候会出现coredump,为什么会出现coredump?因为我们程序访问了未分配的内存地址,当我们访问了未分配地址的内存内核会发送信号(如SIGSEGV)给我们的程序报告访问越界,然后coredump就产生,但是遇到coredump不要慌张,我们有非常好的办法对付他

coredump文件

coredump后会生成一个coredump文件记录我们在crash的那一刻,内存,寄存器,stack等等信息,我们可以通过gdb打开coredump文件,然后定位问题在那里。

首先coredump文件是默认不打开的,所以我们先看系统是否允许coredump文件生成,查看命令如下

root@zhr-workstation:~/test# ulimit -c
0

ulimit查看系统对于用户的限制,-c代表coredump,结果位0代表没有打开,所以我们要先打开这个限制,命令如下,在原命令后面加上一个ulimited表示开启或者说是不限制

root@zhr-workstation:~/test# ulimit -c unlimited

再查看是否开启coredump文件生成的限制,发现已经开启

root@zhr-workstation:~/test# ulimit -c 
unlimited
root@zhr-workstation:~/test# 

然后我们继续编译我们发生coredump的文件

root@zhr-workstation:~/test# ./search.o 
Segmentation fault (core dumped)

再ls -al一下发现本目录下还是没有coredump文件…那么我们怎么办,我们知道coredump文件以pid作为文件名字的一部分,所以我们要知道产生coredump进程的pid,然后搜索他,所以我们在产生coredump的代码中加上这一句

#include <unistd>

printf("pid is %d\n"getpid());

最后发现pid为212206,最后我们直接用find命令找到core文件

root@zhr-workstation:~/test# find / -name *.212206.*
find: ‘/run/user/1000/gvfs’: Permission denied
find: ‘/run/user/126/gvfs’: Permission denied
/var/lib/apport/coredump/core._root_test_search_o.0.7304495b-f7bd-4c87-a009-f5c63b165ceb.212206.223266434

我们还可以通过看apport的日志来确定coredump文件的名字apport是Ubuntu’s crash reporting system,coredump就是通过这个apport系统生成的,所以我们看这个日志除了看coredump文件的类型还可以看到我们程序因为收到什么信号发生的coredump,如下

root@zhr-workstation:~/test# tail  /var/log/apport.log

使用gdb调试coredump文件

我们开始用gdb调试coredump文件,这里注意编译的时候需要加上-g选项(gcc)
然后我们开始打开GDB调试,首先我们GDB的打开方式是下面的格式分成三部分,首先是gdb,其次是可执行的二进制文件,最后是coredump文件

gdb binary_file core_file
root@zhr-workstation:~/test# gdb search.o core._root_test_search_o.0.7304495b-f7bd-4c87-a009-f5c63b165ceb.212237.223362044 
GNU gdb (Ubuntu 11.1-0ubuntu2) 11.1
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from search.o...
[New LWP 212237]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./search.o'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055f27615227b in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=<error reading variable: Cannot access memory at address 0x7fffca3cfffc>) at search.c:16
16      int binary_search(int* data, int key,  int low, int hight){
(gdb) 

由上图可以看到直接跳转到出现问题的地方
我们先在gdb允许bt直接打印出出问题的栈

(gdb) bt
#0  0x000055f27615227b in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=<error reading variable: Cannot access memory at address 0x7fffca3cfffc>) at search.c:16
#1  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#2  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#3  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#4  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#5  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#6  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#7  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#8  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#9  0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#10 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#11 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#12 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20
#13 0x000055f2761522c2 in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=0) at search.c:20

backtrace就是定位引起crash(这里是segment fault)的那个代码所在函数的栈以及之前用的链路栈信息,假如引起crash的代码在c()这个函数中,且main()调用了a()函数,a()函数调用了b()函数,b()函数调用了c()函数,c()函数使用了使程序crash的代码,那么backtrace就会打印从main函数开始一直到调用c()函数的栈的信息(main()->a()->b()->c())

因为这里是一个迭代,所以bt后栈太多我就不一一列举了,我们直接看最后一个栈信息(#0)
我们用frame 0跳转到第一个出问题的栈帧上,info local看变量没问题把,最后发现问题出在访问hight变量报错,

(gdb) frame 0
#0  0x000055f27615227b in binary_search (data=0x7fffcabcd590, key=2, low=0, hight=<error reading variable: Cannot access memory at address 0x7fffca3cfffc>) at search.c:16
16      int binary_search(int* data, int key,  int low, int hight){
(gdb) info local
mid = 0
index = 0
(gdb) p key
$1 = 2
(gdb) p low
$2 = 0
(gdb) p hight
Cannot access memory at address 0x7fffca3cfffc
(gdb) 

知道了大概的问题我们开始对代码进行排查我们直接list看出问题地方的代码

(gdb) list
11              int result = binary_search(data, 2, 0, sizeof(data)/4);
12
13              printf("result is %d\n", result);
14      }
15
16      int binary_search(int* data, int key,  int low, int hight){
17              int mid = (low + hight) / 2;
18              int index;
19              if( data[mid] > key ){
20                      index = binary_search(data, key, low, mid);
(gdb) 
21              }else if( data[mid] < key ){
22                      index = binary_search(data, key, mid, hight);
23              }else if( data[mid] == key ){
24                      return mid;
25              }
26              return index;
27
28      }

使用valgrind调试内存泄漏

内存泄漏在C/C++中可以是最难调试的部分,C++还好说可以有智能指针做管理,但是C语言就没有办法了,只能小心的使用指针,检查每一处该free掉的地方,假如进程内存泄漏的越来越多,linux会触发OOM,机器直接挂掉

首先用户进程向内核讨要一连串内存空间(当然是虚拟地址),内核不会立马就给,只有当进程真正的写入这个地址的时候,会触发缺页(page_fault)通知内核(虚拟地址到物理地址转换通过tlb–>page walk等等又是另外的故事),当内核给用户进程分配页的时候,分配的页是dirty的(页被分给了其他的进程,且其他的进程做了写磁盘的操作,但是为了提升性能不会立马写入磁盘而是暂存在page中),那么os会将dirty的数据刷入磁盘中,然后将这个page分配给刚刚需要的进程,如果实再没有page可以分配就OOM告诉进程让其停止运行,因为内存不够了

先写出我们造成memory leak的程序

#include <iostream>

class memory_leak{
public:
    memory_leak()
    {
        std::cout << "construct memory leak class" << std::endl;
        init();
    }
    ~memory_leak(){
        std::cout << "destruct memory leak class without delete" << std::endl;
    }

    void init();
private:
    int* leak_data;
};

void memory_leak::init(){
    leak_data = new int[10]; //memory leak happen here
}

int main(){
    auto ml = new memory_leak();
    delete ml; //will memory leak
    return 0;
}

上述的程序会在ml class对象析构的时候发生内存泄漏,因为构造的时候在heap上new的内存(new int[10])没有delete掉,上述代码发生内存泄漏我们可以一眼看出问题所在,但是在代码几千上万行的时候就非常难了,所以我们要借助valgrand工具,首先我们正常运行

root@zhr-workstation:~/test/gdb# g++ memory_leak.cpp -g 
root@zhr-workstation:~/test/gdb# 
root@zhr-workstation:~/test/gdb# ./a.out 
construct memory leak class
destruct memory leak class without delete

上述代码直接运行没有报错没有不妥,真正的情况只有等内存被泄漏没了才会报错(OOM),此时我们用valgrand工具扫描一下看看是否有问题

root@zhr-workstation:~/test/gdb# valgrind ./a.out 
==17759== Memcheck, a memory error detector
==17759== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17759== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==17759== Command: ./a.out
==17759== 
construct memory leak class
destruct memory leak class without delete
==17759== 
==17759== HEAP SUMMARY:
==17759==     in use at exit: 40 bytes in 1 blocks
==17759==   total heap usage: 4 allocs, 3 frees, 73,776 bytes allocated
==17759== 
==17759== LEAK SUMMARY:
==17759==    definitely lost: 40 bytes in 1 blocks
==17759==    indirectly lost: 0 bytes in 0 blocks
==17759==      possibly lost: 0 bytes in 0 blocks
==17759==    still reachable: 0 bytes in 0 blocks
==17759==         suppressed: 0 bytes in 0 blocks
==17759== Rerun with --leak-check=full to see details of leaked memory
==17759== 
==17759== For lists of detected and suppressed errors, rerun with: -s
==17759== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

valgrand给出的结果是堆空间溢出了40字节,我们想看是那个地方引起的,加上--leak-check=full参数

root@zhr-workstation:~/test/gdb# valgrind --leak-check=full ./a.out 
==17860== Memcheck, a memory error detector
==17860== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17860== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==17860== Command: ./a.out
==17860== 
construct memory leak class
destruct memory leak class without delete
==17860== 
==17860== HEAP SUMMARY:
==17860==     in use at exit: 40 bytes in 1 blocks
==17860==   total heap usage: 4 allocs, 3 frees, 73,776 bytes allocated
==17860== 
==17860== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
==17860==    at 0x484A2F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==17860==    by 0x109243: memory_leak::init() (memory_leak.cpp:20)
==17860==    by 0x10937C: memory_leak::memory_leak() (memory_leak.cpp:8)
==17860==    by 0x109274: main (memory_leak.cpp:24)
==17860== 
==17860== LEAK SUMMARY:
==17860==    definitely lost: 40 bytes in 1 blocks
==17860==    indirectly lost: 0 bytes in 0 blocks
==17860==      possibly lost: 0 bytes in 0 blocks
==17860==    still reachable: 0 bytes in 0 blocks
==17860==         suppressed: 0 bytes in 0 blocks
==17860== 
==17860== For lists of detected and suppressed errors, rerun with: -s
==17860== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

看到main()—>memory_leak::memory_leak()—>memory_leak::init()—>operator new[]引起的内存泄漏,这个像gdb的backtrace一样打印出出问题的栈调用链信息,17860代表进程id
然后我们更改代码

#include <iostream>

class memory_leak{
public:
    memory_leak()
    {
        std::cout << "construct memory leak class" << std::endl;
        init();
    }
    ~memory_leak(){
        std::cout << "destruct memory leak class without delete" << std::endl;
        delete leak_data;
    }

    void init();
private:
    int* leak_data;
};

void memory_leak::init(){
    leak_data = new int[10]; //memory leak happen here
}

int main(){
    auto ml = new memory_leak();
    delete ml; //will memory leak
    return 0;
}

再使用valgrand进行detect,发现所有的内存都被free了

root@zhr-workstation:~/test/gdb# valgrind   ./a.out 
==18226== Memcheck, a memory error detector
==18226== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==18226== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==18226== Command: ./a.out
==18226== 
construct memory leak class
destruct memory leak class without delete
==18226== Mismatched free() / delete / delete []
==18226==    at 0x484BB6F: operator delete(void*, unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==18226==    by 0x1093D3: memory_leak::~memory_leak() (memory_leak.cpp:12)
==18226==    by 0x109289: main (memory_leak.cpp:26)
==18226==  Address 0x4ddf110 is 0 bytes inside a block of size 40 alloc'd
==18226==    at 0x484A2F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==18226==    by 0x109243: memory_leak::init() (memory_leak.cpp:21)
==18226==    by 0x10937C: memory_leak::memory_leak() (memory_leak.cpp:8)
==18226==    by 0x109274: main (memory_leak.cpp:25)
==18226== 
==18226== 
==18226== HEAP SUMMARY:
==18226==     in use at exit: 0 bytes in 0 blocks
==18226==   total heap usage: 4 allocs, 4 frees, 73,776 bytes allocated
==18226== 
==18226== All heap blocks were freed -- no leaks are possible
==18226== 
==18226== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==18226== 
==18226== 1 errors in context 1 of 1:
==18226== Mismatched free() / delete / delete []
==18226==    at 0x484BB6F: operator delete(void*, unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==18226==    by 0x1093D3: memory_leak::~memory_leak() (memory_leak.cpp:12)
==18226==    by 0x109289: main (memory_leak.cpp:26)
==18226==  Address 0x4ddf110 is 0 bytes inside a block of size 40 alloc'd
==18226==    at 0x484A2F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==18226==    by 0x109243: memory_leak::init() (memory_leak.cpp:21)
==18226==    by 0x10937C: memory_leak::memory_leak() (memory_leak.cpp:8)
==18226==    by 0x109274: main (memory_leak.cpp:25)
==18226== 
  • 4
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值