无coredump文件时程序segfault问题定位

1.原理:

无coredump时可通过查看内核的segfault日志信息进行分析

2.涉及工具

addr2line
一般用法: addr2line -e yourSegfaultingProgram your_instruction_pointer(ip)
[root@docker-node1 sbin]# addr2line -h
用法:addr2line [选项] [地址]
将地址转换成文件名/行号对。
如果没有在命令行中给出地址,就从标准输入中读取它们
选项是:
  @<file>                     读取选项从 <file>
  -a --addresses              显示地址
  -b --target=<bfdname>       设置二进位文件格式
  -e --exe=<executable><name> 设置输入文件名称(默认为 a.out)
  -i --inlines                解开内联函数
  -j --section=<name>         读取相对于段的偏移而非地址
  -p --pretty-print           让输出对人类更可读
  -s --basenames              去除目录名
  -f --functions              显示函数名
  -C --demangle[=style]       解码函数名
  -h --help                   显示本帮助
objdump -S your-program > your-program.objdump.txt

3.错误信息获取

通过dmesg core或grep segfault /var/log/message查看,报错信息中包含程序名称或动态库名称

4.程序名称注意:

strip掉的程序名称或动态库无法查看函数表,可对应相同的文件名称分析

5.segfault格式说明:

  • address  (after the  at ) - the location in memory the code is trying to access (it's likely that  10  and  11  are offsets from a pointer we expect to be set to a valid value but which is instead pointing to  0 )
  • ip  - instruction pointer, ie. where the code which is trying to do this lives
  • sp  - stack pointer
  • error  - An error code for page faults; see below for what this means on x86.
/*
* Page fault error code bits:
*
* bit 0 == 0: no page found 1: protection fault
* bit 1 == 0: read access 1: write access
* bit 2 == 0: kernel-mode access 1: user-mode access
* bit 3 == 1: use of reserved bit detected
* bit 4 == 1: fault was an instruction fetch
*/
程序名/动态库[开始地址+虚拟内存大小]
举例:
segfault at 10 ip 00007f9bebcca90d sp 00007fffb62705f0 error 4 in libQtWebKit.so.4.5.2[ 7f9beb83a000+f6f000]
segfault动态库: libQtWebKit.so.4.5.2
IP : 指令地址 00007f9bebcca90d
sp: 栈地址00007fffb62705f0
函数在程序中地址( starting address): 7f9beb83a000 ,实际在函数地址需要使用IP-starting address
虚拟内存地址: f6f000
"[7fa44d2f8000+f6f000]" is starting address and size of virtual memory area where offending object  was mapped at the time of crash.

6.程序segfault:

addr2line -e yourSegfaultingProgram 00007f9bebcca90d // 00007f9bebcca90d替换为segfault中ip值

7.动态库:

addr2line -Cfi 计算出的偏移 -e 动态库名称
segfault at 10 ip 00007f9bebcca90d sp 00007fffb62705f0 error 4 in libQtWebKit.so.4.5.2[ 7f9beb83a000 +f6f000]
segfault at 10 ip 00007fa44d78890d sp 00007fff43f6b720 error 4 in libQtWebKit.so.4.5.2[7fa44d2f8000+f6f000]
segfault at 11 ip 00007f2b0022acee sp 00007fff368ea610 error 4 in libQtWebKit.so.4.5.2[7f2aff9f7000+f6f000]
segfault at 11 ip 00007f24b21adcee sp 00007fff7379ded0 error 4 in libQtWebKit.so.4.5.2[7f24b197a000+f6f000]
0x00007f9bebcca90d - 0x7f9beb83a000 = 0x49090D
addr2line -e /usr/lib64/qt45/lib/libQtWebKit.so.4.5.2 -fCi 0x49090D
"[7fa44d2f8000+f6f000]" is starting address and size of virtual memory area where offending object

8.举例说明:

The most straight forward way is to find it in the kernel log (/var/log/kern.log) or system log (/var/log/syslog). Its format is like:
Apr 27 18:17:55 prod-util-c01 kernel: [32427315.749998] your-program[39902]: segfault at fffffffffffffff3 ip 000000000073442c sp 00007fa141a8b460 error 5 in your-program[400000+1bc0000]
your hostname "prod-util-c01 kernel";
your program name "your-program";
the memory address the segfault tried to access "fffffffffffffff3";
the Instruction Pointer (ip) "000000000073442c" which is the assembly instruction address;
the Stack Pointer (sp) "00007fa141a8b460";
the error code "5": the error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in arch/*/mm/fault.c in the kernel source.
Note: if the segfault happened in a dynamic library (*.so), then you need to do "000000000073442c"-"400000" to find the internal ip address inside the library.

 Use objdump

objdump -S your-program > your-program.objdump.txt
which will generate a text file including your C++ code (if you compiled your program with "-g"), assembly code, and the memory address.
Find the IP address (000000000073442c) to locate the code which caused the segfault. Trace back the call stack to see which functions called the code.

9.被strip掉之后如何定位:

可使用相同源码编译后无strip的代码,进行定位
补充: debug选项 对ip(instruction pointor)无影响

10.内核代码实现:

fault.c: show_signal_msg
10.动态库segfault验证:
gcc -g -fPIC -c func.c
gcc -g -shared -fPIC -o libfunc.so func.o
gcc -g a.c -L. -lfunc -o lyy_shared
export LD_LIBRARY_PATH=.
不增加debug选项(-g) 对没有stripped的动态库,可以打出函数名,但无法显示行号
是否添加debug选项不影响 coredump在动态库中的偏移位置,即IP值-start address固定,可使用stripped的库运行、coredump之后增加debug选项重新编译、并定义
代码:
a.c:
#include <stdio.h>


int g_global = 0;
int g_test = 1;


extern int* g_pointer;
extern void func();


int main(int argc, char *argv[])
{
    printf("&g_global = %p\n", &g_global);
    printf("&g_test = %p\n", &g_test);
    printf("&g_pointer = %p\n", &g_pointer);
    printf("g_pointer = %p\n", g_pointer);
    printf("&func = %p\n", &func);
    printf("&main = %p\n", &main);
    
    func();


    return 0;
}

func.c

#include <stdio.h>

int* g_pointer;

void func()
{
    *g_pointer = (int)"D.T.Software";

    return;
}

12.参考:

  1. 没打开coredump,利用dmesg调试core的方法  没打开coredump,利用dmesg调试core的方法_tl_sunshine的博客-CSDN博客
      3.  Introduction to segmentation fault handling  https://www.slideshare.net/noobyahoo/introduction-to-segmentation-fault-handling-5563036
    
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值