上节主要讲述了关于段错误的相关概念,本文主要总结段错误调试的方法。
1. 调试方法总结
1)printf
printf是程序员最常用的调试方法,能够定位与解决大部分的段错误信息。可采用二分法放置printf语句,逐步定位段错误发生的位置。这里就不举例子。
2)使用gdb
为了使用gdb调试程序,编译程序时需加上-g参数。
程序源码如下:
//test.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int result = 0;
int *ptr = NULL;
*ptr = 8;
return 0;
}
很明显,上述代码第10行为错误行
编译并运行
xzkj@xzkj:~/temp$ gcc -g -o test test.c
xzkj@xzkj:~/temp$ ./test
段错误 (核心已转储)
xzkj@xzkj:~/temp$ gdb test
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test...done.
(gdb) run
Starting program: /home/xzkj/temp/test
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400504 in main () at test.c:10
10 *ptr = 8;
(gdb)
如上图所示(蓝色部分),因为程序比较简单,采用gdb一下就定位到段错误的发生位置。
即使错误位置在共享库里,gdb也能定位到
测试代码如下:
//add_sub.c
#include <stdlib.h>
#include <stdlib.h>
#include <string.h>
int add(int a, int b)
{
int *ptr = NULL;
*ptr = 8;
return (a+b);
}
int sub(int a, int b)
{
return (a-b);
}
//main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern int add(int a, int b);
extern int sub(int a, int b);
int main()
{
int a = 10,b = 5;
int result = 0;
// int *ptr = NULL;
// *ptr = 8;
result = add(a,b);
printf("a+b = %d\n",result);
return 0;
}
编译并运行
xzkj@xzkj:~/temp$ gcc -shared -fPIC add_sub.c -o libadd_sub.so
xzkj@xzkj:~/temp$ gcc -g main.c -o test -L. -ladd_sub
xzkj@xzkj:~/temp$ ./test
段错误 (核心已转储)
使用gdb调试
xzkj@xzkj:~/temp$ gdb test
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from test...done.
(gdb) run
Starting program: /home/xzkj/temp/test
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bd86ab in add () from ./libadd_sub.so
(gdb)
上图蓝色部分显示,段错误发生在库add_sub.c里,但具体哪一行,没有确定。
3) 使用dmesg与addr2line命令
还是使用上面 test.c 的代码,运行后,使用dmesg查看错误信息
1[ 148.314968] test[2326]: segfault at 0 ip 0000000000400512 sp 00007ffd33294680 error 6 in test[400000+1000]
可以看到指令寄存器(ip)的值为 0000000000400512 ,使用addr2line将ip所指的地址转换为源码行号
1xzkj@xzkj:~/temp$ addr2line -e test 0000000000400512
2/home/xzkj/temp/main.c:13
根据实验结果可以看出,能够精确定位到错误代码行
如果错误行发生在动态库里,使用addr2line命令依然能够定位到错误代码行,只是需要将dmesg输出中的ip地址值减去行最后的地址值
以上述main.c中的例子
1xzkj@xzkj:~/temp$ gcc -O3 -g -o libadd_sub.so -shared -fPIC add_sub.c
2xzkj@xzkj:~/temp$ gcc -O3 -g -o test main.c -L. -ladd_sub
3xzkj@xzkj:~/temp$ ./test
1xzkj@xzkj:~/temp$ dmesg
2...
3[ 2083.947773] test[2747]: segfault at 0 ip 00007f72f37ac6a0 sp 00007ffd9482a938 error 6 in libadd_sub.so[7f72f37ac000+1000]
addr2line的参数 = 7f72f37ac6a0 - 7f72f37ac000 = 6a0
1xzkj@xzkj:~/temp$ addr2line -e libadd_sub.so 6ab
2/home/xzkj/temp/add_sub.c:9
4) backtrace
方法 3)中需要先调用dmesg命令获取程序crash时的信息,然后再调用addr2line命令定位错误代码行,其实可以使用backtrace函数及信号处理机制,在程序crash时,直接打印调用堆栈
1//main.c
2#include <stdio.h>
3#include <stdlib.h>
4#include <string.h>
5#include <signal.h>
6#include <execinfo.h>
7
8void sig_handle(int sig)
9{
10 void *func[256];
11 int size = backtrace(func, 256);
12 backtrace_symbols_fd(func, size, 2);
13 exit(1);
14}
15
16int main()
17{
18 int result = 0;
19 signal(SIGSEGV,sig_handle);
20 int *ptr = NULL;
21 *ptr = 8;
22 return 0;
23}
1xzkj@xzkj:~/temp$ gcc -g -o test main.c
2xzkj@xzkj:~/temp$ ./test
3./test[0x400642]
4/lib/x86_64-linux-gnu/libc.so.6(+0x36cb0)[0x7fbc9caa8cb0]
5./test[0x400698]
6/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fbc9ca93f45]
7./test[0x400559]
8xzkj@xzkj:~/temp$ addr2line -e test 400698
9/home/xzkj/temp/main.c:20
5)利用core文件
还是以 2)中的test.c为例
当产生段错误时,将产生core文件,使用gdb命令调试
1xzkj@xzkj:~/temp$ gdb test core
2GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
3Copyright (C) 2014 Free Software Foundation, Inc.
4License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
5This is free software: you are free to change and redistribute it.
6There is NO WARRANTY, to the extent permitted by law. Type "show copying"
7and "show warranty" for details.
8This GDB was configured as "x86_64-linux-gnu".
9Type "show configuration" for configuration details.
10For bug reporting instructions, please see:
11<http://www.gnu.org/software/gdb/bugs/>.
12Find the GDB manual and other documentation resources online at:
13<http://www.gnu.org/software/gdb/documentation/>.
14For help, type "help".
15Type "apropos word" to search for commands related to "word"...
16Reading symbols from test...done.
17[New LWP 3032]
18Core was generated by `./test'.
19Program terminated with signal SIGSEGV, Segmentation fault.
20#0 0x0000000000400504 in main () at test.c:10
2110 *ptr = 8;
22(gdb)
6) 使用catchsegv命令
1xzkj@xzkj:~/temp$ catchsegv ./test
2*** Segmentation fault
3Register dump:
4
5 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
6 RDX: 00007ffd8470f5e8 RSI: 00007ffd8470f5d8 RDI: 0000000000000001
7 RBP: 00007ffd8470f4f0 R8 : 00007f2e513e7e80 R9 : 00007f2e51602700
8 R10: 00007ffd8470f380 R11: 00007f2e51045e50 R12: 0000000000400400
9 R13: 00007ffd8470f5d0 R14: 0000000000000000 R15: 0000000000000000
10 RSP: 00007ffd8470f4f0
11
12 RIP: 0000000000400504 EFLAGS: 00010246
13
14 CS: 0033 FS: 0000 GS: 0000
15
16 Trap: 0000000e Error: 00000006 OldMask: 00000000 CR2: 00000000
17
18 FPUCW: 0000037f FPUSW: 00000000 TAG: 00000000
19 RIP: 00000000 RDP: 00000000
20
21 ST(0) 0000 0000000000000000 ST(1) 0000 0000000000000000
22 ST(2) 0000 0000000000000000 ST(3) 0000 0000000000000000
23 ST(4) 0000 0000000000000000 ST(5) 0000 0000000000000000
24 ST(6) 0000 0000000000000000 ST(7) 0000 0000000000000000
25 mxcsr: 1f80
26 XMM0: 00000000000000000000000000000000 XMM1: 00000000000000000000000000000000
27 XMM2: 00000000000000000000000000000000 XMM3: 00000000000000000000000000000000
28 XMM4: 00000000000000000000000000000000 XMM5: 00000000000000000000000000000000
29 XMM6: 00000000000000000000000000000000 XMM7: 00000000000000000000000000000000
30 XMM8: 00000000000000000000000000000000 XMM9: 00000000000000000000000000000000
31 XMM10: 00000000000000000000000000000000 XMM11: 00000000000000000000000000000000
32 XMM12: 00000000000000000000000000000000 XMM13: 00000000000000000000000000000000
33 XMM14: 00000000000000000000000000000000 XMM15: 00000000000000000000000000000000
34
35Backtrace:
36/home/xzkj/temp/test.c:10(main)[0x400504]
37/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f2e51045f45]
38??:?(_start)[0x400429]
7)使用objdump反汇编
首先使用dmesg获取段错误信息
1[4568.134668] test[3214]: segfault at 0 ip 0000000000400504 sp 00007fff510babb0 error 6 in test[400000+1000]】
然后反汇编二进制可执行程序
1objdump -d ./test > testdump
最后在汇编里查找地址,然后根据汇编代码定位源代码位置
1xzkj@xzkj:~/temp$ cat testdump
2....
300000000004004ed <main>:
4 4004ed: 55 push %rbp
5 4004ee: 48 89 e5 mov %rsp,%rbp
6 4004f1: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
7 4004f8: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
8 4004ff: 00
9 400500: 48 8b 45 f8 mov -0x8(%rbp),%rax
10 400504: c7 00 08 00 00 00 movl $0x8,(%rax)
11 40050a: b8 00 00 00 00 mov $0x0,%eax
12 40050f: 5d pop %rbp
13 400510: c3 retq
14 400511: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
15 400518: 00 00 00
16 40051b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
17 ....
本文只是简单的对段错误调试方法做了总结,真正工程应用中,定位段错误要比这复杂很多,但方法还是和上述差不多。
更多文章,欢迎关注微信公众号~