Linux:生成core的几种方式
1.总结
在某些情况下,进程会生成core文件(核心转储),记录进程状态,帮助我们快速定位异常。
例如:
- 当进程异常时如段错误退出,可以分析结果core,查看调用栈定位空指针处;
- 当进程执行某处代码阻塞时,可以强制生成core,查看调用栈定位阻塞原因;
- ……
以下几种方式可生成core:
- 代码不严谨异常退出,如最常见的段错误(Segmentation fault);
- 进程收到SIGABRT信号,进程退出并生成core;
- 通过gcore(或gdb)对进程生成core,进程正常运行不终止;
2.环境版本
操作系统:
[test1280@test1280 20210113]$ uname -a
Linux test1280 2.6.32-642.el6.x86_64 #1 SMP Tue May 10 17:27:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[test1280@test1280 20210113]$ cat /etc/redhat-release
CentOS release 6.8 (Final)
环境变量:
[test1280@test1280 20210113]$ ulimit -c
unlimited
注意:
ulimit -c一定不能是0,最好是ulimited。
ulimit -c如果设置为0,将无法生成core文件。
更多参考:https://blog.csdn.net/test1280/article/details/73655994
3.示例
3.1.运行时异常
最常见的如段错误:
- 空指针引用
- 内存越界
- ……
以操作空指针引起段错误为例:
demo1.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
struct Student
{
char *mName;
char *mAddr;
int mAge;
};
int fun3()
{
struct Student* pStudent = NULL;
/* 对空指针NULL操作段错误 */
pStudent->mName = "test1280";
}
int fun2()
{
fun3();
}
int fun1()
{
fun2();
}
int main()
{
fun1();
}
编译、执行:
[test1280@test1280 20210113]$ gcc -o demo1 demo1.c -g
[test1280@test1280 20210113]$ ./demo1
Segmentation fault (core dumped)
查看core文件:
[test1280@test1280 20210113]$ gdb -c core.3348 demo1
......
Core was generated by `./demo1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000400484 in fun3 () at demo1.c:16
16 pStudent->mName = "test1280";
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.192.el6.x86_64
(gdb) bt
#0 0x0000000000400484 in fun3 () at demo1.c:16
#1 0x000000000040049b in fun2 () at demo1.c:21
#2 0x00000000004004ab in fun1 () at demo1.c:26
#3 0x00000000004004bb in main () at demo1.c:31
(gdb) quit
异常时的调用栈为:main->fun1->fun2->fun3
#0 0x0000000000400484 in fun3 () at demo1.c:16
#1 0x000000000040049b in fun2 () at demo1.c:21
#2 0x00000000004004ab in fun1 () at demo1.c:26
#3 0x00000000004004bb in main () at demo1.c:31
指明异常原因:
Program terminated with signal 11, Segmentation fault.
注:signal 11 = SIGSEGV
指明异常代码(源文件、源代码):
#0 0x0000000000400484 in fun3 () at demo1.c:16
16 pStudent->mName = "test1280";
其他错误也可能引起core生成,如除0操作:
int fun3()
{
int i = 0/0;
}
Floating point exception (core dumped)
Program terminated with signal 8, Arithmetic exception.
3.2.信号
信号可以是进程自己触发,又或者是手动触发。
3.2.1.abort
进程在执行到异常流程时,可以主动调用abort函数(C库stdlib)退出进程,并生成core文件。
demo2.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int fun3()
{
/* 异常流程,退出进程 */
abort();
}
int fun2()
{
fun3();
}
int fun1()
{
fun2();
}
int main()
{
fun1();
}
编译、执行:
[test1280@test1280 20210113]$ gcc -o demo2 demo2.c -g
[test1280@test1280 20210113]$ ./demo2
Aborted (core dumped)
查看core文件:
[test1280@test1280 20210113]$ gdb -c core.3415 demo2
......
Core was generated by `./demo2'.
Program terminated with signal 6, Aborted.
#0 0x0000003da0e325e5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.192.el6.x86_64
(gdb) bt
#0 0x0000003da0e325e5 in raise () from /lib64/libc.so.6
#1 0x0000003da0e33dc5 in abort () from /lib64/libc.so.6
#2 0x00000000004004cd in fun3 () at demo2.c:8
#3 0x00000000004004db in fun2 () at demo2.c:13
#4 0x00000000004004eb in fun1 () at demo2.c:18
#5 0x00000000004004fb in main () at demo2.c:23
(gdb) quit
异常时的调用栈为:main->fun1->fun2->fun3->abort->raise
#0 0x0000003da0e325e5 in raise () from /lib64/libc.so.6
#1 0x0000003da0e33dc5 in abort () from /lib64/libc.so.6
#2 0x00000000004004cd in fun3 () at demo2.c:8
#3 0x00000000004004db in fun2 () at demo2.c:13
#4 0x00000000004004eb in fun1 () at demo2.c:18
#5 0x00000000004004fb in main () at demo2.c:23
在调用abort时,调用raise,发送SIGABRT信号到进程自身。
指明异常原因:
Program terminated with signal 6, Aborted.
注:signal 6 = SIGABRT
指明异常代码(源文件、源代码):
#0 0x0000003da0e325e5 in raise () from /lib64/libc.so.6
3.2.2.kill
- Ctrl+\
如果进程运行在前台,例如:
demo3.c
[test1280@test1280 20210113]$ cat demo3.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int fun3()
{
while (1)
{
sleep(1);
}
}
int fun2()
{
fun3();
}
int fun1()
{
fun2();
}
int main()
{
fun1();
}
编译、执行:
[test1280@test1280 20210113]$ gcc -o demo3 demo3.c -g
[test1280@test1280 20210113]$ ./demo3
【前台阻塞,在demo3执行完毕前,当前shell阻塞】
在当前shell键入Ctrl+\,发送SIGQUIT信号到前台进程:
[test1280@test1280 20210113]$ gcc -o demo3 demo3.c -g
[test1280@test1280 20210113]$ ./demo3
^\Quit (core dumped)
此时,前台进程终止运行,生成core文件。
查看core文件:
[test1280@test1280 20210113]$ gdb -c core.3575 demo3
......
Core was generated by `./demo3'.
Program terminated with signal 3, Quit.
#0 0x0000003da0eacbc0 in __nanosleep_nocancel () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.192.el6.x86_64
(gdb) bt
#0 0x0000003da0eacbc0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x0000003da0eaca50 in sleep () from /lib64/libc.so.6
#2 0x00000000004004d2 in fun3 () at demo3.c:9
#3 0x00000000004004e2 in fun2 () at demo3.c:15
#4 0x00000000004004f2 in fun1 () at demo3.c:20
#5 0x0000000000400502 in main () at demo3.c:25
注意:signal 3 = SIGQUIT
Program terminated with signal 3, Quit.
- kill
kill命令(或类似kill的命令),可以手动发送指定信号到指定进程。
例如,仍以demo3为例,可以保持前台挂起,新启动shell终端执行kill:
【终端1】
[test1280@test1280 20210113]$ ./demo3
【终端1阻塞...】
【终端2】
【先查demo3进程的PID=3587】
[test1280@test1280 ~]$ ps aux | grep demo3 | grep -v grep
test1280 3587 0.0 0.0 3920 328 pts/0 S+ 06:41 0:00 ./demo3
【执行kill命令发送SIGQUIT到3587进程】
[test1280@test1280 ~]$ kill -SIGQUIT 3587
【终端1】
[test1280@test1280 20210113]$ ./demo3
Quit (core dumped)
【demo3进程收到在终端2通过kill发送的SIGQUIT信号,进程退出,终端1不再阻塞】
3.3.gcore
若生产环境中进程出现异常阻塞,在不宕停进程的情况下想生成core,可以使用gcore。
gcore是一个调用gdb的脚本:
[test1280@test1280 20210113]$ which gcore
/usr/bin/gcore
[test1280@test1280 20210113]$ file `which gcore`
/usr/bin/gcore: POSIX shell script text executable
[test1280@test1280 20210113]$ vi `which gcore`
......
gcore的man描述:
Generate a core dump of a running program with process ID pid.
Produced file is equivalent to a kernel produced core file as if the process crashed (and if “ulimit -c” were used to set up an appropriate core dump limit).
Unlike after a crash, after gcore the program remains running without any change.
例如,仍以demo3为例,可以保持前台挂起,新启动shell终端执行gcore:
【终端1】
[test1280@test1280 20210113]$ ./demo3
【终端1等待demo3进程宕停,终端1阻塞挂起】
【终端2】
【先查demo3进程的PID=3644】
[test1280@test1280 20210113]$ ps aux | grep demo3 | grep -v grep
test1280 3644 0.0 0.0 3920 332 pts/0 S+ 06:52 0:00 ./demo3
【gcore <pid>生成core】
[test1280@test1280 20210113]$ gcore 3644
0x0000003da0eacbc0 in __nanosleep_nocancel () from /lib64/libc.so.6
Saved corefile core.3644
【终端1】
[test1280@test1280 20210113]$ ./demo3
【终端1进程仍然阻塞等待demo3进程宕停,在执行gcore后,demo3进程仍运行】
gcore脚本,是调用gdb的gcore指令实现其功能:
[test1280@test1280 20210113]$ gdb
(gdb) help gcore
Save a core file with the current state of the debugged process.
Argument is optional filename. Default filename is 'core.<process_id>'.
(gdb) quit
4.总结
进程会由于各种各样的原因主动地或被动地生成core。
但归咎起来,大体上都是通过内核信号生成:
* SIGSEGV:段错误
* SIGABRT:abort
* SIGQUIT:Ctrl+Q
除上之外,还有其他信号也会导致进程出core,例如SIGILL、SIGTRAP等。
关键在于,何种情况会导致信号触发送到进程。
除了信号,gcore调用gdb的gcore指令,使得某个进程生成core文件但并不终止进程执行。
遗留问题:gcore实现原理,是否也是发送某一种特定信号,此信号会使得进程生成core但并不终止?