coredump 续---abort，addr2line_coredump 信号量-CSDN博客

本文链接：https://blog.csdn.net/pangyemeng/article/details/72877929

0x01缘由

在项目实际开发中，当出现段错误时，为了定位问题的代码行。必须保留程序运行上下文、根据linux内核日志定位代码行等。

有时当程序自己发现问题时，自动退出程序释放资源，如发现系统内存耗尽时，退出程序，又得自动将文件流刷新到磁盘等。

0x02 abort介绍

信号量：SIGABRT 6 程序自己发现错误并调用 abort 时产生。

ISO C要求如果信号被捕获而信号处理器返回，abort仍然不返回到它的调用者。如果信号被捕获，信号不能返回的唯一方式是调用exit、_exit、 _Exit、longjmp或siglongjmp。（10.15节讨论了longjmp和siglongjmp的区别。）POSIX.1也规定 abort覆盖进程对这个信号的阻塞或忽略。
让进程捕获SIGABRT的意图是允许它执行任何它想在进程终止前的清理。如果进程不在信号处理器里终止它自己，那么根据POSIX.1当信号处理器返回时，abort终止这个进程。
ISO C对这个函数的规定让实现来决定输出流是否被冲洗以及临时文件是否被删除。POSIX.1要求更多，要求如果abort调用终止进程，那么在进程里的打开的标准I／O流上的效果将和进程在终止前为每个流调用fclose的效果一样。

/* Copyright (C) 1991-2017 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#include <libc-lock.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

/* Try to get a machine dependent instruction which will make the
   program crash.  This is used in case everything else fails.  */
#include <abort-instr.h>
#ifndef ABORT_INSTRUCTION
/* No such instruction is available.  */
# define ABORT_INSTRUCTION
#endif

#include <libio/libioP.h>
#define fflush(s) _IO_flush_all_lockp (0)

/* Exported variable to locate abort message in core files etc.  */
struct abort_msg_s *__abort_msg __attribute__ ((nocommon));
libc_hidden_def (__abort_msg)

/* We must avoid to run in circles.  Therefore we remember how far we
   already got.  */
static int stage;

/* We should be prepared for multiple threads trying to run abort.  */
__libc_lock_define_initialized_recursive (static, lock);


/* Cause an abnormal program termination with core-dump.  */
void
abort (void)
{
  struct sigaction act;
  sigset_t sigs;

  /* First acquire the lock.  */
  __libc_lock_lock_recursive (lock);

  /* Now it's for sure we are alone.  But recursive calls are possible.  */

  /* Unlock SIGABRT.  */
  if (stage == 0)
    {
      ++stage;
      if (__sigemptyset (&sigs) == 0 &&
	  __sigaddset (&sigs, SIGABRT) == 0)
	__sigprocmask (SIG_UNBLOCK, &sigs, (sigset_t *) NULL);
    }

  /* Flush all streams.  We cannot close them now because the user
     might have registered a handler for SIGABRT.  */
  if (stage == 1)
    {
      ++stage;
      fflush (NULL);
    }

  /* Send signal which possibly calls a user handler.  */
  if (stage == 2)
    {
      /* This stage is special: we must allow repeated calls of
	 `abort' when a user defined handler for SIGABRT is installed.
	 This is risky since the `raise' implementation might also
	 fail but I don't see another possibility.  */
      int save_stage = stage;

      stage = 0;
      __libc_lock_unlock_recursive (lock);

      raise (SIGABRT);

      __libc_lock_lock_recursive (lock);
      stage = save_stage + 1;
    }

  /* There was a handler installed.  Now remove it.  */
  if (stage == 3)
    {
      ++stage;
      memset (&act, '\0', sizeof (struct sigaction));
      act.sa_handler = SIG_DFL;
      __sigfillset (&act.sa_mask);
      act.sa_flags = 0;
      __sigaction (SIGABRT, &act, NULL);
    }

  /* Now close the streams which also flushes the output the user
     defined handler might has produced.  */
  if (stage == 4)
    {
      ++stage;
      __fcloseall ();
    }

  /* Try again.  */
  if (stage == 5)
    {
      ++stage;
      raise (SIGABRT);
    }

  /* Now try to abort using the system specific command.  */
  if (stage == 6)
    {
      ++stage;
      ABORT_INSTRUCTION;
    }

  /* If we can't signal ourselves and the abort instruction failed, exit.  */
  if (stage == 7)
    {
      ++stage;
      _exit (127);
    }

  /* If even this fails try to use the provided instruction to crash
     or otherwise make sure we never return.  */
  while (1)
    /* Try for ever and ever.  */
    ABORT_INSTRUCTION;
}
libc_hidden_def (abort)

0x03 abort应用场景

当程序中检测到某种资源即将耗尽时，自动退出程序，做好相关清理。

当排查某问题时，生成coredump上下文，查看堆栈信息。如http://blog.csdn.net/stpeace/article/details/65937095

0x04 addr2line介绍

从工具命名来看，是将地址转换成代码行。具体看下面例子：

#include <stdlib.h>
#include <iostream>
#include <cmath>
using namespace std;

struct Point
{
    int x;
    int y;
};

int main() 
{
    Point po; 
    po.x = 1;
    po.y = 2;
    float distance = sqrt(po.x * po.x + po.y * po.y);
    float result = po.y / 0;
    abort();
    return 0;
}

[root@BH test]# ./a.out 
Floating point exception (core dumped)

[root@BH test]# dmesg 
[UHIO] all counters clear!
a.out[14634] trap divide error ip:400772 sp:7fffc81fe400 error:0 in a.out[400000+1000]

[root@BH test]# addr2line 400772
/home/pangyemeng/test/abrt.cpp:18
[root@BH test]#

0x05 addr2line应用场景

有时配置不给力，Linux直接毁尸灭迹，没有了Core文件；又有时，刚好磁盘空间不足，Core文件写不下了。没有Core文件的时候，如何知道程序在什么地方出错了呢？addr2line就在这时派上用场。

0x06 siglongjmp（转载用法，http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=14327709&id=2978705）

linux中特殊的跳转函数sigsetjmp()和siglongjmp()
如果是在low-0level subroutine中处理中断和错误的时候特别有用
在使用函数的时候，需要先声明一个sigjmp_buf型的变量，用来保存某一位置(时刻)堆栈上下文的信息。

原型：

//直接调用则返回0, 从siglongjmp调用返回则返回非0值. int sigsetjmp(sigjmp_buf env, int savesigs); void siglongjmp(sigjmp_buf env, int val);

sigsetjmp会将当前的堆栈上下文保存在变量env中，这个变量会在后面的siglongjmp中用到。但是当调用个sigsetjmp的函数返

回的时候，env变量将会失效；
如果savesigs非零，阻塞的信号集合也会保存在env变量中，当调用siglongjmp的时候，阻塞的信号集也会被恢复。如果

sigsetjmp本身直接返回，则返回值为0；若sigsetjmp在siglongjmp使用env之后返回，则返回值为非零。

其具体用法见下面的例子：

#include <stdio.h> #include <setjmp.h> #include <signal.h> static sigjmp_buf jmpbuf; void sig_fpe(int signo) { siglongjmp(jmpbuf, 1); } int main(int argc, char *argv[]) { signal(SIGFPE, sig_fpe); if (sigsetjmp(jmpbuf, 1) == 0) // try { // { int ret = 10 / 0; // int ret = 10 / 0; } // } else // catch { // { printf("catch exception\n"); // printf("catch excetion\n"); } // } }

其结果如下，说明捕捉到了除零的错误信号：

catch exception

结果分析：

分析：在第一次调用sigsetjmp的时候，由于之前没有调用siglongjmp，所以sigsetjmp的返回值为0，故执行int ret = 10 / 0;的操作这时候产生了一个SIGFPE信号，然后会进入SIGFPE信号的handler中。在handler中调用了siglongjmp，恢复了env，这时候会回到保存env之处，继续重新执行if，由于在本次sigsetjmp调用之前已经有siglongjmp恢复了env，故返回值为非零。从而最终打印出捕捉到的异常信息。这个功能其实相当于cpp中的异常捕捉try...catch块。