Linux信号处理简析

1. 前言

限于作者能力水平,本文可能存在谬误,因此而给读者带来的损失,作者不做任何承诺。

2. 分析背景

本文基于 ARM32 架构 + Linux 4.14 内核源码进行分析。

3. 信号概述

3.1 信号分类

信号这个概念,起始于 UNIX 操作系统,经过一系列的演变,形成了今天由 POSIX 标准定义的信号。按信号的编号区间和处理的实时性,我们简单的将信号分为标准信号实时信号两类。

3.1.1 标准信号

标准信号起始于 UNIX 操作系统,编号区间为 1-31 。标准信号的编号如下表:

Signal        x86/ARM     Alpha/   MIPS   PARISC   Notes
            most others   SPARC
─────────────────────────────────────────────────────────────────
SIGHUP           1           1       1       1
SIGINT           2           2       2       2
SIGQUIT          3           3       3       3
SIGILL           4           4       4       4
SIGTRAP          5           5       5       5
SIGABRT          6           6       6       6
SIGIOT           6           6       6       6
SIGBUS           7          10      10      10
SIGEMT           -           7       7      -
SIGFPE           8           8       8       8
SIGKILL          9           9       9       9
SIGUSR1         10          30      16      16
SIGSEGV         11          11      11      11
SIGUSR2         12          31      17      17
SIGPIPE         13          13      13      13
SIGALRM         14          14      14      14
SIGTERM         15          15      15      15
SIGSTKFLT       16          -       -        7
SIGCHLD         17          20      18      18
SIGCLD           -          -       18      -
SIGCONT         18          19      25      26
SIGSTOP         19          17      23      24
SIGTSTP         20          18      24      25
SIGTTIN         21          21      26      27
SIGTTOU         22          22      27      28
SIGURG          23          16      21      29
SIGXCPU         24          24      30      12
SIGXFSZ         25          25      31      30
SIGVTALRM       26          26      28      20
SIGPROF         27          27      29      21
SIGWINCH        28          28      20      23
SIGIO           29          23      22      22
SIGPOLL                                            Same as SIGIO
SIGPWR          30         29/-     19      19
SIGINFO          -         29/-     -       -
SIGLOST          -         -/29     -       -
SIGSYS          31          12      12      31
SIGUNUSED       31          -       -       31

可见,对于不同的硬件架构实现,信号的编号并不相同,但它们需要保证,同名的信号,具有相同的含义。我们再来看一下部分标准信号的具体含义、以及它们的默认处理动作:

 Signal      Standard   Action   Comment
────────────────────────────────────────────────────────────────────────
SIGABRT      P1990      Core    Abort signal from abort(3)
SIGALRM      P1990      Term    Timer signal from alarm(2)
SIGBUS       P2001      Core    Bus error (bad memory access)
SIGCHLD      P1990      Ign     Child stopped or terminated
SIGCLD         -        Ign     A synonym for SIGCHLD
SIGCONT      P1990      Cont    Continue if stopped
SIGEMT         -        Term    Emulator trap
SIGFPE       P1990      Core    Floating-point exception
SIGHUP       P1990      Term    Hangup detected on controlling terminal
                                or death of controlling process
SIGILL       P1990      Core    Illegal Instruction
SIGINFO        -                A synonym for SIGPWR
SIGINT       P1990      Term    Interrupt from keyboard
SIGIO          -        Term    I/O now possible (4.2BSD)
SIGIOT         -        Core    IOT trap. A synonym for SIGABRT
SIGKILL      P1990      Term    Kill signal
SIGLOST        -        Term    File lock lost (unused)
SIGPIPE      P1990      Term    Broken pipe: write to pipe with no
                                readers; see pipe(7)
SIGPOLL      P2001      Term    Pollable event (Sys V);
                                synonym for SIGIO
SIGPROF      P2001      Term    Profiling timer expired
SIGPWR         -        Term    Power failure (System V)
SIGQUIT      P1990      Core    Quit from keyboard
SIGSEGV      P1990      Core    Invalid memory reference
SIGSTKFLT      -        Term    Stack fault on coprocessor (unused)
SIGSTOP      P1990      Stop    Stop process
SIGTSTP      P1990      Stop    Stop typed at terminal
SIGSYS       P2001      Core    Bad system call (SVr4);
                                see also seccomp(2)
SIGTERM      P1990      Term    Termination signal
SIGTRAP      P2001      Core    Trace/breakpoint trap
SIGTTIN      P1990      Stop    Terminal input for background process
SIGTTOU      P1990      Stop    Terminal output for background process
SIGUNUSED      -        Core    Synonymous with SIGSYS
SIGURG       P2001      Ign     Urgent condition on socket (4.2BSD)
SIGUSR1      P1990      Term    User-defined signal 1
SIGUSR2      P1990      Term    User-defined signal 2
SIGVTALRM    P2001      Term    Virtual alarm clock (4.2BSD)
SIGXCPU      P2001      Core    CPU time limit exceeded (4.2BSD);
                                see setrlimit(2)
SIGXFSZ      P2001      Core    File size limit exceeded (4.2BSD);
                                see setrlimit(2)
SIGWINCH       -        Ign     Window resize signal (4.3BSD, Sun)

3.1.2 实时信号

标准信号的处理,不具备实时性。对某一个标准信号,只要当前有挂起的,后续的信号都会被忽略,也就是只会响应第一个信号。为此,引入了实时信号,对于同一实时信号的多次触发,会建立信号队列,将信号入队,让每个信号都得到处理。
实时信号编号区间为 32-64,glibc 的 pthread ,使用了 32-3332-34 这几个信号,同时将标记实时信号起始编号的宏 SIGRTMIN 重定义为 34 或 35

3.2 信号的发起

信号的发起,可以经由系统调用 sys_kill()sys_tgkill() 显式发起,其中:

sys_kill() 发送给线程组,处理信号的线程可以是线程组中内的任何线程;
sys_tgkill() 发送给线程组内特定线程,信号经由该线程上下文处理。

另外一类信号发起的方式,是由进程在某些特定条件下(如空指针访问),由内核隐式发起,如 SIGSEGV
在分析信号的发起流程前,先看一下信号进程处理信号相关的数据结构:
在这里插入图片描述
接下来看一下信号的发起流程。先看发送信号给线程组的流程:

sys_kill(pid, sig)
	struct siginfo info;

	info.si_signo = sig;
	info.si_errno = 0;
	info.si_code = SI_USER;
	info.si_pid = task_tgid_vnr(current);
	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
	
	kill_something_info(sig, &info, pid)
		kill_pid_info(sig, info, find_vpid(pid))
			struct task_struct *p = pid_task(pid, PIDTYPE_PID);
			group_send_sig_info(sig, info, p)
			 	do_send_sig_info(sig, info, p, true) /* 发送信号到线程组 */
			 		/* 参看信号发送的公共流程 */

再看发送信号给特定线程的流程:

sys_tgkill(tgid, pid, sig)
	do_tkill(tgid, pid, sig)
		struct siginfo info = {};
		
		info.si_signo = sig;
		info.si_errno = 0;
		info.si_code = SI_TKILL;
		info.si_pid = task_tgid_vnr(current);
		info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
		
		do_send_specific(tgid, pid, sig, &info)
			struct task_struct *p = find_task_by_vpid(pid);
			do_send_sig_info(sig, info, p, false) /* 发送信号到特定线程 */
				/* 参看信号发送的公共流程 */

最后看发送信号到线程组或线程组内特定线程的公共流程:

do_send_sig_info(sig, info, p, group)
	send_signal(sig, info, p, group)
		__send_signal(sig, info, t, group, from_ancestor_ns)
			/* prepare_signal() 返回 0 表示接收信号 */
			if (!prepare_signal(sig, t,
					from_ancestor_ns || (info == SEND_SIG_PRIV) || (info == SEND_SIG_FORCED)))
				goto ret;
			
			/*
			 * group == true : 将信号放入线程组共享的挂起队列
			 * group == false: 将信号放入线程独立的挂起队列
			 */
			pending = group ? &t->signal->shared_pending : &t->pending;

			/* 对标准信号,如果重复收到,仅需要入队1次 */ 
			if (legacy_queue(pending, sig))
				goto ret;

			/* 分配挂起信号队列节点对象 */
			q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit);
			/* 添加挂起信号到对应队列 */
			list_add_tail(&q->list, &pending->list);
			copy_siginfo(&q->info, info);

			signalfd_notify(t, sig); /* 唤醒在 signalfd 上等待信号的进程 */
			sigaddset(&pending->signal, sig); /* 设置挂起信号的掩码 */
			/*
			 * 选择信号处理进程,告知进程有挂起的信号待处理 (设置 TIF_SIGPENDING 标
			 * 记),然后唤醒进程 
			 */
			complete_signal(sig, t, group);

3.3 信号的处理

3.3.1 信号处理的准备工作

在进程启动时,会做一些进程信号处理的准备工作,其具体流程如下:

load_elf_binary()
	...
	arch_setup_additional_pages(bprm, !!elf_interpreter)
		signal_page = get_signal_page()
			page = alloc_pages(GFP_KERNEL, 0); /* 分配1个物理页面 */
			addr = page_address(page); /* 返回页面的虚拟地址 */
			offset = 0x200 + (get_random_int() & 0x7fc); /* 页面内随机偏移 */
			signal_return_offset = offset; /* 保存页内随机偏移到 @signal_return_offset */
			/* 拷贝【信号处理接口返回内核空间代码片段】到页面内偏移 @offset 处 */ 
			memcpy(addr + offset, sigreturn_codes, sizeof(sigreturn_codes))
		...
		/* 映射【信号处理接口返回内核空间代码片段】所在页面到进程虚拟地址空间 */
		hint = sigpage_addr(mm, npages);
		addr = get_unmapped_area(NULL, hint, npages << PAGE_SHIFT, 0, 0);
		vma = _install_special_mapping(mm, addr, PAGE_SIZE,
				VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
				&sigpage_mapping);
		/* 记录【信号处理接口返回内核空间代码片段】虚拟地址到进程地址空间 mm_struct */
		mm->context.sigpage = addr;
	...

3.3.2 信号的处理流程

中断返回用户空间系统调用返回用户空间时,系统对挂起的信号进行处理。处理流程如下:

/* @arch/arm/kernel/entry-common.S */
ret_fast_syscall:
	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK	@ 检查进程的 _TIF_SIGPENDING 标记
	...
/* 有挂起的工作要做,先做完挂起的工作,再返回用户空间 */
slow_work_pending:
	mov	r0, sp
	mov	r2, why
	@arch/arm/kernel/signal.c
	bl	do_work_pending @ 处理挂起的信号
		do {
			...
			if (thread_flags & _TIF_SIGPENDING) { /* 挂起信号可能导致系统调用的中断 */
				do_signal(regs, syscall)
					if (get_signal(&ksig)) { /* 取出一个挂起的信号 */
						handle_signal(&ksig, regs); /* 处理取出的挂起信号 */
							setup_frame(ksig, oldset, regs)
								/* 从用户空间栈分配 sigframe 变量空间 */
								struct sigframe __user *frame = get_sigframe(ksig, regs, sizeof(*frame));
								setup_sigframe(frame, regs, set)
									context = (struct sigcontext) {
										.arm_r0        = regs->ARM_r0,
										.arm_r1        = regs->ARM_r1,
										.arm_r2        = regs->ARM_r2,
										.arm_r3        = regs->ARM_r3,
										.arm_r4        = regs->ARM_r4,
										.arm_r5        = regs->ARM_r5,
										.arm_r6        = regs->ARM_r6,
										.arm_r7        = regs->ARM_r7,
										.arm_r8        = regs->ARM_r8,
										.arm_r9        = regs->ARM_r9,
										.arm_r10       = regs->ARM_r10,
										.arm_fp        = regs->ARM_fp,
										.arm_ip        = regs->ARM_ip,
										.arm_sp        = regs->ARM_sp,
										.arm_lr        = regs->ARM_lr,
										.arm_pc        = regs->ARM_pc,
										.arm_cpsr      = regs->ARM_cpsr,
								
										.trap_no       = current->thread.trap_no,
										.error_code    = current->thread.error_code,
										.fault_address = current->thread.address,
										.oldmask       = set->sig[0],
									};
									__copy_to_user(&sf->uc.uc_mcontext, &context, sizeof(context)); /* 保存用户空间上下文:信号处理会破坏它们 */
									...
								setup_return(regs, ksig, frame->retcode, frame)
									/* 用户空间设置的信号处理接口 */
									unsigned long handler = (unsigned long)ksig->ka.sa.sa_handler;
									...
									if (__put_user(sigreturn_codes[idx],   rc) ||
		    							__put_user(sigreturn_codes[idx+1], rc+1))
										return 1;
									/* 进程启动时,映射到进程地址空间的【信号处理接口返回内核空间代码片段】地址 */
									struct mm_struct *mm = current->mm;
									retcode = mm->context.sigpage + signal_return_offset +
				  (idx << 2) + thumb;
				  					regs->ARM_r0 = ksig->sig; /* 信号处理接口的 参数0 为信号编码 */
									regs->ARM_sp = (unsigned long)frame;
									regs->ARM_lr = retcode; /* 信号处理接口返回到sigreturn_codes 代码片段处:即发起系统调用 sys_sigreturn() 返回内核空间,然后再返回用户空间被中断的代码处 */
									regs->ARM_pc = handler; /* 处理信号时,返回用户空间时,返回到信号处理接口 */
									regs->ARM_cpsr = cpsr;

									return 0;
							signal_setup_done(ret, ksig, 0)
					}
			}
			...
		} while (thread_flags & _TIF_WORK_MASK);

	/* 从中断或系统调用返回用户空间,进入用户空间配置的信号处理接口 */
	signal_handler()

	/*
	 * 信号处理接口 signal_handler() 返回时,执行 sigreturn_codes 处的代码片段:
	 * arch/arm/kernel/sigreturn_codes.S
	 */
sigreturn_codes:
	mov	r7, #(__NR_sigreturn - __NR_SYSCALL_BASE)
	swi	#(__NR_sigreturn)|(__NR_OABI_SYSCALL_BASE)
		/* 进入系统调用 sys_sigreturn() */
		sys_sigreturn()
			frame = (struct sigframe __user *)regs->ARM_sp;
			/* 恢复用户空间因信号处理被破坏上下文 */
			restore_sigframe(regs, frame)
				__copy_from_user(&context, &sf->uc.uc_mcontext, sizeof(context))
				regs->ARM_r0 = context.arm_r0;
				regs->ARM_r1 = context.arm_r1;
				regs->ARM_r2 = context.arm_r2;
				regs->ARM_r3 = context.arm_r3;
				regs->ARM_r4 = context.arm_r4;
				regs->ARM_r5 = context.arm_r5;
				regs->ARM_r6 = context.arm_r6;
				regs->ARM_r7 = context.arm_r7;
				regs->ARM_r8 = context.arm_r8;
				regs->ARM_r9 = context.arm_r9;
				regs->ARM_r10 = context.arm_r10;
				regs->ARM_fp = context.arm_fp;
				regs->ARM_ip = context.arm_ip;
				regs->ARM_sp = context.arm_sp;
				regs->ARM_lr = context.arm_lr;
				regs->ARM_pc = context.arm_pc;
				regs->ARM_cpsr = context.arm_cpsr;
	
	/* 从系统调用 sys_sigreturn() 返回用户空间继续执行,信号处理完毕!!! */

我们用下图来总结下信号的处理流程:

用户态
        signal                signal handler                   继续执行被中断的程序
----------                  ------------------                ----------------------
          |                |                  |              |
          |                |           sys_sigreturn()       |
          |                |                  |              |
----------V----------------^------------------V--------------^-----------------------> t
          |                |                  |              |
          |                |                  |              | 
          |                |                  |              |
           ----------------                    --------------
             do_signal()                       sys_sigreturn()
内核态

另外,进行信号处理时的上下文,可参考博文Linux系统调用实现简析

4. 实例

学习信号处理的细节,到底意义何在?首先当然是了解信号的工作机制。另外,在Linux应用编程: API基础中,提到一个Async-Signal-Safe Function的概念,这类函数可以在信号处理函数内调用,除此之外的其它函数,如果在信号处理函数内调用,可能导致程序死锁、或者数据处理混乱等问题,学习信号机制也能帮我们理解分析这些问题。让我们来看一个在信号处理接口内不适当地调用函数,导致死锁的例子:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>

pthread_mutex_t recursive_disallow_mutex;

void async_signal_not_safe(void)
{
    pthread_mutex_lock(&recursive_disallow_mutex);
    sleep(5);
    pthread_mutex_unlock(&recursive_disallow_mutex);
}

void signal_int(int signo)
{
    async_signal_not_safe();
}

int main(void)
{
	pthread_mutexattr_t attr;

	/* 不允许 pthread_mutex_t 递归使用 */
	pthread_mutexattr_init(&attr);
	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_NORMAL);
	pthread_mutex_init(&recursive_disallow_mutex, &attr);
	pthread_mutexattr_destroy(&attr);

	if (signal(SIGINT, signal_int) == SIG_ERR) {
		printf("signal(SIGINT) error");
		return -1;
	}

	async_signal_not_safe();

    return 0;
}

编译运行,按下 Ctrl+C ,用 gdb 观察程序运行情况:

bill@bill-virtual-machine:~/Study/app/signal$ sudo gdb attach -p 3560
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 3560
Reading symbols from /home/bill/Study/app/signal/async-signal-not-safe...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/.build-id/c5/57b8146e8079af46310b549de6912d1fc4ea86.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so...done.
done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.23.so...done.
done.
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135	../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) thread apply all bt

Thread 1 (Thread 0x7f6f08429700 (LWP 3560)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f6f0800bdbd in __GI___pthread_mutex_lock (mutex=0x6010a0 <recursive_disallow_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2  0x0000000000400904 in async_signal_not_safe ()
#3  0x000000000040092b in signal_int ()
#4  <signal handler called>
#5  0x00007f6f07d04370 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:84
#6  0x00007f6f07d042da in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#7  0x000000000040090e in async_signal_not_safe ()
#8  0x00000000004009af in main ()

发现程序一直卡在了信号处理函数 signal_int() 调用链:

signal_int()
	async_signal_not_safe()
		pthread_mutex_lock(&recursive_disallow_mutex)

也就是说,程序发生了死锁。从前面分析的信号处理流程,这里发生问题的场景,在如下场景进入了信号处理接口:

main()
	async_signal_not_safe()
		pthread_mutex_lock(&recursive_disallow_mutex)
			sleep(5)

sleep() 使进程陷入睡眠期间,按下 Ctrl+C 生成了 SIGINT 信号;在 sleep() 睡眠时间到达后,系统唤醒进程,从 sleep() 系统调用返回用户空间,发现进程有挂起的信号,于是进入信号处理流程:

signal_int()
	async_signal_not_safe()
		pthread_mutex_lock(&recursive_disallow_mutex)

此时因为锁 recursive_disallow_mutex 尚未释放,同时禁用了锁 recursive_disallow_mutex 的递归使用,从而导致死锁。
类似如上的场景还有很多,如在 main()signal_int() 中都同时调用 malloc()/free() 等接口,都会导致死锁,或者数据损坏等莫名其妙的错误。

5. 参考资料

[1] man signal
[2] 关于异步信号安全

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值