linux 进程 ssl 状态,Linux进程状态与信号

问题描述

今天测试环境上出现创建缓存分区失败的情况,查看log发现是ceph-disk zap /dev/sdx hang死,导致超时被杀。log如下所示:

318 time=2020-02-27T10:08:25+08:00 level=warning module=utils/process.go:123 topic=kernel.external.process msg="Process was killed after 2m0.000139012s: /usr/sbin/ceph-disk [ceph-disk zap /dev/sdg]

319 out:

320 err: 1+0 records in

321 1+0 records out

322 4194304 bytes (4.2 MB) copied, 0.00448586 s, 935 MB/s

323 "

分析

查看其对应的进程信息,发现有好几个sgdisk进程

[root@sds2 ~]# ps -ef | grep zap

root 4085 1 0 11:10 ? 00:00:00 /usr/sbin/sgdisk --zap-all -- /dev/sdg

root 23181 1 0 10:06 ? 00:00:00 /usr/sbin/sgdisk --zap-all -- /dev/sdg

root 40867 1 0 Feb26 ? 00:00:00 /usr/sbin/sgdisk --zap-all -- /dev/sdg

root 41064 1 0 Feb26 ? 00:00:00 /usr/sbin/sgdisk --zap-all -- /dev/sdi

root 42785 1 0 Feb26 ? 00:00:00 /usr/sbin/sgdisk --zap-all -- /dev/sdg

root 48840 32585 0 16:24 pts/1 00:00:00 grep --color=auto zap

查看其中一个进程的栈信息,从其栈信息可以看出其hang在call_rwsem_down_read_failed,具体介绍可以参考读写信号量与实时进程阻塞挂死问题

[root@sds2 ~]# cat /proc/4085/stack

[] call_rwsem_down_read_failed+0x18/0x30

[] iterate_supers+0xaa/0x120

[] sys_sync+0x44/0xb0

[] system_call_fastpath+0x16/0x1b

[] 0xffffffffffffffff

接着使用top命令查看其进程状态为D,D代表uninterruptible sleep,Linux进程有两种睡眠状态,一种interruptible sleep,处在这种睡眠状态的进程是可以通过给它发信号来唤醒的,比如发HUP信号给nginx的master进程可以让nginx重新加载配置文件而不需要重新启动nginx进程;另外一种睡眠状态是uninterruptible sleep,处在这种状态的进程不接受外来的任何信号,也无法用kill杀掉这些处于D状态的进程,无论是”kill”, “kill -9″还是”kill -15″,因为它们不受这些信号的支配。

进程为什么会被置于uninterruptible sleep状态呢?处于uninterruptible sleep状态的进程通常是在等待IO,比如磁盘IO,网络IO,其他外设IO,如果进程正在等待的IO在较长的时间内都没有响应,那么就很会不幸地被 ps看到了,同时也就意味着很有可能有IO出了问题,可能是外设本身出了故障,也可能是比如挂载的远程文件系统已经不可访问了。

[root@sds2 ~]# top -p 4085

top - 16:27:32 up 16 days, 20:22, 3 users, load average: 7.24, 7.25, 7.26

Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie

%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.4 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st

KiB Mem : 65758080 total, 37593416 free, 5325808 used, 22838856 buff/cache

KiB Swap: 0 total, 0 free, 0 used. 53136852 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4085 root 20 0 53296 2112 1736 D 0.0 0.0 0:00.08 sgdisk

(ENV) [root@ceph-2 ~]# ps -axf | grep etcd

7123 pts/1 S+ 0:00 \_ grep --color=auto etcd

17158 ? Ssl 462:16 /opt/sds/bin/etcd --config-file /opt/sds/etcd/etcd.conf

17227 ? Ssl 97:00 /opt/sds/bin/etcd --config-file /opt/sds/etcd/etcd-proxy.conf

以下内容来自ps手册页。

This ps works by reading the virtual files in /proc.

Processes marked are dead processes (so-called "zombies") that remain because their parent has not destroyed

them properly. These processes will be destroyed by init(8) if the parent process exits.

PROCESS STATE CODES

Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to

describe the state of a process:

D uninterruptible sleep (usually IO)

R running or runnable (on run queue)

S interruptible sleep (waiting for an event to complete)

T stopped by job control signal

t stopped by debugger during the tracing

W paging (not valid since the 2.6.xx kernel)

X dead (should never be seen)

Z defunct ("zombie") process, terminated but not reaped by its parent

For BSD formats and when the stat keyword is used, additional characters may be displayed:

< high-priority (not nice to other users)

N low-priority (nice to other users)

L has pages locked into memory (for real-time and custom IO)

s is a session leader

l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)

+ is in the foreground process group

其中,前面提到的kill命令,我们可以调用kill -l查看相应的信号。

[root@sds2 ~]# kill -l

1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP

6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1

11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM

16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP

21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ

26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR

31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3

38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8

43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13

48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12

53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7

58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2

63) SIGRTMAX-1 64) SIGRTMAX

上面的信号中需要提到的是18,19,20。

kill -SIGSTOP [pid]

kill -SIGCONT [pid]

对于SIGSTOP

When SIGSTOP is sent to a process, the usual behaviour is to pause that process in its current state. The process will only resume execution if it is sent the SIGCONT signal. SIGSTOP and SIGCONT are used for SIGSTOP cannot be caught or ignored.

对于SIGCONT

When SIGSTOP or SIGTSTP is sent to a process, the usual behaviour is to pause that process in its current state. The process will only resume execution if it is sent the SIGCONT signal. SIGSTOP and SIGCONT are used for job control in the Unix shell, among other purposes.

简而言之,SIGSTOP告诉进程先hold on,而且SIGSTOP不能被捕捉或忽略,SIGTSTP可以被捕捉或忽略。 SIGCONT通知进程从其hold on的地方继续开始。

In short, SIGSTOP tells a process to “hold on” and SIGCONT tells a process to “pick up where you left off”.

A job running in the foreground can be stopped by typing the suspend character (Ctrl-Z). This sends the "terminal stop" signal (However, a process can register a signal handler for or ignore SIGTSTP. A process can also be paused with the "stop" signal (SIGSTOP), which cannot be caught or ignored.

A job running in the foreground can be interrupted by typing the interruption character (Ctrl-C). This sends the "interrupt" signal (

另外有一个地方需要注意的是kill -0 ,其主要是执行错误检查,用于检查进程或进程组ID是否存在。当时在keepalived启动时也看到同样的用法。

Jan 8 12:14:36 ceph-2 Keepalived[9288]: Opening file '/opt/sds/keepalived/sds-keepalived-10.252.90.77-8/keepalived.conf'.

Jan 8 12:14:36 ceph-2 Keepalived[9288]: Remove a zombie pid file /opt/sds/keepalived/sds-keepalived-10.252.90.77-8/keepalived.pid

Jan 8 12:14:36 ceph-2 Keepalived[9288]: Remove a zombie pid file /opt/sds/keepalived/sds-keepalived-10.252.90.77-8/vrrp.pid

Jan 8 12:14:36 ceph-2 Keepalived[9289]: Starting VRRP child process, pid=9290

Jan 8 12:14:36 ceph-2 Keepalived_vrrp[9290]: Registering Kernel netlink reflector

Jan 8 12:14:36 ceph-2 Keepalived_vrrp[9290]: Registering Kernel netlink command channel

Jan 8 12:14:36 ceph-2 Keepalived_vrrp[9290]: Registering gratuitous ARP shared channel

Jan 8 12:14:36 ceph-2 Keepalived_vrrp[9290]: Opening file '/opt/sds/keepalived/sds-keepalived-10.252.90.77-8/keepalived.conf'.

Jan 8 12:14:36 ceph-2 Keepalived_vrrp[9290]: WARNING - default user 'keepalived_script' for script execution does not exist - please create.

Jan 8 12:14:36 ceph-2 Keepalived_vrrp[9290]: (sds-keepalived-10.252.90.77-8): Cannot start in MASTER state if not address owner

Jan 8 12:14:36 ceph-2 Keepalived_vrrp[9290]: (sds-keepalived-10.252.90.77-8): Unable to set no_accept mode since iptables chain name unset

从log看到在keepalived pid文件中注入某进程ID之后还是能正常启动,查看源码可以看出启动时会去检查pid file。

2171 /* Check if keepalived is already running */

2172 if (keepalived_running(daemon_mode)) {

2173 log_message(LOG_INFO, "daemon is already running");

2174 report_stopped = false;

2175 goto end;

2176 }

2177 }

123 /* Return parent process daemon state */

124 bool

125 keepalived_running(unsigned long mode)

126 {

127 if (process_running(main_pidfile))

128 return true;

129 #ifdef _WITH_VRRP_

130 if (__test_bit(DAEMON_VRRP, &mode) && process_running(vrrp_pidfile))

131 return true;

132 #endif

133 #ifdef _WITH_LVS_

134 if (__test_bit(DAEMON_CHECKERS, &mode) && process_running(checkers_pidfile))

135 return true;

136 #endif

137 #ifdef _WITH_BFD_

138 if (__test_bit(DAEMON_BFD, &mode) && process_running(bfd_pidfile))

139 return true;

140 #endif

141 return false;

142 }

90 static int

91 process_running(const char *pid_file)

92 {

93 FILE *pidfile = fopen(pid_file, "r");

94 pid_t pid = 0;

95 int ret;

96

97 /* No pidfile */

98 if (!pidfile)

99 return 0;

100

101 ret = fscanf(pidfile, "%d", &pid);

102 fclose(pidfile);

103 if (ret != 1) {

104 log_message(LOG_INFO, "Error reading pid file %s", pid_file);

105 pid = 0;

106 pidfile_rm(pid_file);

107 }

108

109 /* What should we return - we don't know if it is running or not. */

110 if (!pid)

111 return 1;

112

113 /* If no process is attached to pidfile, remove it */

114 if (kill(pid, 0)) {

115 log_message(LOG_INFO, "Remove a zombie pid file %s", pid_file);

116 pidfile_rm(pid_file);

117 return 0;

118 }

119

120 return 1;

121 }

查看man 2 kill手册页可以看到:

#include

int kill(pid_t pid, int sig);

If sig is 0, then no signal is sent, but error checking is still performed; this can be used to check for the existence

of a process ID or process group ID.

References

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值