业务背景
使用 top + grep 来判断一个进程是否退出。如果grep不到了,就说明退出里,否则等待1秒钟重新判断
测试进程process_abc
,sleep 5秒钟后自动退出
#include <stdio.h>
int main()
{
sleep(5);
return 0;
}
检测脚本
velscode@velscode:~/test$ cat check.sh
#/bin/bash
while true
do
top -n 1 -b | grep "process_abc"
if [ $? -eq 0 ];then
echo "process_abc alive, wait it"
sleep 1
else
echo "process_abc exit"
break
fi
done
问题现象
通过 docker run --it ..
进入容器bash,手工执行命令,正常
检测脚本检测到目标进程5次,每次等待1秒,process_abc退出后,执行结束
# 拉起容器的脚本
velscode@velscode:~/test$ cat run_manual.sh
DOCKER_CONTAINER_TAG=ubuntu/test:1.0
docker build -f Dockerfile . -t ${DOCKER_CONTAINER_TAG}
docker run -it --privileged \
-v $(pwd):/opt/ \
-w /opt/ \
${DOCKER_CONTAINER_TAG} /bin/bash
# 拉起容器
velscode@velscode:~/test$ bash run_manual.sh
Sending build context to Docker daemon 24.58kB
Step 1/1 : FROM ubuntu:latest
---> ba6acccedd29
Successfully built ba6acccedd29
Successfully tagged ubuntu/test:1.0
# 执行测试进程和检测脚本
root@a63946704302:/opt# ./process_abc & bash check.sh
[1] 9
9 root 20 0 2352 516 452 S 0.0 0.1 0:00.00 process_abc
process_abc alive, wait it
9 root 20 0 2352 516 452 S 0.0 0.1 0:00.00 process_abc
process_abc alive, wait it
9 root 20 0 2352 516 452 S 0.0 0.1 0:00.00 process_abc
process_abc alive, wait it
9 root 20 0 2352 516 452 S 0.0 0.1 0:00.00 process_abc
process_abc alive, wait it
9 root 20 0 2352 516 452 S 0.0 0.1 0:00.00 process_abc
process_abc alive, wait it
process_abc exit
[1]+ Done ./process_abc
如果采用 docker run ... /bin/bash -c "./process_abc & bash check.sh"
,则检测脚本无法检测到进程
# 拉起容器的脚本
velscode@velscode:~/test$ cat run_docker.sh
DOCKER_CONTAINER_TAG=ubuntu/test:1.0
docker build -f Dockerfile . -t ${DOCKER_CONTAINER_TAG}
docker run --privileged \
-v $(pwd):/opt/ \
-w /opt/ \
${DOCKER_CONTAINER_TAG} /bin/bash -c "export TERM=xterm; ./process_abc & bash check.sh"
# 执行结果
velscode@velscode:~/test$ bash run_docker.sh
Sending build context to Docker daemon 24.58kB
Step 1/1 : FROM ubuntu:latest
---> ba6acccedd29
Successfully built ba6acccedd29
Successfully tagged ubuntu/test:1.0
process_abc exit
问题原因
在docker中,终端默认宽度比较小,top结果的进程名显示不全
velscode@velscode:~/test$ cat run_docker.sh
DOCKER_CONTAINER_TAG=ubuntu/test:1.0
docker build -f Dockerfile . -t ${DOCKER_CONTAINER_TAG}
docker run --privileged \
-v $(pwd):/opt/ \
-w /opt/ \
${DOCKER_CONTAINER_TAG} /bin/bash -c "export TERM=xterm; ./process_abc & top -n 1 -b"
velscode@velscode:~/test$ bash run_docker.sh
Sending build context to Docker daemon 24.58kB
Step 1/1 : FROM ubuntu:latest
---> ba6acccedd29
Successfully built ba6acccedd29
Successfully tagged ubuntu/test:1.0
top - 11:06:12 up 1:39, 0 users, load average: 0.01, 0.08, 0.08
Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 93.3 id, 6.7 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 962.1 total, 119.1 free, 256.5 used, 586.6 buff/cache
MiB Swap: 1924.0 total, 1922.5 free, 1.5 used. 551.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 3980 2952 2740 S 0.0 0.3 0:00.00 bash
7 root 20 0 2352 576 516 S 0.0 0.1 0:00.00 process_+
8 root 20 0 5972 3204 2776 R 0.0 0.3 0:00.00 top
导致grep失败。
解决方法
使用 top 的 w
参数指定宽度
velscode@velscode:~/test$ cat run_docker.sh
DOCKER_CONTAINER_TAG=ubuntu/test:1.0
docker build -f Dockerfile . -t ${DOCKER_CONTAINER_TAG}
docker run --privileged \
-v $(pwd):/opt/ \
-w /opt/ \
${DOCKER_CONTAINER_TAG} /bin/bash -c "export TERM=xterm; ./process_abc & top -n 1 -b -w 512"
velscode@velscode:~/test$ bash run_docker.sh
Sending build context to Docker daemon 25.09kB
Step 1/1 : FROM ubuntu:latest
---> ba6acccedd29
Successfully built ba6acccedd29
Successfully tagged ubuntu/test:1.0
top - 11:07:57 up 1:40, 0 users, load average: 0.07, 0.07, 0.08
Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 93.3 id, 6.7 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 962.1 total, 115.5 free, 258.3 used, 588.4 buff/cache
MiB Swap: 1924.0 total, 1922.5 free, 1.5 used. 549.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 3980 2944 2728 S 0.0 0.3 0:00.00 bash
7 root 20 0 2352 584 520 S 0.0 0.1 0:00.00 process_abc
8 root 20 0 5972 3232 2808 R 0.0 0.3 0:00.00 top
可以看到进程名显示全了,grep可以正常工作里