打算在跳板机上写一个shell脚本,批量检查远程服务器上的main进程是否在健康运行中。
先找出其中一台远程机器,查看main进程运行情况
1 2 3 4 5 | [root@two002 tmp] # ps -ef|grep main root 23448 23422 0 11:40 pts /0 00:00:00 grep --color=auto main [root@two002 tmp] # ps -ef|grep main|grep -v grep|wc -l 0 |
shell检查脚本如下
1 2 3 4 5 6 7 8 | [root@two002 tmp] # cat /tmp/main_check.sh #!/bin/bash NUM=$( ps -ef| grep main| grep - v grep | wc -l) if [ $NUM - eq 0 ]; then echo "It's not good! main is stoped!" else echo "Don't worry! main is running!" fi |
执行脚本
1 2 3 4 5 6 7 8 9 10 11 12 | [root@two002 tmp] # sh -x /tmp/main_check.sh ++ grep main ++ grep - v grep ++ wc -l ++ ps -ef + NUM=2 + '[' 2 - eq 0 ']' + echo 'Don' \ '' t worry! main is running!' Don't worry! main is running! [root@two002 tmp] # sh /tmp/main_check.sh Don't worry! main is running! |
如上执行结果,发现脚本执行过程中,看到赋予NUM参数的结果值是2!但是手动执行ps -ef|grep main|grep -v grep|wc -l的结果明明是0!!
这是由于grep匹配的问题,需要grep进行精准匹配,即"grep -w"。这就需要将main_check.sh脚本内容修改如下:
1 2 3 4 5 6 7 8 | [root@two002 tmp] # cat /tmp/main_check.sh #!/bin/bash NUM=$( ps -ef| grep -w main| grep - v grep | wc -l) if [ $NUM - eq 0 ]; then echo "Oh!My God! It's broken! main is stoped!" else echo "Don't worry! main is running!" fi |
再次执行检查脚本,就OK了
1 2 3 4 5 6 7 8 9 10 11 12 | [root@two002 tmp] # sh -x /tmp/main_check.sh ++ grep -w main ++ grep - v grep ++ wc -l ++ ps -ef + NUM=0 + '[' 0 - eq 0 ']' + echo 'Oh!My God! It' \ '' s broken! main is stoped!' Oh!My God! It's broken! main is stoped! [root@two002 tmp] # sh /tmp/main_check.sh Oh!My God! It's broken! main is stoped! |
故在跳板机上,批量检查远程服务器的main进程运行状态的脚本为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | [root@tiaoban ~] # cat /usr/bin/main_check #!/bin/bash NUM=$( ps -ef| grep -w main| grep - v grep | wc -l) if [ $NUM - eq 0 ]; then echo "Oh!My God! It's broken! main is stoped!" else echo "Don't worry! main is running!" fi [root@tiaoban ~] # cat /opt/script/main_check.sh #!/bin/bash for i in $( cat /opt/ip .list) do /usr/bin/rsync -e "ssh -p22" -avpgolr /usr/bin/main_check $i: /usr/bin/ > /dev/null 2>&1 ssh -p22 root@$i "echo $i;sh /usr/bin/main_check" done |