部署supervisor监控进程

背景:需要对一个服务的进程进行监控,并且需要在进程不存在的时候及时恢复进程,所以采用supervisor服务来满足需求
一、安装

1.需要先下载supervisor-3.3.5.tar.gz

# wget https://files.pythonhosted.org/packages/ba/65/92575a8757ed576beaee59251f64a3287bde82bdc03964b89df9e1d29e1b/supervisor-3.3.5.tar.gz
# tar zxf supervisor-3.3.5.tar.gz
# cd supervisor-3.3.5

2.安装的时候出现下面错误

# python setup.py install
Processing dependencies for supervisor==3.3.5
Searching for meld3>=0.6.5
Reading http://pypi.python.org/simple/meld3/
Couldn't find index page for 'meld3' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading http://pypi.python.org/simple/
No local packages or download links found for meld3>=0.6.5
error: Could not find suitable distribution for Requirement.parse('meld3>=0.6.5')

3.安装meld3的版本要大于等于0.6.5

# yum install -y python-meld3

4.再次安装supervisor

# python setup.py install
......
Installed /usr/lib/python2.6/site-packages/supervisor-3.3.5-py2.6.egg
Processing dependencies for supervisor==3.3.5
Searching for meld3==0.6.7
Best match: meld3 0.6.7
Adding meld3 0.6.7 to easy-install.pth file

Using /usr/lib64/python2.6/site-packages
Finished processing dependencies for supervisor==3.3.5
二、配置服务

1.生成配置文件

# echo_supervisord_conf > /etc/supervisord.conf

2.修改主配置文件,我这里主要修改了下面内容
[unix_http_server]段中chown=homework:hoemwork是为了让homework有权限通过supervisorctl管理进程,目前服务通过jenkins部署的时候,需要把服务停止发布代码,然后在通过supervisorctl启动把服务启动

[unix_http_server]
file=/home/homework/supervisor/supervisor.sock
chown=homework:hoemwork

[supervisord]
logfile=/home/homework/supervisor/supervisord.log
pidfile=/home/homework/supervisor/supervisord.pid
user=homework

[supervisorctl]
serverurl=unix:///home/homework/supervisor/supervisor.sock

[include]
files = /etc/supervisord.conf.d/*.conf

3.添加管理服务的配置文件,放到/etc/supervisord.conf.d目录下面,结尾是conf,配置文件为/etc/supervisord.conf.d/rsrankserver.conf,内容如下,xxx就是服务名字

# cat /etc/supervisord.conf.d/rsrankserver.conf
[program:xxx]
directory=/home/homework/rs/rsrankserver/bin
command=/home/homework/rs/rsrankserver/bin/rsrankserver >error.out
numprocs=1
autostart=true
startsecs=1
autorestart=true
user=homework
redirect_stderr=true
stdout_logfile=/home/homework/supervisor/xxx.log

这里注意一下,源有服务是通过shell脚本管理的,命令是/home/homework/rs/rsrankserver/bin/rsrankserver --daemon &>error.out,但是supervisor无法管理守护进程的服务,所以在配置command里面将–daemon &去掉,这样抱着supervisor能管理该进程

三、添加管理脚本

管理脚本/etc/init.d/supervisord,修改权限为755,内容如下

#!/bin/sh
#
# /etc/rc.d/init.d/supervisord
#
# Supervisor is a client/server system that
# allows its users to monitor and control a
# number of processes on UNIX-like operating
# systems.
#
# chkconfig: - 64 36
# description: Supervisor Server
# processname: supervisord

# Source init functions
. /etc/init.d/functions

RETVAL=0
prog="supervisord"
pidfile="/home/homework/supervisor/supervisord.pid"
lockfile="/home/homework/supervisor/lock/subsys/supervisord"

start()
{
        echo -n $"Starting $prog: "
        daemon --pidfile $pidfile supervisord -c /etc/supervisord.conf
        RETVAL=$?
        echo
        [ $RETVAL -eq 0 ] && touch ${lockfile}
}

stop()
{
        echo -n $"Shutting down $prog: "
        killproc -p ${pidfile} /usr/bin/supervisord
        RETVAL=$?
        echo
        if [ $RETVAL -eq 0 ] ; then
                rm -f ${lockfile} ${pidfile}
        fi
}

case "$1" in

  start)
    start
  ;;

  stop)
    stop
  ;;

  status)
        status $prog
  ;;

  restart)
    stop
    start
  ;;

  *)
    echo "Usage: $0 {start|stop|restart|status}"
  ;;

esac
四、启动服务
# mkdir -p /home/homework/supervisor/lock/subsys/
# chown homework.homework -R /home/homework/supervisor
# /etc/init.d/supervisord start
Starting supervisord:                                      [  OK  ]
五、检查服务
# ps -ef|grep rsr
homework  89173  94865  0 Feb14 ?        00:00:49 /home/homework/rs/rsrankserver/bin/rsrankserver >error.out
# supervisorctl status
rsrankserver                     RUNNING   pid 89173, uptime 14 days, 0:50:05

通过此方法验证服务已经处于RUNNING状态,下面验证进程kill是否会自动启动。需要注意的是,原来如果的服务已经存在,需要先清理掉,确保只有一个正常的服务进程。

六、验证服务
# ps -ef|grep rsr
homework  88491  87473 91 16:07 ?        00:00:05 /home/homework/rs/rsrankserver/bin/rsrankserver >error.out
# kill 88491
# ps -ef|grep rsr     
homework  88638  87473  6 16:08 ?        00:00:07 /home/homework/rs/rsrankserver/bin/rsrankserver >error.out

pid更新了说明服务会自动拉起

七、管理服务

supervisorctl status:查看所有服务的状态
supervisorctl stop xxx:将xxx服务停止
supervisorctl start xxx:将xxx服务启动

# supervisorctl status
rsrankserver                     RUNNING   pid 88638, uptime 0:02:41
# supervisorctl stop rsrankserver
rsrankserver: stopped
# supervisorctl start rsrankserver
rsrankserver: started
--------下面是旧的部署文档--------
一、安装
# yum install supervisor
二、配置服务

生成配置文件

# echo_supervisord_conf > /etc/supervisord.conf

/etc/supervisord.conf是当前配置文件,下面是把注释行过滤掉的内容

[unix_http_server]
file=/tmp/supervisor.sock   ; the path to the socket file

[supervisord]
logfile=/var/log/supervisord.log ; main log file; default $CWD/supervisord.log
logfile_maxbytes=50MB        ; max main logfile bytes b4 rotation; default 50MB
logfile_backups=10           ; # of main logfile backups; 0 means none, default 10
#定义日志级别,默认是info
loglevel=info                ; log level; default info; others: debug,warn,trace
pidfile=/var/log/supervisord.pid ; supervisord pidfile; default supervisord.pid
nodaemon=false               ; start in foreground if true; default false
minfds=1024                  ; min. avail startup file descriptors; default 1024
minprocs=200                 ; min. avail process descriptors;default 200

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL  for a unix socket

#program:xxx是你的服务名字,比如nginx
[program:xxx]
#command=后面填写的是启动命令,我这里启动服务的是一个脚本
command=/home/homework/xxx.sh
#xxx服务的进程数量,默认是1
numprocs=1
#是否自动启动
autostart=true
#设置为0表示服务异常立即触发command定义的命令
startsecs=0
#服务异常自动重启
autorestart=true
#使用homework用户启动服务
user=homework
#开启错误输出
redirect_stderr=true
#定义错误日志
stdout_logfile=/home/homework/log/rs/recallsrv/rsrankserver/rsrankserver.log
三、启动服务
# service supervisord start
或者
# /etc/init.d/supervisord start
四、检查服务
# tail -f /var/log/supervisord.log
2019-01-29 20:05:22,951 INFO exited: rsrankserver (exit status 1; not expected)
2019-01-29 20:05:22,951 INFO received SIGCLD indicating a child quit
2019-01-29 20:05:23,954 INFO spawned: 'rsrankserver' with pid 81212
2019-01-29 20:05:23,972 INFO success: rsrankserver entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2019-01-29 20:05:23,972 TRAC rsrankserver output:
already is running. pid: 114669

1、这里因为当时排查问题所以在[supervisord]端内的loglevel配置的是trace
2、注意exit status 1表示服务正常,如果是0说明服务不存在会自动恢复服务。但如果是127说明启动脚本有异常或者其它问题,这时候自动恢复服务是不成功的。
3、每个服务都是每秒检查,所以注意配置好日志的数量和单个日志大小

五、验证

1、因为服务是homework用户启动的,所以测试也进入homework用户,kill进程或者用脚本停止服务,看看是否恢复服务即可。
2、如果没有恢复,注意查看日志,查看启动服务的时候抱什么错误,我出现过因为shell脚本没有写#!/bin/sh导致supervisor无法运行脚本,所以服务无法恢复,添加后就正常了,导致问题的原因很多,需要排查。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值