supervisor:C/S架构的进程控制系统,可使用户在类UNIX系统中监控、管理进程。常用于管理与某个用户或项目相关的进程。
组成部分
supervisord:服务守护进程
supervisorctl:命令行客户端
Web Server:提供与supervisorctl功能相当的WEB操作界面
XML-RPC Interface:XML-RPC接口
安装
centos平台下可直接用过YUM源安装
yum info supervisor
sudo yum install supervisor
sudo chkconfig supervisord on
服务器启停
sudo /etc/init.d/supervisord {start|stop|status|restart|reload|force-reload|condrestart}
日志
/var/log/supervisor/supervisord.log
配置文件
sudo vim /etc/supervisord.conf
一般包含如下几个可配置部分
[unix_http_server]
[inet_http_server]
[supervisord]
[supervisorctl]
[program:x]
[include]
[group:x]
[fcgi-program:x]
[eventlistener:x]
[rpcinterface:x]
需要重点关注的是以下两部分
[program:x]中配置要监控的进程
[group:x] 将要监控的进程分组
配置样例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
|
[supervisord]
http_port=/var/tmp/supervisor.sock ; (
default
is to run a UNIX domain socket server)
;http_port=
127.0
.
0.1
:
9001
; (alternately, ip_address:port specifies AF_INET)
;sockchmod=
0700
; AF_UNIX socketmode (AF_INET ignore,
default
0700
)
;sockchown=nobody.nogroup ; AF_UNIX socket uid.gid owner (AF_INET ignores)
;umask=
022
; (process file creation umask;
default
022
)
logfile=/var/log/supervisor/supervisord.log ; (main log file;
default
$CWD/supervisord.log)
logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;
default
50MB)
logfile_backups=
10
; (num of main logfile rotation backups;
default
10
)
loglevel=info ; (logging level;
default
info; others: debug,warn)
pidfile=/var/run/supervisord.pid ; (supervisord pidfile;
default
supervisord.pid)
nodaemon=
false
; (start in foreground
if
true
;
default
false
)
minfds=
1024
; (min. avail startup file descriptors;
default
1024
)
minprocs=
200
; (min. avail process descriptors;
default
200
)
;nocleanup=
true
; (don't clean up tempfiles at start;
default
false
)
;http_username=user ; (
default
is no username (open system))
;http_password=
123
; (
default
is no password (open system))
;childlogdir=/tmp ; (
'AUTO'
child log dir,
default
$TEMP)
;user=chrism ; (
default
is current user, required
if
root)
;directory=/tmp ; (
default
is not to cd during start)
;environment=KEY=value ; (key value pairs to add to environment)
[supervisorctl]
serverurl=unix:
///var/tmp/supervisor.sock ; use a unix:// URL for a unix socket
;serverurl=http:
//127.0.0.1:9001 ; use an http:// url to specify an inet socket
;username=chris ; should be same as http_username
if
set
;password=
123
; should be same as http_password
if
set
;prompt=mysupervisor ; cmd line prompt (
default
"supervisor"
)
; The below sample program section shows all possible program subsection values,
; create one or more
'real'
program: sections to be able to control them under
; supervisor.
;[program:example]
;command=/bin/echo; the program (relative uses PATH, can take args)
;priority=
999
; the relative start priority (
default
999
)
;autostart=
true
; start at supervisord start (
default
:
true
)
;autorestart=
true
; retstart at unexpected quit (
default
:
true
)
;startsecs=
10
; number of secs prog must stay running (def.
10
)
;startretries=
3
; max # of serial start failures (
default
3
)
;exitcodes=
0
,
2
;
'expected'
exit codes
for
process (
default
0
,
2
)
;stopsignal=QUIT ; signal used to kill process (
default
TERM)
;stopwaitsecs=
10
; max num secs to wait before SIGKILL (
default
10
)
;user=chrism ; setuid to
this
UNIX account to run the program
;log_stdout=
true
;
if
true
, log program stdout (
default
true
)
;log_stderr=
true
;
if
true
, log program stderr (def
false
)
;logfile=/var/log/supervisor.log ; child log path, use NONE
for
none;
default
AUTO
;logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (
default
50MB)
;logfile_backups=
10
; # of logfile backups (
default
10
)
|
1
2
3
4
5
6
7
8
9
|
;command=/bin/echo; supervisor启动时将要开启的进程。相对或绝对路径均可。若是相对路径则会从supervisord的$PATH变中查找。命令可带参数。
;priority=
999
指明进程启动和关闭的顺序。低优先级表明进程启动时较先启动关闭时较后关闭。高优先级表明进程启动时启动时较后启动关闭时较先关闭。
;autostart=
true
是否随supervisord启动而启动
;autorestart=
true
进程意外退出后是否自动重启
;startsecs=
10
进程持续运行多久才认为是启动成功
;startretries=
3
重启失败的连续重试次数
;exitcodes=
0
,
2
若autostart设置为unexpected且监控的进程并非因为supervisord停止而退出,那么如果进程的退出码不在exitcode列表中supervisord将重启进程
;stopsignal=QUIT 杀进程的信号
;stopwaitsecs=
10
向进程发出stopsignal后等待OS向supervisord返回SIGCHILD 的时间。若超时则supervisord将使用SIGKILL杀进程
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
[program:worker_for_summary]
command=/home/op1/test_db_monitor/worker_for_summary.py
logfile=/var/log/worker_for_summary.py.supervisor.log
[program:worker_for_detail_all]
command=/home/op1/test_db_monitor/worker_for_detail_all.py
logfile=/var/log/worker_for_detail_all.py.supervisor.log
[program:worker_for_detail_recent]
command=/home/op1/test_db_monitor/worker_for_detail_recent.py
logfile=/var/log/worker_for_detail_recent.py.supervisor.log
[program:publisher_for_summary]
command=/home/op1/test_db_monitor/publisher_for_summary.py
logfile=/var/log/publisher_for_summary.py.supervisor.log
[program:publisher_for_summary_nt]
command=/home/op1/test_db_monitor/publisher_for_summary_nt.py
logfile=/var/log/publisher_for_summary_nt.py.supervisor.log
[program:publisher_for_detail]
command=/home/op1/test_db_monitor/publisher_for_detail.py
logfile=/var/log/publisher_for_detail.py.supervisor.log
[program:publisher_for_detail_nt]
command=/home/op1/test_db_monitor/publisher_for_detail_nt.py
logfile=/var/log/publisher_for_detail_nt.py.supervisor.log
[group:dbmonitor_consumer]
programs=worker_for_summary, worker_for_detail_all, worker_for_detail_recent
priority=
1
log_stderr=
true
logfile_maxbytes=1MB
[group:dbmonitor_publisher]
programs=publisher_for_summary, publisher_for_summary_nt,publisher_for_detail,publisher_for_detail_nt
priority=
999
log_stderr=
true
logfile_maxbytes=1MB
|
[group:x]部分将监控的进程信息分为生产者和消费者两组,分别给以不同的优先级。
配置完成后启动supervisord
1
|
sudo /etc/init.d/supervisord start
|
停掉某个进程后supervisor会马上重启该进程
停止supervisor
1
|
sudo /etc/init.d/supervisord stop
|
可以通过supervisorctl查看管理监控的进程情况:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
|
[op1
@SVR1631HP360
~]$ sudo supervisorctl
publisher_for_detail RUNNING pid
27557
, uptime
0
:
00
:
45
publisher_for_detail_nt RUNNING pid
27567
, uptime
0
:
00
:
45
publisher_for_summary RUNNING pid
27566
, uptime
0
:
00
:
45
publisher_for_summary_nt RUNNING pid
27568
, uptime
0
:
00
:
45
worker_for_detail_all RUNNING pid
27581
, uptime
0
:
00
:
45
worker_for_detail_recent RUNNING pid
27582
, uptime
0
:
00
:
45
worker_for_summary RUNNING pid
27559
, uptime
0
:
00
:
45
#可通过help了解命令的更多用法
supervisor> help
Documented commands (type help <topic>):
========================================
EOF exit maintail quit restart start stop
clear help open reload shutdown status tail
supervisor> help stop
stop <processname> Stop a process.
stop <processname> <processname> Stop multiple processes
stop all Stop all processes
When all processes are stopped, they are stopped in
reverse priority order (see config file)
supervisor> help status
status Get all process status info.
status <name> Get status on a single process by name.
status <name> <name> Get status on multiple named processes.
#停止某个进程
supervisor> stop publisher_for_summary
publisher_for_summary: stopped
#查看此时此刻的状态
supervisor> status
publisher_for_detail RUNNING pid
27557
, uptime
0
:
05
:
41
publisher_for_detail_nt RUNNING pid
27567
, uptime
0
:
05
:
41
publisher_for_summary STOPPED Feb
27
02
:
48
PM
publisher_for_summary_nt RUNNING pid
27568
, uptime
0
:
05
:
41
worker_for_detail_all RUNNING pid
27581
, uptime
0
:
05
:
41
worker_for_detail_recent RUNNING pid
27582
, uptime
0
:
05
:
41
worker_for_summary RUNNING pid
27559
, uptime
0
:
05
:
41
#发现被supervisorctl停掉的进程不会被自动重启
#开启刚才停掉的进程
supervisor> start publisher_for_summary
publisher_for_summary: started
supervisor> status
publisher_for_detail RUNNING pid
27557
, uptime
0
:
08
:
02
publisher_for_detail_nt RUNNING pid
27567
, uptime
0
:
08
:
02
publisher_for_summary RUNNING pid
3035
, uptime
0
:
00
:
04
publisher_for_summary_nt RUNNING pid
27568
, uptime
0
:
08
:
02
worker_for_detail_all RUNNING pid
27581
, uptime
0
:
08
:
02
worker_for_detail_recent RUNNING pid
27582
, uptime
0
:
08
:
02
worker_for_summary RUNNING pid
27559
, uptime
0
:
08
:
02
#停掉所有进程
supervisor> stop all
worker_for_detail_recent: stopped
worker_for_detail_all: stopped
publisher_for_summary_nt: stopped
publisher_for_detail_nt: stopped
publisher_for_summary: stopped
worker_for_summary: stopped
publisher_for_detail: stopped
supervisor> status
publisher_for_detail STOPPED Feb
27
02
:
51
PM
publisher_for_detail_nt STOPPED Feb
27
02
:
51
PM
publisher_for_summary STOPPED Feb
27
02
:
51
PM
publisher_for_summary_nt STOPPED Feb
27
02
:
51
PM
worker_for_detail_all STOPPED Feb
27
02
:
51
PM
worker_for_detail_recent STOPPED Feb
27
02
:
51
PM
worker_for_summary STOPPED Feb
27
02
:
51
PM
#开启所有进程
supervisor> start all
publisher_for_detail: started
worker_for_summary: started
publisher_for_summary: started
publisher_for_detail_nt: started
publisher_for_summary_nt: started
worker_for_detail_all: started
worker_for_detail_recent: started
supervisor> status
publisher_for_detail RUNNING pid
5111
, uptime
0
:
00
:
15
publisher_for_detail_nt RUNNING pid
5141
, uptime
0
:
00
:
15
publisher_for_summary RUNNING pid
5135
, uptime
0
:
00
:
15
publisher_for_summary_nt RUNNING pid
5147
, uptime
0
:
00
:
15
worker_for_detail_all RUNNING pid
5153
, uptime
0
:
00
:
15
worker_for_detail_recent RUNNING pid
5159
, uptime
0
:
00
:
14
worker_for_summary RUNNING pid
5112
, uptime
0
:
00
:
15
|