官方文档:Configuration File — Supervisor 4.2.5 documentation
参考博客
Supervisor这个监控告警功能你用过吗?-腾讯云开发者社区-腾讯云
Signals¶
The supervisord program may be sent signals which cause it to perform certain actions while it’s running.
You can send any of these signals to the single supervisord process id. This process id can be found in the file represented by the
pidfile
parameter in the[supervisord]
section of the configuration file (by default it’s$CWD/supervisord.pid
).
supervisord程序可能会被发送信号,这个信号能导致supervisor做一些事情
你可以发送一些信息给supervisord进程id, 它可以被发现在一个文件中,这个文件在那里呢?
这个文件的位置被展示在 配置文件中的[supervisord] 部分的 pidfile参数中
这个就是supervisor的进程id,给这个进程发信息,就可以做报警通知等等各种信息
原文:
Events¶
Events are an advanced feature of Supervisor introduced in version 3.0. You don’t need to understand events if you simply want to use Supervisor as a mechanism to restart crashed processes or as a system to manually control process state. You do need to understand events if you want to use Supervisor as part of a process monitoring/notification framework.
Evetns 是被介绍在Supervisor3.0版本一个先进的特性。 如果你只是想用Supervisor作为一个重启 意外关掉进程的 工具 或者是 作为一个控制手动控制进程状态的系统,那么不需要了解Events 。
如果你想要使用supervisor作为 process 监控/通知 framework 的一部分,那就需要了解了。
Event Listeners and Event Notifications¶
Supervisor provides a way for a specially written program (which it runs as a subprocess) called an “event listener” to subscribe to “event notifications”. An event notification implies that something happened related to a subprocess controlled by supervisord or to supervisord itself. Event notifications are grouped into types in order to make it possible for event listeners to subscribe to a limited subset of event notifications. Supervisor continually emits event notifications as its running even if there are no listeners configured. If a listener is configured and subscribed to an event type that is emitted during a supervisord lifetime, that listener will be notified.
supervisor为一个特殊编写的程序(这个程序以子进程的方式来运行)提供了一种方式,这个方式叫做 event listener(事件监听器) 去订阅 event notifications(事件通知) 。 一个event notifications(事件通知器)意味着 somethings已经发生了,这个事情是关于子进程的,这个子进程被supervisord控制,或者这个事情就是关于supervisord自身的。
事件通知 被分组成为不同的类型,为了让事件监听器能够订阅有限数量的子集(只订阅自己许的的)。 即使没有事件监听器被配置,那supervisor也会持续不断的发出事件通知(发了等于白发,没人听)。 如果一个监听器被配置了,并且订阅了一个在supervisor工作期间会发出的事件类型 ,那么监听器就能注意到这个事件。
The purpose(目标) of the event notification/subscription system is to provide a mechanism(机制) for arbitrary(任何的,任意的) code to be run (e.g. send an email, make an HTTP request, etc) when some condition is met. That condition usually has to do with subprocess state(这个情况通常与子进程的状态有关,这句话是什么意思呢?我们用子进程监测了我们自己的想成,所以当项目出现某些情况的时候,子进程的状态会变化). For instance, you may want to notify someone via email when a process crashes and is restarted by Supervisor.
The event notification protocol is based on communication via a subprocess’ stdin and stdout(事件通知协议基于子进程的stdin和stdout通信。).
Supervisor sends specially-formatted input to an event listener process’ stdin and expects specially-formatted output from an event listener’s stdout, forming a request-response cycle. (Supervisor将特定格式的输入发送到事件监听进程的stdin,并期望从事件监听进程的stdout得到特定格式的输出,形成一个请求-响应循环。)
A protocol(协议) agreed upon between supervisor and the listener’s implementer allows listeners to process event notifications.
Event listeners can be written in any language supported by the platform you’re using to run Supervisor. Although event listeners may be written in any language, there is special library support for Python in the form of a
supervisor.childutils
module, which makes creating event listeners in Python slightly easier than in other languages. (事件监听器可以用运行Supervisor的平台支持的任何语言编写。尽管事件监听器可以用任何语言编写,但Python有专门的库以supervisor的形式支持。childutils模块,它使在Python中创建事件监听器比在其他语言中更容易。)
上面文档说明了,supervisor是支持监控和订阅的。
Configuring an Event Listener¶
A supervisor event listener is specified via a
[eventlistener:x]
section in the configuration file (提示我们需要去修改配置文件,在配置文件里面添加eventlistener
). Supervisor[eventlistener:x]
sections are treated almost exactly like supervisor[program:x]
section with the respect to(就...而言) the keys allowed in their configuration except (除了....看不懂) that Supervisor does not respect “capture mode” output from event listener processes (ie. event listeners cannot bePROCESS_COMMUNICATIONS_EVENT
event generators). Therefore it is an error to specifystdout_capture_maxbytes
orstderr_capture_maxbytes
in the configuration of an eventlistener.There is no artificial constraint on the number of eventlistener sections that can be placed into the configuration file.(在配置文件中放置事件监听器部分的数量没有人为的限制。)
写python程序发送http请求,使用钉钉机器人发消息进行报警
这是之前的go发送http请求,我们使用ChatGPT转化成python
func (t *DingRobot) SendMessage(p *ParamCronTask) error {
b := []byte{}
if p.MsgText.Msgtype == "text" {
msg := map[string]interface{}{}
atMobileStringArr := make([]string, len(p.MsgText.At.AtMobiles))
for i, atMobile := range p.MsgText.At.AtMobiles {
atMobileStringArr[i] = atMobile.AtMobile
}
atUserIdStringArr := make([]string, len(p.MsgText.At.AtUserIds))
for i, AtuserId := range p.MsgText.At.AtUserIds {
atUserIdStringArr[i] = AtuserId.AtUserId
}
msg = map[string]interface{}{
"msgtype": "text",
"text": map[string]string{
"content": p.MsgText.Text.Content,
},
}
if p.MsgText.At.IsAtAll {
msg["at"] = map[string]interface{}{
"isAtAll": p.MsgText.At.IsAtAll,
}
} else {
msg["at"] = map[string]interface{}{
"atMobiles": atMobileStringArr, //字符串切片类型
"atUserIds": atUserIdStringArr,
"isAtAll": p.MsgText.At.IsAtAll,
}
}
b, _ = json.Marshal(msg)
}
var resp *http.Response
var err error
if t.Type == "1" || t.Secret == "" {
resp, err = http.Post(t.getURLV2(), "application/json", bytes.NewBuffer(b))
} else {
resp, err = http.Post(t.getURL(), "application/json", bytes.NewBuffer(b))
}
if err != nil {
return err
}
defer resp.Body.Close()
date, err := ioutil.ReadAll(resp.Body)
r := ResponseSendMessage{}
err = json.Unmarshal(date, &r)
if err != nil {
return err
}
if r.Errcode != 0 {
fmt.Println(r.Errmsg)
return errors.New(r.Errmsg)
}
return nil
}
func (t *DingRobot) getURLV2() string {
url := "https://oapi.dingtalk.com/robot/send?access_token=" + t.RobotId //拼接token路径
return url
}
type DingRobot struct {
RobotId string `gorm:"primaryKey;foreignKey:RobotId" json:"robot_id"` //机器人的token
Deleted gorm.DeletedAt `json:"deleted"` //软删除字段
Type string `json:"type"` //机器人类型,1为企业内部机器人,2为自定义webhook机器人
TypeDetail string `json:"type_detail"` //具体机器人类型
ChatBotUserId string `json:"chat_bot_user_id"` //加密的机器人id,该字段无用
Secret string `json:"secret"` // 机器人所属用户id
UserName string `json:"user_name"` //机器人所属用户名
DingUsers []DingUser `json:"ding_users" gorm:"many2many:user_robot"` //机器人@多个人,一个人可以被多个机器人@
ChatId string `json:"chat_id"` //机器人所在的群聊chatId
OpenConversationID string `json:"open_conversation_id"` //机器人所在的群聊openConversationID
Tasks []Task `gorm:"foreignKey:RobotId;references:RobotId"` //机器人拥有多个任务
Name string `json:"name"` //机器人的名称
DingToken `json:"ding_token" gorm:"-"`
IsShared int `json:"is_shared"`
}
type DingToken struct {
Token string `json:"token"`
}
supervisor用的python代码
配置文件如下:主要是program和event listener,结尾附上完成配置文件
;导入配置文件,还有一种写法是把配置文件给分开,分开之后,可以在这里继续导入
[include]
files = /etc/supervisord.conf
[program:test]
;程序启动参数,这个比较简单
command=/usr/local/goproject/ding_server_v3/test
;是否跟随supervisord的启动而启动,我们设置了true是
autostart=true
;程序退出后自动重启,选择true是
autorestart=true
;进程被杀死时,是否向这个进程组发送stop信号,包括子进程,选择true是
stopasgroup=true
;向进程组发送kill信号,包括子进程,选择true是
killasgroup=true
;下面这几行是日志文件和日志大小和备份个数
stdout_logfile=/var/log/simpleHttp.std.log
stdout_logfile_maxbytes = 50MB
stdout_logfile_backups = 10
stderr_logfile=/var/log/simpleHttp.err.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=10
[eventlistener:testgolang]
command=/usr/bin/python3 /opt/my_custom_listener_testgolang.py ; 自定义的监控程序,需要指定一下/usr/bin/python3,不然可能用到python2,然后就执行不起来
events=PROCESS_STATE_EXITED,PROCESS_STATE_FATAL,TICK_60 ; 监控事件:进程退出、进程启动失败、间隔六十秒
; 下面的配置和`[program:x]`完全一样
autostart=true
autorestart=true
log_stdout=true
log_stderr=true
stdout_logfile=/opt/supervisor_event_exited-stdout.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=3
buffer_size=100
stderr_logfile=/opt/supervisor_event_exited-stderr.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=3
启动supervisor,进入supervisor安装目录
./supervisord -c /etc/supervisord.conf
重启,账号密码在配置文件中
./supervisorctl -c /etc/supervisord.conf -u user -p 123 restart all
杀死program中的进程
然后就会在钉钉群里面发送消息
查看日志
踩坑
kill -9 、 kill 、正常退出、异常退出 之间没有关联性,所以我们监听的时候,可以直接监听程序退出,这样无论是异常还是正常,都可以检测到,从而触发警报。
如何查看日志?
这里面有好几个日志,一个是go程序的输出日志,一个是event listener的日志,一个是supervisor的日志,前两个日志文件都是我们自己指定的路径,supervisor的日志是在
go程序的日志就不用说了,是自己项目的bug
对于supervisor来说,我们需要查看event listener 判断我们的监听器是否正常工作,但是也要查看supervisor的日志,因为supervisor不正常了,那event listener大概率也是不正常的