刚开始对接完钉钉机器人报警的效果是这样的:
后面想一般群里的消息基本上没有人去看,需要增加一个@具体人的功能,然后一并把消息提醒的结构优化一下,套用现成的模板。
在网上找了一个比较好的模板:
{{/* Alert List Begin */}}
{{ define "example.__text_alert_list" }}{{ range . }}
**{{ .Annotations.message }}**
[Prometheus](Prometheus地址) | [Alertmanager](Alertmanager地址) | [Grafana](Grafana地址)
{{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
{{ end }}{{ end }}
{{/* Alert List End */}}
{{/* Message Title Begin */}}
{{ define "example.title" }}{{ template "__subject" . }}{{ end }}
{{/* Message Title End */}}
{{/* Message Content Begin */}}
{{ define "example.content" }}
{{ if gt (len .Alerts.Firing) 0 -}}
{{ template "example.__text_alert_list" .Alerts.Firing }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}
{{ template "example.__text_alert_list" .Alerts.Resolved }}
{{- end }}
{{- end }}
{{/* Message Content End */}}
效果是这样的:
结果提示说 "example.title","example.content" 未定义。后来又找了一个这个:
{{ define "ding.link.content" }}
{{ if gt (len .Alerts.Firing) 0 -}}
告警列表:
-----------
{{ template "__text_alert_list" .Alerts.Firing }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}
恢复列表:
{{ template "__text_resolve_list" .Alerts.Resolved }}
{{- end }}
{{- end }}
效果如下:
1、创建模板,目录为 /config/example.tmpl,如下图:
2、修改 config.example.yml
templates:
- /usr/local/prometheus-webhook-dingtalk/config/example.tmpl
targets:
webhook1:
url: https://oapi.dingtalk.com/robot/send?access_token=aceff59d093d2589ff07e2fff33544d48a928dc6ad2b1dbcb42b08669d33a046
mention:
mobiles: ['手机号...', '手机号...']
message:
text: |
@手机号... @手机号...
{{ template "ding.link.content" . }}
3、修改 alertmanager.yml
global:
resolve_timeout: 1m
route:
group_by: ['severity','alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: webhook1
routes:
- match_re:
severity: warning
receiver: webhook1
receivers:
- name: 'webhook1'
webhook_configs:
- url: http://localhost:8060/dingtalk/webhook1/send
send_resolved: true
route:
receiver: 'ops_dingding' #默认的接收器
receivers:
- name: 'ops_dingding'
webhook_configs:
url: 'http://192.19.192.65:8060/dingtalk/webhook1/send'
send_resolved: true
2)这里合理的配置应该是配置两个 receivers,分别为 webhook1 和 webhook2。上面的 receiver 指定 webhook1,group_by 增加 severity 组,配置 routes 的 severity 为 warning,这个和 Prometheus 的告警规则文件中的 severity 要一致。webhook2 接收 severity 为 warning 的告警,webhook1 接收其他告警。
route:
group_by: ['severity','alertname']
group_wait: 10s
repeat_interval: 1h
receiver: webhook1
routes:
- match_re:
severity: warning
receiver: webhook2
receivers:
- name: 'webhook1'
webhook_configs:
- &dingtalk_config
send_resolved: false
url: http://localhost:8060/dingtalk/webhook1/send
- name: 'webhook2'
webhook_configs:
- <<: *dingtalk_config
url: http://localhost:8060/dingtalk/webhook2/send
3)修改 prometheus-webhook-dingtalk 的服务启动脚本,原先启动命令配置的是 --ding.profile=ops_dingding=机器人webhook地址,现在我们需要修改成 --config.file=config.example.yml。
[Unit]
Description=https://github.com/timonwong/prometheus-webhook-dingtalk/releases/
After=network-online.target
[Service]
Restart=on-failure
ExecStart=/usr/local/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/usr/local/prometheus-webhook-dingtalk/config.example.yml
[Install]
WantedBy=multi-user.target
4)如果用 --ding.profile=ops_dingding= 启动的话,receivers 的 webhook_configs 的 url 里面的 ops_dingding 要和 --ding.profile=ops_dingding= 指定的一致。
receivers:
- name: 'ops_dingding'
webhook_configs:
url: 'http://192.19.192.65:8060/dingtalk/webhook1/send'
send_resolved: true
./prometheus-webhook-dingtalk --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=xxx"