prometheus + alertmanager + PrometheusAlert
要达到的效果 , 短信告警 + 解除告警 (可以自定义内容)
一、阿里云短信签名及模板设置
有4个信息
AccessKey : ALY_DX_AccessKeyId=xxxxxxxxxxx
Secret : ALY_DX_AccessSecret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
签名: : ALY_DX_SignName=XXXXX
模板ID : ALY_DX_Template=SMS_2XXXXXXXX
二、prometheus + alertmanager 前期环境
见之前文档
prometheus 邮件告警
https://blog.csdn.net/oToyix/article/details/120160633
prometheus process-export进程监控
https://blog.csdn.net/oToyix/article/details/120176825
三、短信告警
1、alertmanager配置,调用PrometheusAlert
global:
resolve_timeout: 5m
route:
group_by: ['laertname','severity','namespace']
group_wait: 10s
group_interval: 10s
repeat_interval: 10s
receiver: 'prometheusalert-phone'
routes:
- receiver: 'prometheusalert-phone'
group_wait: 10s
match:
level: '1'
group_interval: 15s
repeat_interval: 1m
receivers:
- name: 'prometheusalert-phone'
webhook_configs:
- url: 'http://192.168.0.59:8080/prometheusalert?type=alydx&tpl=ali-phone&phone=18627967213'
# - url: 'http://192.168.0.59:8080/prometheus/alert'
send_resolved: true
2、PrometheusAlert规则
下载安装PrometheusAlert
wget -c https://github.com/feiyu563/PrometheusAlert/releases/download/v4.4.0/linux.zip
unzip linux.zip
mv linux PrometheusAlert
cd conf/
cp app.conf{,.bak}
vim app.conf
chmod a+x PrometheusAlert
nohup ./PrometheusAlert &
firewall-cmd --add-port=8080/tcp --permanent
firewall-cmd --reload
PrometheusAlert配置文件
app.conf
#---------------------↓全局配置-----------------------
appname = PrometheusAlert
#登录用户名
login_user=prometheusalert
#登录密码
login_password=prometheusalert
#监听地址
httpaddr = "0.0.0.0"
#监听端口
httpport = 8080
runmode = dev
#设置代理 proxy = http://123.123.123.123:8080
proxy =
#开启JSON请求
copyrequestbody = true
#告警消息标题
title=PrometheusAlert
#链接到告警平台地址
GraylogAlerturl=http://graylog.org
#钉钉告警 告警logo图标地址
logourl=https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png
#钉钉告警 恢复logo图标地址
rlogourl=https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png
#短信告警级别(等于3就进行短信告警) 告警级别定义 0 信息,1 警告,2 一般严重,3 严重,4 灾难
messagelevel=1
#电话告警级别(等于4就进行语音告警) 告警级别定义 0 信息,1 警告,2 一般严重,3 严重,4 灾难
phonecalllevel=4
#默认拨打号码(页面测试短信和电话功能需要配置此项)
defaultphone=18627967217
#---------------------↓阿里云接口-----------------------
#是否开启阿里云短信告警通道,可同时开始多个通道0为关闭,1为开启
open-alydx=1
#阿里云短信主账号AccessKey的ID
ALY_DX_AccessKeyId=xxxxxxxxxxx
#阿里云短信接口密钥
ALY_DX_AccessSecret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ALY_DX_AccessSecret=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
#阿里云短信签名名称
ALY_DX_SignName=otoyix
#阿里云短信模板ID
ALY_DX_Template=SMS_224350085
PrometheusAlert添加阿里云模板
http://192.168.0.59:8080/
{{ range $k,$v:=.alerts }}{{if eq $v.status "resolved"}}
云服务器-恢复通知
名称:{{$v.labels.alertname}}
组:{{$v.labels.app}}
实例:{{$v.labels.instance}}
服务:{{$v.labels.groupname}}
时间:{{$v.endsAt}}(时间 +8小时)
注:{{$v.annotations.description}}
{{else}}
云服务器-故障告警通知
名称:{{$v.labels.alertname}}
组:{{$v.labels.app}}
实例:{{$v.labels.instance}}
服务:{{$v.labels.groupname}}
时间:{{$v.endsAt}}(时间 +8小时)
注:{{$v.annotations.description}}
{{end}}
{{ end }}
json内容
{"receiver":"prometheusalert-phone","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"InproessDown","app":"node-process","groupname":"map[:mysqld]","instance":"192.168.0.63:9256","job":"process","severity":"critical"},"annotations":{"description":"process has been down for more than 1 m .","title":"process down"},"startsAt":"2021-09-15T08:09:48.853Z","endsAt":"2021-09-15T08:10:18.853Z","generatorURL":"http://localhost.localdomain:9090/graph?g0.expr=namedprocess_namegroup_num_procs+%3C+1\u0026g0.tab=1","fingerprint":"02d5be47b4cad419"}],"groupLabels":{"severity":"critical"},"commonLabels":{"alertname":"InproessDown","app":"node-process","groupname":"map[:mysqld]","instance":"192.168.0.63:9256","job":"process","severity":"critical"},"commonAnnotations":{"description":"process has been down for more than 1 m .","title":"process down"},"externalURL":"http://localhost.localdomain:9093","version":"4","groupKey":"{}:{severity=\"critical\"}","truncatedAlerts":0}
四、告警测试
如,停止mysql服务后 再 重启
注:
模板这里可以加上时间
开始时间: {{$v.startsAt}}
结束时间: {{$v.endsAt}}
------------------------end