AlertManager报警组件


theme: geek-black

这是我参与更文挑战的第5天,活动详情查看: 更文挑战

AlertManager

紧接着上一篇自定义Prometheus

前言

搭建好了一套监控后,必不可少的就是报警机制了,以各种各样的方式推送消息,比如邮件、短信、钉钉、企业微信等方式,帮助运维人员尽快发现并修复问题

1. 创建AlertManager

老规矩开局直接偷配置文件 docker cp alertmanager:/etc/alertmanager/alertmanager.yml . 启动AlertManager docker run --name alertmanager -d -p 9093:9093 -v /Users/yujian/Documents/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml prom/alertmanager:latest

2. 创建AlertManager告警方式

邮件方式,修改alertmanager.yml ```yaml global: resolvetimeout: 5m smtpsmarthost: 'smtp.163.com:25' smtpfrom: xxxxxxx@163.com smtpauthusername: xxxxxx@163.com smtpauthpassword: xxxxx smtprequire_tls: false

route: groupby: ['alertname'] groupwait: 10s groupinterval: 10s repeatinterval: 1m receiver: 'mail' receivers: - name: 'mail' email_configs: - to: xxxxxxxx@qq.com ``` 此时AlertManager的告警已经配置完成。

3. 创建告警规则

告警规则代表什么情况下会触发报警,由Prometheus控制 ```

修改prometheus.yml

rulefiles: - "/etc/prometheus/rules.yml" # - "secondrules.yml" 此时并没有/etc/prometheus/rules.yml的配置文件,我们来创建一个 vi rule.yml

groups: - name: node-up rules: - alert: cpumax #aleartname expr: easyprometheussystemcpupercent{job="easyprometheus"} > 20 #promQL for: 3s #保持的时间 annotations: #为了更好触发我改为了20% summary: "{{ $labels.instance }} cpu使用率超过20%!" - alert: node-up expr: up{job="easyprometheus"} == 0 #promQL for: 4s labels: #描述 severity: 1 team: node annotations: summary: "{{ $labels.instance }} 已停止运行!" ``` 重新创建Prometheus容器,将rule.yml挂载到/etc/prometheus/rules.yml,启动完成查看Alerts是否成功

image.png

webhook方式 route: group_by: ['instance'] group_wait: 10s group_interval: 20s repeat_interval: 20s #repeat_interval: 1h receiver: 'webhook' receivers: - name: 'webhook' webhook_configs: - url: 'http://192.168.31.150:8089/webhook' 消息格式 ```json {"receiver":"webhook","status":"resolved","alerts":[{"status":"resolved","labels":{{"status":"resolved","labels":{"action":"Cpu利用率","alertname":"cpumax","application":"easyprometheus","cause":"Cpu利"exportedapplication":"easyprometheus","instance":"192.168.31.150:8089","job":"easyprometheus"},"annotations":{"summary":"192.168.31.150:8089 cpu使用率超过20%!"},"startsAt":"2021-06-19T03:21:56.117Z","ends021-06-19T03:22:11.117Z","generatorURL":"http://406161e43292:9090/graph?g0.expr=easyprometheussystemcpupercent%7Bjob%3D%22easyprometheus%22%7D+%3E+20\u0026g0.tab=1","fingerprint":"1bcf523f0c524538"}],"groupLabels":{"instance":"192.168.31.150:8089"},"commonLabels":{"application":"easyprometheus","instance":"192.168.31.150:8089","job":"easy_prometheus"},"commonAnnotations":{},"externalURL":"http://c731ba69bfca:9093","version":"4","groupKey":"{}:{instance=\"192.168.31.150:8089\"}","truncatedAlerts":0}

``` 改造一下Easy-Prometheus(已更新到github)的源码增加监听webhook通知

access_token在钉钉群机器人处创建得到 `` type Ding struct { Alerts []struct{ Annotations struct{ Summary stringjson:"summary" }json:"annotations" }json:"alerts"` }

func dingding(w http.ResponseWriter, r *http.Request) { s, _ := ioutil.ReadAll(r.Body) ding := &Ding{} fmt.Println(string(s)) json.Unmarshal(s,ding) anno := ding.Alerts[0] req :=&httpgo.Req{} x, err := req.Header("Content-Type", "application/json"). Method(http.MethodPost). Url("https://oapi.dingtalk.com/robot/send?accesstoken=xxxxxxx"). Params(httpgo.Query{ "link": map[string]interface{}{ "title": "AlertManager通知", "text": "通知" + anno.Annotations.Summary, #图是网上随便找的 "picUrl": "https://photo.16pic.com/00/65/09/16pic6509905_b.png", #点击消息标题快速跳转到Prometheus "messageUrl":"http://localhost:9090/alerts",

},
        "msgtype": "link",
    }).Go().Body()
if err!=nil {
    log.Println(err)
}
fmt.Println(x)

} ```

3. 测试告警

我这里测试启动多个应用以让CPU达到20%利用率并维持3秒钟。

image.png

钉钉

image.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值