Prometheus+Altermanager实现邮箱告警
Prometheus的安装在上一篇文章有详细介绍了
altermanager官网下载地址
1.准备安装包
[root@server161 ~]# ls
alertmanager-0.26.0.linux-amd64.tar.gz anaconda-ks.cfg original-ks.cfg
2.解压安装
tar -xf alertmanager-0.26.0.linux-amd64.tar.gz -C /opt/
mv /opt/alertmanager-0.26.0.linux-amd64/ /opt/alertmanager
chown -R prometheus:prometheus /opt/alertmanager/
3.编写 alertmanager systemd服务启动脚本
[root@server161 ~]# vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=Alertmanager Server
After=network.target
[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/opt/alertmanager/alertmanager \
--config.file=/opt/alertmanager/alertmanager.yml \
--storage.path=/opt/alertmanager/data
Restart=always
[Install]
WantedBy=multi-user.target
启动altermanager
systemctl daemon-reload
systemctl start alertmanager
systemctl status alertmanager
浏览器查看
修改Prometheus配置文件
[root@server160 ~]# vim /opt/prometheus-2.45/prometheus.yml
告警配置文件信息,根据监控项自定义更改
[root@server160 ~]# cat /opt/prometheus-2.45/alert.yml
groups:
- name: prometheus alert
rules:
# 所有 超过30s 无法联系都发出 告警信息
- alert: "服务警告"
expr: up == 0
for: 30s
labels:
severity: critical
annotations:
instance: "{{ $labels.instance }}"
description: "{{ $labels.job }} 服务已关闭."
- alert: "CPU使用率告警"
expr: 1 - avg(irate(node_cpu_seconds_total{mode="idle"}[1m])) by (instance) > 0.75
for: 30s
labels:
level: warning
annotations:
summary: "{{ $labels.instance }} CPU负载告警 "
description: "{{$labels.instance}} CPU使用率超过75%(当前值: {{ $value }})"
- alert: "内存使用率告警"
expr: (1 - node_memory_MemAvailable_bytes{job="node-exporter"} / node_memory_MemTotal_bytes{job="node-exporter"}) * 100 > 80
for: 30s
labels:
level: critical
annotations:
summary: "{{ $labels.instance }} 可用内存不足告警"
description: "{{$labels.instance}} 内存使用率已达80% (当前值: {{ $value }})"
然后通过POST请求更新Prometheus的配置
curl -X POST http://192.168.121.160:9090/-/reload
浏览器访问Prometheus,可以看到告警规则存在
然后配置altermanager的告警媒介
[root@server161 ~]# cat /opt/alertmanager/alertmanager.yml
global:
#在3m内收到Prometheus发来相同告警情况下认为告警已经恢复
resolve_timeout: 3m
#SMTP邮件服务器配置
smtp_smarthost: 'smtp.163.com:25'
smtp_from: '********@163.com'
smtp_auth_username: '**********'
smtp_auth_password: '**************'
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'email'
#接受邮箱
receivers:
- name: 'email'
email_configs:
#接收告警的目标邮箱
- to: '********@qq.com'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
检查配置是否有问题
[root@server161 ~]# /opt/alertmanager/amtool check-config /opt/alertmanager/alertmanager.yml
Checking '/opt/alertmanager/alertmanager.yml' SUCCESS
Found:
- global config
- route
- 1 inhibit rules
- 1 receivers
- 0 templates
没有问题则也通过POST请求重载配置
curl -X POST http://192.168.121.161:9093/-/reload
浏览器查看altermanager的配置更新情况
可以看到已经没有问题了
现在进行告警测试
我将151主机挂掉,查看报警接收情况
服务挂了,准备进入告警状态
开始告警
可以看到已经触发告警了
然后我的qq邮箱也已经收到告警信息了
这样就完成了Prometheus+Altermanager告警的配置
希望对大家有帮助
赞点点