文章目录
Prometheus-Alertmanger 告警实例之:端口监控 企微告警
安装blackbox_exporter插件
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.15.1/blackbox_exporter-0.15.1.linux-amd64.tar.gz
tar xf blackbox_exporter-0.15.1.linux-amd64.tar.gz
mv blackbox_exporter-0.15.1.linux-amd64 /data/prometheus/blackbox_exporter
cd /data/prometheus/blackbox_exporter/
nohup /data/prometheus/blackbox_exporter/blackbox_exporter --config.file=/data/prometheus/blackbox_exporter/blackbox.yml &
# 端口为 9115
设置端口监控配置
prometheus.yml
# Tcp_Port 端口监控
- job_name: 'TCP_PORT'
scrape_interval: 15s # 探测间隔
metrics_path: '/probe' # 信息查看uri
params:
module: [tcp_connect] # 类型 tcp 探测
static_configs:
- targets: ['192.168.1.104:40000','192.168.1.105:50000'] # 监控地址和端口
labels:
groups: "端口监控" # 组名
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.1.103:9115 # 负责的端口
告警消息通知模板
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
========= 异常告警 =========
触发时间: {{ ($alert.StartsAt.Add 28800e9).Format "2010-01-02 15:04:05" }}
告警类型: {{ $alert.Labels.alertname }}
告警级别: {{ $alert.Labels.severity }}
告警详情: {{ $alert.Annotations.description}} # 获取规则文件中的description
========= = end = =========
{{- end }}
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
========= 告警恢复 =========
告警类型: {{ $alert.Labels.alertname }} # 获取规则文件中的alert
告警级别: {{ $alert.Labels.severity }} # 获取规则文件中的severity
触发时间: {{ ($alert.StartsAt.Add 28800e9).Format "2010-01-02 15:04:05" }}
恢复时间: {{ ($alert.EndsAt.Add 28800e9).Format "2010-01-02 15:04:05" }}
========= = end = =========
{{- end }}
{{- end }}
{{- end }}
{{- end }}
rule 告警规则
groups: # 组
- name: tcp_port # 告警名称
rules:
- alert: API service is health # 标识 名字
expr: probe_success{groups="端口监控"} == 0 # 规则触发表达式 根据实际情况进行查询,prometheus的查询语句请自行百度【PQL语言】
for: 1m # 持续时间 触发后持续的时间
labels:
severity: warning # 级别,warning
annotations:
description: 'API service: {{$labels.instance}} 端口检查失败,服务不可用,请检查' # 消息通知信息
summary: consumer lag behind
## 告警模板的具体消息,可通过变量进行获取
重启alertmanager和prometheus
cd /data/prometheus/alertmanager
./amtool check-config alertmanager.yml
pkill alertmanager
nohup ./alertmanager --config.file=alertmanager.yml --log.level=debug &
systemctl restart prometheus.service
systemctl status prometheus.service
然后查看prometheus控制台信息 : http://localhost:9090
最后测试告警,查看企微是否收到消息,测试省略