Prometheus配置企业微信报警

置顶

运维之美@

已于 2025-01-27 14:53:55 修改

阅读量6.9k

点赞数 5

分类专栏： prometheus K8S 文章标签：运维微信

于 2021-09-03 21:46:22 首次发布

本文链接：https://blog.csdn.net/m0_37680131/article/details/120090880

版权

本文介绍了如何将Prometheus与企业微信结合，实现监控告警通知。通过配置Prometheus的告警规则和Alertmanager，详细讲解了从安装Alertmanager到设置企业微信接收告警的全过程，包括告警状态的解释和测试验证。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Prometheus配置企业微信报警

更多技术博客，请关注微信公众号：运维之美

Prometheus被号称是下一代的监控，可以解决云上K8S集群的监控问题，搭配部署alertmanager,可以实现告警发送，本篇我们就通过企业微信实现告警发送，运维小哥可以躺平了。

环境：prometheus服务端和alertmanager部署在同一台机器上，实验前提是prometheus服务端已经安装好

操作系统：Centos7.4

prometheus的告警管理分为两部分。通过在prometheus服务端设置告警规则， Prometheus服务器端产生告警向Alertmanager发送告警。然后，Alertmanager管理这些告警，包括静默，抑制，聚合以及通过电子邮件，企业微信，钉钉等方法发送告警通知。

设置警报和通知的主要步骤如下：

安装启动Alertmanager；

配置Prometheus对Alertmanager访问，配置告警规则；

配置企微后台，alertmanager配置对接企微并配置告警模板；

修改阈值触发告警

01 安装AlertManager

以官网最新版本为例，可以从官网地址https://prometheus.io/download/下载alertmanager安装包

[root@prometheus ~]# mkdir -p /usr/local/alertmanager
[root@prometheus~]# tar -xvf alertmanager-0.22.2.linux-amd64.tar.gz  -C /usr/local/alertmanager
[root@prometheus~]# cd /usr/local/alertmanager/
[root@prometheus alertmanager]# nohup ./alertmanager &

将包上传到服务器上，按照上面步骤安装和启动alertmanager服务

02 配置prometheus

prometheus中添加配置监控alertmanager服务器

prometheus.yml添加如下配置

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 192.168.61.123:9093
rule_files:
  - "rules/*_rules.yml"
  - "rules/*_alerts.yml"

scrape_configs:
  - job_name: 'alertmanager'
    static_configs:
    - targets: ['localhost:9093']

rule_files为告警触发的规则文件

prometheus当前路径下新建rules目录，创建如下配置文件

[root@prometheus prometheus]# cd rules/
[root@prometheus rules]# ls
node_alerts.yml  pod_rules.yml

Node节点告警配置

node_alerts.yml #监控主机级别告警

[root@prometheus rules]# cat node_alerts.yml
groups:
- name: 主机状态-监控告警
  rules:
  - alert: 主机状态
    expr: up {
   job="kubernetes-nodes"} == 0
    for: 15s
    labels:
      status: 非常严重
    annotations:
      summary: "{
   {.instance}}:服务器宕机"
      description: "{
   {.instance}}:服务器延时超过15s"

  - alert: CPU使用情况
    expr: 100-(avg(irate(node_cpu_seconds_total{
   mode="idle"}[5m])) by(instance)* 100) > 60
    for: 1m
    labels:
      status: warning
    annotations:
      summary: "{
   {
   $labels.instance}}: High CPU Usage Detected"
      description: "{
   {
   $labels.instance}}: CPU usage is {
   {
   $value}}, above 60%"

  - alert: NodeFilesystemUsage
    expr: 100 - (node_filesystem_free_bytes{
   fstype=~"ext4|xfs"} / node_filesystem_size_bytes{
   fstype=~"ext4|xfs"} * 100) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "Instance {
   { $labels.instance }} : {
   { $labels.mountpoint }} 分区使用率过高"
      description: "{
   { $labels.instance }}: {
   { $labels.mountpoint }} 分区使用大于80% (当前值: {
   { $value }})"

  - alert: 内存使用
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100