Prometheus + Grafana (二)实现自动告警

AlertManager


1、安装 AlertManager

cd /usr/software
wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz 
tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/prometheus-2.26.0.linux-amd64
./

2、配置 prometheus.yml

cd /usr/local/prometheus-2.26.0.linux-amd64
vim prometheus.yml

######## prometheus.yml 配置文件 ###########
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'localhost:19911'

rule_files:
   - "rules.yml"

3、配置 alertmanager.yml

cd /usr/local/alertmanager-0.21.0.linux-amd64
vim alertmanager.yml


######## alertmanager.yml 配置文件 ###########
global:
  resolve_timeout: 5m

  # smtp配置
  smtp_from: "ezrealer@qq.com"
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: "ezrealer@qq.com"
  smtp_auth_password: "123456"
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'ezrealer_email'
receivers:
- name: 'ezrealer_email'
  email_configs:
  - to: '2695138379@qq.com'
    send_resolved: true
    headers:
      from: "Prometheus 警报中心"
      subject: "报警邮件"
      to: "ezrealer2"

4、配置 rules.yml

cd /usr/local/prometheus-2.26.0.linux-amd6
vim rules.yml

######## rules.yml 配置文件 ###########
groups:
  - name: node_status
    rules:
    - alert: node_status
      expr: probe_success == 0
      for: 1m
      labels:
        status: 严重
      annotations:
        summary: "group:{{$labels.group}},instance:{{$labels.instance}} has been down "
        description: "group:{{$labels.group}},instance:{{$labels.instance}} has been down "
        value: "{{$value}}"
  - name: CPU
    rules:
    - alert: CPU使用率
      expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[6m]))) by (instance) * 100 > 80
      for: 1m
      labels:
        status: 一般
      annotations:
        summary: "group:{{$labels.group}},instance:{{$labels.instance}}:CPU使用率大于80%"
        value: "{{$value}}"

服务器的监控与告警


参考:https://mp.weixin.qq.com/s/DILXvkvpS25VJbb3FalBqQ

CPU
内存
磁盘
可用性
服务状态
网络

CPU

100-(avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)* 100) > 60

node_load5 > on (instance) 2 * count by(instance)(node_cpu_seconds_total{mode="idle"})

内存

node_memory_MemTotal_bytes:主机上的总内存
node_memory_MemFree_bytes:主机上的可用内存
node_memory_Buffers_bytes:缓冲缓存中的内存
node_memory_Cached_bytes:页面缓存中的内存

100 - sum(node_memory_MemFree_bytes{job="node-exporter"} + node_memory_Buffers_bytes{job="node-exporter"} + node_memory_Cached_bytes{job="node-exporter"})by (instance) / sum(node_memory_MemTotal_bytes{job="node-exporter"})by(instance)*100 > 80

磁盘

predict_linear(node_filesystem_free_bytes{job="node-exporter",mountpoint!=""}[1h], 4*3600) 

(100 - (node_filesystem_avail_bytes{fstype!="",job="node-exporter"} / node_filesystem_size_bytes{fstype!="",job="node-exporter"} * 100)>80) and (predict_linear(node_filesystem_free_bytes{job="node-exporter",mountpoint!="",device!="rootfs"}[1h],4 * 3600) < 0)

100-(avg(irate(node_disk_io_time_seconds_total[1m])) by(instance)* 100)

可用性

up{job="node-exporter"}==0

服务状态

1、docker

node_systemd_unit_state{name="docker.service",state="active"} == 1
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值