监控——Prometheus 部署告警对接 QQ 邮箱

jwrrrrrr

已于 2022-03-15 13:23:33 修改

阅读量704

点赞数

分类专栏： Linux Prometheus 文章标签：云计算运维 prometheus 监控类 linux

于 2022-02-18 17:55:17 首次发布

本文链接：https://blog.csdn.net/oyyy3/article/details/123008488

版权

Linux 同时被 2 个专栏收录

84 篇文章 5 订阅

订阅专栏

Prometheus

4 篇文章 1 订阅

订阅专栏

一、prometheus告警功能

Prometheus对指标的收集、存储同告警能力分属于Prometheus Server和AlertManager(通用的组件)两个独立的组件，前者仅负责基于"告警规则"生成告警通知，具体的告警操作则由后者完成;

Alertmanager负责处理由客户端发来的告警通知客户端通常是Prometheus server，但它也支持接收来自其它工具的告警;
Alertmanager对告警通知进行分组、去重后，根据路由规则将其路由到不同的receiver，如Email、短信或PagerDuty等;

二、静默、抑制、分组

分组（Grouping):将相似告警合并为单个告警通知的机制，在系统因大面积故障而触发告警潮时，分组机制能避免用户被大量的告警噪声淹没，进而导致关键信息的隐没;
抑制(Inhibition):系统中某个组件或服务故障而触发告警通知后，那些依赖于该组件或服务的其它组件或服务可能也会因此而触发告警，抑制便是避免类似的级联告警的一种特性，从而让用户能将精力集中于真正的故障所在;
静默(silent):是指在一个特定的时间窗口内，即便接收到告警通知，Alertmanager也不会真正向用户发送告警信息的行为;通常，在系统例行维护期间，需要激活告警系统的静默特性;
路由(route):用于配置Alertmanager如何处理传入的特定类型的告警通知，其基本逻辑是根据路由匹配规则的匹配结果来确定处理当前告警通知的路径和行为

三、部署告警对接QQ邮箱

[root@qq ~]# systemctl stop firewalld.service 
[root@qq ~]# setenforce 0
##将alertmanager-0.21.0.linux-amd64.tar.gz压缩包传入到/opt目录下
[root@qq /opt]# ls
alertmanager-0.21.0.linux-amd64.tar.gz
[root@qq /opt]# tar zxf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local/
[root@qq /usr/local]# ln -s /usr/local/alertmanager-0.21.0.linux-amd64/ /usr/local/alertmanager

[root@qq /usr/local/alertmanager]# cat /usr/local/alertmanager/alertmanager.yml
global:													#全局参数
  resolve_timeout: 5m
  
route:													#路由信息
  group_by: ['alertname']								#分组
  group_wait: 30s		 								#分组缓冲/等待时间
  group_interval: 5m	 								#重新分组时间
  repeat_interval: 1h	 								#重新告警间隔
  receiver: 'web.hook'	 								#接收方/媒介
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'						#标注5001端口
inhibit_rules:											#抑制规则的策略
  - source_match:										#匹配项
      severity: 'critical'								#严重的级别
    target_match:
      severity: 'warning'								#target匹配warning级别
    equal: ['alertname', 'dev', 'instance']				#符合alertname、dev、instance

修改配置文件

[root@qq /usr/local/alertmanager]# vim /usr/local/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_from: 1441596016@qq.com
  smtp_auth_username: 1441596016@qq.com
  smtp_auth_password: 授权码
  smtp_require_tls: false
  smtp_smarthost: 'smtp.qq.com:456'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email-test'
receivers:
- name: 'email-test'
  email_configs:
  - to: 1441596016@qq.com
    send_resolved: true

配置绑定的QQ邮箱

启动alertmanager

[root@qq /usr/local/alertmanager]# ./alertmanager

相关的配置文件

[root@qq /usr/local]# cd prometheus-2.27.1.linux-amd64/
[root@qq /usr/local/prometheus-2.27.1.linux-amd64]# mkdir alert_config
[root@qq /usr/local/prometheus-2.27.1.linux-amd64]# cd alert_config/
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config]# mkdir alert_rules targets
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config]# cd alert_rules/
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/alert_rules]# vim instance_down.yaml 
groups:
- name: AllInstances
  rules:
  - alert: InstanceDown
    # Condition for alerting
    expr: up == 0
    for: 1m
    # Annotation - additional informational labels to store more information
    annotations:
      title: 'Instance down'
      description: Instance has been down for more than 1 minute.'
    # Labels - additional labels to be attached to the alert
    labels:
      severity: 'critical'

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/alert_rules]# vim instance_down.yaml

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config]# cd targets/
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim alertmanagers.yaml 
- targets:
  - 192.168.68.40:9093
  labels:
    app: alertmanager
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim nodes-linux.yaml 
- targets:
  - 192.168.68.30:9100
  - 192.168.68.105:9100
  labels:
    app: node-exporter
    job: node
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim prometheus-servers.yaml 
- targets:
  - 192.168.68.40:9090
  labels:
    app: prometheus
    job:  prometheus

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim alertmanagers.yaml

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim nodes-linux.yaml

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim prometheus-servers.yaml

prometheus 启动文件

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config]# vim prometheus.yml 
# my global config
# Author: MageEdu <mage@magedu.com>
# Repo: http://gitlab.magedu.com/MageEdu/prometheus-configs/
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - file_sd_configs:
    - files:
      - "targets/alertmanagers*.yaml"

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/*.yaml"
  - "alert_rules/*.yaml" 

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    file_sd_configs:
    - files:                                               
      - targets/prometheus-*.yaml  
      refresh_interval: 2m 

  # All nodes
  - job_name: 'nodes'
    file_sd_configs:
    - files:                                               
      - targets/nodes-*.yaml  
      refresh_interval: 2m 

  - job_name: 'alertmanagers'
    file_sd_configs:
    - files:
      - targets/alertmanagers*.yaml
      refresh_interval: 2m

启动 prometheus

[root@qq /usr/local/prometheus-2.27.1.linux-amd64]# ./prometheus --config.file=./alert_config/prometheus.yml

模拟故障（停止node_exporter）

jwrrrrrr

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
监控——Prometheus 部署告警对接 QQ 邮箱

目录一、prometheus告警功能二、静默、抑制、分组三、部署告警对接QQ邮箱修改配置文件配置绑定的QQ邮箱启动alertmanager相关的配置文件prometheus 启动文件启动 prometheus模拟故障（停止node_exporter）一、prometheus告警功能Prometheus对指标的收集、存储同告警能力分属于Prometheus Server和AlertManager(通用的组件)两个独立的组件，前者仅负责基于"告警规则"生成告警通知
复制链接

扫一扫