监控——Prometheus 部署告警对接 QQ 邮箱

目录

一、prometheus告警功能

二、静默、抑制、分组

三、部署告警对接QQ邮箱

修改配置文件

配置绑定的QQ邮箱

启动alertmanager

相关的配置文件

prometheus 启动文件

启动 prometheus

模拟故障(停止node_exporter)


一、prometheus告警功能

Prometheus对指标的收集、存储同告警能力分属于Prometheus Server和AlertManager(通用的组件)两个独立的组件,前者仅负责基于"告警规则"生成告警通知,具体的告警操作则由后者完成;

Alertmanager负责处理由客户端发来的告警通知客户端通常是Prometheus server,但它也支持接收来自其它工具的告警;
Alertmanager对告警通知进行分组、去重后,根据路由规则将其路由到不同的receiver,如Email、短信或PagerDuty等;

二、静默、抑制、分组

  • 分组 (Grouping):将相似告警合并为单个告警通知的机制,在系统因大面积故障而触发告警潮时,分组机制能避免用户被大量的告警噪声淹没,进而导致关键信息的隐没;
  • 抑制(Inhibition):系统中某个组件或服务故障而触发告警通知后,那些依赖于该组件或服务的其它组件或服务可能也会因此而触发告警,抑制便是避免类似的级联告警的一种特性,从而让用户能将精力集中于真正的故障所在;
  • 静默(silent):是指在一个特定的时间窗口内,即便接收到告警通知,Alertmanager也不会真正向用户发送告警信息的行为;通常,在系统例行维护期间,需要激活告警系统的静默特性;
  • 路由(route):用于配置Alertmanager如何处理传入的特定类型的告警通知,其基本逻辑是根据路由匹配规则的匹配结果来确定处理当前告警通知的路径和行为

三、部署告警对接QQ邮箱

[root@qq ~]# systemctl stop firewalld.service 
[root@qq ~]# setenforce 0
##将alertmanager-0.21.0.linux-amd64.tar.gz压缩包传入到/opt目录下
[root@qq /opt]# ls
alertmanager-0.21.0.linux-amd64.tar.gz
[root@qq /opt]# tar zxf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local/
[root@qq /usr/local]# ln -s /usr/local/alertmanager-0.21.0.linux-amd64/ /usr/local/alertmanager
[root@qq /usr/local/alertmanager]# cat /usr/local/alertmanager/alertmanager.yml
global:													#全局参数
  resolve_timeout: 5m
  
route:													#路由信息
  group_by: ['alertname']								#分组
  group_wait: 30s		 								#分组缓冲/等待时间
  group_interval: 5m	 								#重新分组时间
  repeat_interval: 1h	 								#重新告警间隔
  receiver: 'web.hook'	 								#接收方/媒介
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'						#标注5001端口
inhibit_rules:											#抑制规则的策略
  - source_match:										#匹配项
      severity: 'critical'								#严重的级别
    target_match:
      severity: 'warning'								#target匹配warning级别
    equal: ['alertname', 'dev', 'instance']				#符合alertname、dev、instance

修改配置文件

[root@qq /usr/local/alertmanager]# vim /usr/local/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_from: 1441596016@qq.com
  smtp_auth_username: 1441596016@qq.com
  smtp_auth_password: 授权码
  smtp_require_tls: false
  smtp_smarthost: 'smtp.qq.com:456'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'email-test'
receivers:
- name: 'email-test'
  email_configs:
  - to: 1441596016@qq.com
    send_resolved: true

配置绑定的QQ邮箱

启动alertmanager

[root@qq /usr/local/alertmanager]# ./alertmanager 

相关的配置文件

[root@qq /usr/local]# cd prometheus-2.27.1.linux-amd64/
[root@qq /usr/local/prometheus-2.27.1.linux-amd64]# mkdir alert_config
[root@qq /usr/local/prometheus-2.27.1.linux-amd64]# cd alert_config/
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config]# mkdir alert_rules targets
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config]# cd alert_rules/
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/alert_rules]# vim instance_down.yaml 
groups:
- name: AllInstances
  rules:
  - alert: InstanceDown
    # Condition for alerting
    expr: up == 0
    for: 1m
    # Annotation - additional informational labels to store more information
    annotations:
      title: 'Instance down'
      description: Instance has been down for more than 1 minute.'
    # Labels - additional labels to be attached to the alert
    labels:
      severity: 'critical'

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/alert_rules]# vim instance_down.yaml

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config]# cd targets/
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim alertmanagers.yaml 
- targets:
  - 192.168.68.40:9093
  labels:
    app: alertmanager
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim nodes-linux.yaml 
- targets:
  - 192.168.68.30:9100
  - 192.168.68.105:9100
  labels:
    app: node-exporter
    job: node
[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim prometheus-servers.yaml 
- targets:
  - 192.168.68.40:9090
  labels:
    app: prometheus
    job:  prometheus

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim alertmanagers.yaml

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim nodes-linux.yaml

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config/targets]# vim prometheus-servers.yaml

prometheus 启动文件

[root@qq /usr/local/prometheus-2.27.1.linux-amd64/alert_config]# vim prometheus.yml 
# my global config
# Author: MageEdu <mage@magedu.com>
# Repo: http://gitlab.magedu.com/MageEdu/prometheus-configs/
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - file_sd_configs:
    - files:
      - "targets/alertmanagers*.yaml"

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/*.yaml"
  - "alert_rules/*.yaml" 

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    file_sd_configs:
    - files:                                               
      - targets/prometheus-*.yaml  
      refresh_interval: 2m 

  # All nodes
  - job_name: 'nodes'
    file_sd_configs:
    - files:                                               
      - targets/nodes-*.yaml  
      refresh_interval: 2m 

  - job_name: 'alertmanagers'
    file_sd_configs:
    - files:
      - targets/alertmanagers*.yaml
      refresh_interval: 2m 

启动 prometheus

[root@qq /usr/local/prometheus-2.27.1.linux-amd64]# ./prometheus --config.file=./alert_config/prometheus.yml

模拟故障(停止node_exporter)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值