Prometheus 容器化部署,配合Grafan画图工具监控节点

部署 Prometheus

环境部署

主机名IP地址服务
prometheus192.168.129.205prometheus、grafana
node-exporter192.168.129.33node_exporter、cAdvisor

准备工作

//修改主机名
[root@localhost ~]# hostnamectl  set-hostname prometheus
[root@localhost ~]# bash

//关闭防火墙
[root@prometheus ~]# systemctl disable --now firewalld.service 
[root@prometheus ~]# sed -i s/SELINUX=enforing/SELINUX=disabled/g /etc/selinux/config 
[root@prometheus ~]# setenforce 0
[root@prometheus ~]# reboot

//配置yum源
[root@prometheus ~]# curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-8.repo
[root@prometheus ~]# sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo

//配置docker的yum源
[root@prometheus ~]# cd /etc/yum.repos.d/
[root@prometheus ~]# curl -o docker-ce.repo https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/centos/docker-ce.repo
[root@prometheus ~]# sed -i 's@https://download.docker.com@https://mirrors.tuna.tsinghua.edu.cn/docker-ce@g' docker-ce.repo

//安装docker-ce以及docker组件
[root@prometheus ~]# yum -y install docker-ce 
[root@prometheus ~]# systemctl enable --now docker
[root@prometheus ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-12-29 23:32:00 CST; 7min ago
     Docs: https://docs.docker.com
 Main PID: 1002 (dockerd)
    Tasks: 13
   Memory: 128.2M
   CGroup: /system.slice/docker.service
           └─1002 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

//配置阿里云镜像加速
[root@prometheus ~]# tee /etc/docker/daemon.json << 'EOF'
{
  "registry-mirrors": ["https://b9pmyelo.mirror.aliyuncs.com"]
}
EOF
[root@prometheus ~]# systemctl restart docker

//
[root@prometheus ~]# docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:22 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:43:44 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

运行promethues容器

拉取官方prometheus镜像

[root@prometheus ~]# docker pull prom/prometheus
Using default tag: latest
latest: Pulling from prom/prometheus
3cb635b06aa2: Pull complete 
34f699df6fe0: Pull complete 
33d6c9635e0f: Pull complete 
f2af7323bed8: Pull complete 
c16675a6a294: Pull complete 
827843f6afe6: Pull complete 
3d272942eeaf: Pull complete 
7e785cfa34da: Pull complete 
05e324559e3b: Pull complete 
170620261a59: Pull complete 
ec35f5996032: Pull complete 
5509173eb708: Pull complete 
Digest: sha256:cb9817249c346d6cfadebe383ed3b3cd4c540f623db40c4ca00da2ada45259bb
Status: Downloaded newer image for prom/prometheus:latest
docker.io/prom/prometheus:latest

创建存放prometheus配置文件的目录,并提供默认配置文件

[root@prometheus ~]# mkdir -p /prometheus/config/
[root@prometheus ~]# vi /prometheus/config/prometheus.yml 
[root@prometheus ~]# cat /prometheus/config/prometheus.yml 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

使用官方promethrus镜像创建容器

[root@prometheus ~]# docker images
REPOSITORY                                           TAG       IMAGE ID       CREATED         SIZE
prom/prometheus                                      latest    a3d385fc29f9   11 days ago     201MB

[root@prometheus ~]# docker run --name prometheus -d --restart always  -p 9090:9090 -v /prometheus/config/prometheus.yml:/etc/prometheus/prometheus.yml:ro   prom/prometheus:latest
268b771e2b8ad9680cd33ab48d7789de1e0da8f28b37f59acfe1bb32ea0235f2
[root@prometheus ~]# docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED         STATUS         PORTS                                       NAMES
268b771e2b8a   prom/prometheus:latest   "/bin/prometheus --c…"   7 seconds ago   Up 5 seconds   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp   prometheus

//查看端口号
[root@prometheus ~]# ss -antl
State         Recv-Q        Send-Q               Local Address:Port                Peer Address:Port        Process        
LISTEN        0             128                        0.0.0.0:9090                     0.0.0.0:*                          
LISTEN        0             128                        0.0.0.0:22                       0.0.0.0:*                          
LISTEN        0             128                           [::]:9090                        [::]:*                          
LISTEN        0             128                           [::]:22                          [::]:* 

使用本机IP地址192.168.129.205 + 端口号9090/targets在浏览器中访问
在这里插入图片描述

部署 node_exporter

准备工作
node_exporter-1.3.0.linux-amd64.tar.gz

//修改主机名
[root@localhost ~]# hostnamectl  set-hostname node-exporter
[root@localhost ~]# bash

//关闭防火墙
[root@node-exporter ~]# systemctl disable --now firewalld.service 
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@node-exporter ~]# sed -i s/SELINUX=enforing/SELINUX=disabled/g /etc/selinux/config 
[root@node-exporter ~]# setenforce 0
[root@node-exporter ~]# reboot

//下载安装包
[root@node-exporter ~]# cd /usr/src/
[root@node-exporter src]# wget https://github.com/prometheus/node_exporter/releases/download/v1.3.0/node_exporter-1.3.0.linux-amd64.tar.gz
[root@node-exporter src]# ls
debug  kernels  node_exporter-1.3.0.linux-amd64.tar.gz

//解压并重命名
[root@node-exporter src]# tar -xf node_exporter-1.3.0.linux-amd64.tar.gz  -C /usr/local/
[root@node-exporter src]# mv /usr/local/node_exporter-1.3.0.linux-amd64/ /usr/local/node_exporter
[root@node-exporter src]# ls /usr/local/
bin  etc  games  include  lib  lib64  libexec  node_exporter  sbin  share  src

//编写.service文件
[root@node-exporter ~]# cat > /usr/lib/systemd/system/node_exporter.service   <<EOF
[unit]
Description=The node_exporter Server
After=network.target

[Service]
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
RestartSec=15s
SyslogIdentifier=node_exporter

[Install]
WantedBy=multi-user.target
EOF

[root@node-exporter ~]# systemctl daemon-reload

//启动
[root@node-exporter ~]# systemctl enable --now node_exporter
Created symlink from /etc/systemd/system/multi-user.target.wants/node_exporter.service to /usr/lib/systemd/system/node_exporter.service.
[root@node-exporter ~]# systemctl status node_exporter
● node_exporter.service
   Loaded: loaded (/usr/lib/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since 四 2021-12-30 00:24:44 CST; 1s ago
 Main PID: 62288 (node_exporter)
    Tasks: 4
   Memory: 6.5M
   CGroup: /system.slice/node_exporter.service
           └─62288 /usr/local/node_exporter/node_exporter

//查看端口号
[root@node-exporter ~]# ss -antl
State         Recv-Q        Send-Q               Local Address:Port                Peer Address:Port        Process        
LISTEN        0             128                        0.0.0.0:22                       0.0.0.0:*                          
LISTEN        0             128                              *:9100                           *:*                          
LISTEN        0             128                           [::]:22                          [::]:*  

添加节点到prometheus中

修改/prometheus/config目录下的prometheus配置文件prometheus.yml

[root@prometheus ~]# cd /prometheus/config/
[root@prometheus config]# vi prometheus.yml 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node1"                      #添加节点名称       
    static_configs:
      - targets: ["192.168.129.33:9100"]   #添加node_exporter节点IP地址和端口号

//重启prometheus容器
[root@prometheus ~]# docker restart prometheus
prometheus
[root@prometheus ~]# docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED          STATUS         PORTS                                       NAMES
268b771e2b8a   prom/prometheus:latest   "/bin/prometheus --c…"   42 minutes ago   Up 5 seconds   0.0.0.0:9090->9090/tcp, :::9090->9090/tcp   prometheus

使用prometheus主机IP地址192.168.129.205+ 端口号9090/targets在浏览器中访问
在这里插入图片描述
查看监控数据
浏览器打开IP+端口 192.168.129.205:9090/metrics
在这里插入图片描述

部署grafana画图工具

拉取grafan/grafan官方镜像

[root@prometheus ~]# docker pull grafana/grafana
Using default tag: latest
latest: Pulling from grafana/grafana
97518928ae5f: Pull complete 
5b58818b7f48: Pull complete 
d9a64d9fd162: Pull complete 
4e368e1b924c: Pull complete 
867f7fdd92d9: Pull complete 
387c55415012: Pull complete 
07f94c8f51cd: Pull complete 
ce8cf00ff6aa: Pull complete 
e44858b5f948: Pull complete 
4000fdbdd2a3: Pull complete 
Digest: sha256:18d94ae734accd66bccf22daed7bdb20c6b99aa0f2c687eea3ce4275fe275062
Status: Downloaded newer image for grafana/grafana:latest
docker.io/grafana/grafana:latest
[root@prometheus ~]# docker images
REPOSITORY                                           TAG       IMAGE ID       CREATED         SIZE
prom/prometheus                                      latest    a3d385fc29f9   11 days ago     201MB
grafana/grafana                                      latest    9b957e098315   2 weeks ago     275MB

使用官方grafana镜像运行容器

[root@prometheus ~]# docker run -d --name grafana -p 3000:3000 --restart always grafana/grafana
32239d6bf896ca59c12df52d070865a007a2ea44de43149d1cacdd7582a5d2b0
[root@prometheus ~]# docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED          STATUS          PORTS                                       NAMES
32239d6bf896   grafana/grafana          "/run.sh"                30 seconds ago   Up 28 seconds   0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   grafana
268b771e2b8a   prom/prometheus:latest   "/bin/prometheus --c…"   54 minutes ago   Up 6 minutes    0.0.0.0:9090->9090/tcp, :::9090->9090/tcp   prometheus
[root@prometheus ~]# ss -antl
State         Recv-Q        Send-Q               Local Address:Port                Peer Address:Port        Process        
LISTEN        0             128                        0.0.0.0:3000                     0.0.0.0:*                          
LISTEN        0             128                        0.0.0.0:9090                     0.0.0.0:*                          
LISTEN        0             128                        0.0.0.0:22                       0.0.0.0:*                          
LISTEN        0             128                           [::]:3000                        [::]:*                          
LISTEN        0             128                           [::]:9090                        [::]:*                          
LISTEN        0             128                           [::]:22                          [::]:* 

使用prometheus主机IP地址192.168.129.205 + 端口号3000在浏览器中访问

  • 默认账号:admin 密码:admin

在这里插入图片描述

再次输入(因为你是首次登陆)
在这里插入图片描述

进入首页
在这里插入图片描述
配置数据源
在这里插入图片描述在这里插入图片描述

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
导入仪表盘模板
仪表板模板下载地址

  • 下载JSON文件到本地,uoload上传导入
  • 直接输入ID,load就会自动加载到这个模板

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

效果图

在这里插入图片描述

部署cAdvisor

[root@node-exporter ~]# docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
--privileged \
--device=/dev/kmsg \
google/cadvisor
ac1ef6a06434feaf84272a454a95c64178eeac37b6d57d6aa544f4d0cd744418

访问

node-exporter 主机的ip+8080端口
在这里插入图片描述
添加容器ip

[root@prometheus ~]# vi /prometheus/config/prometheus.yml 
[root@prometheus ~]# cat /prometheus/config/prometheus.yml 
# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node1"                      #添加节点名称       
    static_configs:
      - targets: ["192.168.129.33:9100"]   #添加node_exporter节点IP地址和端口号
  - job_name: "docker"
    static_configs:
      - targets: ["192.168.129.33:8080"]   #添加node_exporter节点IP地址和端口号
 
//重启prometheus
[root@prometheus ~]# docker restart prometheus
prometheus

访问测试

现有已经把容器监控起来
在这里插入图片描述

不过这只是监控的宿主机资源信息,如果我们想看docker容器的信息
在这里插入图片描述

在官网查找与docker有关的模板导入并使用
适合docker的模板11600

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

配置Alertmanager报警

alertmanager与prometheus工作流程如下
在这里插入图片描述

  • prometheus收集监测的信息
  • prometheus.yml文件定义rules文件,rules里包括了告警信息
  • prometheus把报警信息push给alertmanager ,alertmanager里面有定义收件人和发件人
  • alertmanager发送文件给邮箱或微信

告警等级
在这里插入图片描述

启动并配置 AlertManager

[root@master ~]# docker pull prom/alertmanager:latest
latest: Pulling from prom/alertmanager
aa2a8d90b84c: Pull complete 
b45d31ee2d7f: Pull complete 
e64c3c57ffe7: Pull complete 
7665a4a59238: Pull complete 
9a345be9cdfe: Pull complete 
aa42aae1183b: Pull complete 
Digest: sha256:9ab73a421b65b80be072f96a88df756fc5b52a1bc8d983537b8ec5be8b624c5a
Status: Downloaded newer image for prom/alertmanager:latest
docker.io/prom/alertmanager:latest
[root@prometheus ~]# mkdir /alertmanager
[root@prometheus ~]# docker cp alertmanager:/etc/alertmanager/alertmanager.yml /alertmanager
[root@prometheus ~]# ls /alertmanager/
alertmanager.yml

[root@prometheus ~]# docker run --name alertmanager -d -v /alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml -p 9093:9093 --restart always prom/alertmanager
4afabc63c74ea8372035878ef2ca1f84b289450e585efd3efdb7a87bfcc6c4bb

这里 AlertManager 默认启动的端口为 9093,启动完成后,浏览器访问 http://<IP>:9093可以看到默认提供的 UI 页面,不过现在是没有任何告警信息的,因为我们还没有配置报警规则来触发报警。
在这里插入图片描述

配置alertmanager邮箱报警

AlertManager 默认配置文件为alertmanager.yml,在容器内路径为 /etc/alertmanager/alertmanager.yml,默认配置如下:

[root@prometheus ~]# cat /alertmanager/alertmanager.yml 
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

简单介绍一下主要配置的作用:

  • global: 全局配置,包括报警解决后的超时时间、SMTP 相关配置、各种渠道通知的 API 地址等等。
  • route: 用来设置报警的分发策略,它是一个树状结构,按照深度优先从左向右的顺序进行匹配。
  • receivers: 配置告警消息接受者信息,例如常用的 email、wechat、slack、webhook 等消息通知方式。
  • inhibit_rules: 抑制规则配置,当存在与另一组匹配的警报(源)时,抑制规则将禁用与一组匹配的警报(目标)。

那么,我们就来配置一下使用 Email 方式通知报警信息,这里以 QQ 邮箱为例,配置如下:

global:
  resolve_timeout: 5m
  #邮箱服务器
  smtp_from: 'xxxxxxxx@qq.com'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: 'xxxxxxxx@qq.com'
  smtp_auth_password: 'xxxxxxxxxxxxxxx'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
#配置路由树
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
#接收人
receivers:
- name: 'email'
  email_configs:
  - to: 'xxxxxxxx@qq.com'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
  • smtp_smarthost: 这里为 QQ 邮箱 SMTP 服务地址,官方地址为 smtp.qq.com 端口为 465587,同时要设置开启 POP3/SMTP 服务。
  • smtp_auth_password: 这里为第三方登录 QQ 邮箱的授权码,非 QQ 账户登录密码,否则会报错,获取方式在 QQ 邮箱服务端设置开启 POP3/SMTP 服务时会提示。
  • smtp_require_tls: 是否使用 tls,根据环境不同,来选择开启和关闭。如果提示报错 email.loginAuth failed: 530 Must issue a STARTTLS command first,那么就需要设置为 true。着重说明一下,如果开启了 tls,提示报错 starttls failed: x509: certificate signed by unknown authority,需要在 email_configs 下配置 insecure_skip_verify: true 来跳过 tls 验证。

配置邮箱报警,首先我们邮箱需要开启SMTP服务,并获取唯一标识码
在这里插入图片描述
在这里插入图片描述

在这里插入图片描述
编辑alertmanager.yml文件

[root@prometheus ~]# cat /alertmanager/alertmanager.yml 
global:
  resolve_timeout: 5m
  smtp_from: '2464146612@qq.com'  #定义发送的邮箱
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '2464146612@qq.com'
  smtp_auth_password: 'rgxloknrxwpyebcc'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '2464146612@qq.com'
    send_resolved: true 
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

prometheus添加alertmanager报警规则

接下来,我们需要在 Prometheus 配置 AlertManager 服务地址以及告警规则,新建报警规则文件node-up.rules如下

[root@prometheus ~]# mkdir /prometheus/rules && cd /prometheus/rules
[root@prometheus rules]# vi node_up.rules
[root@prometheus rules]# cat node_up.rules 
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node1"} == 0          #这个是在prometheus.yml里面设置的job_name
    for: 15s
    labels:
      severity: 1 
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s!"

修改prometheus.yml文件,添加rules规则

[root@prometheus ~]# vi /prometheus/config/prometheus.yml 
...
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 192.168.129.205:9093        #alertmanager地址以及端口
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/usr/local/prometheus/rules/*.rules"
  # - "first_rules.yml"
  # - "second_rules.yml"
...

//重启服务
[root@prometheus ~]# docker restart prometheus
prometheus

在prometheus上查看相应的规则
在这里插入图片描述
这里说明一下 Prometheus Alert 告警状态有三种状态:InactivePendingFiring

  • Inactive:非活动状态,表示正在监控,但是还未有任何警报触发。
  • Pending:表示这个警报必须被触发。由于警报可以被分组、压抑/抑制或静默/静音,所以等待验证,一旦所有的验证都通过,则将转到 Firing 状态。
  • Firing:将警报发送到 AlertManager,它将按照配置将警报的发送给所有接收者。一旦警报解除,则将状态转到 Inactive,如此循环。

触发报警发送 Email

上边我们定义的 rule 规则为监测 job="node-exporter" Node 是否活着,那么就可以停掉 node-exporter 服务来间接起到 Node Down 的作用,从而达到报警条件,触发报警规则。

[root@node-exporter ~]# docker ps
CONTAINER ID   IMAGE             COMMAND                  CREATED       STATUS       PORTS                                       NAMES
ac1ef6a06434   google/cadvisor   "/usr/bin/cadvisor -…"   2 hours ago   Up 2 hours   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp   cadvisor
[root@node-exporter ~]# 
 docker stop node-exporter

停止服务后,等待 15s 之后可以看到 Prometheus target 里面 node-exproter 状态为 unhealthy 状态,等待 15s 后,alert 页面由绿色 node-up (0 active) Inactive 状态变成了黄色 node-up (1 active) Pending 状态,继续等待 15s 后状态变成红色 Firing 状态,向 AlertManager 发送报警信息,此时 AlertManager 则按照配置规则向接受者发送邮件告警
在这里插入图片描述
查看邮件
在这里插入图片描述

lertManager 配置自定义邮件模板

上面虽然已经可以做出报警,但是我们想让报警信息更加直观一些

alertmanager支持自定义邮件模板的

首先新建一个模板文件 email.tmpl

[root@prometheus template]# mkdir -p /prometheus/alertmanager/template && cd /prometheus/alertmanager/template
[root@prometheus template]# vim email.tmpl
{{ define "email.from" }}2464146612@qq.com{{ end }}
{{ define "email.to" }}2464146612@qq.com{{ end }}
{{ define "email.to.html" }}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }}<br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
触发时间: {{ .StartsAt.Format "2021-12-30 01:29:52" }} <br>
=========end==========<br>
{{ end }}
{{ end }}

简单说明一下,上边模板文件配置了 email.fromemail.toemail.to.html 三种模板变量,可以在 alertmanager.yml 文件中直接配置引用。这里 email.to.html 就是要发送的邮件内容,支持 Html 和 Text 格式,这里为了显示好看,采用 Html 格式简单显示信息。下边 {{ range .Alerts }} 是个循环语法,用于循环获取匹配的 Alerts 的信息,下边的告警信息跟上边默认邮件显示信息一样,只是提取了部分核心值来展示。然后,需要增加 alertmanager.yml 文件 templates 配置如下:

[root@prometheus ~]# vi /alertmanager/alertmanager.yml 
[root@prometheus ~]# cat  /alertmanager/alertmanager.yml 
global:
  resolve_timeout: 5m
  smtp_from: '{{ template "email.from" . }}'  #定义发送的邮箱
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '{{ template "email.from" . }}'
  smtp_auth_password: 'rgxloknrxwpyebcc'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
templates:
  - '/etc/alertmanager-tmpl/email.tmpl'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '{{ template "email.to" . }}'
    html: '{{ template "email.to.html" . }}'
    send_resolved: true 
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

然后,修改 AlertManager 启动命令,将本地 email.tmpl 文件挂载到容器内指定位置并重启。

[root@prometheus ~]# docker rm -f alertmanager
[root@prometheus ~]# docker run --name alertmanager -d -v /alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml -v /prometheus/alertmanager/template/email.tmpl:/etc/alertmanager-tmpl/ -p 9093:9093 --restart always prom/alertmanager

在这里插入图片描述
在这里插入图片描述
重启完毕后,同样模拟触发报警条件(停止 node-exporter 服务),也是可以正常发送模板邮件

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值