Prometheus（6）Pormetheus+ Alertmanager配置邮件警告，并使用模板进行发送

?abc!

已于 2022-02-11 11:25:39 修改

阅读量5.1k

点赞数 2

分类专栏： # Prometheus 文章标签： linux centos 运维

于 2022-02-10 10:49:45 首次发布

本文链接：https://blog.csdn.net/yyuggjggg/article/details/122842086

版权

Prometheus 专栏收录该内容

16 篇文章 2 订阅

订阅专栏

1 进行时间同步

实现报警前把所有机器时间同步再检查一遍.

ntpdate cn.ntp.org.cn

2 Linux部署

第一步：下载安装包

下载安装包：alertmanager-0.16.2.linux-amd64.tar.gz
链接：https://pan.baidu.com/s/1kRDIZ8zPByhjs11JP30e5A
提取码：l3i1

第二步：上传压缩包解压到特定的文件夹

在这里插入图片描述

[root@localhost ~]# mv alertmanager-0.16.2.linux-amd64.tar.gz /opt/prometheus/
[root@localhost ~]# cd /opt/prometheus/
[root@localhost prometheus]# ls
alertmanager-0.16.2.linux-amd64.tar.gz  prometheus-2.6.1.linux-amd64
grafana-5.3.4-1.x86_64.rpm              prometheus-2.6.1.linux-amd64.tar.gz
[root@localhost prometheus]# tar -zxvf alertmanager-0.16.2.linux-amd64.tar.gz 
alertmanager-0.16.2.linux-amd64/
alertmanager-0.16.2.linux-amd64/LICENSE
alertmanager-0.16.2.linux-amd64/alertmanager.yml
alertmanager-0.16.2.linux-amd64/alertmanager
alertmanager-0.16.2.linux-amd64/amtool
alertmanager-0.16.2.linux-amd64/NOTICE
[root@localhost prometheus]# 
[root@localhost prometheus]# ls
alertmanager-0.16.2.linux-amd64         prometheus-2.6.1.linux-amd64
alertmanager-0.16.2.linux-amd64.tar.gz  prometheus-2.6.1.linux-amd64.tar.gz
grafana-5.3.4-1.x86_64.rpm
[root@localhost prometheus]# mv alertmanager-0.16.2.linux-amd64 alertmanager
[root@localhost prometheus]# ls
alertmanager                            prometheus-2.6.1.linux-amd64
alertmanager-0.16.2.linux-amd64.tar.gz  prometheus-2.6.1.linux-amd64.tar.gz
grafana-5.3.4-1.x86_64.rpm
[root@localhost prometheus]#

查看是否安装成功

[root@localhost alertmanager]# ./alertmanager --version
alertmanager, version 0.16.2 (branch: HEAD, revision: 308b7620642dc147794e6686a3f94d1b6fc8ef4d)
  build user:       root@1e9a48272b38
  build date:       20190405-12:27:40
  go version:       go1.11.6
[root@localhost alertmanager]#

第三步：启动alertManager

启动 AlertManager 来接受 Prometheus 发送过来的报警信息，并执行各种方式的告警。

在alertmanager的安装目录下执行：

[root@localhost alertmanager]# ./alertmanager --config.file=alertmanager.yml

在这里插入图片描述

AlertManager 默认启动的端口为 9093，启动完成后，浏览器访问 http://<IP>:9093可以看到默认提供的 UI 页面，因为我们还没有配置报警规则来触发报警，所有现在是没有任何告警信息的，

在这里插入图片描述

3 配置告警信息

查看目录结构

[root@localhost prometheus]# cd alertmanager/
[root@localhost alertmanager]# ls
alertmanager  alertmanager.yml  amtool  LICENSE  NOTICE
[root@localhost alertmanager]# ll
总用量 38964
-rwxr-xr-x. 1 3434 3434 23072841 4月   5 2019 alertmanager
-rw-r--r--. 1 3434 3434      380 4月   5 2019 alertmanager.yml
-rwxr-xr-x. 1 3434 3434 16801752 4月   5 2019 amtool
-rw-r--r--. 1 3434 3434    11357 4月   5 2019 LICENSE
-rw-r--r--. 1 3434 3434      457 4月   5 2019 NOTICE
[root@localhost alertmanager]#

3.1 查看默认配置

[root@localhost alertmanager]# cat alertmanager.yml 
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://127.0.0.1:5001/'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

3.2 其主要配置的作用

global: 全局配置

包括报警解决后的超时时间、SMTP 相关配置、各种渠道通知的 API 地址等等。

route: 用来设置报警的分发策略，它是一个树状结构，按照深度优先从左向右的顺序进行匹配。

receivers: 配置告警消息接受者信息，

例如常用的 email、wechat、slack、webhook 等消息通知方式。

inhibit_rules: 抑制规则配置，当存在与另一组匹配的警报（源）时，抑制规则将禁用与一组匹配的警报（目标）。

3.3 邮件告警的配置

配置告警信息：配置详情

global:
  resolve_timeout: 5m	# 超时,默认5min
  #这里为 QQ 邮箱 SMTP 服务地址，官方地址为 smtp.qq.com 端口为 465 或 587，同时要设置开启 POP3/SMTP 服务。
  smtp_smarthost: 'smtp.qq.com:465'	 
  smtp_from: 'xxx@qq.com'
  smtp_auth_username: 'xxx@qq.com'
  smtp_auth_password: 'xxxxxx' # 这里是邮箱的授权密码，不是登录密码
  smtp_require_tls: false	
  # 是否使用 tls，根据环境不同，来选择开启和关闭。
  #如果提示报错 email.loginAuth failed: 530 Must issue a STARTTLS command first，那么就需要设置为 true。
  #如果开启了 tls，提示报错 starttls failed: x509: certificate signed by unknown authority，需要在 email_configs 下配置 insecure_skip_verify: true 来跳过 tls 验证。
  smtp_hello: 'qq.com'

route:   # route用来设置报警的分发策略
  group_by: ['alertname']  # 采用哪个标签来作为分组依据
 # 组告警等待时间。也就是告警产生后等待5s，如果有同组告警一起发出
  group_wait: 5s   
  group_interval: 5s  # 两组告警的间隔时间
  repeat_interval: 5m  # 重复告警的间隔时间，减少相同邮件的发送频率
  receiver: 'email'  # 设置默认接收人

receivers:  # 配置报警信息接收者信息。
- name: 'email'	# 警报接收者名称
  email_configs:
  # 接收警报的email（这里是引用模板文件中定义的变量）
  - to: 'xxxxxxxx@qq.com' 
    send_resolved: true	# 故障恢复后通知
# 抑制规则配置，当存在与另一组匹配的警报（源）时，抑制规则将禁用与一组匹配的警报（目标）。
inhibit_rules:	
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

3.4 告警的具体操作

[root@localhost alertmanager]# vim alertmanager.yml 
global:
  resolve_timeout: 5m
  smtp_from: '15***775@qq.com'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '154***75@qq.com'
  smtp_auth_password: 'y***bhjhi'
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '154***5@qq.com'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
~
"alertmanager.yml" 25L, 566C 已写入                                                                                   
[root@localhost alertmanager]#

3.5 使用amtool工具检查配置

修改好配置文件后，可以使用amtool工具检查配置

[root@localhost alertmanager]# ./amtool check-config  alertmanager.yml 
Checking 'alertmanager.yml'  SUCCESS
Found:
 - global config
 - route
 - 1 inhibit rules
 - 1 receivers
 - 0 templates

[root@localhost alertmanager]#

3.6 重新启动alert manager

[root@localhost alertmanager]# ./alertmanager --config.file=alertmanager.yml

4 Prometheus 配置 AlertManager 告警规则

在 Prometheus 配置 AlertManager 服务地址以及告警规则，新建报警规则文件 node-up.rules 如下：

4.1node-up.rules规则的设置

groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node-exporter"} == 0
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s！"

4.2 具体操作

[root@localhost prometheus-2.6.1.linux-amd64]# ls
console_libraries  consoles  data  LICENSE  NOTICE  prometheus  prometheus.yml  promtool
[root@localhost prometheus-2.6.1.linux-amd64]# mkdir rules
[root@localhost prometheus-2.6.1.linux-amd64]# cd rules/
[root@localhost rules]# vim node-up.rules
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="agent1"} == 0
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s！"
~
~
"node-up.rules" [新] 11L, 237C 已写入                                              
[root@localhost rules]#

该 rules 目的是监测 node 是否存活，

expr ：为 PromQL 表达式验证特定节点 job=“agent1” 是否活着，
for ：表示报警状态为 Pending 后等待 15s 变成 Firing 状态，一旦变成 Firing 状态则将报警发送到 AlertManager，
labels 和 annotations 对该 alert 添加更多的标识说明信息，所有添加的标签注解信息，以及 prometheus.yml 中该 job 已添加 label 都会自动添加到邮件内容中

4.3 修改 prometheus.yml 配置文件，添加 rules 规则文件

[root@localhost ~]# cd /opt/prometheus/prometheus-2.6.1.linux-amd64/
[root@localhost prometheus-2.6.1.linux-amd64]# vim prometheus.yml 
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 192.168.156.133:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "/opt/prometheus/prometheus-2.6.1.linux-amd64/rules/*.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'agent1'
    static_configs:
    - targets: ['192.168.156.133:9100']
"prometheus.yml" 32L, 1074C 已写入

4.4 重启Prometheus

"prometheus.yml" 32L, 1074C 已写入                                                 
[root@localhost prometheus-2.6.1.linux-amd64]# pkill prometheus
[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9090
[root@localhost prometheus-2.6.1.linux-amd64]# ./prometheus --config.file=prometheus.yml &

4.5 查看是否配置成功

按下面的操作，便会进入下面的界面
在这里插入图片描述
由此可知，我们配置成功了

4.6 告警状态有三种状态

Prometheus Alert 告警状态有三种状态： Inactive、Pending、Firing。

Inactive：非活动状态，表示正在监控，但是还未有任何警报触发。
Pending：表示这个警报必须被触发。由于警报可以被分组、压抑/抑制或静默/静音，所以等待验证，一旦所有的验证都通过，则将转到 Firing 状态。
Firing：将警报发送到 AlertManager，它将按照配置将警报发送给所有接收者。一旦警报解除，则将状态转到 Inactive，如此循环。

5 触发警报

定义的 rule 规则为监测 job="agent1" Node 是否活着，那么就可以停掉 node-exporter 服务来间接起到 Node Down 的作用，从而达到报警条件，触发报警规则。

查看配置信息，确定监控的节点端口等信息，进行对应的停止

[root@localhost prometheus-2.6.1.linux-amd64]# cat prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "/opt/prometheus/prometheus-2.6.1.linux-amd64/rules/*.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'agent1'
    static_configs: 
    - targets: ['192.168.156.133:9100']

查看对应端口的进程


[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9100
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
node_expo 67601 root    3u  IPv6 629243      0t0  TCP *:jetdirect (LISTEN)
node_expo 67601 root    5u  IPv6 783547      0t0  TCP localhost.localdomain:jetdirect->localhost.localdomain:47248 (ESTABLISHED)
prometheu 76836 root   15u  IPv4 783055      0t0  TCP localhost.localdomain:47248->localhost.localdomain:jetdirect (ESTABLISHED)

停止node结点：agent1的进程

[root@localhost prometheus-2.6.1.linux-amd64]# kill 67601
[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9100
[root@localhost prometheus-2.6.1.linux-amd64]#

停止服务后，

等待 15s 之后可以看到 Prometheus target 里面 node-exproter 状态为 unhealthy 状态，
等待 15s 后，alert 页面由绿色 agent1 (0 active) Inactive 状态变成了黄色 node-up (1 active) Pending 状态，
继续等待 15s 后状态变成红色 Firing 状态，向 AlertManager 发送报警信息，此时 AlertManager 则按照配置规则向接受者发送邮件告警。

在这里插入图片描述

在这里插入图片描述
查看邮箱

重新启动node

[root@localhost node_export]# nohup ./node_exporter &
[2] 81062
[1]   已终止               nohup ./node_exporter
[root@localhost node_export]# nohup: 忽略输入并把输出追加到"nohup.out"

[root@localhost node_export]#

会再次发一个邮件，如下
在这里插入图片描述

5 使用自定义模板发送

5.1 编写模板文件

在alert manager的安装目录里面新建应该template目录，这template目录里面编写模板文件
模板文件如下

{{ define "email.from" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to.html" }}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2019-08-04 16:58:15" }} <br>
=========end==========<br>
{{ end }}
{{ end }}

实际操作

[root@localhost alertmanager]# 
[root@localhost alertmanager]# mkdir template
[root@localhost alertmanager]# ls
alertmanager  alertmanager.yml  amtool  data  LICENSE  NOTICE  template
[root@localhost alertmanager]# cd template/
[root@localhost template]# vim email1.tepl
{{ define "email.from" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to.html" }}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2019-08-04 16:58:15" }} <br>
=========end==========<br>
{{ end }}
{{ end }}
~
~
~
~
~
~
~
~
~
~
~
"email1.tepl" [新] 15L, 550C 已写入                                                                                    
[root@localhost template]#

5.2 新增alertmanager的配置文件进行测试

global:
  resolve_timeout: 5m
  smtp_from: '{{ template "email.from" . }}'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '{{ template "email.from" . }}'
  smtp_auth_password: 'ymbwwkcakpxbhjhi'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
templates:
   -  '/opt/prometheus/alertmanager/template/email1.tmpl'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '{{ template "email.to" . }}'
    html: '{{ template "email.to.html" . }}'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

email.from、email.to、email.to.html 三种模板变量，可以在 alertmanager.yml 文件中直接配置引用
email.to.html 就是要发送的邮件内容，支持 Html 和 Text 格式，这里为了显示好看，采用 Html 格式简单显示信息。下
{{ range .Alerts }}是个循环语法，用于循环获取匹配的 Alerts 的信息，下边的告警信息跟上边默认邮件显示信息一样，只是提取了部分核心值来展示。

实际操作：

[root@localhost alertmanager]# ls
alertmanager  alertmanager.yml  amtool  data  LICENSE  NOTICE  template
[root@localhost alertmanager]# vim alertmanager1.yml
global:
  resolve_timeout: 5m
  smtp_from: '{{ template "email.from" . }}'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '{{ template "email.from" . }}'
  smtp_auth_password: 'ymbww****xbhjhi'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
templates:
  - '/etc/alertmanager-tmpl/email.tmpl'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '{{ template "email.to" . }}'
    html: '{{ template "email.to.html" . }}'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

"alertmanager1.yml" [新] 29L, 719C 已写入                                                                              
[root@localhost alertmanager]# ls
alertmanager  alertmanager1.yml  alertmanager.yml  amtool  data  LICENSE  NOTICE  template
[root@localhost alertmanager]#

5.3 查看配置文件是否正确

[root@localhost alertmanager]# ./amtool check-config  alertmanager1.yml 
Checking 'alertmanager1.yml'  SUCCESS
Found:
 - global config
 - route
 - 1 inhibit rules
 - 1 receivers
 - 1 templates
  SUCCESS

[root@localhost alertmanager]#

5.4 启动alert manager

[root@localhost alertmanager]# ./alertmanager --config.file=alertmanager1.yml
level=info ts=2022-02-10T03:24:22.679397949Z caller=main.go:177 msg="Starting Alertmanager" version="(version=0.16.2, branch=HEAD, revision=308b7620642dc147794e6686a3f94d1b6fc8ef4d)"
level=info ts=2022-02-10T03:24:22.679510727Z caller=main.go:178 build_context="(go=go1.11.6, user=root@1e9a48272b38, date=20190405-12:27:40)"
level=info ts=2022-02-10T03:24:22.68530334Z caller=cluster.go:161 component=cluster msg="setting advertise address explicitly" addr=192.168.156.133 port=9094
level=info ts=2022-02-10T03:24:22.689931066Z caller=cluster.go:632 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2022-02-10T03:24:22.703779166Z caller=main.go:334 msg="Loading configuration file" file=alertmanager1.yml
level=info ts=2022-02-10T03:24:22.707237841Z caller=main.go:428 msg=Listening address=:9093
level=info ts=2022-02-10T03:24:24.690305758Z caller=cluster.go:657 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000287352s
level=info ts=2022-02-10T03:24:32.693591832Z caller=cluster.go:649 component=cluster msg="gossip settled; proceeding" elapsed=10.003586882s

5.5 修改node-up.rules

由于配置了 {{ .Annotations.description }} 变量，而之前 node-up.rules 中并没有配置该变量，会导致获取不到值。

所以需要在Prometheus的安装目录里面修改之前配置的规则文件

[root@localhost prometheus-2.6.1.linux-amd64]# ls
console_libraries  consoles  data  LICENSE  NOTICE  prometheus  prometheus.yml  promtool  rules
[root@localhost prometheus-2.6.1.linux-amd64]# cd rules/
[root@localhost rules]# ls
node-up.rules
[root@localhost rules]# vim node-up.rules 
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="agent1"} == 0
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s！"
      description: "{{ $labels.instance }} 检测到异常停止！请重点关注！！！"
~
"node-up.rules" 12L, 323C 已写入                                                                                       
[root@localhost rules]#

5.6 重启 Promethues 服务

[root@localhost rules]# 
[root@localhost rules]# cd ..
[root@localhost prometheus-2.6.1.linux-amd64]# ls
console_libraries  consoles  data  LICENSE  NOTICE  prometheus  prometheus.yml  promtool  rules
[root@localhost prometheus-2.6.1.linux-amd64]# pkill prometheus
level=warn ts=2022-02-10T03:28:40.638273674Z caller=main.go:405 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2022-02-10T03:28:40.638327573Z caller=main.go:430 msg="Stopping scrape discovery manager..."
level=info ts=2022-02-10T03:28:40.638335586Z caller=main.go:444 msg="Stopping notify discovery manager..."
level=info ts=2022-02-10T03:28:40.63834017Z caller=main.go:466 msg="Stopping scrape manager..."
level=info ts=2022-02-10T03:28:40.638359536Z caller=main.go:426 msg="Scrape discovery manager stopped"
level=info ts=2022-02-10T03:28:40.638369808Z caller=main.go:440 msg="Notify discovery manager stopped"
level=info ts=2022-02-10T03:28:40.638431616Z caller=manager.go:664 component="rule manager" msg="Stopping rule manager..."
level=info ts=2022-02-10T03:28:40.638478552Z caller=manager.go:670 component="rule manager" msg="Rule manager stopped"
level=info ts=2022-02-10T03:28:40.638521662Z caller=main.go:460 msg="Scrape manager stopped"
[root@localhost prometheus-2.6.1.linux-amd64]# level=info ts=2022-02-10T03:28:40.640008618Z caller=notifier.go:521 component=notifier msg="Stopping notification manager..."
level=info ts=2022-02-10T03:28:40.640035125Z caller=main.go:615 msg="Notifier manager stopped"
level=info ts=2022-02-10T03:28:40.640192411Z caller=main.go:627 msg="See you next time!"

[1]+  完成                  ./prometheus --config.file=prometheus.yml
[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9090
[root@localhost prometheus-2.6.1.linux-amd64]# ./prometheus --config.file=prometheus.yml & 
[1] 81615
[root@localhost prometheus-2.6.1.linux-amd64]# level=info ts=2022-02-10T03:28:53.958420258Z caller=main.go:243 msg="Starting Prometheus" version="(version=2.6.1, branch=HEAD, revision=b639fe140c1f71b2cbad3fc322b17efe60839e7e)"
level=info ts=2022-02-10T03:28:53.95851453Z caller=main.go:244 build_context="(go=go1.11.4, user=root@4c0e286fe2b3, date=20190115-19:12:04)"
level=info ts=2022-02-10T03:28:53.958534672Z caller=main.go:245 host_details="(Linux 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021 x86_64 localhost.localdomain (none))"
level=info ts=2022-02-10T03:28:53.958548683Z caller=main.go:246 fd_limits="(soft=1024, hard=4096)"
level=info ts=2022-02-10T03:28:53.95855905Z caller=main.go:247 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2022-02-10T03:28:53.959002719Z caller=main.go:561 msg="Starting TSDB ..."
level=info ts=2022-02-10T03:28:53.959671934Z caller=web.go:429 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2022-02-10T03:28:53.959878293Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1644301801123 maxt=1644364800000 ulid=01FVEDMKCQGGJ3F9NDEETVAZW0
level=info ts=2022-02-10T03:28:53.959919384Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1644364800000 maxt=1644429600000 ulid=01FVGFR6499R9A354RPZ3BC6ET
level=info ts=2022-02-10T03:28:53.95993753Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1644451200000 maxt=1644458400000 ulid=01FVGS5JZ95MA7N14KF461PTZ5
level=info ts=2022-02-10T03:28:53.959958412Z caller=repair.go:48 component=tsdb msg="found healthy block" mint=1644429600000 maxt=1644451200000 ulid=01FVGS5K4K55VJQEH45W337PQK
level=warn ts=2022-02-10T03:28:54.114211565Z caller=head.go:434 component=tsdb msg="unknown series references" count=320781
level=info ts=2022-02-10T03:28:54.116993838Z caller=main.go:571 msg="TSDB started"
level=info ts=2022-02-10T03:28:54.117041776Z caller=main.go:631 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2022-02-10T03:28:54.11854499Z caller=main.go:657 msg="Completed loading of configuration file" filename=prometheus.yml
level=info ts=2022-02-10T03:28:54.118568236Z caller=main.go:530 msg="Server is ready to receive web requests."

[root@localhost prometheus-2.6.1.linux-amd64]# lsof -i:9090
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
prometheu 81615 root    3u  IPv6 838619      0t0  TCP *:websm (LISTEN)
prometheu 81615 root    7u  IPv4 838620      0t0  TCP localhost:43908->localhost:websm (ESTABLISHED)
prometheu 81615 root    8u  IPv6 838621      0t0  TCP localhost:websm->localhost:43908 (ESTABLISHED)
[root@localhost prometheus-2.6.1.linux-amd64]#

5.7 测试

上面的配置有一些问题，测试会出现下面这个问题
好像是模板里面的内容获取不到，大家可以参考去看，最终的效果如下：

[root@localhost node_export]# lsof -i:9100
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
prometheu 83165 root   23u  IPv4 870803      0t0  TCP localhost.localdomain:55416->localhost.localdomain:jetdirect (ESTABLISHED)
node_expo 84543 root    3u  IPv6 870789      0t0  TCP *:jetdirect (LISTEN)
node_expo 84543 root    5u  IPv6 870804      0t0  TCP localhost.localdomain:jetdirect->localhost.localdomain:55416 (ESTABLISHED)
[root@localhost node_export]# kill 84543
[root@localhost node_export]#

在这里插入图片描述
重新启动node节点后，也是会发送一封邮件

[root@localhost node_export]# nohup ./node_exporter &
[1] 84685
[root@localhost node_export]# nohup: 忽略输入并把输出追加到"nohup.out"

[root@localhost node_export]#

在这里插入图片描述

?abc!

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Prometheus（6）Pormetheus+ Alertmanager配置邮件警告，并使用模板进行发送

1 进行时间同步实现报警前把所有机器时间同步再检查一遍.ntpdate cn.ntp.org.cn2 Linux部署第一步：下载安装包下载安装包：alertmanager-0.16.2.linux-amd64.tar.gz链接：https://pan.baidu.com/s/1kRDIZ8zPByhjs11JP30e5A提取码：l3i1第二步：上传压缩包解压到特定的文件夹[root@localhost ~]# mv alertmanager-0.16.2.linux-amd64.tar
复制链接

扫一扫