springboot +promethues+grafna+alertmanager+qq邮箱实现系统监控警告
准备: 本地环境windows11 、jdk8 、翻墙工具(非必要,可以下载但是速度比较感人)
1.springboot项目
这里的springboot项目不是必须的,我们可以选择任意可以运行的java程序。为了实现快速演示这里选用快捷搭建的springboot项目。
springboot 初始化比较基础直接略过。
下图为个人本地新建测试项目:
****这里特别注意在启动类上添加bean 用意为像prometheus 注册applicationName 以区分。
@Bean
MeterRegistryCustomizer<MeterRegistry> configurer(
@Value("${spring.application.name}") String applicationName) {
return (registry) -> registry.config().commonTags("application", applicationName);
}
下方为本项目pom文件为了测试可以直接复制粘贴
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.5.12</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.plmxs</groupId>
<artifactId>plmxs</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>plmxs</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
**这里注意比较重要的两个 dependency 为
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
下方为个人项目allication.yml
server:
port: 4562
tomcat:
mbeanregistry:
enabled: true
spring:
application:
name: my-plmxs
management:
endpoints:
web:
exposure:
include: "*"
2.promethues
下载地址:https://prometheus.io/download/
到截稿时官网为下图:
下载到本地后解压如下目录:
直接点击promethues.exe就可以运行如下图:
如图,promethues已经启动完成。
直接在浏览器输入127.0.0.1:9090 可以看到promethues的dashboard
至此,prometheus的简单准备工作已经完成。默认地址127.0.0.1:9090
3.grafna安装下载
下载地址:https://grafana.com/grafana/download 点击下载就中。
解压后文件目录:
点击grafana-server.exe执行就行 默认地址127.0.0.1:3000
******接下来就是比较重要的配置文件了prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 127.0.0.1:9093 ##这里是altermanager的地址
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "./rule/host_monitor.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
metrics_path: '/actuator/prometheus' #这里是你的项目地址
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:4562'] ##这里是你的项目地址,记得自定义修改
需要重新启动使配置生效,lunix系统下同理
4.altermanager
下载地址:https://prometheus.io/download/
如下图:
直接下载安装就好,修改配置文件如下
global:
resolve_timeout: 5m
smtp_from: 151xxxxxx276@qq.com
smtp_auth_username: 1519xxx276@qq.com
smtp_auth_password: 授权码 ##开通qq邮箱smtp服务
smtp_require_tls: false
smtp_smarthost: 'smtp.qq.com:465'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 1h
receiver: 'email-test'
receivers:
- name: 'email-test'
email_configs:
- to: 15xxxxxxxxx3276@qq.com
send_resolved: true
inhibit_rules: #抑制规则
- source_match: #源标签
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
修改完成后记得重启,启动界面如下:
默认地址:127.0.0.1:9093
5.alertmanager 实现警告
上文提到的配置规则为为了模拟停机5s发送邮件告警
rule_files:
- "./rule/host_monitor.yml"
所以在prometheus统计目录下新建rule目录新建 host_monitor.yml 文件
文件内容如下:
groups:
- name: node-up
rules:
- alert: node-up
expr: up == 0
for: 5s #服务停止超过5秒就会告警停止
labels:
team: node
annotations:
summary: "{{$labels.instance}} Instance has been down for more than 5 seconds"
6.测试
启动项目后,等项目稳定运行后直接stop坐等邮箱告警。
至此,冒烟整合算是完成了。这里有扩展的地方 告警规则、 邮件样式,内容等可以查看文档地址:https://prometheus.io/docs/alerting/latest/alertmanager/