文章目录
资源下载-资源下载-资源下载 提取码: i97g
1.简述
样例会安装以下组件,redis和springboot可以不安装
prometheus
,数据源node_expoter
,用于监控linux服务器上的cpu、内存等资源redis_expoter
,用于监控redisspringboot
,不需要安装额外的组件,直接在项目中添加依赖即可grafana
,可视化界面
2.安装node_expoter
- 解压
tar -xvf node_exporter-1.7.0.linux-amd64.tar.gz
- 启动
node_expoter
# 前台启动
./node_exporter
# 后台启动
nohup ./node_exporter &
- 访问页面
http://172.100.200.243:9100/
3.安装redis_expoter(非必需)
- 解压
tar -xvf redis_exporter-v1.29.0.linux-amd64.tar.gz
- 启动
redis_expoter
# 前台启动
./redis_exporter -redis.addr 172.100.200.243:6379 -redis.password 123546
# 后台启动
nohup ./redis_exporter -redis.addr 172.100.200.243:6379 -redis.password 123546 &
- 访问页面
http://172.100.200.243:9121/
4.SpringBoot(非必需)
- 添加依赖
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
- yml添加配置
management:
endpoints:
web:
exposure:
include: '*'
metrics:
tags:
application: ${spring.application.name}
正常启动就可以了
5.安装prometheus
- 解压
tar -xvf prometheus-2.37.1.linux-amd64.tar.gz
- 修改配置文件
prometheus.yml
- 启动
prometheus
# 前台启动
./prometheus --web.listen-address=:19090
# 后台启动
nohup ./prometheus --web.listen-address=:19090 &
- 访问页面
http://172.100.200.243:19090/
6.查看prometheus的Targets
7.安装grafana
- 解压
tar -xvf grafana-enterprise-11.1.3.linux-amd64.tar.gz
- 修改配置文件
defaults.ini
,改为中文
#default_language = en-US
default_language = zh-Hans
- 启动
nohup ./grafana-server &
- 访问页面
http://172.100.200.243:3000/
账号密码都是admin,添加数据源
redis和springboot也是同样的操作
仪表盘样式下载地址-仪表盘样式下载地址-仪表盘样式下载地址
8.安装alertmanager(非必需),告警处理模块
演示的是处理服务掉线的处理
原文链接-原文链接-原文链接
-
解压
tar -xvf alertmanager-0.27.0.linux-amd64.tar.gz
-
修改配置文件
alertmanager.yml
-
-
启动
nohup ./alertmanager --cluster.advertise-address=0.0.0.0:9093 &
-
访问页面
http://172.100.200.243:9093/
-
微服务添加对应接收方法
-
新建
first_rules.yml
文件,和prometheus.yml
同级别目录
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: Instance has been down for more than 5 minutes
- 修改
prometheus.yml
,重启
- prometheus重启后
- 测试把正常服务直接关停,测试告警接收
关闭服务
过一会变红
在alertmanager也能看到
微服务也接收到了告警信息
服务掉线的告警数据
{
"receiver": "web\\.hook",
"status": "firing",
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "InstanceDown",
"instance": "172.100.200.243:7009",
"job": "app-nms-acs-fttx-server",
"severity": "critical"
},
"annotations": {
"summary": "Instance has been down for more than 5 minutes"
},
"startsAt": "2024-08-07T14:57:10.05Z",
"endsAt": "0001-01-01T00:00:00Z",
"generatorURL": "http://localhost.localdomain:19090/graph?g0.expr=up+%3D%3D+0\u0026g0.tab=1",
"fingerprint": "5320c4e6431679ba"
}
],
"groupLabels": {
"alertname": "InstanceDown"
},
"commonLabels": {
"alertname": "InstanceDown",
"instance": "172.100.200.243:7009",
"job": "app-nms-acs-fttx-server",
"severity": "critical"
},
"commonAnnotations": {
"summary": "Instance has been down for more than 5 minutes"
},
"externalURL": "http://localhost.localdomain:9093",
"version": "4",
"groupKey": "{}:{alertname=\"InstanceDown\"}",
"truncatedAlerts": 0
}
再将服务启动回来,看到告警消失
告警的恢复数据
{
"receiver": "web\\.hook",
"status": "resolved",
"alerts": [
{
"status": "resolved",
"labels": {
"alertname": "InstanceDown",
"instance": "172.100.200.243:7009",
"job": "app-nms-acs-fttx-server",
"severity": "critical"
},
"annotations": {
"summary": "Instance has been down for more than 5 minutes"
},
"startsAt": "2024-08-08T02:20:55.05Z",
"endsAt": "2024-08-08T02:28:40.05Z",
"generatorURL": "http://localhost.localdomain:19090/graph?g0.expr=up+%3D%3D+0\u0026g0.tab=1",
"fingerprint": "5320c4e6431679ba"
}
],
"groupLabels": {
"alertname": "InstanceDown"
},
"commonLabels": {
"alertname": "InstanceDown",
"instance": "172.100.200.243:7009",
"job": "app-nms-acs-fttx-server",
"severity": "critical"
},
"commonAnnotations": {
"summary": "Instance has been down for more than 5 minutes"
},
"externalURL": "http://localhost.localdomain:9093",
"version": "4",
"groupKey": "{}:{alertname=\"InstanceDown\"}",
"truncatedAlerts": 0
}
测试中发现,微服务关闭后,在prometheus
的Alerts
能看到黄色警告,经过1分钟后变成红色警告,变红后,Controller
就收到了告警消息(有时候延迟了30秒才收到),微服务重新上线,红色警告是立马消失的,但是在2-4分钟后,Controller才接受到告警恢复的消息