Prometheus+Grafana+企业微信机器人告警
开源监控和报警系统
Prometheus+Grafana+企业微信机器人告警
1.Prometheus 配置安装
Prometheus下载地址
上传至服务器
解压
tar -zxvf prometheus-2.44.0.linux-amd64.tar.gz
进入目录启动
nohup /usr/local/prometheus-2.44.0/prometheus --config.file="/usr/local/prometheus-2.44.0/prometheus.yml" > ./prometheus.log 2>&1 & (修改成自己prometheus安装的地址)
打开web测试: http://localhost:9090 (promethues默认端口)
启动成功
添加开机自启
vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=prometheus
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/prometheus-2.44.0/prometheus --config.file="/usr/local/prometheus-2.44.0/prometheus.yml"
[Install]
WantedBy=multi-user.target
刷新配置
systemctl daemon-reload
测试启动
systemctl start prometheus
查看启动状态
systemctl status prometheus
添加开机自启
systemctl enable prometheus
重新启动
systemctl restart prometheus
1.1.node_exporter
node_exporter-1.5.0-下载地址
也可以打开官网选择版本下载
上传服务器
解压
tar -zxvf node_exporter-1.5.0.linux-amd64.tar.gz
进入目录启动
nohup /usr/local/node_exporter-1.5.0/node_exporter > ./node_exporter.log 2>&1 &
打开web访问: http://localhost:9100/metrics (node_exporter默认端口)
启动成功
添加开机自启
vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=https://github.com/prometheus/node_exporter
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter-1.5.0/node_exporter
[Install]
WantedBy=multi-user.target
刷新配置
systemctl daemon-reload
测试启动
systemctl start node_exporter
查看启动状态
systemctl status node_exporter
添加开机自启
systemctl enable node_exporter
重新启动
systemctl restart node_exporter
进入prometheus目录编辑prometheus.yml配置文件
static_configs:
- targets: ["localhost:9090"]
#添加node_exporter配置
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100'] # 更改为自己的ip地址9100默认端口
保存退出重新启动prometheus
systemctl restart prometheus
web打开prometheus页面
存在即配置成功
查询采集数据
1.2.process_exporter
process_exporter-下载地址
上传服务器
解压
tar -zxvf process-exporter-0.7.10.linux-amd64.tar.gz
进入目录添加
vi process-exporter.yaml
process_names:
- name: '{{.Comm}}'
cmdline:
- '.+'
-
测试启动
nohup /usr/local/process-exporter-0.7.10/process-exporter -config.path=/usr/local/process-exporter-0.7.10/process-exporter.yaml > ./process_exporter.log 2>&1 &
打开web访问: http://localhost:9256/metrics
启动成功
添加开机自启
vim /usr/lib/systemd/system/process-exporter.service
[Unit]
Description=process-exporter
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/process-exporter-0.7.10/process-exporter -config.path=/usr/local/process-exporter-0.7.10/process-exporter.yaml
[Install]
WantedBy=multi-user.target
刷新配置
systemctl daemon-reload
测试启动
systemctl start process-exporter
查看启动状态
systemctl status process-exporter
添加开机自启
systemctl enable process-exporter
重新启动
systemctl restart process-exporter
进入prometheus目录添加配置
- job_name: process_exporter
static_configs:
- targets: ['localhost:9256'] #9256默认端口
重启prometheus,打开web访问prometheus
存在即配置成功
1.3.mysqld_exporter
mysqld_exporter-0.14.0-下载地址
也可以打开官网选择版本下载
上传服务器
解压
tar -zxvf mysqld_exporter-0.14.0.linux-amd64.tar.gz
进入目录添加添加配置文件
vi /usr/local/mysqld_exporter-0.14.0/.my.cnf
[client]
user=root #mysql账号
password=root #mysql 密码
mysql(5.7)无法登录同时忘记设置密码时
vim /etc/my.cnf
#在[mysqld]后添加
skip-grant-tables #登录时跳过权限检查
#重启MySQL服务
sudo systemctl restart mysqld
#测试登录
mysql –uroot –p #直接回车(Enter)
#设置新密码
set password for 'root'@'localhost'=password('root');
如果报:ERROR 1290 (HY000): The MySQL server is running with the --skip-grant-tables option so it cannot execute this statement
#刷新配置
flush privileges;
#重新设置密码
set password for 'root'@'localhost'=password('root');
#赋予全部权限,实际配置建议重新创建新账号赋予部分权限保证安全
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%'IDENTIFIED BY 'root' WITH GRANT OPTION;
flush privileges;
#退出
exit;
再把my.ini的skip-grant-tables注释
进入目录启动
nohup /usr/local/mysqld_exporter-0.14.0/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter-0.14.0/.my.cnf > ./mysqld_exporter.log 2>&1 &
打开web访问: http://localhost:9104/metrics
启动成功
添加开启自启
vim /usr/lib/systemd/system/mysqld_exporter.service
[Unit]
Description=mysqld_exporter
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/mysqld_exporter-0.14.0/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter-0.14.0/.my.cnf
[Install]
WantedBy=multi-user.target
刷新配置
systemctl daemon-reload
测试启动
systemctl start mysqld_exporter
查看启动状态
systemctl status mysqld_exporter
添加开机自启
systemctl enable mysqld_exporter
重新启动
systemctl restart mysqld_exporter
进入prometheus目录添加配置
- job_name: 'mysqld_exporter' # 采集mysql的指标
static_configs:
- targets: ['localhost:9104'] # mysqld_exporter服务的ip和端口
重启prometheus,打开web访问prometheus
存在即配置成功
1.4.nginx_exporter
进入nginx目录重新编译安装
/configure --prefix=/usr/local/nginx/ --with-http_stub_status_module --add-module=../nginx-http-flv-module
make
sudo make install
启动nginx ./nginx -V 2>&1 | grep -o with-http_stub_status_module
如果在终端输出with-http_stub_status_module,说明nginx已启用tub_status模块
更改nginx.conf配置文件
server {
listen 80;
#端口改成自己设定的
location /nginx_status {
stub_status on;
access_log off;
allow localhost;
deny all;
}
}
上传至服务器
#解压
tar -zxvf nginx_exporter-0.11.0.tar.gz
#启动nginx_exporter
nohup /usr/local/nginx_exporter-0.11.0/nginx-prometheus-exporter -nginx.scrape-uri http://localhost:8080/nginx_status > ./nginx_exporter.log 2>&1 &
打开web访问: http://localhost:9113/metrics (nginx_exporter默认端口)
启动成功
添加开机自启
vim /usr/lib/systemd/system/nginx_exporter.service
[Unit]
Description=nginx_exporter
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/nginx_exporter-0.11.0/nginx-prometheus-exporter -nginx.scrape-uri http://172.16.11.10:7006/nginx_status
[Install]
WantedBy=multi-user.target
刷新配置
systemctl daemon-reload
测试启动
systemctl start nginx_exporter
查看启动状态
systemctl status nginx_exporter
添加开机自启
systemctl enable nginx_exporter
重新启动
systemctl restart nginx_exporter
进入prometheus目录编辑prometheus.yml配置文件
prometheus添加配置
- job_name: 'nginx_status' # 采集nginx的指标
metrics_path: '/metrics' # 拉取指标的接口路径
scrape_interval: 5s # 采集指标的间隔周期
static_configs:
- targets: ['localhost:9113'] # nginx-prometheus-exporter服务的ip和端口
存在即配置成功
1.5.redis_exporter
redis_exporter-下载地址
上传服务器
#解压
tar -zxvf redis_exporter-1.50.0.tar.gz
#启动
nohup /usr/local/redis_exporter-v1.50.0/redis_exporter > ./redis_exporter.log 2>&1 &
打开web访问: http://localhost:9121/metrics
启动成功
添加开机自启
vim /usr/lib/systemd/system/redis_exporter.service
[Unit]
Description=redis_exporter
After=network.target
[Service]
Restart=on-failure
ExecStart=/usr/local/redis_exporter-v1.50.0/redis_exporter
[Install]
WantedBy=multi-user.target
刷新配置
systemctl daemon-reload
测试启动
systemctl start redis_exporter
查看启动状态
systemctl status redis_exporter
添加开机自启
systemctl enable redis_exporter
进入prometheus目录添加配置
- job_name: 'mysqld_exporter' # 采集mysql的指标
static_configs:
- targets: ['localhost:9121'] # redis_exporter服务的ip和端口
重启prometheus,打开web访问prometheus
存在即配置成功
1.6.监控SpringBoot-2.x
//添加pom
<!-- spring-boot-actuator依赖 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- https://mvnrepository.com/artifact/io.micrometer/micrometer-registry-prometheus -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
添加yml配置
management:
endpoints:
# Web端点的配置属性
web:
# 设置端点访问的URL前缀,默认为/actuator
base-path: /actuator
exposure:
# 开放端点的ID集合(eg:['health','info','beans','env']),配置为“*”表示全部 安全建议只开启 prometheus,health
include: '*'
metrics:
tags:
application: ${spring.application.name}
#测试地址
http://localhost:80/actuator/prometheus #端口号更改为自己设定的端口号
#prometheus配置文件
- job_name: 'java'
metrics_path: '/actuator/prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:8080','localhost:8081','localhost:8082','localhost:8083'] #多个服务的配置
#重启prometheus,打开web访问prometheus
1.7.alertmanager
alertmanager-0.25.0-下载地址
也可以打开官网选择版本下载
修改prometheus的配置文件prometheus.yml
# Alertmanager configuration
# 改为alertmanager的地址
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
# 指定规则文件
rule_files:
- "/usr/local/prometheus-2.44.0/rules/*.yml"
#在prometheus目录新建 rules 目录
mkdir rules
1.7.1.添加告警规则
***************** vi node_alived.yml **************8
groups:
- name: 实例存活告警规则
rules:
- alert: 实例存活告警
expr: up == 0
for: 1m
labels:
user: prometheus
severity: warning
annotations:
summary: "主机宕机 !!!"
description: "该实例主机已经宕机超过一分钟了。"
********************* vi memory_over.yml*************************
groups:
- name: 内存报警规则
rules:
- alert: 内存使用率告警
expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes))) * 100 > 50
for: 1m
labels:
severity: warning
annotations:
summary: "服务器可用内存不足。"
description: "内存使用率已超过50%(当前值:{{ $value }}%)"
*************** vi cpu_over.yml *******************************
groups:
- name: CPU报警规则
rules:
- alert: CPU使用率告警
expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 50
for: 1m
labels:
severity: warning
annotations:
summary: "CPU使用率正在飙升。"
description: "CPU使用率超过50%(当前值:{{ $value }}%)"
**************** vi disk_over.yml******************************
groups:
- name: 磁盘使用率报警规则
rules:
- alert: 磁盘使用率告警
expr: 100 - node_filesystem_free_bytes{fstype=~"xfs|ext4"} / node_filesystem_size_bytes{fstype=~"xfs|ext4"} * 100 > 80
for: 20m
labels:
severity: warning
annotations:
summary: "硬盘分区使用率过高"
description: "分区使用大于80%(当前值:{{ $value }}%)"
上传服务器
#解压
tar -zxvf alertmanager-0.25.0.linux-amd64.tar.gz
#进入解压目录,修改配置文件
vi alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://localhost:8089/adapter/wx' #默认端口
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
#查看版本
/usr/local/alertmanager-0.25.0/alertmanager --version
#启动
/usr/local/alertmanager-0.25.0/amtool check-config /usr/local/alertmanager-0.25.0/alertmanager.yml
打开web访问 http://localhost:9093/metrics
启动成功
进入prometheus目录添加配置
- job_name: 'alertmanager_exporter'
static_configs:
- targets: ['localhost:9093']
重启prometheus,打开web访问prometheus
存在即配置成功
添加开启自启
vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager-0.25.0/alertmanager --config.file /usr/local/alertmanager-0.25.0/alertmanager.yml --storage.path="/usr/local/alertmanager-0.25.0/data/" --data.retention=120h
[Install]
WantedBy=multi-user.target
保存后:
systemctl daemon-reload
systemctl enable alertmanager
systemctl start alertmanager
1.7.2.安装docker
yum install -y yum-utils
#设置镜像仓库地址
yum-config-manager \
--add-repo \
http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum makecache fase
yum install docker-ce docker-ce-cli containerd.io
#启动docker
systemctl start docker
#查看docker版本
docker version
#测试
docker run hello-world
docker images /docker ps
systemctl enable docker
添加json配置
vi etc/docker/daemon.json
{
"registry-mirrors": ["https://78q96cy9.mirror.aliyuncs.com"]
}
systemctl daemon-reload #刷新配置
systemctl start docker # 启动docker服务
systemctl stop docker # 停止docker服务
systemctl restart docker # 重启docker服务
1.7.3.企业微信机器人配置及启动
打开企业微信添加机器人
复制webhook地址
#执行
docker run -d --name wechat \
--restart always -p 8080:80 \
guyongquan/webhook-adapter \
--adapter=/app/prometheusalert/wx.js=/wx=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxx(自己的微信机器人key
启动成功后,修改之前配置告警消息配置更改为>10稍等一会,收到告警消息
更改为50,收到正常消息
2.Grafana
#Grafana下载
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_9.5.2_amd64.deb
或者打开官网下载自己需要的版本
解压进入目录启动
nohup ./bin/grafana-server web > ./grafana.log 2>&1 &
打开web:http://localhost:3000,默认用户名和密码:admin/admin