Prometheus+Grafana+企业微信机器人告警

置顶顾优秀

已于 2023-05-29 15:45:39 修改

阅读量3.9k

点赞数 2

文章标签： prometheus grafana 企业微信

于 2023-05-25 18:15:37 首次发布

本文链接：https://blog.csdn.net/weixin_47167816/article/details/130870149

版权

Prometheus+Grafana+企业微信机器人告警

开源监控和报警系统

Prometheus+Grafana+企业微信机器人告警

Prometheus+Grafana+企业微信机器人告警
1.Prometheus 配置安装
2.Grafana

1.Prometheus 配置安装

Prometheus下载地址
在这里插入图片描述
上传至服务器

解压
tar -zxvf prometheus-2.44.0.linux-amd64.tar.gz
进入目录启动
nohup /usr/local/prometheus-2.44.0/prometheus --config.file="/usr/local/prometheus-2.44.0/prometheus.yml" > ./prometheus.log 2>&1 & (修改成自己prometheus安装的地址)
打开web测试: http://localhost:9090 (promethues默认端口)

启动成功
在这里插入图片描述

添加开机自启

vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=prometheus
After=network.target

[Service]
Restart=on-failure
ExecStart=/usr/local/prometheus-2.44.0/prometheus --config.file="/usr/local/prometheus-2.44.0/prometheus.yml"

[Install]
WantedBy=multi-user.target

刷新配置
systemctl daemon-reload 
测试启动
systemctl start prometheus
查看启动状态
systemctl status prometheus
添加开机自启
systemctl enable prometheus
重新启动
systemctl restart prometheus

1.1.node_exporter

node_exporter-1.5.0-下载地址
也可以打开官网选择版本下载

在这里插入图片描述
上传服务器

解压
tar -zxvf node_exporter-1.5.0.linux-amd64.tar.gz
进入目录启动
nohup /usr/local/node_exporter-1.5.0/node_exporter > ./node_exporter.log 2>&1 &
打开web访问: http://localhost:9100/metrics (node_exporter默认端口)

启动成功
在这里插入图片描述

添加开机自启

vi /usr/lib/systemd/system/node_exporter.service

[Unit]
Description=node_exporter
Documentation=https://github.com/prometheus/node_exporter
After=network.target

[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter-1.5.0/node_exporter


[Install]
WantedBy=multi-user.target

刷新配置
systemctl daemon-reload 
测试启动
systemctl start node_exporter
查看启动状态
systemctl status node_exporter
添加开机自启
systemctl enable node_exporter
重新启动
systemctl restart node_exporter

进入prometheus目录编辑prometheus.yml配置文件

    static_configs:
      - targets: ["localhost:9090"]
     #添加node_exporter配置
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100'] # 更改为自己的ip地址9100默认端口
      
 保存退出重新启动prometheus
 systemctl restart prometheus

web打开prometheus页面
在这里插入图片描述
存在即配置成功

查询采集数据
在这里插入图片描述

1.2.process_exporter

process_exporter-下载地址
上传服务器

解压
tar -zxvf process-exporter-0.7.10.linux-amd64.tar.gz
进入目录添加 
vi process-exporter.yaml
process_names:
  - name: '{{.Comm}}'
    cmdline:
    - '.+'
    - 
测试启动
nohup /usr/local/process-exporter-0.7.10/process-exporter -config.path=/usr/local/process-exporter-0.7.10/process-exporter.yaml > ./process_exporter.log 2>&1 &

打开web访问: http://localhost:9256/metrics

启动成功
在这里插入图片描述
添加开机自启

vim /usr/lib/systemd/system/process-exporter.service

[Unit]
Description=process-exporter
After=network.target

[Service]
Restart=on-failure
ExecStart=/usr/local/process-exporter-0.7.10/process-exporter -config.path=/usr/local/process-exporter-0.7.10/process-exporter.yaml

[Install]
WantedBy=multi-user.target

刷新配置
systemctl daemon-reload 
测试启动
systemctl start process-exporter
查看启动状态
systemctl status process-exporter
添加开机自启
systemctl enable process-exporter
重新启动
systemctl restart process-exporter

进入prometheus目录添加配置

 - job_name: process_exporter
    static_configs:
    - targets: ['localhost:9256'] #9256默认端口

重启prometheus，打开web访问prometheus

存在即配置成功
在这里插入图片描述

1.3.mysqld_exporter

mysqld_exporter-0.14.0-下载地址
也可以打开官网选择版本下载
在这里插入图片描述
上传服务器

解压
tar -zxvf mysqld_exporter-0.14.0.linux-amd64.tar.gz

进入目录添加添加配置文件
vi /usr/local/mysqld_exporter-0.14.0/.my.cnf

[client]
user=root #mysql账号
password=root #mysql 密码

mysql(5.7)无法登录同时忘记设置密码时
vim /etc/my.cnf
#在[mysqld]后添加
skip-grant-tables #登录时跳过权限检查
#重启MySQL服务
sudo systemctl restart mysqld
#测试登录
mysql –uroot –p #直接回车（Enter）
#设置新密码
set password for 'root'@'localhost'=password('root');
如果报：ERROR 1290 (HY000): The MySQL server is running with the --skip-grant-tables option so it cannot execute this statement
#刷新配置
flush privileges;
#重新设置密码
set password for 'root'@'localhost'=password('root');
#赋予全部权限，实际配置建议重新创建新账号赋予部分权限保证安全
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%'IDENTIFIED BY 'root' WITH GRANT OPTION;
flush privileges;
#退出
exit;
再把my.ini的skip-grant-tables注释

进入目录启动

nohup /usr/local/mysqld_exporter-0.14.0/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter-0.14.0/.my.cnf > ./mysqld_exporter.log 2>&1 &
打开web访问: http://localhost:9104/metrics

启动成功
在这里插入图片描述
添加开启自启

vim /usr/lib/systemd/system/mysqld_exporter.service
[Unit]
Description=mysqld_exporter
After=network.target

[Service]
Restart=on-failure
ExecStart=/usr/local/mysqld_exporter-0.14.0/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter-0.14.0/.my.cnf

[Install]
WantedBy=multi-user.target

刷新配置
systemctl daemon-reload 
测试启动
systemctl start mysqld_exporter
查看启动状态
systemctl status mysqld_exporter
添加开机自启
systemctl enable mysqld_exporter
重新启动
systemctl restart mysqld_exporter

进入prometheus目录添加配置

- job_name: 'mysqld_exporter' # 采集mysql的指标
  static_configs:
  - targets: ['localhost:9104'] # mysqld_exporter服务的ip和端口
重启prometheus，打开web访问prometheus

存在即配置成功
在这里插入图片描述

1.4.nginx_exporter

nginx_exporter-下载地址

进入nginx目录重新编译安装 
/configure --prefix=/usr/local/nginx/ --with-http_stub_status_module --add-module=../nginx-http-flv-module
make
sudo make install
启动nginx ./nginx -V 2>&1 | grep -o with-http_stub_status_module
如果在终端输出with-http_stub_status_module，说明nginx已启用tub_status模块

更改nginx.conf配置文件
 server {
        listen   80;  
        #端口改成自己设定的
        location /nginx_status {
            stub_status on;
            access_log off;
            allow localhost;
            deny all;
        }
}

上传至服务器

#解压
tar -zxvf nginx_exporter-0.11.0.tar.gz
#启动nginx_exporter
nohup /usr/local/nginx_exporter-0.11.0/nginx-prometheus-exporter -nginx.scrape-uri http://localhost:8080/nginx_status > ./nginx_exporter.log 2>&1 &  
打开web访问: http://localhost:9113/metrics (nginx_exporter默认端口)

启动成功
在这里插入图片描述
添加开机自启

vim /usr/lib/systemd/system/nginx_exporter.service
[Unit]
Description=nginx_exporter
After=network.target

[Service]
Restart=on-failure
ExecStart=/usr/local/nginx_exporter-0.11.0/nginx-prometheus-exporter -nginx.scrape-uri http://172.16.11.10:7006/nginx_status

[Install]
WantedBy=multi-user.target

刷新配置
systemctl daemon-reload 
测试启动
systemctl start nginx_exporter
查看启动状态
systemctl status nginx_exporter
添加开机自启
systemctl enable nginx_exporter
重新启动
systemctl restart nginx_exporter

进入prometheus目录编辑prometheus.yml配置文件

prometheus添加配置
- job_name: 'nginx_status' # 采集nginx的指标
  metrics_path: '/metrics' # 拉取指标的接口路径
  scrape_interval: 5s # 采集指标的间隔周期
  static_configs:
  - targets: ['localhost:9113'] # nginx-prometheus-exporter服务的ip和端口

存在即配置成功
在这里插入图片描述

1.5.redis_exporter

redis_exporter-下载地址
上传服务器

#解压
tar -zxvf redis_exporter-1.50.0.tar.gz
#启动
nohup /usr/local/redis_exporter-v1.50.0/redis_exporter > ./redis_exporter.log 2>&1 &
打开web访问: http://localhost:9121/metrics

启动成功
在这里插入图片描述
添加开机自启

vim /usr/lib/systemd/system/redis_exporter.service
[Unit]
Description=redis_exporter
After=network.target

[Service]
Restart=on-failure
ExecStart=/usr/local/redis_exporter-v1.50.0/redis_exporter

[Install]
WantedBy=multi-user.target

刷新配置
systemctl daemon-reload 
测试启动
systemctl start redis_exporter
查看启动状态
systemctl status redis_exporter
添加开机自启
systemctl enable redis_exporter

进入prometheus目录添加配置

- job_name: 'mysqld_exporter' # 采集mysql的指标
  static_configs:
  - targets: ['localhost:9121'] # redis_exporter服务的ip和端口
重启prometheus，打开web访问prometheus

存在即配置成功
在这里插入图片描述

1.6.监控SpringBoot-2.x

//添加pom
<!--  spring-boot-actuator依赖   -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<!-- https://mvnrepository.com/artifact/io.micrometer/micrometer-registry-prometheus -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

添加yml配置

management:
  endpoints:
    # Web端点的配置属性
    web:
      # 设置端点访问的URL前缀，默认为/actuator
      base-path: /actuator
      exposure:
        #  开放端点的ID集合（eg:['health','info','beans','env']），配置为“*”表示全部 安全建议只开启  prometheus,health
        include: '*'
  metrics:
    tags:
      application: ${spring.application.name}

#测试地址 
http://localhost:80/actuator/prometheus  #端口号更改为自己设定的端口号

 #prometheus配置文件
 - job_name: 'java'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:8080','localhost:8081','localhost:8082','localhost:8083'] #多个服务的配置 
#重启prometheus，打开web访问prometheus

1.7.alertmanager

alertmanager-0.25.0-下载地址
也可以打开官网选择版本下载
在这里插入图片描述
修改prometheus的配置文件prometheus.yml

# Alertmanager configuration
# 改为alertmanager的地址
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
# 指定规则文件
rule_files:
  - "/usr/local/prometheus-2.44.0/rules/*.yml"

#在prometheus目录新建 rules 目录
mkdir rules

1.7.1.添加告警规则

***************** vi node_alived.yml **************8
groups:
- name: 实例存活告警规则
  rules:
  - alert: 实例存活告警
    expr: up == 0
    for: 1m
    labels:
      user: prometheus
      severity: warning
    annotations:
      summary: "主机宕机 !!!"
      description: "该实例主机已经宕机超过一分钟了。"

********************* vi memory_over.yml*************************
groups:
- name: 内存报警规则
  rules:
  - alert: 内存使用率告警
    expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes))) * 100 > 50
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "服务器可用内存不足。"
      description: "内存使用率已超过50%（当前值：{{ $value }}%）"

*************** vi cpu_over.yml *******************************
groups:
- name: CPU报警规则
  rules:
  - alert: CPU使用率告警
    expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 50
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "CPU使用率正在飙升。"
      description: "CPU使用率超过50%（当前值：{{ $value }}%）"

**************** vi disk_over.yml******************************
groups:
- name: 磁盘使用率报警规则
  rules:
  - alert: 磁盘使用率告警
    expr: 100 - node_filesystem_free_bytes{fstype=~"xfs|ext4"} / node_filesystem_size_bytes{fstype=~"xfs|ext4"} * 100 > 80
    for: 20m
    labels:
      severity: warning
    annotations:
      summary: "硬盘分区使用率过高"
      description: "分区使用大于80%（当前值：{{ $value }}%）"

上传服务器

#解压
tar -zxvf alertmanager-0.25.0.linux-amd64.tar.gz

#进入解压目录,修改配置文件

vi alertmanager.yml

global:
  resolve_timeout: 5m
 
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'
 
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://localhost:8089/adapter/wx'   #默认端口
    send_resolved: true
 
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance'] 

#查看版本
/usr/local/alertmanager-0.25.0/alertmanager --version
#启动
/usr/local/alertmanager-0.25.0/amtool check-config /usr/local/alertmanager-0.25.0/alertmanager.yml
打开web访问 http://localhost:9093/metrics

启动成功
在这里插入图片描述
进入prometheus目录添加配置

- job_name: 'alertmanager_exporter' 
  static_configs:
  - targets: ['localhost:9093'] 
重启prometheus，打开web访问prometheus

存在即配置成功
在这里插入图片描述

添加开启自启

vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=https://prometheus.io
  
[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager-0.25.0/alertmanager --config.file /usr/local/alertmanager-0.25.0/alertmanager.yml --storage.path="/usr/local/alertmanager-0.25.0/data/" --data.retention=120h
 
[Install]                      
WantedBy=multi-user.target

保存后：
systemctl daemon-reload
systemctl enable alertmanager
systemctl start alertmanager

1.7.2.安装docker

yum install -y yum-utils
#设置镜像仓库地址
 yum-config-manager \
  --add-repo \
   http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
   
yum makecache fase 

yum install docker-ce docker-ce-cli containerd.io
#启动docker
systemctl  start  docker
#查看docker版本
docker version
#测试
docker run hello-world
docker  images  /docker ps 
systemctl enable docker

添加json配置
vi etc/docker/daemon.json
{
"registry-mirrors": ["https://78q96cy9.mirror.aliyuncs.com"]
} 

systemctl daemon-reload #刷新配置
systemctl start docker  # 启动docker服务
systemctl stop docker  # 停止docker服务
systemctl restart docker  # 重启docker服务

1.7.3.企业微信机器人配置及启动

打开企业微信添加机器人
在这里插入图片描述
复制webhook地址

#执行
docker run -d --name wechat \
--restart always -p 8080:80 \
guyongquan/webhook-adapter \
--adapter=/app/prometheusalert/wx.js=/wx=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxx(自己的微信机器人key

启动成功后，修改之前配置告警消息配置更改为>10稍等一会,收到告警消息
在这里插入图片描述
更改为50，收到正常消息

2.Grafana

#Grafana下载
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_9.5.2_amd64.deb

或者打开官网下载自己需要的版本

在这里插入图片描述
解压进入目录启动

nohup ./bin/grafana-server web > ./grafana.log 2>&1 &
打开web：http://localhost:3000,默认用户名和密码：admin/admin

顾优秀

关注

2
点赞
踩
23

收藏

觉得还不错? 一键收藏
0
评论
Prometheus+Grafana+企业微信机器人告警

开源监控和报警系统。
复制链接

扫一扫