Prometheus安装
下载wget 用来下载Prometheus
// 关闭防火墙,或者放开对应端口 这里为了省事直接关闭了防火墙
sudo systemctl stop firewalld
yum install wget
// 下载prometheus
wget https://githubfast.com/prometheus/prometheus/releases/download/v2.37.2/prometheus-2.37.2.linux-amd64.tar.gz
解压prometheus 并移动重命名
tar -xzvf prometheus-2.37.2.linux-amd64.tar.gz -C /opt
重命名
mv prometheus-2.37.2.linux-amd64 prometheus
// 查看版本
cd /opt/prometheus
./prometheus --version
每次修改完配置文件后一定要检查prometheus文件
cd /opt/prometheus
./promtool check config prometheus.yml
创建prometheus 本地TSDB数据存储目录
mkdir -p /data/prometheus
配置prometheus 程序 使用 systemctl 管理 Prometheus
vi /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=root
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
设置prometheus 开机自启动
systemctl enable prometheus
systemctl start prometheus
// 查看状态
systemctl status prometheus
在网页打开prometheus 浏览器输入 [你的ip地址]:9090 默认端口是9090
安装node-exporter
// 下载node_exporter
wget https://githubfast.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz
// 解压
tar -zvxf node_exporter-1.4.0.linux-amd64.tar.gz -C /opt
// 重命名
mv node_exporter-1.4.0.linux-amd64 node_exporter
systemctl 管理 node_exporter
vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=root
ExecStart=/opt/node_exporter/node_exporter
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
设置node_exporter开机启动
systemctl enable node_exporter
systemctl start node_exporter
配置promethenus 配置文件,我这里监听的是官方的模版
官方模版地址 https://grafana.com/grafana/dashboards/
targets:[你的ip地址:9100]
cd /opt/prometheus
vi prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files: ['/opt/prometheus/rules/*.yml']
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: node
static_configs:
- targets: ['10.121.50.32:9100']
node_exporter 设置 具体说明参考https://blog.csdn.net/ygq13572549874/article/details/129115350
cd /opt/node_exporter
// 修改监控接口
./node_exporter --web.listen-address="127.0.0.1:9100" --log.level=warn &
// 开启搜集processes
./node_exporter --collector.processes
配置 Prometheus Rule 告警规则
mkdir -p /opt/prometheus/rules/
// 编辑rule 配置文件
vi /optl/prometheus/rules/rules.yml
groups:
- name: http_status_code
rules:
- alert: probe_http_status_code
expr: probe_http_status_code != 200
for: 1m
labels:
severity: critical
annotations:
summary: "{{ $labels.instance }} 状态码异常"
description: "{{ $labels.instance }} 网站访问异常!!! (value: {{ $value }})"
- name: icmp_ping_status
rules:
- alert: icmp_ping_status
expr: probe_icmp_duration_seconds{phase="rtt"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "主机 {{ $labels.instance }} ICMP异常"
description: "{{ $labels.instance }} ICMP异常!!!(value: {{ $value }})"
value: '{{ $value }}'
##延迟高
- name: link_delay_high
rules:
- alert: link_delay_high
expr: probe_icmp_duration_seconds{phase="rtt"} >0.005
for: 1m
labels:
severity: critical
annotations:
summary: " {{ $labels.instance }} 延迟高!"
description: "{{ $labels.instance }} 延迟高!!!(value: {{ $value }})"
检查rules文件
cd /opt/prometheus/rules
/opt/prometheus/promtool check rules rules.yml
热更新prometheus
curl -X POST http://127.0.0.1:9090/-/reload
打开浏览器 查看node状态
效果预览 用的是Grafana,这里就不介绍了
官方模版地址