一、本次安装图示例
二、安装程序下载地址地址相关
Prometheus:
程序 | 官方地址 |
---|---|
Prometheus、alertmanager、mysqld_exporter、node_exporter 、pushgateway、 | https://prometheus.io/download/ |
redis_exporter | https://github.com/oliver006/redis_exporter/releases |
process-exporter | https://github.com/ncabatoff/process-exporter/releases/ |
nginx-module-vts | https://github.com/vozlt/nginx-module-vts#installation |
nginx-vts-exporter | https://github.com/hnlq715/nginx-vts-exporter/releases/ |
cadvisor | https://github.com/google/cadvisor |
PrometheusAlert | https://github.com/feiyu563/PrometheusAlert |
grafana下载 | https://grafana.com/grafana/download?pg=get&plcmt=selfmanaged-box1-cta1 |
grafana摸版 | https://grafana.com/grafana/dashboards/ |
三、安装程序
3.1 Prometheus主程序安装
3.1.1 go环境的解压完成安装
tar -zxvf prometheus-2.31.1.linux-amd64.tar.gz -C /usr/local/
ln -s prometheus-2.37.0.linux-amd64 prometheus
3.1.2 配置使用Systemd管理Prometheus
vim /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/usr/local/prometheus/data --web.listen-address=:9090 --web.enable-lifecycle
ExecStop=/usr/bin/pkill -f prometheus
[Install]
WantedBy=multi-user.target
3.1.3 启动相关
# 重载systemd 配置,修改完systemd配置文件后需重载才会生效。
systemctl daemon-reload
# 设置服务开机启动
systemctl enable prometheus
# 启动服务
systemctl start prometheus
# 查看服务状态
systemctl status prometheus
访问地址 Http://ip:9090,服务已经正常启动
热重启
curl -XPOST http://host:9090/-/reload
3.2 安装Grafana 安装
3.2.1 解压完成安装grafana
tar -zxvf grafana-enterprise-9.0.6.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/ && ln -s pushgateway-1.4.3.linux-amd64 pushgateway
3.2.2 启动相关grafana
vim /etc/systemd/system/grafana.service
[Unit]
Description=grafana
[Service]
ExecStart=/usr/local/grafana/bin/grafana-server -homepath=/usr/local/grafana
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
3.2.3启动grafana
systemctl daemon-reload
systemctl start grafana
systemctl enable grafana
systemctl stop grafana
3.2.3登录grafana
登录地址http://ip:3000 默认账号密码为admin/admin
添加源
3.3 安装alertmanager
3.3.1 解压完成安装
tar -zxvf alertmanager-0.24.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local && ln -s alertmanager-0.24.0.linux-amd64 alertmanager
3.3.2 配置使用Systemd管理alertmanager
vim /etc/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/alertmanager/alertmanager --web.listen-address=:9093 --config.file=/usr/local/alertmanag
er/alertmanager.yml --storage.path=/usr/local/alertmanager/data/
ExecStop=/usr/bin/pkill -f alertmanager
[Install]
WantedBy=multi-user.target
3.3.3 启动alertmanager
# 重载systemd 配置,修改完systemd配置文件后需重载才会生效。
systemctl daemon-reload
# 设置服务开机启动
systemctl enable alertmanager
# 启动服务
systemctl start alertmanager
# 查看服务状态
systemctl status alertmanager
访问alertmanager的地址 http://ip:9093/
3.4 安装node_exporter 主机监控
3.4.1 解压完成安装
tar -zxvf node_exporter-1.4.0-rc.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/ && ln -s node_exporter-1.4.0-rc.0.linux-amd64 node_exporter
3.4.2 配置使用Systemd管理node_exporter
vim /etc/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/node_exporter/node_exporter --web.listen-address=:9100
ExecStop=/usr/bin/pkill -f node_exporter
[Install]
WantedBy=multi-user.target
3.4.3启动node_exporter
# 重载systemd 配置,修改完systemd配置文件后需重载才会生效。
systemctl daemon-reload
# 设置服务开机启动
systemctl enable node_exporter
# 启动服务
systemctl start node_exporter
# 查看服务状态
systemctl status node_exporter
3.4.4 修改prometheus配置
修改配置 vi /usr/local/prometheus/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- - localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- rules/*.yml
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: 'node exporter'
static_configs:
- targets: ['192.168.1.134:9100']
labels:
instance: node_01
- targets: ['192.168.1.132:9100']
labels:
instance: node_02
登录后台查看表示node已经成功启动
3.4.5 添加监控图示
登录grafana:添加模板
这里可以使用8919或者12633(我这边使用的是这个)
3.4.5 添加报警rule
cd /usr/local/prometheus/ && mkdir rules
cpu报警
cat cpu.yml
groups:
- name: CPU报警规则
rules:
- alert: 服务器-CPU使用率告警
expr: 100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[1m]) )) * 100 > 85
for: 3m
labels:
severity: warning
annotations:
summary: "CPU使用率正在飙升。"
description: "CPU使用率超过85%(当前值:{{ $value }}%)"
内存报警
cat memory.yml
groups:
- name: 内存报警规则
rules:
- alert: 服务器-内存使用率告警
expr: (1 - (node_memory_MemAvailable_bytes{job="node exporter"} / (node_memory_MemTotal_bytes{job="node exporter"}))) * 100 > 85
for: 3m
labels:
severity: warning
annotations:
summary: "服务器可用内存不足。"
description: "内存使用率已超过85%(当前值:{{ $value }}%)"
硬盘报警
cat disk.yml
groups:
- name: 磁盘使用率报警规则
rules:
- alert: 服务器-磁盘使用率告警
expr: 100 - node_filesystem_free_bytes{fstype=~"xfs|ext4"} / node_filesystem_size_bytes{fstype=~"xfs|ext4"} * 100 > 95
for: 30m
labels:
severity: warning
annotations:
summary: "硬盘分区使用率过高"
description: "分区使用大于85%(当前值:{{ $value }}%)"
查看是否正常
3.5 安装PushGateway
3.5.1 go环境的解压完成安装
tar -zxvf pushgateway-1.4.3.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/ && ln -s pushgateway-1.4.3.linux-amd64 pushgateway
3.5.2 配置使用Systemd管理PushGateway
vim /etc/systemd/system/pushgateway.service
[Unit]
Description=pushgateway
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/pushgateway/pushgateway --web.listen-address=:9091
ExecStop=/usr/bin/pkill -f pushgateway
[Install]
WantedBy=multi-user.target
3.5.3启动PushGateway
# 重载systemd 配置,修改完systemd配置文件后需重载才会生效。
systemctl daemon-reload
# 设置服务开机启动
systemctl enable pushgateway
# 启动服务
systemctl start pushgateway
# 查看服务状态
systemctl status pushgateway
3.5.4 修改prometheus配置
修改配置 vi /usr/local/prometheus/prometheus.yml
添加
- job_name: 'pushgateway'
static_configs:
- targets: ['192.168.1.134:9091']
labels:
instance: pushgateway
重启prometheus,然后页面上查看,这样表示已经正常
3.6 进程监控(process_exporter)
3.6.1 解压完成安装
tar -zxvf process-exporter-0.7.10.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/ && ln -s process-exporter-0.7.10.linux-amd64 process-exporter
编写配置文件
cd /usr/local/process-exporter && vi process-name.yaml
process_names:
#监控指定进程
- name: "{{.Matches}}"
cmdline:
- 'nginx'
- name: "{{.Matches}}"
cmdline:
- 'redis-server'
- name: "{{.Matches}}"
cmdline:
- 'mysql'
#监控所有进程
- name: "{{.Matches}}"
cmdline:
- '.+'
3.6.2 配置使用Systemd管理process-exporter
vim etc/systemd/system/process-exporter.service
[Unit]
Description=process-exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/process-exporter/process-exporter -config.path /usr/local/process-exporter/process-name.yaml
ExecStop=/usr/bin/pkill -f process-exporter
[Install]
WantedBy=multi-user.target
3.6.3启动PushGateway
# 重载systemd 配置,修改完systemd配置文件后需重载才会生效。
systemctl daemon-reload
# 设置服务开机启动
systemctl enable process-exporter
# 启动服务
systemctl start process-exporter
# 查看服务状态
systemctl status process-exporter
3.6.3修改prometheus配置
vi prometheus.yml 添加
- job_name: 'process'
static_configs:
- targets: ['192.168.1.132:9256','192.168.1.134:9256']
重启后查看
3.6.4 添加监控图示
grafana 添加模版ID: 4202 (本次使用)| 249 | 8378 任选其一模版使用,添加完成如下
3.6.4 添加报警监控
[root@localhost rules]# vi /usr/local/prometheus/rules/134process.yml
groups:
- name: 进程报警
rules:
- alert: 进程-php-fpm
expr: sum(namedprocess_namegroup_num_procs{groupname=~".*php-fpm.*" ,instance =~ "192.168.1.134.*"}) < 5
for: 3m
labels:
severity: warning
annotations:
summary: "服务进行挂掉php-fpm。"
description: "服务进行挂掉php-fpm进程数(当前值:{{ $value }})"
- alert: 进程-nginx
expr: sum(namedprocess_namegroup_num_procs{groupname=~".*nginx.*" ,instance =~ "192.168.1.134.*"}) < 2
for: 3m
labels:
severity: warning
annotations:
summary: "服务进行挂掉nginx。"
description: "服务进行挂掉nginx进程数(当前值:{{ $value }})"
- alert: 进程-mysql
expr: sum(namedprocess_namegroup_num_procs{groupname=~".*mysql.*" ,instance =~ "192.168.1.134.*"}) < 2
for: 3m
labels:
severity: warning
annotations:
summary: "服务进行挂掉mysql。"
description: "服务进行挂掉mysql进程数(当前值:{{ $value }})"
- alert: 进程-redis
expr: sum(namedprocess_namegroup_num_procs{groupname=~".*redis.*" ,instance =~ "192.168.1.134.*"}) < 2
for: 3m
labels:
severity: warning
annotations:
summary: "服务进行挂掉redis。"
description: "服务进行挂掉mysql进程数(当前值:{{ $value }})"
重启prometheus