环境说明:
- 操作系统:Centos7-2009
- 确保网络连接正常、yum源正常
监控主机 | 11.0.1.137 |
---|---|
被监控主机 | 11.0.1.134 |
版本node_exporter | v1.5.0 |
步骤说明:
1、原理逻辑
使用 **ip:9100/metrics **暴露数据
2、被监控主机安装node_exporter
:::tips
exporter列表:https://prometheus.io/docs/instrumenting/exporters
:::
node_exporter获取:
官网获取软件包:https://prometheus.io/download/
GitHub获取软件包:
1、进入exporter列表
https://prometheus.io/docs/instrumenting/exporters/
2、跳转GitHub
https://github.com/prometheus/node_exporter
3、获取包
3、在Prometheus配置被监控主机信息
安装node_exporter
mkdir /opt/node-exporter
cd /opt/node-exporter/
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar xzvf node_exporter-1.5.0.linux-amd64.tar.gz
mv node_exporter-1.5.0.linux-amd64 node_exporter-1.5.0
#测试启动
cd node_exporter-1.5.0/
./node_exporter
.........
ts=2023-02-28T15:07:53.817Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9100
ts=2023-02-28T15:07:53.817Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9100
访问测试:http://11.0.1.134:9100/
配置服务管理
# vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
[Service]
ExecStart=/opt/node-exporter/node_exporter-1.5.0/node_exporter
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
#重载、启动、开机启动
systemctl daemon-reload
systemctl start node_exporter
systemctl enable node_exporter
3、监控主机配置的
vi prometheus.yml
#增加监控配置
- job_name: "exp"
static_configs:
- targets: ["11.0.1.134:9100"]
#配置检查,如下结果为ok
./promtool check config ./prometheus.yml
Checking ./prometheus.yml
SUCCESS: ./prometheus.yml is valid prometheus config file syntax
#进行热加载
##格式为: kill -HUP pid
ps -ef |grep prometheus
ps -ef |grep prometheus
root 10306 1 0 20:30 ? 00:00:05 /opt/monitor/prometheus/prometheus --config.file=/opt/monitor/prometheus/prometheus.yml
root 10709 1931 0 23:19 pts/1 00:00:00 grep --color=auto prometheus
kill -HUP 10306
页面测试:
4、在Grafana展示数据:
前提绑定Prometheus的数据源
导入仪表盘的json或者ID
仪表盘数据来源官网:https://grafana.com/grafana/dashboards/
5、给暴露指标接口启用http认证
1、被监控主机
cd /opt/node-exporter/node_exporter-1.5.0/
#启用HTTP认证:
vi config.yml
basic_auth_users:
prometheus: $2y$12$8uSetX/PDmYcBOFGRYxBauz8KaCZhHsZz0yf7GWn8DCxVlWMfB5nW
# 用户名: 密码
##上面密码用下面命令生成:
yum install httpd-tools –y
htpasswd -nBC 12 '' | tr -d ':\n'
New password:
Re-type new password:
$2y$12$8uSetX/PDmYcBOFGRYxBauz8KaCZhHsZz0yf7GWn8DCxVlWMfB5nW0
# vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
[Service]
ExecStart=/opt/node-exporter/node_exporter/node_exporter --web.config.file=/opt/node-exporter/node_exporter/config.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
#重载、重启服务
systemctl daemon-reload
systemctl restart node_exporter
#检查服务状态
systemctl status node_exporter.service
重新刷新页面访问:http://11.0.1.134:9100/
2、监控主机
vi prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "exp"
basic_auth:
username: prometheus
password: 123456
static_configs:
- targets: ["11.0.1.134:9100"]
配置检查以及检查监控状态
./promtool check config ./prometheus.yml
Checking ./prometheus.yml
SUCCESS: ./prometheus.yml is valid prometheus config file syntax
#进行配置热加载
##格式为: kill -HUP pid
ps -ef |grep prometheus
root 10803 1 0 Feb28 ? 00:00:04 /opt/monitor/prometheus/prometheus --config.file=/opt/monitor/prometheus/prometheus.yml
kill -HUP 10803
6、监控系统服务
被监控主机上配置
# vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
[Service]
ExecStart=/opt/node-exporter/node_exporter/node_exporter --web.config.file=/opt/node-exporter/node_exporter/config.yml --collector.systemd --collector.systemd.unit-include=(sshd|network).service
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target
#重载、重启服务
systemctl daemon-reload
systemctl restart node_exporter
#检查状态是否正常
systemctl status node_exporter.service
验证
被监控主机:http://11.0.1.134:9100/metrics
监控主机http://11.0.1.137:9090/graph
输入:node_systemd_unit_state,显示相关服务状态
sshd和network服务已被监控