node_exporter官方文档: https://prometheus.io/docs/guides/node-exporter/
主机
IP
监控机
192.168.1.155
被监控机
192.168.1.155
被监控机
192.168.1.156
部署node_exporter插件(所有主机)
wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz
tar -zxvf node_exporter-0.17.0.linux-amd64.tar.gz
mv node_exporter-0.17.0.linux-amd64 /usr/local/node_exporter
cat > /usr/lib/systemd/system/node_exporter.service << EOF
[Unit]
Description=prometheus
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl start node_exporter
systemctl enable node_exporter
ps -ef |grep node_exporter|grep -v grep
netstat -tunlp |grep 9100
访问node_exporter metrics报警接口
http://192.168.1.155:9100/metrics
配置自动发现Linux服务器(监控机)
cd /usr/local/prometheus/
vim /usr/local/prometheus/prometheus.yml #在底部新增
- job_name: 'linux-node' #添加的行
file_sd_configs: #添加的行
- files: ['/usr/local/prometheus/sd_config/linux-node.yml'] #添加的行
refresh_interval: 3s #添加的行
# 编辑linux-node.yml文件
mkdir /usr/local/prometheus/sd_config/
vim /usr/local/prometheus/sd_config/linux-node.yml
- targets:
- 192.168.1.155:9100
- 192.168.1.156:9100
labels:
idc: bj
os: linux
# 检查prometheus配置文件是否正确
/usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml
# 重新加载prometheus配置文件
kill -hup `ps -ef |grep prometheus|grep -v grep|awk '{print $2}'`
验证是否自动发现linux-node监控
promSQL获取CPU信息
cpu使用率: 100-空闲率=使用率
100 - irate(node_cpu_seconds_total{instance="192.168.1.155:9100",job="linux-node",mode="idle"}[5m]) * 100
promSQL获取内存信息
内存使用率: 100 - (剩余内存free容量+缓存cache容量+缓存buffer容量)/内存总和total容量 * 100
100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes)/node_memory_MemTotal_bytes * 100
promSQL获取硬盘信息
/分区使用率:
100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"}/node_filesystem_size_bytes{mountpoint="/",fstype=~"ext4|xfs"} * 100)
获取系统服务运行状态
# 在node_exporter systemd服务中加入进程监控的选项
cat > /usr/lib/systemd/system/node_exporter.service << EOF
[Unit]
Description=prometheus
[Service]
Restart=on-failure
ExecStart=/usr/local/node_exporter/node_exporter --collector.systemd \
--collector.systemd.unit-whitelist=(sshd|grafana-server|prometheus|node_exporter).service
[Install]
WantedBy=multi-user.target
EOF
# 重启node_exporter服务
systemctl daemon-reload
systemctl restart node_exporter
ps -ef |grep node_exporter|grep -v grep
查看sshd进程状态
node_systemd_unit_state{instance=”192.168.1.155:9100″,job=”linux-node”,name=”sshd.service”}
sshd.services
集成部署Grafana
wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.3.4-1.x86_64.rpm
yum localinstall -y grafana-5.3.4-1.x86_64.rpm
systemctl start grafana-server
iptables -I INPUT -p tcp --dport 3000 -j ACCEPT
登录Grafana
地址: http://192.168.1.155:3000/login
默认用户名: admin
默认密码: admin
集成Prometheus
导入模板
模板地址: https://grafana.com/dashboards/8919
id: 8919