一、前沿
通过上级篇文章我们已经对 Prometheus监控服务器基础资源做了记录,这节课主要记录一下监控服务器上的进程
二、实现步骤
Prometheus机器IP | process-exporter机器IP |
---|---|
192.168.1.3 | 192.168.1.4 |
1、安装 process-exporter 下载地址(在192.168.1.4机器上操作)
cd /usr/local/src
wget https://github.com/ncabatoff/process-exporter/releases/download/v0.7.10/process-exporter-0.7.10.linux-amd64.tar.gz
tar -xf process-exporter-0.7.10.linux-amd64.tar.gz
mv process-exporter-0.7.10.linux-amd64 ../
cd ../
ln -s /usr/local/process-exporter-0.7.10.linux-amd64 process_exporter
cd /usr/local/process_exporter/
vim process-conf.yaml
process_names:
- name: "{{.Matches}}"
cmdline:
- 'squid' # 需要监控的服务名字
- name: "{{.Matches}}"
cmdline:
- 'medusa' # 需要监控的服务名字
vim /usr/lib/systemd/system/process_exporter.service
[Unit]
Description=Prometheus Process Exporter
After=network.target
[Service]
ExecStart=/usr/local/process_exporter/process-exporter -config.path /usr/local/process_exporter/process-conf.yaml
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl start process_exporter.service
systemctl enable process_exporter.service
至此整个 process_exporter 已经启动 并且设置了开启自启,监控了 squid的服务名。
2、配置 Prometheus 监控(在 192.168.1.3上操作)
一、配置进程监控规则
cd /usr/local/prometheus/rules && vim process.yml
groups:
- name: process_rule
rules:
- alert: 鱿鱼服务 # 告警名称
expr: (namedprocess_namegroup_num_procs) == 0
for: 30s # 满足告警条件持续时间多久后,才会发送告警
labels: #标签项
severity: error
annotations: # 解析项,详细解释告警信息
summary: "{{ $labels.instance }}: squid进程服务挂了,已经超过30秒"
value: "{{ $value }}"
注意:
主要是 namedprocess_namegroup_num_procs{groupname="map[:squid]"} 要和 process-conf.yaml保持一致。
二、配置 监控机器
vim /usr/local/prometheus/sd_config/squid/process.yml
- targets:
- 192.168.1.4:9256
三、配置 prometheus.yml
# 进程监控
- job_name: 'jd-squid-process'
file_sd_configs:
- files: ['/usr/local/prometheus/sd_config/squid/process.yml']
refresh_interval: 20s
四、重启 prometheus 即可
systemctl restart prometheus.service
3、配置 grafana 官方地址
这里配置编号 249 即可,其他如何配置可以看之前文章
三、总结
简单记录一下,防止以后用