一、实验要求
1.本文以一台阿里云服务器作为实验对象
1)部署prometheus实时检测,并添加grafana优化图形化界面。
2)发送报警信息给钉钉机器人,实现对服务器性能、进程的7*24小时检测。
2.需要准备
1)一台云服务器或虚拟机
2)钉钉机器人的url和加签
二、部署prometheus监控系统
[root@yun2 ~]# wget https://github.com/prometheus/prometheus/releases/download/v2.50.1/prometheus-2.50.1.linux-amd64.tar.gz
[root@yun2 ~]# tar -xf prometheus-2.50.1.linux-amd64.tar.gz
[root@yun2 ~]# mv prometheus-2.50.1.linux-amd64 /usr/local/prometheus
[root@yun2 local]# useradd -M -s /sbin/nologin prometheus //创建运行用户
[root@yun2 local]# chown -R prometheus: /usr/local/prometheus
[root@yun2 ~]# vim /usr/lib/systemd/system/prometheus.service //添加系统服务
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target
[root@yun2 ~]# systemctl start prometheus
[root@yun2 ~]# ss -anpt | grep 9090 //查看服务端口是否监听
三、node_exporter对服务器性能监控
Node_exporter:是一个用于收集和暴露主机级别指标的组件,可与 Prometheus 监控系统集成
[root@yun2 ~]# wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
[root@yun2 ~]# tar -xf node_exporter-1.7.0.linux-amd64.tar.gz
[root@yun2 ~]# mv node_exporter-1.7.0.linux-amd64 /usr/local/node_exporter
[root@yun2 ~]# vim /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
After=network.target
[Service]
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
RestartSec=20
[Install]
WantedBy=multi-user.target
[root@yun2 ~]# systemctl start node_exporter.service
[root@yun2 ~]# systemctl status node_exporter.service //确认服务已开启
修改prometheus著配置文件,使prometheus能够集成node_exporter监测到的数据
[root@yun2 ~]# vim /usr/local/prometheus/prometheus.yml
- job_name: "主机监控" //添加一个监控项目
static_configs:
- targets: ["服务器ip:9100"] //9100是node_exporter的端口号
[root@yun2 ~]# systemctl daemon-reload
[root@yun2 ~]# systemctl restart prometheus
浏览器访问9090端口,在Status-targer中可以看到node_exporter已经集成了node_exporter的监测数据。需要在云服务器的安全组中添加9090和9100端口允许访问。
四、Grafana对服务器性能监控图形化。
[root@yun2 ~]# yum -y localinstall grafana-7.1.5-1.x86_64.rpm //提前准备好的rpm包
[root@yun2 ~]# systemctl start grafana-server.service
[root@yun2 ~]# ss -anpt | grep 3000
将3000端口加入安全组允许访问的策略
使用HTTP访问3000端口,第一次登录的默认账号密码都是admin,登陆后修改密码
添加prometheus源
添加prometheus的url
创建模板,模板可以自定义
加入prometheus源
在主机监控中即可对当前主机性能监控
五、process-exporter对mysqld进程监控
以mysql进程为例,对mysqld服务进行实时监控。进程监控需要使用process_exporter组件
1.安装process_exporter组件,修改配置文件为监控mysqld进程。添加系统服务。
[root@yun2 ~]# tar -zxf process-exporter-0.7.5.linux-amd64.tar.gz
[root@yun2 ~]# mv process-exporter-0.7.5.linux-amd64 /usr/local/process_exporter
[root@yun2 ~]# vim /usr/local/process_exporter/process.yaml //创建process_exporter配置文件,设置监测的进程为mysqld
process_names:
- name: "{{.Matches}}"
cmdline:
- 'mysqld'
[root@yun2 ~]# vim /usr/lib/systemd/system/process_exporter.service //添加系统托管
[Unit]
Description=Prometheus exporter for processors metrics, written in Go with pluggable metric collectors.
After=network.target
[Service]
Type=simple
User=root
WorkingDirectory=/usr/local/process_exporter
ExecStart=/usr/local/process_exporter/process-exporter
-config.path=/usr/local/process_exporter/process.yaml
Restart=on-failure
[Install]
WantedBy=multi-user.target
[root@yun2 ~]# vim /usr/lib/systemd/system/process_exporter.service
[root@yun2 ~]# systemctl status process_exporter.service //确定能够正常启动
2.修改prometheus配置文件,将process_exporter监测数据集成到prometheus中
[root@yun2 ~]# vim /usr/local/prometheus/prometheus.yml
- job_name: "mysqld进程监控" //添加监控项目,9256端口是process_exporter的默认端口
static_configs:
- targets: ["服务器ip:9256"]
[root@yun2 ~]# systemctl restart prometheus
在安全组中添加9256和3306端口,确认mysqld进程监控集成到了prometheus中。
点击链接进入process-exporter查看监测数据,CTRL+F搜索有mysqld字段,说明监测到mysqld进程。
六、mysqld进程监控+钉钉告警
Alertmanager 监控系统的组件,用于接收、处理和发送告警通知。
1.安装alertmanager
[root@yun2 ~]# wget https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz
[root@yun2 ~]# tar -zxf alertmanager-0.17.0.linux-amd64.tar.gz
[root@yun2 ~]# mv alertmanager-0.17.0.linux-amd64 /usr/local/alertmanager
[root@yun2 ~]# vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=Prometheus-Server
After=network.target
[Service]
ExecStart=/usr/local/alertmanager/alertmanager --cluster.advertise-address=0.0.0.0:9093 --config.file=/usr/local/alertmanager/alertmanager.yml
User=root
[Install]
WantedBy=multi-user.target
[root@yun2 ~]# systemctl start alertmanager.service
2.添加告警规则,如果检测到mysqld服务停止启动,会发送如下信息
[root@yun2 ~]# mkdir /usr/local/prometheus/rule
[root@yun2 ~]# vim /usr/local/prometheus/rule/process.yml
groups:
- name: mysqld进程监控
rules:
- alert: mysqld进程监控
expr: namedprocess_namegroup_num_procs{groupname="map[:mysqld]"} == 0
for: 3s
labels:
severity: 严重告警
service: MYSQLD
annotations:
summary: "实例 {{ $labels.instance }} 上的MYSQLD服务已停止运行超过3秒"
description: "在过去的3秒钟内,检测到该实例上的MYSQLD进程数为0,请尽快排查并恢复服务。"
[root@iZ2zecshuv4pu6h9txneadZ rule]# systemctl restart prometheus
[root@iZ2zecshuv4pu6h9txneadZ rule]# systemctl status prometheus
3.使prometheus识别规则文件
在prometheus主配置文件中添加
[root@yun2 rule]# vim /usr/local/prometheus/prometheus.yml
rule_files:
- "/usr/local/prometheus/rule/*.yml"
[root@yun2 ~]# systemctl restart prometheus
查看规则是否纳入prometheus中
4.prometheus和alertmanager链接,使peometheus推送告警信息给alertmanager
[root@iZ2zecshuv4pu6h9txneadZ ~]# vim /usr/local/prometheus/prometheus.yml
alerting:
alertmanagers:
- static_configs:
- targets:
- 47.93.222.225:9093
关闭mysql模拟故障。登录alertmanager监控平台,查看prometheus是否成功推送告警信息
[root@iZ2zecshuv4pu6h9txneadZ ~]# systemctl stop mysqld
5.部署prometheus-webhook-dingtalk
prometheus-webhook-dingtalk 是一个用于将 Prometheus 告警通知发送到钉钉群组的 webhook 模块。
[root@iZ2zecshuv4pu6h9txneadZ ~]# tar -zxf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz -C /usr/local/
[root@iZ2zecshuv4pu6h9txneadZ ~]# cd /usr/local
[root@iZ2zecshuv4pu6h9txneadZ local]# mv prometheus-webhook-dingtalk-2.1.0.linux-amd64/ webhook
[root@iZ2zecshuv4pu6h9txneadZ local]# vim /lib/systemd/system/webhook.service
[Unit]
Description=Prometheus-Server
After=network.target
[Service]
ExecStart=/usr/local/webhook/prometheus-webhook-dingtalk --config.file=/usr/local/webhook/config.yml
User=root
[Install]
WantedBy=multi-user.target
6.修改webhook配置文件,指定钉钉报警机器人
[root@iZ2zecshuv4pu6h9txneadZ local]# cd /usr/local/webhook
[root@iZ2zecshuv4pu6h9txneadZ webhook]# cp -p config.example.yml config.yml
[root@iZ2zecshuv4pu6h9txneadZ webhook]# vim config.yml
指定钉钉机器人的url和加签
7.配置alertmanager的主配置文件,把alertmanager和webhook链接起来,并定义webhook告警路由的设置
[root@iZ2zecshuv4pu6h9txneadZ webhook]# vim /usr/local/alertmanager/alertmanager.yml
route:
group_by: ['dingding']
group_wait: 5s
group_interval: 5s
repeat_interval: 3m
receiver: 'dingding.webhook1'
routes:
- receiver: 'dingding.webhook1'
match_re:
alertname: ".*"
receivers:
- name: 'dingding.webhook1'
webhook_configs:
- url: 'http://主机ip:8060/dingtalk/webhook1/send'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
[root@iZ2zecshuv4pu6h9txneadZ webhook]# systemctl restart prometheus
[root@iZ2zecshuv4pu6h9txneadZ webhook]# systemctl restart webhook
[root@iZ2zecshuv4pu6h9txneadZ webhook]# systemctl restart alertmanager.service
8.发出告警信息