一 准备以下软件包(告警 grafana可视化界面 node prometheus监控 )并将包依次导入虚拟机中
二 导入prometheus的tar包,将tar包存放在/usr/local路径下
[root@prometheus ~]# cd /usr/local
[root@prometheus local]# rz -E
rz waiting to receive.
[root@prometheus local]# ls
bin etc games include lib lib64 libexec prometheus-2.53.1.linux-amd64.tar.gz sbin share src
[root@prometheus local]#
三 解压tar包,重命名
bin etc games include lib lib64 libexec prometheus-2.53.1.linux-amd64.tar.gz sbin share src
[root@prometheus local]# tar zxvf prometheus-2.53.1.linux-amd64.tar.gz
prometheus-2.53.1.linux-amd64/
prometheus-2.53.1.linux-amd64/prometheus.yml
prometheus-2.53.1.linux-amd64/prometheus
prometheus-2.53.1.linux-amd64/consoles/
prometheus-2.53.1.linux-amd64/consoles/node-disk.html
prometheus-2.53.1.linux-amd64/consoles/node-overview.html
prometheus-2.53.1.linux-amd64/consoles/node-cpu.html
prometheus-2.53.1.linux-amd64/consoles/node.html
prometheus-2.53.1.linux-amd64/consoles/prometheus-overview.html
prometheus-2.53.1.linux-amd64/consoles/index.html.example
prometheus-2.53.1.linux-amd64/consoles/prometheus.html
prometheus-2.53.1.linux-amd64/LICENSE
prometheus-2.53.1.linux-amd64/promtool
prometheus-2.53.1.linux-amd64/console_libraries/
prometheus-2.53.1.linux-amd64/console_libraries/menu.lib
prometheus-2.53.1.linux-amd64/console_libraries/prom.lib
prometheus-2.53.1.linux-amd64/NOTICE
[root@prometheus local]# mv prometheus-2.53.1.linux-amd64 prometheus
四 ,切换到prometheus目录,备份prometheus.yml文件(以防万一)修改prometheus.yml文件,红线里面的换成虚拟机的IP地址
[root@prometheus local]# cd prometheus/
[root@prometheus prometheus]# ls
console_libraries consoles LICENSE NOTICE prometheus prometheus.yml promtool
[root@prometheus prometheus]# cp prometheus.yml prometheus.bak
[root@prometheus prometheus]# ls
console_libraries consoles LICENSE NOTICE prometheus prometheus.bak prometheus.yml promtool
[root@prometheus prometheus]# vim prometheus.yml
设置Prometheus的用户组和用户都为root
[root@prometheus local]# chown -R root:root ./prometheus
[root@prometheus local]# ll
总用量 101752
drwxr-xr-x. 2 root root 134 6月 29 14:57 bin
drwxr-xr-x. 2 root root 6 4月 11 2018 etc
drwxr-xr-x. 2 root root 6 4月 11 2018 games
drwxr-xr-x. 2 root root 6 4月 11 2018 include
drwxr-xr-x. 2 root root 6 4月 11 2018 lib
drwxr-xr-x. 2 root root 6 4月 11 2018 lib64
drwxr-xr-x. 2 root root 6 4月 11 2018 libexec
drwxr-xr-x. 4 root root 154 7月 24 09:16 prometheus
-rw-r--r--. 1 root root 104191695 7月 23 15:52 prometheus-2.53.1.linux-amd64.tar.gz
drwxr-xr-x. 2 root root 6 4月 11 2018 sbin
drwxr-xr-x. 5 root root 49 6月 29 17:49 share
drwxr-xr-x. 3 root root 51 6月 29 14:55 src
[root@prometheus local]#
五 设置开机自启动,创建prometheus.service,添加以下内容,保存退出
[root@prometheus prometheus]# cd /usr/lib/systemd/system
[root@prometheus system]# vim prometheus.service
[Unit]
Description=Prometheus server
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/usr/local/prometheus/data --web.external-url=http://0.0.0.0:9090
[Install]
WantedBy=multi-user.target
重新加载配置文件,启动服务,可以看到我的服务以及启动了
[root@prometheus system]# systemctl daemon-reload
[root@prometheus system]# systemctl start prometheus.service
[root@prometheus system]# systemctl status prometheus.service
● prometheus.service - Prometheus server
Loaded: loaded (/usr/lib/systemd/system/prometheus.service; enabled; vendor preset: disabled)
Active: active (running) since 二 2024-07-23 19:46:00 CST; 13h ago
Main PID: 3319 (prometheus)
CGroup: /system.slice/prometheus.service
└─3319 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --...
7月 24 07:00:22 prometheus prometheus[3319]: ts=2024-07-23T23:00:22.376Z caller=compact.go:576 leve...alse
7月 24 07:00:22 prometheus prometheus[3319]: ts=2024-07-23T23:00:22.379Z caller=head.go:1355 level=...96ms
7月 24 07:00:22 prometheus prometheus[3319]: ts=2024-07-23T23:00:22.422Z caller=compact.go:514 leve...36ms
7月 24 07:00:22 prometheus prometheus[3319]: ts=2024-07-23T23:00:22.423Z caller=db.go:1712 level=in...SNHC
7月 24 07:00:22 prometheus prometheus[3319]: ts=2024-07-23T23:00:22.425Z caller=db.go:1712 level=in...R0BP
7月 24 07:00:22 prometheus prometheus[3319]: ts=2024-07-23T23:00:22.426Z caller=db.go:1712 level=in...H1Y0
7月 24 09:00:22 prometheus prometheus[3319]: ts=2024-07-24T01:00:22.383Z caller=compact.go:576 leve...alse
7月 24 09:00:22 prometheus prometheus[3319]: ts=2024-07-24T01:00:22.385Z caller=head.go:1355 level=...73ms
7月 24 09:00:22 prometheus prometheus[3319]: ts=2024-07-24T01:00:22.386Z caller=checkpoint.go:101 l...0000
7月 24 09:00:22 prometheus prometheus[3319]: ts=2024-07-24T01:00:22.403Z caller=head.go:1317 level=...51ms
Hint: Some lines were ellipsized, use -l to show in full.
六 访问prometheus输入 http;//IP:9090,如下图所示,可以看到我们的状态已经是up状态,若是访问不到,可能是防火墙没有关闭
七 下面安装node_exoprt 跟前面步骤差不多,简单一下步骤
[root@prometheus local]# ls //上传tar包
bin games lib libexec prometheus share
etc include lib64 node_exporter-1.2.2.linux-amd64.tar.gz sbin src
[root@prometheus local]# tar zxvf node_exporter-1.2.2.linux-amd64.tar.gz //解压tar包
[root@prometheus local]# mv node_exporter-1.2.2.linux-amd64 node_exporter //重命名
[root@prometheus local]# chown -R root:root ./node_exporter/ //切换用户组
[root@prometheus local]# cd node_exporter/
[root@prometheus node_exporter]# ls
LICENSE node_exporter NOTICE
创建node.service,便于后台启动,将以下内容添加到node.service中,创建位置/usr/lib/systemd/syetem/
[root@prometheus node_exporter]# cat /usr/lib/systemd/system/node.service
[Unit]
Description=node_exporter
Documentation=https:// prometheus.io/
After=network-online.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
重新加载配置文件,启动node.service
[root@prometheus node_exporter]# systemctl daemon-reload
[root@prometheus node_exporter]# systemctl start node.service
[root@prometheus node_exporter]# systemctl status node.service
● node.service - node_exporter
Loaded: loaded (/usr/lib/systemd/system/node.service; disabled; vendor preset: disabled)
Active: active (running) since 三 2024-07-24 17:57:47 CST; 6s ago
Main PID: 2660 (node_exporter)
Tasks: 5
CGroup: /system.slice/node.service
└─2660 /usr/local/node_exporter/node_exporter
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...ne
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...me
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...ex
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...es
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...me
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...at
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...fs
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...fs
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.435Z ca...00
7月 24 17:57:47 prometheus node_exporter[2660]: level=info ts=2024-07-24T09:57:47.436Z ca...se
Hint: Some lines were ellipsized, use -l to show in full.
输入http://IP:9100访问网址
八 添加node_exporter节点,修改配置文件prometheus.yml文件,保持退出
重启Prometheus
[root@prometheus prometheus]# vim prometheus.yml
[root@prometheus prometheus]# ./promtool check config prometheus.yml //检查yml文件是否有错误
Checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file syntax
root@prometheus prometheus]# systemctl restart prometheus.service
重新访问http://IP:9090,发现我的node_exporter节点加入到监控里面了
九 安装告警装置配置qq告警监控(安装包不细说)
[root@prometheus local]# ll
总用量 30144
-rw-r--r--. 1 root root 30866868 7月 23 17:44 alertmanager-0.27.0.linux-amd64.tar.gz
导包
[root@prometheus local]# tar zxvf alertmanager-0.27.0.linux-amd64.tar.gz //解压
[root@prometheus local]# mv alertmanager-0.27.0.linux-amd64/ alertmanager //重命名
[root@prometheus local]# chown -R root:root ./alertmanager/ //用户名用户组
创建alertmanager.service,便于后台启动,将以下内容添加到node.service中,创建位置/usr/lib/systemd/syetem/
[Unit]
Description=Alertmanager
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
重新加载配置文件,启动alertmanager.service
[root@prometheus alertmanager]# systemctl daemon-reload
[root@prometheus alertmanager]# systemctl start alertmanager.service
[root@prometheus alertmanager]# systemctl status alertmanager.service
● alertmanager.service - Alertmanager
Loaded: loaded (/usr/lib/systemd/system/alertmanager.service; disabled; vendor preset: disabled)
Active: active (running) since 三 2024-07-24 18:25:02 CST; 6min ago
Main PID: 3157 (alertmanager)
CGroup: /system.slice/alertmanager.service
└─3157 /usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager...
7月 24 18:25:02 prometheus alertmanager[3157]: ts=2024-07-24T10:25:02.663Z caller=main.go...)"
7月 24 18:25:02 prometheus alertmanager[3157]: ts=2024-07-24T10:25:02.663Z caller=main.go...)"
7月 24 18:25:02 prometheus alertmanager[3157]: ts=2024-07-24T10:25:02.670Z caller=cluster...94
7月 24 18:25:02 prometheus alertmanager[3157]: ts=2024-07-24T10:25:02.672Z caller=cluster...2s
7月 24 18:25:02 prometheus alertmanager[3157]: ts=2024-07-24T10:25:02.718Z caller=coordin...ml
7月 24 18:25:02 prometheus alertmanager[3157]: ts=2024-07-24T10:25:02.719Z caller=coordin...ml
7月 24 18:25:02 prometheus alertmanager[3157]: ts=2024-07-24T10:25:02.722Z caller=tls_con...93
7月 24 18:25:02 prometheus alertmanager[3157]: ts=2024-07-24T10:25:02.722Z caller=tls_con...93
7月 24 18:25:04 prometheus alertmanager[3157]: ts=2024-07-24T10:25:04.674Z caller=cluster...4s
7月 24 18:25:12 prometheus alertmanager[3157]: ts=2024-07-24T10:25:12.676Z caller=cluster...6s
Hint: Some lines were ellipsized, use -l to show in full.
访问http://IP:9093
十 配置告警信息
1 修改alertmanager.yml ,添加告警方式(我这里是qq邮箱告警方式)
//修改内容,有不明白的的地方可以复制给豆包ai,它会一一解答字段的含义
[root@prometheus alertmanager]# cat alertmanager.yml
global:
resolve_timeout: 5s
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '2678903673@qq.com'
smtp_auth_username: '2678903673@qq.com'
smtp_auth_password: 'skobfgldtttrdjfh'
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 3s
group_interval: 10s
repeat_interval: 10s
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '2678903673@qq.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname','dev','instance']
查看alertmanager.yml是否正确
[root@prometheus alertmanager]# ./amtool check-config alertmanager.yml
2 添加要告警的主机以及节点(我是本机地址的node_exporte节点)
[root@prometheus local]# cd prometheus/ 切换到prometheus目录上,修改yml文件
1 为告警地址
2 告警规则文件的路径
[root@prometheus prometheus]# vim rules.yml //创建rules.yml文件
[root@prometheus prometheus]# cat rules.yml //文件内容
groups:
- name: node_down_rules
rules:
- alert: NodeDown
expr: up{instance="192.168.243.36:9100"} == 0
for: 5s
labels:
node: 192.168.243.36
severity: critical
annotations:
summary: "Node is down"
description: "The node has been down for more than 1 minute."
检查yml文件是否正确,发现没有错误
[root@prometheus prometheus]# ./promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 1 rule files found
SUCCESS: prometheus.yml is valid prometheus config file syntax
Checking /usr/local/prometheus/rules.yml
SUCCESS: 1 rules found
重启所有服务
[root@prometheus prometheus]# systemctl restart prometheus.service
[root@prometheus prometheus]# systemctl restart node.service
[root@prometheus prometheus]# systemctl restart alertmanager.service
查看网址 ,可以看到添加的告警条件
十一 测试
关闭node_exporter节点,观察各网址的状态,并观察QQ邮箱是否受到告警通知
QQ邮箱收到通知,告警设置成功
十二 安装grafana图形化界面
1 导入 rpm并安装
[root@prometheus prometheus]# yum install -y grafana-enterprise-10.1.0-1.x86_64.rpm
2 启动grafana服务
[root@prometheus prometheus]# systemctl start grafana-server
[root@prometheus prometheus]# systemctl status grafana-server
● grafana-server.service - Grafana instance
Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; disabled; vendor preset: disabled)
Active: active (running) since 三 2024-07-24 19:17:13 CST; 7s ago
Docs: http://docs.grafana.org
Main PID: 4123 (grafana)
Tasks: 9
CGroup: /system.slice/grafana-server.service
└─4123 /usr/share/grafana/bin/grafana server --config=/etc/grafana/grafana.ini --...
7月 24 19:17:13 prometheus grafana[4123]: logger=ngalert.state.manager t=2024-07-24T19:1...up"
7月 24 19:17:13 prometheus grafana[4123]: logger=ngalert.state.manager t=2024-07-24T19:1…114µs
7月 24 19:17:13 prometheus grafana[4123]: logger=ngalert.scheduler t=2024-07-24T19:17:13...10s
7月 24 19:17:13 prometheus grafana[4123]: logger=ticker t=2024-07-24T19:17:13.502414485+...:00
7月 24 19:17:13 prometheus grafana[4123]: logger=report t=2024-07-24T19:17:13.503575548+...e."
7月 24 19:17:13 prometheus systemd[1]: Started Grafana instance.
7月 24 19:17:13 prometheus grafana[4123]: logger=caching.service t=2024-07-24T19:17:13.5...ed"
7月 24 19:17:13 prometheus grafana[4123]: logger=ngalert.multiorg.alertmanager t=2024-07...er"
7月 24 19:17:14 prometheus grafana[4123]: logger=grafana.update.checker t=2024-07-24T19:...6ms
7月 24 19:17:14 prometheus grafana[4123]: logger=plugins.update.checker t=2024-07-24T19:...2ms
Hint: Some lines were ellipsized, use -l to show in full.
[root@prometheus prometheus]#
游览器输入网址http://IP:3000 账号密码为 admin admin
登录页面
导入数据源
导入dashboards
到此结束啦