二进制部署Prometheus及监控服务

最新推荐文章于 2024-08-08 08:00:27 发布

运维那些事~

最新推荐文章于 2024-08-08 08:00:27 发布

阅读量1.1k

点赞数 6

分类专栏：运维文章标签： docker 运维 jenkins

本文链接：https://blog.csdn.net/ljx1528/article/details/120101398

版权

运维专栏收录该内容

46 篇文章 3 订阅

订阅专栏

一、部署 Prometheus
1、下载二进制文件

https://github.com/prometheus/prometheus/releases/download/v2.28.0/prometheus-2.28.0.linux-amd64.tar.gz

2、下载完后解压即可使用

tar xf prometheus-2.28.0.linux-amd64.tar.gz

3、添加systemd管理

[root@prometheus ~]# cat /usr/lib/systemd/system/prometheus.service 
[Unit]
Description=prometheus
[Service]
ExecStart=/opt/monitor/prometheus/prometheus --config.file=/opt/monitor/prometheus/prometheus.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target

4、加载配置并启动

systemctl daemon-reload
systemctl start prometheus.service

5、prometheus配置文件修改如下

[root@prometheus prometheus]# cat prometheus.yml 
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093    # 开启alertmanager告警，去掉 # 号即可

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"      # prometheus读取监控的数据文件
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node server'
    static_configs:
     - targets: ['192.168.33.145:9100','192.168.33.142:9100']    # 监控 node_exporter 数据，主要监控node节点数据（内存，cpu，负载等）

  - job_name: 'docker'
    static_configs:
     - targets: ['192.168.33.145:8080']       #  cadvisor 服务，主要监控docker数据

6、热加载prometheus配置文件

[root@prometheus prometheus]# ps -ef|grep prometheus
root       1081      1  0 13:25 ?        00:00:10 /opt/monitor/prometheus/prometheus --config.file=/opt/monitor/prometheus/prometheus.yml
root       3123   2619  0 14:10 pts/0    00:00:00 grep --color=auto prometheus
[root@prometheus prometheus]# kill -HUP 1081

7、prometheus自带web页面如下：
输入prometheus所在主机地址+9100即可打开web页面（192.168.33.139:9100）
在这里插入图片描述

二、node_exporter部署
1、下载二进制文件

https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz

2、解压

tar xf node_exporter-1.2.2.linux-amd64.tar.gz -C /opt/monitor

3、添加systemd管理

[root@prometheus ~]# cat /usr/lib/systemd/system/node_exporter.service 
[Unit]
Description=node_exporter
[Service]
ExecStart=/opt/monitor/node_exporter/node_exporter  --collector.systemd --collector.systemd.unit-include=(docker|sshd|nginx).service
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target

4、加载配置并启动

systemctl daemon-reload
systemctl start node_exporter.service

三、grafana部署
1、下载二进制文件

wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.0.3.linux-amd64.tar.gz

2、解压二进制文件

tar -zxvf grafana-enterprise-8.0.3.linux-amd64.tar.gz -C  /opt/monitor

3、添加systemd管理

[root@prometheus ~]# cat /usr/lib/systemd/system/grafana.service 
[Unit]
Description=grafana
[Service]
ExecStart=/opt/monitor/grafana/bin/grafana-server -homepath=/opt/monitor/grafana
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target

4、加载配置并启动

systemctl daemon-reload
systemctl start grafana.service

5、grafana模板下载地址

https://grafana.com/grafana/dashboards

常用模板
193  docke监控r模板
9276  node节点监控模板
7362  mysql监控模板

6、grafana展示界面（192.168.33.145:3000）
6.1、监控node主机
在这里插入图片描述

6.2、监控kubernetes集群
在这里插入图片描述

四、alertmanager部署
1、下载alertmanager二进制包

wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz

2、解压二进制包

tar xf alertmanager-0.23.0.linux-amd64.tar.gz -C /opt/monitor/

3、添加systemd管理

[root@prometheus alertmanager]# cat /usr/lib/systemd/system/alertmanager.service 
[Unit]
Description=alertmanager
[Service]
ExecStart=/opt/monitor/alertmanager/alertmanager --config.file=/opt/monitor/alertmanager/alertmanager.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
[Install]
WantedBy=multi-user.target

4、加载配置并启动

systemctl daemon-reload
systemctl start alertmanager.service

5、修改alertmanager配置(钉钉告警版)

[root@prometheus alertmanager]# cat alertmanager.yml
global:
  resolve_timeout: 5m

templates:
  - '/opt/monitor/alertmanager/template/*.tmpl'

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 2m
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://localhost:8060/dingtalk/webhook1/send'
    send_resolved: true
inhibit_rules:
  - source_match:
      alertname: 'ApplicationDown'
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname',"target","job","instance"]

6、修改alertmanager配置(邮件告警版)

[root@prometheus alertmanager]# cat alertmanager.yml.bak20210830 
global:
  resolve_timeout: 5m
  # 邮箱服务器
  smtp_smarthost: 'smtp.126.com:25'
  smtp_from: 'liujixiao6@126.com'
  smtp_auth_username: 'liujixiao6@126.com'
  smtp_auth_password: 'BBELDJWBPLMLIMUR' 
  smtp_require_tls: false

# 配置路由树
route:
  group_by: ['alertname'] # 根据告警规则组名进行分组
  group_wait: 10s # 分组内第一个告警等待时间，10s内如有第二个告警会合并一个告警
  group_interval: 10s # 发送新告警间隔时间
  repeat_interval: 1h # 重复告警间隔发送时间
  receiver: 'mail'

# 接收人
receivers:
- name: 'mail'
  email_configs:
  - to: '1665111913@qq.com'

7、重启alertmanager