Prometheus（普罗米修斯）----- 集群监控

谁能凭爱意将富士山私有丶

于 2024-10-10 11:08:36 发布

阅读量293

点赞数 11

文章标签： prometheus

本文链接：https://blog.csdn.net/qq_18138507/article/details/142816378

版权

一、简介

Prometheus Server

Prometheus组件中的核心部分,收集和存储时间序列数据,提供PromQL查询语言的支持。内置的 Express Browser UI,通过这个 UI 可以直接通过 PromQL 实现数据的查询以及可视化。

Exporters

将监控数据采集的端点通过HTTP服务的形式暴露给Prometheus Server,Prometheus Server通过访问该Exporter提供的Endpoint端点,即可以获取到需要采集的监控数据

PushGateway

主要是实现接收由 Client push 过来的指标数据,在指定的时间间隔,由主程序来抓取。由于 Prometheus 数据采集基于 Pull 模型进行设计,因此在网络环境的配置上必须要让 Prometheus Server 能够直接与 Exporter 进行通信。当这种网络需求无法直接满足时,就可以利用 PushGateway 来进行中转。可以通过 PushGateway 将内部网络的监控数据主动 Push 到 Gateway 当中。而 Prometheus Server 则可以采用同样 Pull 的方式从 PushGateway 中获取到监控数据。

Alertmanager

管理告警,主要是负责实现报警功能。在 Prometheus Server 中支持基于 PromQL 创建告警规则,如果满足PromQL定义的规则,则会产生一条告警,而告警的后续处理流程则由 AlertManager 进行管理。在 AlertManager 中我们可以与邮件,Slack 等等内置的通知方式进行集成,也可以通过 Webhook 自定义告警处理方式。AlertManager 即 Prometheus 体系中的告警处理中心。

二、安装

官网 Download | Prometheus

一、Prometheus安装

##下载
wget https://github.com/prometheus/prometheus/releases/download/v2.54.1/prometheus-2.54.1.linux-amd64.tar.gz
##解压
tar vxf prometheus-2.54.1.linux-amd64.tar.gz
##移动
mv prometheus-2.54.1.linux-amd64 prometheus
mv prometheus /portal/monitor/
##添加prometheus用户
useradd -M -s /usr/sbin/nologin prometheus
##查看用户
id prometheus
##赋权限
chown prometheus:prometheus -R /portal/monitor/prometheus

##配置systemd

cat > /etc/systemd/system/prometheus.service << "EOF"
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=prometheus
Group=prometheus
Restart=on-failure
ExecStart=/portal/monitor/prometheus/prometheus/prometheus \
  --config.file=/portal/monitor/prometheus/prometheus/prometheus.yml \
  --storage.tsdb.path=/portal/monitor/prometheus/prometheus/data \
  --storage.tsdb.retention.time=60d \
  --web.enable-lifecycle

[Install]
WantedBy=multi-user.target
EOF

##启动
systemctl start prometheus

http://192.168.1.41:9090/

二、alertmanager安装

##下载
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
##解压
tar zxvf alertmanager-0.27.0.linux-amd64.tar.gz
##移动
mv alertmanager-0.27.0.linux-amd64 alertmanager
mv alertmanager /portal/monitor/
##赋权限
chown -R prometheus:prometheus /portal/monitor/


##配置systemd

cat > /etc/systemd/system/alertmanager.service << "EOF"
[Unit]
Description=Alert Manager
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/portal/monitor/alertmanager/alertmanager \
  --config.file=/portal/monitor/alertmanager/alertmanager.yml \
  --storage.path=/portal/monitor/alertmanager/data
  --web.listen-address=":9190"
  
Restart=always

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload

修改prometheus配置

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
           - localhost:9193

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
    - "alert.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

添加告警文件

cat > /portal/monitor/prometheus/alert.yml << "EOF"
groups:
- name: Prometheus alert
  rules:
  # 对任何实例超过30s无法联系的情况发出警告
  - alert: 服务告警
    expr: up == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      instance: "{{ $labels.instance}}"
      description: "{{ $labels.job}} 服务已关闭"

EOF

检查文件是否可用

 ./promtool  check config prometheus.yml

之前配置的热加载，可以使用curl命令

--web.enable-lifecycle
 
curl -X POST http://localhost:9090/-/reload

http://192.168.1.41:9190/

三、Grafana安装

##下载
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-11.2.0.linux-amd64.tar.gz
##解压
tar zxvf grafana-enterprise-11.2.0.linux-amd64.tar.gz
##移动
mv grafana-v11.2.0 grafana
mv grafana /portal/monitor/
##赋权限
chown prometheus:prometheus -R /portal/monitor/


##配置文件

cat > /etc/systemd/system/grafana-server.service << "EOF"
[Unit]
Description=Grafana server
Documentation=http://docs.grafana.org

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/portal/monitor/grafana/bin/grafana-server \
  --config=/portal/monitor/grafana/conf/defaults.ini \
  --homepath=/portal/monitor/grafana
  
[Install]
WantedBy=multi-user.target
EOF

##重载配置
systemctl daemon-reload
##启动
systemctl start granfa-server

访问

http://192.168.1.41:3000/

四、node_exporter安装

##下载
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
##压缩
tar zxvf node_exporter-1.8.2.linux-amd64.tar.gz
##移动
mv node_exporter-1.8.2.linux-amd64 node_exporter
mv node_exporter /portal/monitor/
##赋权
chown -R prometheus:prometheus /portal/monitor


##配置文件
cat > /etc/systemd/system/node_exporter.service << "EOF"
[Unit]
Description=node_exporter
Documentation=http://prometheus.io/
After=network.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/portal/monitor/node_exporter/node_exporter
Restart=on-failure
  
[Install]
WantedBy=multi-user.target
EOF

##重载
systemctl daemon-reload
##启动
systemctl start node_exporter
systemctl status node_exporter

##开机启动
systemctl enable node_exporter
systemctl enable grafana-server
systemctl enable alertmanager
systemctl enable prometheus