Prometheus监控平台与Grafana可视化平台使用

penngo

已于 2022-09-25 17:00:28 修改

阅读量931

点赞数 1

分类专栏：持续交付文章标签： prometheus java 监控

于 2022-09-17 23:58:07 首次发布

本文链接：https://blog.csdn.net/penngo/article/details/126912702

版权

持续交付专栏收录该内容

27 篇文章 9 订阅

订阅专栏

Prometheus监控平台与Grafana可视化平台使用

Prometheus是一套开源的容器和微服务监控报警系统和时间序列数据库的组合，提供丰富度量指标和高性能、高可定制的云原生监控系统。
Prometheus主要通过拉数据的形式实现数据的监控，而把监控收集和暴露出来供Prometheus拉取的组件叫Exporter。

查看正在运行的端口，避免端口冲突

netstat -ntlp

开启端口

firewall-cmd --zone=public --add-port=9100/tcp --permanent
firewall-cmd --reload

1、Prometheus安装

1.1、下载

下载地址:https://prometheus.io/download/
Github地址:https://github.com/prometheus/prometheus
本文通过二进制文件安装，下载：prometheus-2.37.0.linux-amd64.tar.gz

1.2、安装和运行

安装也很简单，直接解压到部署目录，可以直接命令运行

tar -zxvf prometheus-2.37.0.linux-amd64.tar.gz -C /usr/local
ln -sv /usr/local/prometheus-2.37.0.linux-amd64 /usr/local/prometheus
cd /usr/local/prometheus
./prometheus  # 运行

访问http://192.168.28.131:9090/

1.3、可用启动参数

相关启动参数可以通过以下命令查询

./prometheus -h

usage: prometheus [<flags>]

The Prometheus monitoring server

Flags:
  -h, --help                     Show context-sensitive help (also try
                                 --help-long and --help-man).
      --version                  Show application version.
      --config.file="prometheus.yml"  
                                 Prometheus configuration file path.
      --web.listen-address="0.0.0.0:9090"  
                                 Address to listen on for UI, API, and
                                 telemetry.
      --web.config.file=""       [EXPERIMENTAL] Path to configuration file that
                                 can enable TLS or authentication.
      --web.read-timeout=5m      Maximum duration before timing out read of the
                                 request, and closing idle connections.
      --web.max-connections=512  Maximum number of simultaneous connections.
      --web.external-url=<URL>   The URL under which Prometheus is externally
                                 reachable (for example, if Prometheus is served
                                 via a reverse proxy). Used for generating
                                 relative and absolute links back to Prometheus
                                 itself. If the URL has a path portion, it will
                                 be used to prefix all HTTP endpoints served by
                                 Prometheus. If omitted, relevant URL components
                                 will be derived automatically.
      --web.route-prefix=<path>  Prefix for the internal routes of web
                                 endpoints. Defaults to path of
                                 --web.external-url.
      --web.user-assets=<path>   Path to static asset directory, available at
                                 /user.
      --web.enable-lifecycle     Enable shutdown and reload via HTTP request.
      --web.enable-admin-api     Enable API endpoints for admin control actions.
      --web.enable-remote-write-receiver  
                                 Enable API endpoint accepting remote write
                                 requests.
      --web.console.templates="consoles"  
                                 Path to the console template directory,
                                 available at /consoles.
      --web.console.libraries="console_libraries"  
                                 Path to the console library directory.
      --web.page-title="Prometheus Time Series Collection and Processing Server"  
                                 Document title of Prometheus instance.
      --web.cors.origin=".*"     Regex for CORS origin. It is fully anchored.
                                 Example: 'https?://(domain1|domain2)\.com'
      --storage.tsdb.path="data/"  
                                 Base path for metrics storage. Use with server
                                 mode only.
      --storage.tsdb.retention=STORAGE.TSDB.RETENTION  
                                 [DEPRECATED] How long to retain samples in
                                 storage. This flag has been deprecated, use
                                 "storage.tsdb.retention.time" instead. Use with
                                 server mode only.
      --storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME  
                                 How long to retain samples in storage. When
                                 this flag is set it overrides
                                 "storage.tsdb.retention". If neither this flag
                                 nor "storage.tsdb.retention" nor
                                 "storage.tsdb.retention.size" is set, the
                                 retention time defaults to 15d. Units
                                 Supported: y, w, d, h, m, s, ms. Use with
                                 server mode only.
      --storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE  
                                 Maximum number of bytes that can be stored for
                                 blocks. A unit is required, supported units: B,
                                 KB, MB, GB, TB, PB, EB. Ex: "512MB". Based on
                                 powers-of-2, so 1KB is 1024B. Use with server
                                 mode only.
      --storage.tsdb.no-lockfile  
                                 Do not create lockfile in data directory. Use
                                 with server mode only.
      --storage.tsdb.allow-overlapping-blocks  
                                 Allow overlapping blocks, which in turn enables
                                 vertical compaction and vertical query merge.
                                 Use with server mode only.
      --storage.tsdb.head-chunks-write-queue-size=0  
                                 Size of the queue through which head chunks are
                                 written to the disk to be m-mapped, 0 disables
                                 the queue completely. Experimental. Use with
                                 server mode only.
      --storage.agent.path="data-agent/"  
                                 Base path for metrics storage. Use with agent
                                 mode only.
      --storage.agent.wal-compression  
                                 Compress the agent WAL. Use with agent mode
                                 only.
      --storage.agent.retention.min-time=STORAGE.AGENT.RETENTION.MIN-TIME  
                                 Minimum age samples may be before being
                                 considered for deletion when the WAL is
                                 truncated Use with agent mode only.
      --storage.agent.retention.max-time=STORAGE.AGENT.RETENTION.MAX-TIME  
                                 Maximum age samples may be before being
                                 forcibly deleted when the WAL is truncated Use
                                 with agent mode only.
      --storage.agent.no-lockfile  
                                 Do not create lockfile in data directory. Use
                                 with agent mode only.
      --storage.remote.flush-deadline=<duration>  
                                 How long to wait flushing sample on shutdown or
                                 config reload.
      --storage.remote.read-sample-limit=5e7  
                                 Maximum overall number of samples to return via
                                 the remote read interface, in a single query. 0
                                 means no limit. This limit is ignored for
                                 streamed response types. Use with server mode
                                 only.
      --storage.remote.read-concurrent-limit=10  
                                 Maximum number of concurrent remote read calls.
                                 0 means no limit. Use with server mode only.
      --storage.remote.read-max-bytes-in-frame=1048576  
                                 Maximum number of bytes in a single frame for
                                 streaming remote read response types before
                                 marshalling. Note that client might have limit
                                 on frame size as well. 1MB as recommended by
                                 protobuf by default. Use with server mode only.
      --rules.alert.for-outage-tolerance=1h  
                                 Max time to tolerate prometheus outage for
                                 restoring "for" state of alert. Use with server
                                 mode only.
      --rules.alert.for-grace-period=10m  
                                 Minimum duration between alert and restored
                                 "for" state. This is maintained only for alerts
                                 with configured "for" time greater than grace
                                 period. Use with server mode only.
      --rules.alert.resend-delay=1m  
                                 Minimum amount of time to wait before resending
                                 an alert to Alertmanager. Use with server mode
                                 only.
      --alertmanager.notification-queue-capacity=10000  
                                 The capacity of the queue for pending
                                 Alertmanager notifications. Use with server
                                 mode only.
      --query.lookback-delta=5m  The maximum lookback duration for retrieving
                                 metrics during expression evaluations and
                                 federation. Use with server mode only.
      --query.timeout=2m         Maximum time a query may take before being
                                 aborted. Use with server mode only.
      --query.max-concurrency=20  
                                 Maximum number of queries executed
                                 concurrently. Use with server mode only.
      --query.max-samples=50000000  
                                 Maximum number of samples a single query can
                                 load into memory. Note that queries will fail
                                 if they try to load more samples than this into
                                 memory, so this also limits the number of
                                 samples a query can return. Use with server
                                 mode only.
      --enable-feature= ...      Comma separated feature names to enable. Valid
                                 options: agent, exemplar-storage,
                                 expand-external-labels,
                                 memory-snapshot-on-shutdown,
                                 promql-at-modifier, promql-negative-offset,
                                 promql-per-step-stats, remote-write-receiver
                                 (DEPRECATED), extra-scrape-metrics,
                                 new-service-discovery-manager, auto-gomaxprocs.
                                 See
                                 https://prometheus.io/docs/prometheus/latest/feature_flags/
                                 for more details.
      --log.level=info           Only log messages with the given severity or
                                 above. One of: [debug, info, warn, error]
      --log.format=logfmt        Output format of log messages. One of: [logfmt,
                                 json]

1.4、设置登录帐号密码

Prometheus的Web UI和相关Exporter组件都是默认是允许所有人直接访问的。
生成帐号和密钥

#  安装https-tools
[root]# yum install -y httpd-tools
# 使用httpd-tools内的htpasswd生成密钥
[root]# htpasswd -nbBC 12 penngo 123456

penngo:$2y$12$HBw06HgxQlm3z6I85OPH.eNqeUCbqP.w7xFnb0ch60RcK9p3ZFLea # 密码123对应的密钥,在config.yml文件中使用

配置用户信息文件

vi /usr/local/prometheus-2.37.0.linux-amd64/config.yml

# config.yml文件内容
basic_auth_users:
# 可配置多个用户
  penngo: $2y$12$PoUH4HDg3hxWqqcrfWDUB.f52O/oW0J6wRP5/Epwf5k2qd0XNhFVe

可以在启动参数中添加参数–web.config.file=/usr/local/prometheus/config.yml，限制必须登录才能访问Prometheus的Web UI
在这里插入图片描述

1.5、设置为系统服务

vi /usr/lib/systemd/system/prometheus.service

[Unit]
Description=Prometheus server daemon
After=network.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/prometheus/prometheus \
    --config.file=/usr/local/prometheus/prometheus.yml \
    --web.config.file=/usr/local/prometheus/config.yml \
    --web.enable-lifecycle \       # curl http://127.0.0.1:9090/-/reload 重新加载配置
    --storage.tsdb.path=/usr/local/prometheus/data \
    --storage.tsdb.retention=15d \
    --web.console.templates=/usr/local/prometheus/consoles \
    --web.console.libraries=/usr/local/prometheus/console_libraries \
    --web.max-connections=512 \
    --web.external-url=http://192.168.28.131:9090 \
    --web.listen-address=0.0.0.0:9090
Restart=on-failure
[Install]
WantedBy=multi-user.target

prometheus服务命令

systemctl daemon-reload       # 通知systemd重新加载配置文件
systemctl enable prometheus   # 设置开机启动
systemctl disable prometheus  # 取消开机启动
systemctl start prometheus    # 启动服务
systemctl restart prometheus  # 重启服务
systemctl stop prometheus     # 关闭服务
systemctl status prometheus   # 查看状态

2、Grafana安装

2.1、下载安装

下载地址:https://grafana.com/grafana/download
Linux系统下的安装方法
Ubuntu and Debian(64 Bit)

sudo apt-get install -y adduser libfontconfig1
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_8.5.11_amd64.deb
sudo dpkg -i grafana-enterprise_8.5.11_amd64.deb
Read the Ubuntu / Debian installation guide for more information. We also provide an APT package repository.

Standalone Linux Binaries(64 Bit)

wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.11.linux-amd64.tar.gz
tar -zxvf grafana-enterprise-8.5.11.linux-amd64.tar.gz

Red Hat, CentOS, RHEL, and Fedora(64 Bit)

wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.11-1.x86_64.rpm
sudo yum install grafana-enterprise-8.5.11-1.x86_64.rpm

OpenSUSE and SUSE

wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.11-1.x86_64.rpm
sudo rpm -i --nodeps grafana-enterprise-8.5.11-1.x86_64.rpm

wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.11-1.x86_64.rpm
sudo yum install grafana-enterprise-8.5.11-1.x86_64.rpm

2.2、systemd操作grafana服务

systemctl daemon-reload       # 通知systemd重新加载配置文件
systemctl enable grafana-server   # 设置开机启动
systemctl disable grafana-server  # 取消开机启动
systemctl start grafana-server    # 启动服务
systemctl stop grafana-server     # 关闭服务
systemctl status grafana-server   # 查看状态

ps -ef | grep grafana  #查看启动情况：

2.3、文件位置

二进制文件安装位置:/usr/sbin/grafana-server
启动脚本文件:/etc/init.d/grafana-server
默认环境变量文件:/etc/sysconfig/grafana-server
默认配置文件:/etc/grafana/grafana.ini
systemd服务用进程名称:grafana-server.service
默认日志文件:/var/log/grafana/grafana.log
默认指定sqlite3数据库文件:/var/lib/grafana/grafana.db

2.4、默认访问地址

访问：http://127.0.0.1:3000，输入默认用户名/密码：admin/admin。

3、Prometheus常用Exporter

prometheus提供两种方式集成
客户端库集成：https://prometheus.io/docs/instrumenting/clientlibs/
通过不同语言的客户端库，可以非常方便的把各种应用系统接入prometeus的监控。

Exporter集成:https://prometheus.io/docs/instrumenting/exporters/
现成监控组件，提供对数据库、硬件、消息队列、存储、HTTP服务、API服务、日志等的监控。

这两种集成方式都同时有官方提供和社区提供。

3.1、主机监控Node_exporter主机监控

3.1.1、Linux主机监控

3.1.1.1、下载与安装

下载地址：https://prometheus.io/download/
Github：https://github.com/prometheus/node_exporter
本文下载：node_exporter-1.3.1.linux-amd64.tar.gz

# 解压到指定目录
tar -zxvf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local

# 启动
/usr/local/node_exporter-1.3.1.linux-amd64/node_explorter --web.listen-address=":9100"

3.1.1.2、设置为系统服务

创建系统服务

vi /usr/lib/systemd/system/node_exporter.service

# node_exporter.service文件内容

[Unit]
Description=node_exporter
Wants=network-online.target
After=network-online.target

[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/node_exporter-1.3.1.linux-amd64/node_exporter

[Install]
WantedBy=multi-user.target

node_exporter服务命令

systemctl daemon-reload       # 通知systemd重新加载配置文件
systemctl enable node_exporter   # 设置开机启动
systemctl disable node_exporter  # 取消开机启动
systemctl start node_exporter    # 启动服务
systemctl stop node_exporter     # 关闭服务
systemctl status node_exporter   # 查看状态

本地查看监控参数

curl http://127.0.0.1:9100/metrics

3.1.1.3、与prometheus集成

修改prometheus.yml

vi /usr/local/prometheus-2.37.0.linux-amd64/prometheus.yml

prometheus.yml配置

# 全局配置
global:
  scrape_interval: 15s # 设置采集时间为15秒，默认为1分钟。
  evaluation_interval: 15s # 每15秒评估一次规则。默认为1分钟。
  # scrape_timeout 设置为全局默认值(10s)。

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"
    basic_auth:
      username: penngo
      password: 123456
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
      - targets: ["localhost:9090"]
# 添加以下配置，与prometheus集成      
  - job_name: 'node_expporter'
    static_configs:
      - targets: ['192.168.28.136:9100']

3.1.2、Window主机监控

3.1.2.1、下载

Github：https://github.com/prometheus-community/windows_exporter
本文不介绍，需要集成的去windows_exporter官网查看文档。

3.1.3、Prometheus显示

在这里插入图片描述

3.1.4、在Grafana可视化显示监控数据

使用主机的监控模板：https://grafana.com/grafana/dashboards/16098-1-node-exporter-for-prometheus-dashboard-cn-0417-job/
在这里插入图片描述

penngo

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Prometheus监控平台与Grafana可视化平台使用

Prometheus是一套开源的容器和微服务监控报警系统和时间序列数据库的组合，提供丰富度量指标和高性能、高可定制的云原生监控系统。Prometheus主要通过拉数据的形式实现数据的监控，而把监控收集和暴露出来供Prometheus拉取的组件叫Exporter。Prometheus的Web UI和相关Exporter组件都是默认是允许所有人直接访问的。访问http://192.168.28.131:9090/安装也很简单，直接解压到部署目录，可以直接命令运行。可以配置登录帐号和密码来进行限制。
复制链接

扫一扫