prometheus安装配置-CSDN博客

本文链接：https://blog.csdn.net/weixin_49278803/article/details/120541214

概述

Prometheus是一个开源的系统监控和告警系统，在kubernetes容器管理系统中，通常会搭配prometheus进行监控，prometheus支持多种exporter采集数据，还支持通过pushgateway进行数据上报，Prometheus再性能上可支撑上万台规模的集群。
官方网址
prometheus具有很多优点:

高效：单一Prometheus可以处理数以百万的监控指标；每秒处理数十万的数据点
易于伸缩：通过使用功能分区(sharing)+联邦集群(federation)可以对Prometheus进行扩展，形成一个逻辑集群；Prometheus提供多种语言的客户端SDK,这些SDK可以快速让应用程序纳入到Prometheus的监控当中
良好的可视化：Prometheus除了自带有Prometheus UI，还提供了一个独立的基于Ruby On Rails的Dashboard解决方案Promdash。另外最新的Grafana可视化工具也提供了完整的Prometheus支持，基于Prometheus提供的API还可以实现自己的监控可视化UI
。。。。。。等

一、prometheus架构图

在这里插入图片描述
上图所示：
1、Prometheus Server: 用于收集和存储时间序列，默认保存15天，所以官方建议重要数据采用其他方式存储。
2、Client Library: 客户端库，检测应用程序代码，当Prometheus抓取实例的HTTP端点时，客户端库会将所有跟踪的metrics指标的当前状态发送到prometheus server端。
3、Exporters: prometheus支持多种exporter，通过exporter可以采集metrics数据，然后发送到prometheus server端，它并不向中央服务器发送数据，而是等待中央服务器主动前来抓取
4、Alertmanager: 从 Prometheus server 端接收到 alerts 后，会进行去重，分组，并路由到相应的接收方，发出报警，常见的接收方式有：电子邮件，微信，钉钉, slack等。
5、Grafana：跨平台的开源的度量分析和可视化工具(监控仪表盘)，可以通过将采集的数据查询然后可视化的展示，并及时通知
6、pushgateway: 各个目标主机可上报数据到pushgatewy，然后prometheus server统一从pushgateway拉取数据。

二、安装(linux)

1、组件

文件	版本	网络地址	描述
prometheus	2.28.1	https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gz
go	1.15.14	https://golang.google.cn/dl/go1.15.14.linux-amd64.tar.gz	普罗米运行环境
grafana	7.3.7	https://dl.grafana.com/enterprise/release/grafana-enterprise_7.3.7_amd64.deb	封装的普罗米接口仪表盘

2、安装步骤
2.1、安装prometheus

tar -C /usr/local -xvf prometheus-2.28.1.linux-amd64.tar.gz
ln -sv /usr/local/prometheus-2.28.1.linux-amd64/ /usr/local/Prometheus
/usr/local/Prometheus/prometheus --config.file=/usr/local/Prometheus/prometheus.yml &  # &表示后台运行

2.2、安装运行环境go

tar -C /usr/local -xvf go1.15.14.linux-amd64.tar.gz
gedit /etc/profile
source /etc/profile
go version  # 查看go版本

2.3、安装仪表盘grafana

sudo dpkg -i grafana-enterprise_7.3.7_amd64.deb
# 设置自启动
/bin/systemctl daemon-reload
/bin/systemctl enable grafana-server.service
/bin/systemctl start grafana-server.service

3、查看启动结果
3.1、prometheus相关，端口号默认9090
prometheus版本./usr/local/Prometheus/prometheus --version
prometheus主界面http://localhost:9090/
prometheus指标界面http://localhost:9090/metrics
prometheus报警界面http://localhost:9090/alerts
prometheus API接口http://localhost:9090/api/v1/query?query=process_cpu_seconds_total # process_cpu_seconds_total 监控指标；
更多指标信息见页尾
查看帮助信息./usr/local/Prometheus/prometheus --help

3.2、grafana相关，端口号默认3000，用户名密码默认admin/admin
grafana主界面http://localhost:3000/
重启grefanaservice grafana-server restart
4、设置为systemctl系统启动方式

# 1、创建一个关于prometheus的systemd的启动文件
cd /usr/lib/systemd/system
# 这里注意 拷贝过去的文件结尾一定要是 .service
cp sshd.service prometheus.service

# 2、编辑文件内容
vi prometheus.service
# 内容如下
[Unit]
Description=https://prometheus.io
[Service]    
Restart=on-failure
ExecStart=/usr/local/Prometheus/prometheus --config.file=/usr/local/Prometheus/prometheus.yml &
[Install]
WantedBy=multi-user.target  

# 3、重启systemctl使配置生效
systemctl daemon-reload
# 启动普罗米修斯systemctl start prometheus
# 关闭普罗米修斯systemctl stop prometheus

5、配置参数(如有需要的话)
如上文知/usr/local/Prometheus/prometheus.yml是配置文件。其中某些参数我们必须知道

参数	含义
global	全局配置
alerting	告警配置
rule_files	告警规则
scrape_interval	默认抓取周期1分钟，可用单位ms、s、m、h、d、w、y等 # 建议设置每15s采集数据一次
scrape_timeout	默认抓取超时10s
scrape_configs	配置数据源，称为target，每个target用job_name命名。又分为静态配置和服务发现
evaluation_interval	默认估算规则1分钟 #建议设置每15s计算一次规则

三、监控各种client，以node_exporter为例

1、下载安装包

文件	版本	网络地址	描述
exporter	1.2.0	https://github.com/prometheus/node_exporter/releases/download/v1.2.0/node_exporter-1.2.0.linux-arm64.tar.gz	子节点exporter

2、启动子节点，节点端口号默认9100

sudo tar -C /usr/local -zxvf node_exporter-1.2.0.linux-arm64.tar.gz
ln -s /usr/local/node_exporter-1.2.0.linux-arm64/ /usr/local/node_exporter
nohup /usr/local/node_exporter/node_exporter &  # nohup 永久运行；& 后台运行;
# 调试时建议用 ./usr/local/node_exporter/node_exporter 运行

3、子节点client验证是否启动成功
子节点主界面curl 127.0.0.1:9100或者http://localhost:9100/
子节点性能指标http://localhost:9100/metrics
4、主机server添加client节点信息
主机prometheus配置文件，从上文得知为/usr/local/Prometheus/prometheus.yml
sudo gedit /usr/local/Prometheus/prometheus.yml

- job_name: 'node-01_agent'
  static_configs:
  - targets: ['client机ip地址:9100']
    labels:  # labels可不配置，与此对应instance将以ip地址等相关信息表示
      instance: node-01_agent

检查配置文件书写样式是否正确./usr/local/Prometheus/promtool check config /usr/local/Prometheus/prometheus.yml/prometheus.yml
4、重新加载prometheus配置文件，即可实现对子节点client的监控

四、grafana&prometheus以图例形式展示监控详情

1、这一部分内容，我们以第三步讲到的子节点的基础上继续进行配置。主机server跟子节点client配置方法相同
2、grafana添加Prometheus数据源

点击主界面设置Configuration的"Data Sources"
点击"add data source"
选择Prometheus
Dashboards页面选择"Prometheus 2.0 Stats"
Settings页面填写子节点普罗米修斯界面地址http://localhost:9100/
切换到我们刚才添加的"Prometheus 2.0 Stats"即可看到整个监控页面，配置完成

3、配置数据源采集数据模板dashboard
推荐官方一款，官方更多

选择"Prometheus 2.0 Stats"画面设置选项中的manage
选择导入import
点击"Upload .json File"添加下载好的dashboard模板
确认名字(可自定义)name、数据源(上文创建好的)prometheus
刷新即可看到模板文件中所设置的监控内容

五、自定义prometheus收集指标

prometheus有4种采集数据格式，分别是

Counter:可以增长，并且在程序重启的时候会被重设为0，常被用于访问量，任务个数，总处理时间，错误个数等只增不减的指标。
Gauge：可增可减，比如可以设置当前的CPU温度，内存使用量，磁盘、网络流量等等。
Summary
Histogram

下面以Gauge(只能采集整数/浮点数)为例书写Demo，详见，详见


from prometheus_client import Counter, Gauge, Summary, Histogram, start_http_server

# need install prometheus_client

if __name__ == '__main__':
    c = Counter('cc', 'A counter')
    c.inc()

    g = Gauge('gg', 'A gauge')
    g.set(17)

    s = Summary('ss', 'A summary', ['a', 'b'])
    s.labels('c', 'd').observe(17)

    h = Histogram('hh', 'A histogram')
    h.observe(.6)

    start_http_server(8001)
    import time

    while True:
        time.sleep(1)

六、API

prometheus有提供api调用接口，所有稳定的 HTTP API 都在 /api/v1 路径下。当我们有数据查询需求时，可以通过查询 API 请求监控数据，详见

API	说明	方法
/api/v1/query	查询接口	GET/POST
/api/v1/query_range	范围查询	GET/POST
/api/v1/series	series 查询	GET/POST
/api/v1/labels	labels 查询	GET/POST
/api/v1/label/<label_name>/values	label value 查询	GET
/api/v1/prom/write	remote write 数据提交	请求方式POST，不常用到。类似 flink job 上报数据的场景，我们需要通过 API 直接将数据写入 Prometheus，因为这些 job 的生命周期可能会很短，来不及等待 Prometheus 来拉取数据。写入数据可以直接使用 Remote Write 协议或者 Pushgateway 的方式。详见
Pushgateway	pushgateway 数据提交	不常用。详见

七、告警模块

Alertmanager与Prometheus是相互分离的两个组件。Prometheus服务器根据报警规则将警报发送给Alertmanager，然后Alertmanager将silencing、inhibition、aggregation等消息通过电子邮件、dingtalk和HipChat发送通知。
Alertmanager处理由例如Prometheus服务器等客户端发来的警报。它负责删除重复数据、分组，并将警报通过路由发送到正确的接收器，比如电子邮件、Slack、dingtalk等。Alertmanager还支持groups,silencing和警报抑制的机制。

1、下载安装包

文件	版本	网络地址
alert	0.22.2	https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz

2、解压及配置，端口号默认9093

sudo tar -C /usr/local -zxvf alertmanager-0.22.2.linux-amd64.tar.gz
ln -s /usr/local/alertmanager-0.22.2.linux-amd64/ /usr/local/alertmanager
# alertmanager配置文件是/usr/local/alertmanager-0.22.2.linux-amd64/alertmanager.yml
nohup /usr/local/alertmanager/alertmanager &  # nohup 永久运行；& 后台运行;
# 调试时建议用 ./usr/local/alertmanager/alertmanager 运行

3、验证是否启动成功
访问curl 127.0.0.1:9093或者http://localhost:9093/，可以看到默认提供的UI页面，不过现在是没有任何告警信息的，因为还没有配置告警规则来触发报警
4、配置告警配置文件alertmanager.yml

目的将告警信息发送至对应的邮箱地址
如需配置电子邮件，微信，钉钉, slack等相关信息，可参考地址。否则，无需配置。

5、配置alertmanager告警规则

个人习惯将报警规则放在报警的目录下/usr/local/alertmanager/rules/
创建node.yml(也可以是node.rules),文件内容：

groups：
  - names: node报警  # 自定义名字
    rules:
    - alert: node-01_agent  # 和之前prometheus里的job name相对应
      expr: up{job="node-01_agent"}  # 检测指标，检测任务名称(job name)node-01_agent下特定节点node-01_agen是否存活
      for: 15s  # 报警状态为pending后等待15s变成Firing状态，与此同时，一旦Firing状态则将报警发送到AlertManager
      labels: 
        severity: warning  # 严重程度
        team: node
      annotations:
        summary: "{{ $labels.instance }} 已停止运行超过15s！ "

6、将告警规则文件信息录入prometheus配置并重新启动prometheus
sudo gedit /usr/local/Prometheus/prometheus.yml # 修改prometheus配置文件，内容如下

alerting:
  alertmanagers:
  - static_configs:
    - targets: ["localhost:9093"]  # prometheus告警信息发送给的alertmanager的地址映射，即alertmanager的主页面地址
      # - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_file: ["/usr/local/alertmanager/rules/node.yml"]  # 告警规则文件位置。可以用node.rules,也可以用通配符*.yaml

/usr/local/Prometheus/prometheus --config.file=/usr/local/Prometheus/prometheus.yml # 重启prometheus服务
7、验证
打开prometheus告警规则执行详情链接http://localhost:9090/alerts
查看prometheus告警规则配置链接http://localhost:9090/rules

链接
https://blog.csdn.net/qq_42684940/article/details/116303861