Prometheus使用

官网地址 https://prometheus.io/

img

下载与安装

下载地址 https://prometheus.io/download/

  • 下载 prometheus、alertmanager、node_exporter、mysqld_exporter
Prometheus Server

解压prometheus安装包,并启动

tar -zxvf prometheus-2.36.0.linux-amd64.tar.gz
cd prometheus-2.36.0.linux-amd64
./prometheus

访问 192.168.10.129:9090 ,192.168.10.129 虚拟机地址

img

img

访问http://192.168.10.129:9090/metrics 查看指标原始数据

img

img

Exporter

Exporter产生监控数据,Prometheus Server会从Exporter拉取数据

所有的Exporter https://prometheus.io/docs/instrumenting/exporters/

安装node_exporter,node_exporter主要是监控主机的、操作系统

tar -zxvf node_exporter-1.3.1.linux-amd64.tar.gz
cd node_exporter-1.3.1.linux-amd64
./node_exporter

访问 http://192.168.10.129:9100/

imgimg

node_cpu_xxx 监控cpu信息

node_disk_xxx 监控磁盘信息

node_filesystem_xxx 监控文件系统信息

node_memory_xxx 监控内存信息

等等。。。

node_exporter github地址 https://github.com/prometheus/node_exporter

如果想关闭一些指标 使用 --no-collector.<name>,开启指标 --collector.<name>

#关闭cpu指标
./node_exporter --no-collector.cpu

修改PrometheusServer的prometheus.yml文件

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node_exporter"
    static_configs:
      # prometheus server端服务发现,用来帮助server端找到exporter
      - targets: ["localhost:9100"]

重启prometheus server,发现已经有了node_exporter的指标

img

img

AlertManager告警

告警规则地址 https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

配置告警规则 alert.yaml

groups:
# 规格分组的名称
- name: test-group
  rules:
  #告警规则名称
  - alert: TestRule
    # PromQL编写表达式 如果表达式满足条件,就触发告警
    expr: node_disk_read_bytes_total{device="sda", instance="localhost:9100", job="node_exporter"} > 20
    # 表示评估的等待时间,当条件满足持续这么久之后才触发告警
    for: 10s
    # 自定义标签
    labels:
      node_disk_read_bytes_total: node_disk_read_bytes_total
    # 用来指定附加信息,eg.用来描述告警的详情
    annotations:
      # 用来描述告警的概要信息
      summary: "disk指标不正常,自定义标签:{{ $labels.node_disk_read_bytes_total }},实例是{{ $labels.instance }}"
      # 用来描述告警的详情
      description: "{{$labels.instance}}的disk指标不正常,值是{{ $value }}"

在Prometheus.yml添加规则文件地址

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "alert.yaml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node_exporter"
    static_configs:
      # prometheus server端服务发现,用来帮助server端找到exporter
      - targets: ["localhost:9100"]

重启Prometheus server

img

AlertManager

Prometheus Server 发现满足告警触发规则条件就向AlertManager推送

AlertManager配置 https://prometheus.io/docs/alerting/latest/configuration/

tar -zxvf alertmanager-0.24.0.linux-amd64.tar.gz 
cd alertmanager-0.24.0.linux-amd64

修改 alertmanager.yml

route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email'
receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://127.0.0.1:5001/'
  - name: 'email'
    email_configs:
    - to: xxx@qq.com
      from: yyy@qq.com
      smarthost: smtp.qq.com:465
      auth_username: yyy@qq.com 
      # 授权码
      auth_password: uinynfsegdlibage
      # 使用465端口 必须设置成false 否则启动报错
      require_tls: false 
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

启动AlertManager

./alertmanager 

修改prometheus.yml文件

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

重启 prometheus server

访问alertmanager

img

PromQL

Prometheus内置的数据查询语言,可针对Prometheus中的数据做各种计算、汇总等操作

img

区间向量

img

Grafana

可视化监控指标展示工具

官网地址 https://grafana.com/

下载地址 https://grafana.com/grafana/download

wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.3.linux-amd64.tar.gz
tar -zxvf grafana-enterprise-8.5.3.linux-amd64.tar.gz

cd grafana-8.5.3/bin
./grafana-server web

img

添加datasource

img

官方图表地址 https://grafana.com/dashboards

集群

192.168.10.129 : Prometheus Server , node_exporter

192.168.10.130 : node_exporter

192.168.10.131 : node_exporter

1、在以上基础上添加机器状态监控节点(NODE集群配置:每台要监控的服务器都需要安装一个NODE)

在192.168.10.130 192.168.10.131 两台虚拟机上安装node_exporter 并启动

2、配置Prometheus Server 的prometheus.yml 并重启

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "alert.yaml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node_exporter"
    static_configs:
      # prometheus server端服务发现,用来帮助server端找到exporter
      - targets: ["192.168.10.129:9100"]
      - targets: ["192.168.10.130:9100"]
      - targets: ["192.168.10.131:9100"]

img

img

grafana dashboard id 8919

img

grafana dashboard id 1860

img

HTTP API

HTTP API 地址

https://prometheus.io/docs/prometheus/latest/querying/api/

瞬时数据查询
通过使用 QUERY API 我们可以查询 PromQL 在特定时间点下的计算结果。

GET /api/v1/query
URL 请求参数:
query= : PromQL 表达式。
time=<rfc3339 | unix_timestamp> : 用于指定用于计算 PromQL 的时间戳。可选参数,默认情况下使用当前系统时间。
timeout= : 超时设置。可选参数,默认情况下使用全局设置的参数 -query.timeout。
如果 time 参数缺省,则使用当前服务器时间。

查看cpu信息

img
GET /api/v1/query_range
URL 请求参数:

query= : PromQL 表达式。

start=<rfc3339 | unix_timestamp> : 起始时间戳。

end=<rfc3339 | unix_timestamp> : 结束时间戳。

step=<duration | float> : 查询时间步长,时间区间内每 step 秒执行一次。

timeout= : 超时设置。可选参数,默认情况下使用全局设置的参数 -query.timeout。

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值