Prometheus监控metric指标

最新推荐文章于 2024-07-17 21:40:42 发布

惠机智

最新推荐文章于 2024-07-17 21:40:42 发布

阅读量221

点赞数

文章标签：学习程序人生 prometheus

本文链接：https://blog.csdn.net/m0_54424196/article/details/132107194

版权

所有可以向 Prometheus 提供监控样本数据的程序都可以称为 Exporter

Exporter的运行方式

集成到应用中

使用 Prometheus 提供的 Client Library，可以很方便地在应用程序中实现监控代码，这种方式可以将程序内部的运行状态暴露给 Prometheus，适用于需要较多自定义监控指标的项目。目前一些开源项目就增加了对 Prometheus 监控的原生支持，如 Kubernetes，主要包含如下指标：

node节点指标：如cpu、内存、网络等
pod指标：状态、资源使用等
component指标：如etcd（保存了整个集群的状态）、kubelet（运行在每个node节点上,负责pod的生命周期管理）、scheduler（负责对pod进行调度,将pod调度到合适的node节点上运行）等组件指标
apiserver指标：如http请求数、响应延迟等

独立运行

在很多情况下，对象没法直接提供监控接口，可能原因有：1. 项目发布时间较早，并不支持 Prometheus 监控接口，如 MySQL、Redis；2. 监控对象不能直接提供 HTTP 接口，如监控 Linux 系统状态指标。对于上述情况，用户可以选择使用独立运行的 Exporter。除了用户自行实现外，Prometheus 社区也提供了许多独立运行的 Exporter，常见的有 Node Exporter、MySQL Server Exporter。更多详情可以到官网了解：https://prometheus.io/docs/instrumenting/exporters/

Exporter 接口数据规范

Exporter 通过 HTTP 接口以文本形式向 Prometheus 暴露样本数据，格式简单，没有嵌套，可读性强。每个监控指标对应的数据文本格式如下：

# HELP <监控指标名称> <监控指标描述>
# TYPE <监控指标名称> <监控指标类型>
<监控指标名称>{ <标签名称>=<标签值>,<标签名称>=<标签值>...} <样本值1> <时间戳>
<监控指标名称>{ <标签名称>=<标签值>,<标签名称>=<标签值>...} <样本值2> <时间戳>
...

例子：

# HELP x The temperature of cpu
# TYPE x histogram
x_bucket{le="20"} value1
x_bucket{le="50"} value2
x_bucket{le="70"} value3
x_bucket{le="+Inf"} count(values)
x_sum sum(values)
x_count count(values)

如何实现一个 Exporter

使用 prometheus/client_golang 包

package main

import (
"log"
"net/http"

"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
//使用GaugeVec类型可以为监控指标设置标签，这里为监控指标增加一个标签"device"
speed = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Name: "disk_available_bytes",
Help: "Disk space available in bytes",
}, []string{"device"})

tasksTotal = prometheus.NewCounter(prometheus.CounterOpts{
Name: "test_tasks_total",
Help: "Total number of test tasks",
})

taskDuration = prometheus.NewSummary(prometheus.SummaryOpts{
Name: "task_duration_seconds",
Help: "Duration of task in seconds",
//Summary类型的监控指标需要提供分位点
Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},
})

cpuTemperature = prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "cpu_temperature",
Help: "The temperature of cpu",
//Histogram类型的监控指标需要提供Bucket
Buckets: []float64{20, 50, 70, 80},
})
)

func init() {
//注册监控指标
prometheus.MustRegister(speed)
prometheus.MustRegister(tasksTotal)
prometheus.MustRegister(taskDuration)
prometheus.MustRegister(cpuTemperature)
}

func main() {
//模拟采集监控数据
fakeData()

//使用prometheus提供的promhttp.Handler()暴露监控样本数据
//prometheus默认从"/metrics"接口拉取监控样本数据
http.Handle("/metrics", promhttp.Handler())
log.Fatal(http.ListenAndServe(":10000", nil))
}

func fakeData() {
tasksTotal.Inc()
//设置该条样本数据的"device"标签值为"/dev/sda"
speed.With(prometheus.Labels{"device": "/dev/sda"}).Set(82115880)

taskDuration.Observe(10)
taskDuration.Observe(20)
taskDuration.Observe(30)
taskDuration.Observe(45)
taskDuration.Observe(56)
taskDuration.Observe(80)

cpuTemperature.Observe(30)
cpuTemperature.Observe(43)
cpuTemperature.Observe(56)
cpuTemperature.Observe(58)
cpuTemperature.Observe(65)
cpuTemperature.Observe(70)
}

Prometheus常用指令

查询指标的值:promhttp_metric_handler_requests_total{code="200"}
关键字查询:process_start_time_seconds{job="prometheus"} | grep instance
多个时间序列做运算:(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
范围查询:node_cpu_seconds_total{instance="1.2.3.4:9100",mode="idle"}[5m]
聚合运算:sum(rate(node_network_receive_bytes_total[5m])) by (instance)
直方图统计:histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
Inner/Left/Right Join:node_filesystem_avail_bytes * ON node_filesystem_size_bytes
子查询:sum(ALERTS{alertstate="firing"}) BY (severity) / ignoring(severity) group_left sum(ALERTS)