关于 prometheus Quantile (分位数)

最新推荐文章于 2024-11-13 14:04:47 发布

discsthnew

最新推荐文章于 2024-11-13 14:04:47 发布

阅读量1.5k

点赞数 39

文章标签： prometheus

本文链接：https://blog.csdn.net/mailjoin/article/details/143512572

版权

前言

为什么会想到写这边文章呢？起因是在笔记[[avg vs avg_over_time]] 中提到了获取平均带宽的 promQL, 同时想到云商在带宽计费的时候有一个按 95 带宽计费的模式。那么按 95 带宽计费能否用promQL表示出来。

[!NOTE]

什么是 95 带宽计费？

每月结一次款。每5分钟取一个点，1个小时12个点，1天1224个点，一个月按30天算122430=8640个点，从高到低排序，然后把数值最高的％5的点去掉，剩下的最高带宽就是95计费的计费值了。
计费点数是8208个点。
有432个点不用计费，就是异常流量的时间在4325/60=36个小时，即不超过1.5天的异常大带宽（流量），不影响本月的计费。
第四峰值计费，也就是每天的最大值，然后 30 天从高到低排序，选第四高的那个作为本月的计费带宽值。
平均值计费就是每天的最大值加起来，然后除以当月总天数得到的平均值作为计费带宽值。

不可避免的误差

查阅了一些文档，发现prometheus 提供了quantile 函数来计算分位数。
于是计算95 带宽的promQL 可以这么写：

quantile_over_time(0.95, network_bandwidth{provider="huawei", product="eip", instance_id="xxxx"}[30d])

看起来没什么问题，但是得出的结果还是和实际有偏差的。可以从 prometheus 的源码中窥见一些端倪

// quantile calculates the given quantile of a vector of samples.
//
// The Vector will be sorted.
// If 'values' has zero elements, NaN is returned.
// If q==NaN, NaN is returned.
// If q<0, -Inf is returned.
// If q>1, +Inf is returned.
func quantile(q float64, values vectorByValueHeap) float64 {
	if len(values) == 0 || math.IsNaN(q) {
		return math.NaN()
	}
	if q < 0 {
		return math.Inf(-1)
	}
	if q > 1 {
		return math.Inf(+1)
	}
	sort.Sort(values)

	n := float64(len(values))
	// When the quantile lies between two samples,
	// we use a weighted average of the two samples.
	rank := q * (n - 1)

	lowerIndex := math.Max(0, math.Floor(rank))
	upperIndex := math.Min(n-1, lowerIndex+1)

	weight := rank - math.Floor(rank)
	return values[int(lowerIndex)].F*(1-weight) + values[int(upperIndex)].F*weight
}

可以看到quantile 函数是将分布相邻的两个值通过线性近似模型来逼近真实的分位值。

还原 quantile 计算过程

这里用一个例子来说明quantile 函数的计算过程。首先挑选 100 条时序，如下面所示

prometheus_remote_storage_samples_pending[25m]
---
prometheus_remote_storage_samples_pending       74664 @1726039724.228
												69789 @1726039739.228
												70338 @1726039754.228
												56597 @1726039769.228
												59153 @1726039784.228
												64275 @1726039799.228
												57623 @1726039814.228
												61067 @1726039829.228
												72406 @1726039844.228
												67544 @1726039859.228
												63796 @1726039874.228
												63859 @1726039889.228
												56883 @1726039904.228
												61610 @1726039919.228
												60480 @1726039934.228
												58550 @1726039949.228
												61659 @1726039964.228
												55914 @1726039979.231
												56932 @1726039994.228
												57431 @1726040009.228
												68587 @1726040024.228
												58490 @1726040039.228
												49399 @1726040054.228
												53553 @1726040069.228
												55815 @1726040084.228
												60069 @1726040099.228
												65461 @1726040114.228
												66089 @1726040129.228
												67256 @1726040144.228
												54751 @1726040159.228
												63533 @1726040174.228
												64701 @1726040189.228
												65144 @1726040204.228
												66099 @1726040219.228
												61871 @1726040234.228
												50786 @1726040249.228
												61495 @1726040264.228
												49220 @1726040279.228
												56116 @1726040294.228
												50721 @1726040309.228
												56058 @1726040324.228
												50382 @1726040339.228
												48614 @1726040354.228
												48424 @1726040369.228
												51124 @1726040384.228
												58235 @1726040399.228
												58793 @1726040414.228
												58777 @1726040429.228
												66247 @1726040444.228
												65683 @1726040459.228
												76727 @1726040474.228
												68741 @1726040489.228
												53780 @1726040504.228
												65821 @1726040519.243
												61602 @1726040534.228
												64352 @1726040549.228
												71809 @1726040564.228
												71848 @1726040579.228
												70033 @1726040594.228
												64793 @1726040609.228
												70076 @1726040624.228
												57435 @1726040639.228
												63395 @1726040654.228
												58971 @1726040669.228
												70079 @1726040684.228
												57441 @1726040699.228
												62889 @1726040714.228
												66039 @1726040729.228
												67352 @1726040744.228
												62586 @1726040759.228
												58175 @1726040774.228
												64304 @1726040789.228
												65043 @1726040804.235
												64696 @1726040819.228
												61280 @1726040834.228
												71702 @1726040849.228
												69022 @1726040864.228
												59478 @1726040879.228
												62877 @1726040894.228
												70810 @1726040909.231
												63527 @1726040924.228
												55592 @1726040939.228
												46133 @1726040954.228
												55803 @1726040969.228
												58565 @1726040984.228
												55668 @1726040999.228
												53802 @1726041014.228
												67046 @1726041029.228
												71582 @1726041044.228
												71250 @1726041059.228
												56885 @1726041074.228
												71996 @1726041089.228
												65049 @1726041104.228
												61936 @1726041119.228
												62577 @1726041134.228
												65395 @1726041149.228
												75166 @1726041164.228
												52898 @1726041179.228
												54030 @1726041194.239
												58668 @1726041209.228

上面 metric 每 15s 采集一次，25 分钟共 100 条时序

先看看quantile_over_time 计算结果

quantile_over_time(0.95, prometheus_remote_storage_samples_pending[25m])
---
71855.40000000001

下面还原一下计算过程

排序: 方法会将传入的时序值从低到高排序，上面排序后结果如下:

values = [46133, 48424, 48614, 49220, 49399, 50382, 50721, 50786, 51124, 52898, 53553, 53780, 53802, 54030, 54751, 55592, 55668, 55803, 55815, 55914, 56058, 56116, 56597, 56883, 56885, 56932, 57431, 57435, 57441, 57623, 58175, 58235, 58490, 58550, 58565, 58668, 58777, 58793, 58971, 59153, 59478, 60069, 60480, 61067, 61280, 61495, 61602, 61610, 61659, 61871, 61936, 62577, 62586, 62877, 62889, 63395, 63527, 63533, 63796, 63859, 64275, 64304, 64352, 64696, 64701, 64793, 65043, 65049, 65144, 65395, 65461, 65683, 65821, 66039, 66089, 66099, 66247, 67046, 67256, 67352, 67544, 68587, 68741, 69022, 69789, 70033, 70076, 70079, 70338, 70810, 71250, 71582, 71702, 71809, 71848, 71996, 72406, 74664, 75166, 76727]

按传入的分位值计算 rank以及分位值在 values中关联的值的index
```
rank = 0.95 * 99 # 94.05
lowerIndex = 94
upperIndex = 95
```

根据线性权重和分位附近的值计算结果

weight = 94.05 - 94 # 0.05
lower = values[94] # 71848
upper = values[95] # 71996
lower*(1-0.05) + upper*(0.05) # 71855.4

按 95 计费算法，取值为71996, 可以发现使用 quantile 函数只能得到 95 计费算法近似值, 其中 lower与 upper差值越大，误差也越大。

95带宽 promQL (Summary 类型)

回到最开始的问题，由于 quantile 会存在误差，那么如何通过准确的 promQL 获取 95带宽值呢？

prometheus 提供的 Summary 类型就派上了用场，它会直接记录分位数的值。先来看看Summary 类型 metrics 如何定义:

s := prometheus.NewSummary(prometheus.SummaryOpts{  
    Name:       "alicloud_eip_xxx_bandwidth",  
    Help:       "A summary of the network product bandwidth.",  
    // Objectives 指定想要跟踪的分位数值
    Objectives: map[float64]float64{  
      0.5: 0.05,   // 第50个百分位数，最大绝对误差为0.05。  
      0.95: 0.005,   // 第95个百分位数，最大绝对误差为0.005。  
      0.99: 0.001, // 第90个百分位数，最大绝对误差为0.001。  
    },  
  },  
)

假设当前带宽为10Mbps, 可以使用observe() 方法记录

s.oberve(10)

这样就能得到一些Summary类型的 metrics

# HELP alicloud_eip_xxx_bandwidth A summary of the network product bandwidth.  
# TYPE alicloud_eip_xxx_bandwidth summary  
alicloud_eip_xxx_bandwidth{quantile="0.5"} 8.17532  
alicloud_eip_xxx_bandwidth{quantile="0.95"} 10.56477 
alicloud_eip_xxx_bandwidth{quantile="0.99"} 12.37251  
alicloud_eip_xxx_bandwidth_sum 88364.234  
alicloud_eip_xxx_bandwidth_count 227420

设想：如果 prometheus 提供了topk_over_time 的话, promQL 也可以这么写:
bottomk(1, topk_over_time(5, network_bandwidth{provider="huawei", product="eip", instance_id="xxxx"})[30d])

Histogram vs Summary

既然quantile 无法避免误差，那还有必要使用嘛？

答案当然是有必要啦， prometheus 设计就不是为了高精度监控而生。pull 的抓取方式存在间隔，采样点也无法代表采样间隔的真实情况。所以也不必太计较精度。当然如果数据样本较小，且数据间跨度较大确实没法使用。

如果数据样本够大，且分布均匀(一般大样本都符合正态分布)。使用quantile计算分位数是没啥问题的。对于Histogram 也是如此，合理的设置 bucket (需要对数据分布有一定了解) 是能准确计算分位数的。

顺藤摸瓜，这里也记录一下Histogram和Summary的区别。这部分的内容可以在官方文档: HISTOGRAMS AND SUMMARIES 中找到。

Histogram: 提前设置bucket,将数据按bucket分类。通过histogram_quantile() 函数在服务端(prometheus)计算分位数
Summary: 在客户端(exporter)计算数据的分位数交由 prometheus 采集

	Histogram	Summary
所需配置	选择适合预期范围观察值的存储桶	选择所需的 φ-分位数和滑动窗口。其他 φ-分位数和滑动窗口以后无法计算
客户端性能	观测值消耗较小，因为它们只需要增加计数器	由于流式分位数计算，观测值消耗较大
服务端性能	服务器必须计算分位数。您可以使用记录规则临时计算是否需要太长时间(例如在大型仪表板中)	低服务端消耗
时间序列数(除了`_sum`和`_count`之外)	每个配置的存储桶有一个时间序列.	每个配置的分位数有一个时间序列.
分位数错误(详情见下文)	误差受相关观察上存储桶宽度维度的限制.	误差受可配置值 φ 维度的限制
φ分位数和滑动时间窗口的规范	Prometheus 表达式.	由客户端预先配置.
聚合	Prometheus 表达式.	一般来说，不可汇总.