prometheus increase函数统计得到小数

最新推荐文章于 2023-08-16 09:22:52 发布

qq_16899143

最新推荐文章于 2023-08-16 09:22:52 发布

阅读量1k

点赞数 1

分类专栏： prometheus 源码文章标签： prometheus python 数据结构 Powered by 金山文档

本文链接：https://blog.csdn.net/qq_16899143/article/details/128966021

版权

源码同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

prometheus

1 篇文章 0 订阅

订阅专栏

今天发现prometheus的increase函数得到了小数，研究一下源码，以下是rate/increase/delta 对应的计算函数

https://github.com/prometheus/prometheus/blob/d77b56e88e3d554a499e22d2073812b59191256c/promql/functions.go#L55

// extrapolatedRate is a utility function for rate/increase/delta.
// It calculates the rate (allowing for counter resets if isCounter is true),
// extrapolates if the first/last sample is close to the boundary, and returns
// the result as either per-second (if isRate is true) or overall.
func extrapolatedRate(vals []parser.Value, args parser.Expressions, enh *EvalNodeHelper, isCounter bool, isRate bool) Vector {
    ms := args[0].(*parser.MatrixSelector)
    vs := ms.VectorSelector.(*parser.VectorSelector)

    var (
        //取样点对应的struct
        samples    = vals[0].(Matrix)[0]
        //取样的开始和结束时间
        rangeStart = enh.ts - durationMilliseconds(ms.Range+vs.Offset)
        rangeEnd   = enh.ts - durationMilliseconds(vs.Offset)
    )

    // No sense in trying to compute a rate without at least two points. Drop
    // this Vector element.
    //如果只有1或0个取样点，则没法计算增量
    if len(samples.Points) < 2 {
        return enh.out
    }
    var (
        counterCorrection float64
        lastValue         float64
    )
    for _, sample := range samples.Points {
        if isCounter && sample.V < lastValue {
            //没看懂counterCorrection在干什么，但是不影响后面的理解，因为这里应该是要处理特殊情况，一般来说 sample.V < lastValue不应该成立，因为后一个点的值应该大于等于前一个点
            counterCorrection += lastValue
        }
        lastValue = sample.V
    }
    //最后一个计数点和第一个计数点之间的差值（粗略的结果）
    resultValue := lastValue - samples.Points[0].V + counterCorrection

    // Duration between first/last samples and boundary of range.
    //取样开始时间与第一个计数点时间之间的差值
    durationToStart := float64(samples.Points[0].T-rangeStart) / 1000
    //取样结束时间与最后一个计数点之间的差值
    durationToEnd := float64(rangeEnd-samples.Points[len(samples.Points)-1].T) / 1000

    //最后一个计数点与第一个计数点之间的差值
    sampledInterval := float64(samples.Points[len(samples.Points)-1].T-samples.Points[0].T) / 1000
    //计数点之间的时间间隔
    averageDurationBetweenSamples := sampledInterval / float64(len(samples.Points)-1)

    if isCounter && resultValue > 0 && samples.Points[0].V >= 0 {
        // Counters cannot be negative. If we have any slope at
        // all (i.e. resultValue went up), we can extrapolate
        // the zero point of the counter. If the duration to the
        // zero point is shorter than the durationToStart, we
        // take the zero point as the start of the series,
        // thereby avoiding extrapolation to negative counter
        // values.
        //这里的durationToZero是第一个计数点到零点（原始零点，就是整个表格的零点）之间的差值，如果durationToZero < durationToStart 就说明不正常，需要把durationToStart更新为durationToZero
        //至于为什么这么更新，可以看上面原文注释
        durationToZero := sampledInterval * (samples.Points[0].V / resultValue)
        if durationToZero < durationToStart {
            durationToStart = durationToZero
        }
    }

    // If the first/last samples are close to the boundaries of the range,
    // extrapolate the result. This is as we expect that another sample
    // will exist given the spacing between samples we've seen thus far,
    // with an allowance for noise.
    extrapolationThreshold := averageDurationBetweenSamples * 1.1
    extrapolateToInterval := sampledInterval

    //这个if一般来说会为true，因为extrapolationThreshold > averageDurationBetweenSamples，而正常情况下 durationToStart <= averageDurationBetweenSamples 会成立
    if durationToStart < extrapolationThreshold {
        extrapolateToInterval += durationToStart
    } else {
        extrapolateToInterval += averageDurationBetweenSamples / 2
    }
    //这里与durationToStart的情况一致
    if durationToEnd < extrapolationThreshold {
        extrapolateToInterval += durationToEnd
    } else {
        extrapolateToInterval += averageDurationBetweenSamples / 2
    }
    //这里根据之前的计算，会采取数学上的外推法来减少预测的误差
    //这里就是小数出现原因，resultValue原本是一个整数，但是经过外推法的调整，就有了小数部分
    resultValue = resultValue * (extrapolateToInterval / sampledInterval)
    if isRate {
        resultValue = resultValue / ms.Range.Seconds()
    }

    return append(enh.out, Sample{
        Point: Point{V: resultValue},
    })
}

从源码中可以看出，第一个和最后一个计数点之间的差值需要经过外推法计算才能得到最后的结果。一般来说，如果计数点之间的间隔为15s，每60s统计一次，每次统计则会收入4个计数点（而不是5个）,也就只有三个时间间隔，因为计数点不会精确地卡在统计的开始和结束，所以会出现durationToBegin和durationToEnd，而durationToBegin + durationToEnd = 15s, 如图

回到我的问题上，在某个时间段内，我的table只增加了一个数据，计数间隔为15s时：

1.如果将统计间隔设为30s，则每次统计只会涵盖两个计数点（一个时间段），而这个增加的数据刚好就在两个计数点之间，所以最初resultValue=1，extrapolateToInterval=30s，sampledInterval=15s resultValue = resultValue * (extrapolateToInterval / sampledInterval) 后 resultValue=2

2.如果将统计间隔设为1min，则会涵盖四个计数点（三个时间段），所以最初resultValue=1，extrapolateToInterval=60s，sampledInterval=45s resultValue = resultValue * (extrapolateToInterval / sampledInterval) 后 resultValue=1.33 也就是4/3

3.如果将统计间隔设为2min，则会涵盖8个计数点（7个时间段），所以最初resultValue=1，extrapolateToInterval=120s，sampledInterval=105s resultValue = resultValue * (extrapolateToInterval / sampledInterval) 后 resultValue=1.143 也就是8/7