summary类型怎样使用

最新推荐文章于 2024-06-14 23:45:00 发布

puppycuty

最新推荐文章于 2024-06-14 23:45:00 发布

阅读量747

点赞数

分类专栏：监控

本文链接：https://blog.csdn.net/qq_38125626/article/details/114627375

版权

监控专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1 背景

在微服务项目中，我们通常需要监测客户请求的耗时，进而掌握系统整体的性能情况。

若发现某些请求耗时非常高，那肯定会对客户体验造成影响。

并且高耗时的服务非常容易成为整个服务的瓶颈，在高并发下很可能引发微服务雪崩效应，进而导致整个服务不可用。

2 微服务项目中如何监测请求耗时呢？

例如常见的监测手段是：

某个请求的最大耗时。（木桶效应里的最短的那块板）
某个请求的耗时百分位。（请求耗时的整体分布情况）

例如：

请求：http://127.0.0.1/hello

最大耗时：300ms [需要重点关注，什么情况下产生这么大的耗时，必须被优化掉]

耗时百分位：

50分位，50%：100ms（有50%的请求，耗时低于100ms）[性能很好，耗时较低]
90分位，90%：230ms（有90%的请求，耗时低于230ms）[230ms，性能可接受]
95分位，95%：260ms（有95%的请求，耗时低于260ms）[260ms，需要优化性能]
99分位，99%：270ms（有99%的请求，耗时低于270ms）[270ms，影响客户体验]

3 使用Prometheus的Summary类型来统计HTTP请求耗时

3.1 实践：如何使用Summary类型Metric？

示例代码：


      // 统计http请求耗时
      var httpRequestDuration = prometheus.NewSummaryVec(
      	prometheus.SummaryOpts{
      		Name: "http_request_duration",
      		Help: "http request duration",
      		Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.95: 0.005, 0.99: 0.001},
      	},
      	[]string{"endpoint"},
      )

采样结果：


      http_request_duration{endpoint="/hello/2",quantile="0.5"} 35
      http_request_duration{endpoint="/hello/2",quantile="0.9"} 94
      http_request_duration{endpoint="/hello/2",quantile="0.95"} 97
      http_request_duration{endpoint="/hello/2",quantile="0.99"} 98
      http_request_duration_sum{endpoint="/hello/2"} 1172
      http_request_duration_count{endpoint="/hello/2"} 28

Summary类型的Metric会生成三种类型的值：

xxx_sum：表示“/hello/2”这个请求，耗时的总和。
xxx_count：表示“/hello/2”这个请求，请求的次数。
xxx{xxxx, quantile="0.5"}：表示“/hello/2”这个请求，50分位的值，例如上述示例中，50分位值是35，意思是这个url 50%的请求耗时都小于35ms

3.2 源码分析：Summary是如何计算分位数的？

首先看Summary的定义


      type Summary interface {
      	Metric
      	Collector
     	// Observe adds a single observation to the summary.
      // 新增一个观测值
      	Observe(float64)
      }

Summary很核心的一个方法是Observe()，在本地增加一个观测值。

再看Summary的实现


      // Summary接口的实现类
      type summary struct {
      	selfCollector
     	// 省略
      	objectives map[float64]float64 // 分位数，告诉Summary要统计哪些分位的值
      	sortedObjectives []float64 // 对分位数进行排序，升序，防止用户输入的分位数是乱序的
      	labelPairs []*dto.LabelPair
      	sum float64 // 观测到的数据值的总和
      	cnt uint64 // 观测的次数
     	// 省略
      	streams []*quantile.Stream
      	streamDuration time.Duration
      	headStream *quantile.Stream // 存储当前观测数据的地方
      	headStreamIdx int
      // 省略
      }

系统观测到的数据放在quantile.Stream里：


      type Stream struct {
      	*stream // 历史所有观测数据保存在这里，会在一定条件下将b里的观测值merge到stream中
      	b Samples // Samples类型本质就是[]Sample，保存当前观测到的数据
      	sorted bool // 是否已排序
      }
      // stream结构
      type stream struct {
      	n float64 // 数量
      	l []Sample // 所有观测到的数据
      	ƒ invariant
      }
      // 一次观测获取的数据
      type Sample struct {
      	Value float64 `json:",string"`
      	Width float64 `json:",string"`
      	Delta float64 `json:",string"`
      }

由此可见，每一次观测到的值被包装成一个Sample，然后所有观测到的值放在一个list里。