VictoriaMetrics 慢插入调查

最新推荐文章于 2024-07-20 00:05:47 发布

lixiaoer666

最新推荐文章于 2024-07-20 00:05:47 发布

阅读量622

点赞数 5

文章标签：时序数据库 prometheus go

本文链接：https://blog.csdn.net/qq_35200943/article/details/135818536

版权

本文探讨了在VictoriaMetrics中，慢插入告警的原理，即vmstorage组件的vm_slow_row_inserts_total指标占vminsert的vm_rows_inserted_total的比率。文章通过源码分析了慢插入的原因，如内存不足、活跃时间序列过多，以及如何通过调整内存配置、优化指标设计和调整retentionPeriod来优化性能。

摘要由CSDN通过智能技术生成

前言

最近接受了时序数据库VictoriaMetrics，经常收到关于慢插入的告警，但是不清楚这是一个什么东西，决定深入源码，一探究竟。
本文源码给予VM 1.76.1版本

告警规则

sum(rate(vm_slow_row_inserts_total{job=~"vmstorage"}[5m])) / sum(rate(vm_rows_inserted_total{job=~"vminsert"}[5m]))

首先调查了关于慢插入的告警规则，简单描述一下就是，vmstorage 组件的vm_slow_row_inserts_total指标占vminsert组件的vm_rows_inserted_total指标的比率，我司目前配置的告警阈值是30%，达到30%就需要人工关注了，继续上升可能影响集群稳定性。

两个指标根据名字就可以简单得出，就是慢插入的行数占所有写入行数的比率，接下来就要对下面3个问题开展调查了：

什么是慢插入
为什么慢插入会对集群稳定性造成影响
怎样优化慢插入

什么是慢插入

在优化慢插入之前，需要先调查一下什么是慢插入。
根据前面的信息，可以得知，vm_slow_row_inserts_total这个metric代表了vm集群的慢插入数据，那么我们就通过源码来看一下什么情况下这个metric会累加数据。

* If VictoriaMetrics works slowly and eats more than a CPU core per 100K ingested data points per second,
then it is likely you have too many [active time series](https://docs.victoriametrics.com/FAQ.html#what-is-an-active-time-series) for the current amount of RAM.
VictoriaMetrics [exposes](#monitoring)  `vm_slow_*` metrics such as `vm_slow_row_inserts_total` and `vm_slow_metric_name_loads_total`, which could be used
as an indicator of low amounts of RAM.  It is recommended increasing the amount of RAM on the node with VictoriaMetrics in order to improve
ingestion and query performance in this case.

首先是一段README.md文件对于慢插入的介绍，简单来说就是如果VM数据库运行缓慢，10Wqps的写入使用了超过1个cpu时，就认为集群出现了问题。
原因可能是有太多的active time series存储在了内存中，此时推荐的一个应对方法就是调高内存的阈值。
VM也为此暴露出了多个vm_slow_*用于监控
比如vm_slow_row_inserts_total（慢插入行数）和vm_slow_metric_name_loads_total（慢加载指标数量，本文不详细讲解）

源码阅读

首先全局搜索slowRowInserts，查看这个metrics在什么情况下进行累加，最终定位到storage.go文件

atomic.AddUint64(&s.slowRowInserts, slowInsertsCount)

它出现在storage.go的add函数，完整函数如下


// 时序数据的插入
func (s *Storage) add(rows []rawRow, dstMrs []*MetricRow, mrs []MetricRow, precisionBits uint8) error {
	// 获取当前storage的索引数据库
	idb := s.idb()
	j := 0
	var (
		// These vars are used for speeding up bulk imports of multiple adjacent rows for the same metricName.
		// 相同tsid会批量导入
		prevTSID          TSID
		prevMetricNameRaw []byte
	)
	var pmrs *pendingMetricRows
	// 表的最大最小时间戳，35天前和2天后
	minTimestamp, maxTimestamp := s.tb.getMinMaxTimestamps()

	var genTSID generationTSID

	// Return only the first error, since it has no sense in returning all errors.
	// 仅返回第一个错误
	var firstWarn error
	// 循环所有行
	for i := range mrs {
		mr := &mrs[i]
		// 如果是NaN直接跳过，除非是prometheus的过期标记
		if math.IsNaN(mr.Value) {
			if !decimal.IsStaleNaN(mr.Value) {
				// Skip NaNs other than Prometheus staleness marker, since the underlying encoding
				// doesn't know how to work with them.
				continue
			}
		}
		// 如果写入数据小于35天前，跳过
		if mr.Timestamp < minTimestamp {
			// Skip rows with too small timestamps outside the retention.
			if firstWarn == nil {
				metricName := getUserReadableMetricName(mr.MetricNameRaw)
				firstWarn = fmt.Errorf("cannot insert row with too small timestamp %d outside the retention; minimum allowed timestamp is %d; "+
					"probably you need updating -retentionPeriod command-line flag; metricName: %s",
					mr.Timestamp, minTimestamp, metricName)
			}
			atomic.AddUint64(&s.tooSmallTimestampRows, 1)
			continue
		}
		// 如果写入数据超过2天后，跳过
		if mr.Timestamp > maxTimestamp {
			// Skip rows with too big timestamps significantly exceeding the current time.
			if firstWarn == nil {
				metricName := getUserReadableMetricName(mr.MetricNameRaw)
				firstWarn = fmt.Errorf("cannot insert row with too big timestamp %d exceeding the current time; maximum allowed timestamp is %d; metricName: %s",
					mr.Timestamp, maxTimestamp, metricName)
			}
			atomic.AddUint64(&s.tooBigTimestampRows, 1)
			continue
		}
		dstMrs[j] = mr
		r := &rows[j]
		j++
		r.Timestamp = mr.Timestamp
		r.Value = mr.Value
		r.PrecisionBits = precisionBits
		// 如果metricNameRaw与上一次命中缓存时的prevMetricNameRaw一样，则直接使用上一行的TSID
		if string(mr.MetricNameRaw) == string(prevMetricNameRaw) {
			// Fast path - the current mr contains the same metric name as the previous mr, so it contains the same TSID.
			// This path should trigger on bulk imports when many rows contain the same MetricNameRaw.
			r.TSID = prevTSID
			continue
		}
		// 如果从缓存中获取到了TSID
		if s.getTSIDFromCache(&genTSID, mr.MetricNameRaw) {
			r.TSID = genTSID.TSID
			// 如果超过了唯一容量限制，则跳过该行
			if s.isSeriesCardinalityExceeded(r.TSID.MetricID, mr.MetricNameRaw) {
				// Skip the row, since the limit on the number of unique series has been exceeded.
				// 如果j-- ，下一行放到待写入的数组的时候就会覆盖该行，相当于把该行丢弃了
				j--
				continue
			}
			// Fast path - the TSID for the given MetricNameRaw has been found in cache and isn't deleted.
			// There is no need in checking whether r.TSID.MetricID is deleted, since tsidCache doesn't
			// contain MetricName->TSID entries for deleted time series.
			// See Storage.DeleteMetrics code for details.
			// 这两个值用于下一次命中缓存用的
			prevTSID = r.TSID
			prevMetricNameRaw = mr.MetricNameRaw

			// 索引数据库要即使更新，避免索引过期时引发缓存雪崩
			if genTSID.generation != idb.generation {
				// The found entry is from the previous cache generation
				// so attempt to re-populate the current generation with this entry.
				// This is needed for https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1401
				created, err := idb.maybeCreateIndexes(&genTSID.TSID, mr.MetricNameRaw)
				if err != nil {
					return fmt.Errorf("cannot create indexes in the current indexdb: %w", err)
				}
				if created {
					genTSID.generation = idb.generation
					s.putTSIDToCache(&genTSID, mr.MetricNameRaw)
				}
			}
			continue
		}

		// Slow path - the TSID is missing in the cache.
		// Postpone its search in the loop below.
		// 无法命中缓存的慢插入，先将数据放进getPendingMetricRows里面
		j--
		if pmrs == nil {
			pmrs = getPendingMetricRows()
		}
		if err := pmrs.addRow(mr); err != nil {
			// Do not stop adding rows on error - just skip invalid row.
			// This guarantees that invalid rows don't prevent
			// from adding valid rows into the storage.
			if firstWarn == nil {
				firstWarn = err
			}
			continue
		}
	}
	// 开始处理慢插入
	if pmrs != nil {
		// Sort pendingMetricRows by canonical metric name in order to speed up search via `is` in the loop below.
		pendingMetricRows := pmrs.pmrs
		// 先根据metric name排序
		sort.Slice(pendingMetricRows, func(i, j int) bool {
			return string(pendingMetricRows[i].MetricName) < string(pendingMetricRows[j].MetricName)
		})
		is := idb.getIndexSearch(0, 0, noDeadline)
		prevMetricNameRaw = nil
		var slowInsertsCount uint64
		for i := range pendingMetricRows {
			pmr := &pendingMetricRows[i]
			mr := pmr.mr
			dstMrs[j] = mr
			r := &rows[j]
			j++
			r.Timestamp = mr.Timestamp
			r.Value = mr.Value
			r.PrecisionBits = precisionBits
			// 连续两行是同一个指标的场景，直接获取tsid
			if string(mr.MetricNameRaw) == string(prevMetricNameRaw) {
				// Fast path - the current mr contains the same metric name as the previous mr, so it contains the same TSID.
				// This path should trigger on bulk imports when many rows contain the same MetricNameRaw.
				r.TSID = prevTSID
				if s.isSeriesCardinalityExceeded(r.TSID.MetricID, mr.MetricNameRaw) {
					// Skip the row, since the limit on the number of unique series has been exceeded.
					j--
					continue
				}
				continue
			}
			// 慢插入累加
			slowInsertsCount++
			// 关键在GetOrCreateTSIDByName，使用该函数获取tsid，如果报错就跳过错误行
			if err := is.GetOrCreateTSIDByName(&r.TSID, pmr.MetricName); err != nil {
				// Do not stop adding rows on error - just skip invalid row.
				// This guarantees that invalid rows don't prevent
				// from adding valid rows into the storage.
				if firstWarn == nil {
					firstWarn = fmt.Errorf("cannot obtain or create TSID for MetricName %q: %w", pmr.MetricName, err)
				}
				j--
				continue
			}
			genTSID.generation = idb.generation
			genTSID.TSID = r.TSID
			// 获取完将tsid和指标name放入缓存
			s.putTSIDToCache(&genTSID, mr.MetricNameRaw)
			prevTSID = r.TSID
			prevMetricNameRaw = mr.MetricNameRaw
			if s.isSeriesCardinalityExceeded(r.TSID.MetricID, mr.MetricNameRaw) {
				// Skip the row, since the limit on the number of unique series has been exceeded.
				j--
				continue
			}
		}
		idb.putIndexSearch(is)
		putPendingMetricRows(pmrs)
		atomic.AddUint64(&s.slowRowInserts, slowInsertsCount)
	}
	if firstWarn != nil {
		logger.WithThrottler("storageAddRows", 5*time.Second).Warnf("warn occurred during rows addition: %s", firstWarn)
	}
	dstMrs = dstMrs[:j]
	rows = rows[:j]

	var firstError error
	if err := s.tb.AddRows(rows); err != nil {
		firstError = fmt.Errorf("cannot add rows to table: %w", err)
	}
	if err := s.updatePerDateData(rows, dstMrs); err != nil && firstError == nil {
		firstError = fmt.Errorf("cannot update per-date data: %w", err)
	}
	if firstError != nil {
		return fmt.Errorf("error occurred during rows addition: %w", firstError)
	}
	return nil
}

接下来说一下上面这段代码是什么意思
vm内部存储了一个tsidCache缓存空间，默认情况下占内存空间的35%，并且可配置
所有的写入都会从这个缓存获取tsid(ts=time series)
如果能够从缓存中获取tsid，则很快就可以进行写入
如果无法从缓存中拿到tsid，那么将从LSM Tree中进行tsid的获取，速度相比缓存较慢
对未命中缓存的写入，标记为慢插入，并记录metrics
未命中缓存有多种可能

全新的metric肯定不在cache中，此时第一次写入肯定是慢插入
写入量级过大或cache空间过小，总之就是cache空间有限
指标series过多，如果指标设计不合理，tsid过多，同样会造成cache无法
写入的指标时间为较旧的时间，因为cache采用的LRU算法，存储空间有限，数据写入时间跨度比较大的话，也会出现无法命中cache的情况
最近1小时/1天的单指标的写入超过阈值（默认不限制，可以通过配置限制）

为什么慢插入会对集群稳定性造成影响

结合上面的源码分析，如果大量的写入无法命中cache的话，都会执行较慢的LSM Tree检索，此时写入时间势必拉长影响上游，同时会消耗大量的系统资源进行计算。
如果集群冗余资源给的不够，忽然一波写入变为慢写入后，后面的写入都会进行堆积，加剧写入恶化，容易引发缓存雪崩导致集群崩溃

怎样优化慢插入

要优化慢插入，首先要调查清楚是什么原因导致的慢插入，结合实际情况，对症下药

全新metric，无需处理，短暂慢插入后会自动恢复
写入量级过大或cache空间过小，调大storage.cacheSizeStorageTSID配置(默认storage 内存的37%)，或者扩容节点数量
指标series过多，指标设计不合理的问题，要找指标写入方配合改造，不是说所有问题都是VM数据库的问题，这种场景只要拿出铁证就可以让业务方配合改造
指标写入时间跨度比较大，这个也是2个层面进行改造，1 督促业务方配合改造，在上游将历史数据过滤 2 减少retentionPeriod配置的值，这个值是vm数据库保存数据的时间范围，默认31天，最小1天。