大量更新后数据膨胀_段合并的原理探寻

最新推荐文章于 2024-04-18 09:22:50 发布

夜月行者

最新推荐文章于 2024-04-18 09:22:50 发布

阅读量427

点赞数 1

分类专栏： # 使用经验

本文链接：https://blog.csdn.net/u013200380/article/details/114117181

版权

使用经验专栏收录该内容

17 篇文章 0 订阅

订阅专栏

文章目录

问题

在上周的时候，有一个索引post,数据方突然做了一个近全表的update,导致了大量的数据更新。数据积累的情况下跑的有将近1个小时，但是奇怪的是在数据更新完以后索引膨胀了80%。从11g变成了20g.
当时怀疑是段合并的问题，但是看具体的segment数量却基本上没有变化。很是奇怪。经历了半天的时间，依然没有降下来。后来打算在这周一如果数据量依然是很高的话就直接重建全量索引。
在周一的时候，再看数据的量已经降下来了，到了12g

GET _cat/indices/post?v

health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   post_two vVGUNqf9RfyqPwIei7VD4A   5   2   10011876      2774293     12.2gb          3.9gb

问题探寻

当时觉得很是奇怪，而且可以确定是段合并的问题了。也借着这个机会学习一下段合并的原理。
网上查找了不少资料，比较少，但是仔细找还是能够找到的，当然，最后真正弄明白还是要靠源码。

这里只讲一下es的index过程，es的index是过程是新段和删除端不断产生的过程。lucene不会直接删除已经存在的segment中的数据，而是产生新的删除段关联原数据段。同时如果是update的话也会创建新的段。

在这里，我们的情况就是产生了大量的update操作，虽然有些老段中已经有很多删除了，但是因为lucene的段合并策略，总是不能被合并，所以就导致了问题。

lucene的段合并机制

先将所有段按照扣除删除文档之后的字节数(bytesize * (1.0 - delRatio))降序排列
排除size>2.5G的段（2.5=maxMergedSegmentSize / 2）maxMergedSegmentBytes是一个设置值，默认为5g,最大合并段
根据2的结果计算allowedSegCountInt，也就是当前索引对应的数据量理想化的情况下应该有多少个seg
排除正在合并的段
在以上几个步骤后得到候选段集合eligible[]
判断实际的候选段的数量(eligible.size)是否大于 allowedSegCountInt ，如果小于，直接返回，不再进行合并，反之进入第7步
对eligible进行降序的排列组合，得到多个组合结果（每个组合结果都是eligible的一个子集）
对多个组合进行打分，得分最低的一组segment被选中进行合并。

1. allowedSegCountInt 的计算方式

第三步allowedSegCountInt 的计算方式

选定最开始计算的阶梯值，逻辑如下面的floorSize()函数
- 计算，按照每一层的segsPerTier个段的数据量进行计算，层级是从小到大，直到无法达到某个层级的segsPerTier的个数。allowedSegCountInt，是理想化的情况下这些数据应该有的segment数量，是为了衡量当前的段数量是否合理，如果是合理的状态，那么就可能不会在进行段合并了。

    minSegmentBytes = floorSize(minSegmentBytes);

    // Compute max allowed segs in the index
    long levelSize = minSegmentBytes;
    //totIndexBytes是出掉大段后的所有段的总量
    long bytesLeft = totIndexBytes;
    double allowedSegCount = 0;
    while(true) {
      final double segCountLevel = bytesLeft / (double) levelSize;
     // segsPerTier是设定值，是每个阶梯的段数量，默认值是10
     // 当段的剩余的bytesLeft在某个级别的段上已经不够segsPerTier的时候就会中断循环
     // 说明已经统计完了，没有多余的bytesLeft来统计了，这也是唯一的中断出口
      if (segCountLevel < segsPerTier) {
        allowedSegCount += Math.ceil(segCountLevel);
        break;
      }
     //每个层级的段数量相加
      allowedSegCount += segsPerTier;
      bytesLeft -= segsPerTier * levelSize;
      levelSize *= maxMergeAtOnce;
    }
    int allowedSegCountInt = (int) allowedSegCount;

  floorSegmentBytes是设定的最小段，理论上低于这个值的段都被认为是小段,默认的设置是2m
  private long floorSize(long bytes) {
    return Math.max(floorSegmentBytes, bytes);
  }

2. 第5步的计算过程，就是排除正在合并的段

      final List<SegmentCommitInfo> eligible = new ArrayList<>();
      for(int idx = tooBigCount; idx<infosSorted.size(); idx++) {
        final SegmentCommitInfo info = infosSorted.get(idx);
        if (merging.contains(info)) {
          mergingBytes += size(info, writer);
        } else if (!toBeMerged.contains(info)) {
          eligible.add(info);
        }
      }

3. 第7步的排列组合

就是使用双层for循环，从大到小进行组合，segment的总数不超过10个，segment总的数据量不超过maxMergedSegmentBytes。
如果在遍历中有超过maxMergedSegmentBytes的时候对hitTooLarge置标志位，会影响打分
打分越低越优先。最后得到一个打分最低的segment组合。

if (eligible.size() > allowedSegCountInt) {

        // OK we are over budget -- find best merge!
        MergeScore bestScore = null;
        List<SegmentCommitInfo> best = null;
        boolean bestTooLarge = false;
        long bestMergeBytes = 0;

        // Consider all merge starts:
	// 使用双层for循环，从大到小列举所有可能
        for(int startIdx = 0;startIdx <= eligible.size()-maxMergeAtOnce; startIdx++) {

          long totAfterMergeBytes = 0;

	//candidate 保存一次组合的结果
          final List<SegmentCommitInfo> candidate = new ArrayList<>();
          boolean hitTooLarge = false;
          for(int idx = startIdx;idx<eligible.size() && candidate.size() < maxMergeAtOnce;idx++) {
            final SegmentCommitInfo info = eligible.get(idx);
            final long segBytes = size(info, writer);
	//如果加入这个段（info）以后总量大于 设定的最大合并段的值,就再排除这个段，但是置标志位hitTooLarge，表明当前组合的总量是接近最大段的，相对来说合并的效率更好，后期的话就不用再合并了
            if (totAfterMergeBytes + segBytes > maxMergedSegmentBytes) {
              hitTooLarge = true;
              // NOTE: we continue, so that we can try
              // "packing" smaller segments into this merge
              // to see if we can get closer to the max
              // size; this in general is not perfect since
              // this is really "bin packing" and we'd have
              // to try different permutations.
		//这里也说了，这个策略并不算完美
              continue;
            }
            candidate.add(info);
            totAfterMergeBytes += segBytes;
          }

          // We should never see an empty candidate: we iterated over maxMergeAtOnce
          // segments, and already pre-excluded the too-large segments:
          assert candidate.size() > 0;

	//这个地方进行打分
          final MergeScore score = score(candidate, hitTooLarge, mergingBytes, writer);
          if (verbose(writer)) {
            message("  maybe=" + writer.segString(candidate) + " score=" + score.getScore() + " " + score.getExplanation() + " tooLarge=" + hitTooLarge + " size=" + String.format(Locale.ROOT, "%.3f MB", totAfterMergeBytes/1024./1024.), writer);
          }

          // If we are already running a max sized merge
          // (maxMergeIsRunning), don't allow another max
          // sized merge to kick off:
          
	//score小的优先级更高
          if ((bestScore == null || score.getScore() < bestScore.getScore()) && (!hitTooLarge || !maxMergeIsRunning)) {
            best = candidate;
            bestScore = score;
            bestTooLarge = hitTooLarge;
            bestMergeBytes = totAfterMergeBytes;
          }
        }
        
        if (best != null) {
          if (spec == null) {
            spec = new MergeSpecification();
          }
          final OneMerge merge = new OneMerge(best);
          spec.add(merge);
          for(SegmentCommitInfo info : merge.segments) {
            toBeMerged.add(info);
          }

          if (verbose(writer)) {
            message("  add merge=" + writer.segString(merge.segments) + " size=" + String.format(Locale.ROOT, "%.3f MB", bestMergeBytes/1024./1024.) + " score=" + String.format(Locale.ROOT, "%.3f", bestScore.getScore()) + " " + bestScore.getExplanation() + (bestTooLarge ? " [max merge]" : ""), writer);
          }
        } else {
          return spec;
        }
} else {
        return spec;
      }

4. score函数的打分模型,

这里记住，一定是分数越小优先级越高
1. 给出初始值 skew (如果在3中的hitTooLarge为true的话该值为0.1，否则是 当前组合中最大段size/ 组合中所有段floorsize之和),floorsize是每个段所在的层的阈值,也就是大段和小段小段搭配更优
2. score=skew*Math.pow(totAfterMergeBytes, 0.05)//这个地方索命压缩后size越小越好,这一点说明是小段优先
3. score = score*Math.pow(nonDelRatio, 2)//留存比率的平方，这个2是可以手动调节的,删除率越高越好


  /** Expert: scores one merge; subclasses can override. */
  protected MergeScore score(List<SegmentCommitInfo> candidate, boolean hitTooLarge, long mergingBytes, IndexWriter writer) throws IOException {
    long totBeforeMergeBytes = 0;
    long totAfterMergeBytes = 0;
    long totAfterMergeBytesFloored = 0;
    for(SegmentCommitInfo info : candidate) {
      final long segBytes = size(info, writer);
      totAfterMergeBytes += segBytes;
      totAfterMergeBytesFloored += floorSize(segBytes);
      totBeforeMergeBytes += info.sizeInBytes();
    }

    // Roughly measure "skew" of the merge, i.e. how
    // "balanced" the merge is (whether the segments are
    // about the same size), which can range from
    // 1.0/numSegsBeingMerged (good) to 1.0 (poor). Heavily
    // lopsided merges (skew near 1.0) is no good; it means
    // O(N^2) merge cost over time:
    final double skew;
    //这里如果大段置位了，也就是这一组曾经超过过大段，虽然后来又替换，但是应该是接近大段的。这样下次就不用合并了，所以给的优先级比较高
    // 这样的话就会给更高的优先级
    if (hitTooLarge) {
      // Pretend the merge has perfect skew; skew doesn't
      // matter in this case because this merge will not
      // "cascade" and so it cannot lead to N^2 merge cost
      // over time:
      skew = 1.0/maxMergeAtOnce;
    } else {
     //对于其他情况，就用这个组合中最大的segment的size 除以组合内所有元素的size,理论上除非组合中所有元素一样大，否则，skew肯定大于0.1, 段的差异越大，这个值越大
      skew = ((double) floorSize(size(candidate.get(0), writer)))/totAfterMergeBytesFloored;
    }

    // Strongly favor merges with less skew (smaller
    // mergeScore is better):
    // mergeScore 越小越好
    double mergeScore = skew;

    // Gently favor smaller merges over bigger ones.  We
    // don't want to make this exponent too large else we
    // can end up doing poor merges of small segments in
    // order to avoid the large merges:
    //对merge后的总量size取指数运算,这样说来，合并后总量越大对应计算的mergeScore越大，优先级也就越低，  越小则优先级越高，但是，因为指数很小，所以影响不是很大，也就是更偏向于先合并小段
    // 轻轻地偏爱较小的合并而不是较大的合并，我们不想使此指数太大，否则我们最终可能会因为为了避免大合并而对小段进行不良合并
       mergeScore *= Math.pow(totAfterMergeBytes, 0.05);

    // Strongly favor merges that reclaim deletes:
	//这个是保留率，就是删除以后的总量和执行删除前的总量
    final double nonDelRatio = ((double) totAfterMergeBytes)/totBeforeMergeBytes;
    //这里看对压缩比还是比较重视的，保有率越低越好，reclaimDeletesWeight是一个设置值，用来控制压缩率在打分中所占的权重，默认是2，建议的是不超过3，如果是0的话，压缩率就不影响打分了
    mergeScore *= Math.pow(nonDelRatio, reclaimDeletesWeight);

    final double finalMergeScore = mergeScore;

    return new MergeScore() {

      @Override
      public double getScore() {
        return finalMergeScore;
      }

      @Override
      public String getExplanation() {
        return "skew=" + String.format(Locale.ROOT, "%.3f", skew) + " nonDelRatio=" + String.format(Locale.ROOT, "%.3f", nonDelRatio);
      }
    };
  }

总结

总结来说就是

如果10个seg合并后总量接近5G，那么就优先级处于更高
否则10个seg越均衡优先级系数会加一些
10个seg的总量更小，优先级系数会大一些
删除率更高，优先级系数更高
综合上面的几个因素来考虑
在update进行中的时候最开始的时候倾向于合并小段，小段优先级更好，而且大量的index操作会产生大量的小段，之前的比较稳定的中段没有机会合并，所以删除后的文档也无法及时清理，等后面小段处理的差不多了，中段才有机会处理，并且存储量也逐渐下来了。

参考
https://blog.csdn.net/duanduanpeng/article/details/72633217
https://blog.csdn.net/jollyjumper/article/details/24786147
https://blog.csdn.net/zhengxgs/article/details/78971141
https://blog.csdn.net/kimichen123/article/details/77477251
https://www.jianshu.com/p/9b872a41d5bb
文档删除的原理,lucene
https://blog.csdn.net/liujava621/article/details/40948417