Elasticsearch Merge合并操作与配置_elasticsearch merges-CSDN博客

本文链接：https://blog.csdn.net/likui1314159/article/details/53224132

在elasticsearch中，一个分片就是一个完整的Lucene索引，一个Lucene索引又分解为很多segment，segment作为索引存储数据的底层单位，是不可变的。为了保证segment个数在一定范围之内和物理删除已打删除标记的文档，一些小的segment会周期性的合并为更大的segment。合并线程会根据硬件配置，自动平衡合并操作和其他一些操作（比如查询）。

Merge 调度任务

合并任务调度实例（ConcurrentMergeScheduler）控制着合并操作的进程。合并操作使用不同的线程
来进行合并操作，当线程数已达最大时，后面合并操作只能等待前面的线程执行完并可用时，才会进行合并；

合并调度程序支持动态配置最大线程数，配置参数为index.merge.scheduler.max_thread_count ，
参数值为 Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))，机械硬盘读写速度慢，如果配置太大，IO消耗太大，可能影响其他的操作，所以最好配置为1；

对应源代码为：

类文件：org.apache.lucene.index.ConcurrentMergeScheduler.java

 /** Used for testing.
   * @lucene.internal 
   */
  public static final String DEFAULT_CPU_CORE_COUNT_PROPERTY = "lucene.cms.override_core_count";

 /** Sets max merges and threads to proper defaults for rotational
   *  or non-rotational storage.
   * 根据磁盘类型来设置最大合并线程数和最大正在合并的segment数   
   * @param spins true to set defaults best for traditional 
     rotatational storage (spinning disks), 
   *        else false (e.g. for solid-state disks)
   */
  public synchronized void setDefaultMaxMergesAndThreads(boolean spins) {
    if (spins) {//机械的默认配置
      maxThreadCount = 1;//最大合并线程数
      maxMergeCount = 6;
    } else {//固态硬盘时
      int coreCount = Runtime.getRuntime().availableProcessors();

      // Let tests override this to help reproducing a failure on a machine that has a different
      // core count than the one where the test originally failed:
      try {
        String value = System.getProperty(DEFAULT_CPU_CORE_COUNT_PROPERTY);
        //如果环境变量有这个DEFAULT_CPU_CORE_COUNT_PROPERTY值，就用这个，否则用coreCount
        if (value != null) {
          coreCount = Integer.parseInt(value);
        }
      } catch (Throwable ignored) {
      }

      maxThreadCount = Math.max(1, Math.min(4, coreCount/2));
      maxMergeCount = maxThreadCount+5;
    }
  }

org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.java

 void refreshConfig() {
        if (this.getMaxMergeCount() != config.getMaxMergeCount() || this.getMaxThreadCount() != config.getMaxThreadCount()) {
            this.setMaxMergesAndThreads(config.getMaxMergeCount(), config.getMaxThreadCount());
        }
        boolean isEnabled = getIORateLimitMBPerSec() != Double.POSITIVE_INFINITY;
        if (config.isAutoThrottle() && isEnabled == false) {
            enableAutoIOThrottle();
        } else if (config.isAutoThrottle() == false && isEnabled) {
            disableAutoIOThrottle();
        }
    }

org.elasticsearch.index.MergeSchedulerConfig.java 变量配置：

  public static final Setting<Integer> MAX_THREAD_COUNT_SETTING =
        new Setting<>("index.merge.scheduler.max_thread_count",
            (s) -> Integer.toString(Math.max(1, Math.min(4, EsExecutors.boundedNumberOfProcessors(s) / 2))),
            (s) -> Setting.parseInt(s, 1, "index.merge.scheduler.max_thread_count"), Property.Dynamic,
            Property.IndexScope);
    public static final Setting<Integer> MAX_MERGE_COUNT_SETTING =
        new Setting<>("index.merge.scheduler.max_merge_count",
            (s) -> Integer.toString(MAX_THREAD_COUNT_SETTING.get(s) + 5),
            (s) -> Setting.parseInt(s, 1, "index.merge.scheduler.max_merge_count"), Property.Dynamic, Property.IndexScope);
    public static final Setting<Boolean> AUTO_THROTTLE_SETTING =
        Setting.boolSetting("index.merge.scheduler.auto_throttle", true, Property.Dynamic, Property.IndexScope);

    private volatile boolean autoThrottle;
    private volatile int maxThreadCount;
    private volatile int maxMergeCount;