在elasticsearch中,一个分片就是一个完整的Lucene索引,一个Lucene索引又分解为很多segment,segment作为索引存储数据的底层单位,是不可变的。为了保证segment个数在一定范围之内和物理删除已打删除标记的文档,一些小的segment会周期性的合并为更大的segment。合并线程会根据硬件配置,自动平衡合并操作和其他一些操作(比如查询)。
Merge 调度任务
合并任务调度实例(ConcurrentMergeScheduler)控制着合并操作的进程。合并操作使用不同的线程
来进行合并操作,当线程数已达最大时,后面合并操作只能等待前面的线程执行完并可用时,才会进行合并;
合并调度程序支持动态配置最大线程数,配置参数为index.merge.scheduler.max_thread_count ,
参数值为 Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2)),机械硬盘读写速度慢,如果配置太大,IO消耗太大,可能影响其他的操作,所以最好配置为1;
对应源代码为:
类文件:org.apache.lucene.index.ConcurrentMergeScheduler.java
/** Used for testing.
* @lucene.internal
*/
public static final String DEFAULT_CPU_CORE_COUNT_PROPERTY = "lucene.cms.override_core_count";
/** Sets max merges and threads to proper defaults for rotational
* or non-rotational storage.
* 根据磁盘类型来设置最大合并线程数和最大正在合并的segment数
* @param spins true to set defaults best for traditional
rotatational storage (spinning disks),
* else false (e.g. for solid-state disks)
*/
public synchronized void setDefaultMaxMergesAndThreads(boolean spins) {
if (spins) {//机械的默认配置
maxThreadCount = 1;//最大合并线程数
maxMergeCount = 6;
} else {//固态硬盘时
int coreCount = Runtime.getRuntime().availableProcessors();
// Let tests override this to help reproducing a failure on a machine that has a different
// core count than the one where the test originally failed:
try {
String value = System.getProperty(DEFAULT_CPU_CORE_COUNT_PROPERTY);
//如果环境变量有这个DEFAULT_CPU_CORE_COUNT_PROPERTY值,就用这个,否则用coreCount
if (value != null) {
coreCount = Integer.parseInt(value);
}
} catch (Throwable ignored) {
}
maxThreadCount = Math.max(1, Math.min(4, coreCount/2));
maxMergeCount = maxThreadCount+5;
}
}
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.java
void refreshConfig() {
if (this.getMaxMergeCount() != config.getMaxMergeCount() || this.getMaxThreadCount() != config.getMaxThreadCount()) {
this.setMaxMergesAndThreads(config.getMaxMergeCount(), config.getMaxThreadCount());
}
boolean isEnabled = getIORateLimitMBPerSec() != Double.POSITIVE_INFINITY;
if (config.isAutoThrottle() && isEnabled == false) {
enableAutoIOThrottle();
} else if (config.isAutoThrottle() == false && isEnabled) {
disableAutoIOThrottle();
}
}
org.elasticsearch.index.MergeSchedulerConfig.java 变量配置:
public static final Setting<Integer> MAX_THREAD_COUNT_SETTING =
new Setting<>("index.merge.scheduler.max_thread_count",
(s) -> Integer.toString(Math.max(1, Math.min(4, EsExecutors.boundedNumberOfProcessors(s) / 2))),
(s) -> Setting.parseInt(s, 1, "index.merge.scheduler.max_thread_count"), Property.Dynamic,
Property.IndexScope);
public static final Setting<Integer> MAX_MERGE_COUNT_SETTING =
new Setting<>("index.merge.scheduler.max_merge_count",
(s) -> Integer.toString(MAX_THREAD_COUNT_SETTING.get(s) + 5),
(s) -> Setting.parseInt(s, 1, "index.merge.scheduler.max_merge_count"), Property.Dynamic, Property.IndexScope);
public static final Setting<Boolean> AUTO_THROTTLE_SETTING =
Setting.boolSetting("index.merge.scheduler.auto_throttle", true, Property.Dynamic, Property.IndexScope);
private volatile boolean autoThrottle;
private volatile int maxThreadCount;
private volatile int maxMergeCount;
本地4个的PC机刚好算出是2个线程