_forcemerge API 源码分析
文章目录
源码基于6.7.2
合并方式
RestForceMergeAction
中提供了对api的处理入口。在做了简单的参数处理之后,将请求转发给TransportForceMergeAction
进行进一步的处理,里面有一个分片级的方法shardOperation
。通过indicesService
找到对应的shard
,再执行InternalEngine.forceMerge
。
public void forceMerge(final boolean flush, int maxNumSegments, boolean onlyExpungeDeletes,
final boolean upgrade, final boolean upgradeOnlyAncientSegments) throws EngineException, IOException {
assert indexWriter.getConfig().getMergePolicy() instanceof ElasticsearchMergePolicy : "MergePolicy is " +
indexWriter.getConfig().getMergePolicy().getClass().getName();
// 获取合并策略
ElasticsearchMergePolicy mp = (ElasticsearchMergePolicy) indexWriter.getConfig().getMergePolicy();
optimizeLock.lock();
try {
ensureOpen();
if (upgrade) {
logger.info("starting segment upgrade upgradeOnlyAncientSegments={}", upgradeOnlyAncientSegments);
mp.setUpgradeInProgress(true, upgradeOnlyAncientSegments);
}
store.incRef(); // increment the ref just to ensure nobody closes the store while we optimize
try {
// 将合并分为,1.只合并删除文档 2.没有配置最大segment数的合并 3.配置了最大segment数的合并
if (onlyExpungeDeletes) {
assert upgrade == false;
indexWriter.forceMergeDeletes(true /* blocks and waits for merges*/);
} else if (maxNumSegments <= 0) {
assert upgrade == false;
indexWriter.maybeMerge();
} else {
indexWriter.forceMerge(maxNumSegments, true /* blocks and waits for merges*/);
}
if (flush) {
if (tryRenewSyncCommit() == false) {
flush(false, true);
}
}
if (upgrade) {
logger.info("finished segment upgrade");
}
} finally {
store.decRef();
}
} catch (AlreadyClosedException ex) {
/* in this case we first check if the engine is still open. If so this exception is just fine
* and expected. We don't hold any locks while we block on forceMerge otherwise it would block
* closing the engine as well. If we are not closed we pass it on to failOnTragicEvent which ensures
* we are handling a tragic even exception here */
ensureOpen(ex);
failOnTragicEvent(ex);
throw ex;
} catch (Exception e) {
try {
maybeFailEngine("force merge", e);
} catch (Exception inner) {
e.addSuppressed(inner);
}
throw e;
} finally {
try {
// reset it just to make sure we reset it in a case of an error
mp.setUpgradeInProgress(false, false);
} finally {
optimizeLock.unlock();
}
}
}
通过上述代码可以发现,在ES实现的分段文件合并中,主要分为三类:
- 只合并删除文档
- 没有限制最大segment数的合并
- 限制了最大segment数的合并
只合并删除文档
这是一个阻塞和等待的合并。先进行将所有内存中缓冲的更新(添加和删除)刷新到目录,同时会去触发可能合并的逻辑,更新合并的等待列表。当调用 forceMergeDeletes 时,我们仅在其删除百分比超过此阈值时才合并掉一个段。 默认值为 10%。
IndexWriter.forceMergeDeletes
public void forceMergeDeletes(boolean doWait)
throws IOException {
// 确保索引的分片是一个开启可用的状态
ensureOpen();
// 先将目前的内存中的缓存更新到物理文件中
flush(true, true);
if (infoStream.isEnabled("IW")) {
infoStream.message("IW", "forceMergeDeletes: index now " + segString());
}
final MergePolicy mergePolicy = config.getMergePolicy();
// 这是一个静态内部类,定义了合并规范
MergePolicy.MergeSpecification spec;
boolean newMergesFound = false;
synchronized(this) {
// 通过EsTieredMergePolicy类中的forcedMergePolicy对象去寻找可以合并的segment,在EsTieredMergePolicy实例化TieredMergePolicy时,把段文件大小从Lucene中默认的5GB改为了无限制。
spec = mergePolicy.findForcedDeletesMerges(segmentInfos, this);
newMergesFound = spec != null;
if (newMergesFound) {
final int numMerges = spec.merges.size();
for(int i=0;i<numMerges;i++)
//检查此合并是否涉及已参与合并的段。 如果不是,则此合并是“注册的”,这意味着我们记录其段现在正在参与合并,并返回 true。 否则(合并冲突)返回 false。
registerMerge(spec.merges.get(i));
}
}
// 合并线程进行段文件的合并
mergeScheduler.merge(this, MergeTrigger.EXPLICIT, newMergesFound);
if (spec != null && doWait) {
final int numMerges = spec.merges.size();
synchronized(this) {
boolean running = true;
while(running) {
if (tragedy.get() != null) {
throw new IllegalStateException("this writer hit an unrecoverable error; cannot complete forceMergeDeletes", tragedy.get());
}
// 检查 MergePolicy 要求我们执行的每个合并,以查看其中是否有任何仍在运行,以及是否有任何遇到异常。
running = false;
for(int i=0;i<numMerges;i++) {
final MergePolicy.OneMerge merge = spec.merges.get(i);
if (pendingMerges.contains(merge) || runningMerges.contains(merge)) {
running = true;
}
Throwable t = merge.getException();
if (t != null) {
throw new IOException("background merge hit exception: " + merge.segString(), t);
}
}
// If any of our merges are still running, wait:
if (running)
doWait();
}
}
}
// 注意:在 ConcurrentMergeScheduler 的情况下,当 doWait 为 false 时,我们可以在后台线程完成合并时立即返回
}
TieredMergePolicy.findForcedDeletesMerges
@Override
public MergeSpecification findForcedDeletesMerges(SegmentInfos infos, MergeContext mergeContext) throws IOException {
if (verbose(mergeContext)) {
message("findForcedDeletesMerges infos=" + segString(mergeContext, infos) + " forceMergeDeletesPctAllowed=" + forceMergeDeletesPctAllowed, mergeContext);
}
// First do a quick check that there's any work to do.
// NOTE: this makes BaseMergePOlicyTestCase.testFindForcedDeletesMerges work
final Set<SegmentCommitInfo> merging = mergeContext.getMergingSegments();
boolean haveWork = false;
for(SegmentCommitInfo info : infos) {
// 计算一个segment中被标识删除文档的个数
int delCount = mergeContext.numDeletesToMerge(info);
assert assertDelCount(delCount, info);
// 计算已删除文档数在segment中的百分比
double pctDeletes = 100.*((double) delCount)/info.info.maxDoc();
// forceMergeDeletesPctAllowed的默认值是10,即需要已删除文档数的占比超过10%,才可以合并
if (pctDeletes > forceMergeDeletesPctAllowed && !merging.contains(info)) {
haveWork = true;
break;
}
}
if (haveWork == false) {
return null;
}
// 当我们在这里运行时,大小可以并发地改变,因为删除现在是并发地应用的,这可能会导致TimSort失败!因此,我们在每个段调用size()一次,并根据它排序:
List<SegmentSizeAndDocs> sortedInfos = getSortedBySegmentSize(infos, mergeContext);
Iterator<SegmentSizeAndDocs> iter = sortedInfos.iterator();
while (iter.hasNext()) {
SegmentSizeAndDocs segSizeDocs = iter.next();
double pctDeletes = 100. * ((double) segSizeDocs.delCount / (double) segSizeDocs.maxDoc);
// 如果此段正在合并或者目前的已删除文档占比低于阈值则不对此段文件进行合并
if (merging.contains(segSizeDocs.segInfo) || pctDeletes <= forceMergeDeletesPctAllowed) {
iter.remove();
}
}
if (verbose(mergeContext)) {
message("eligible=" + sortedInfos, mergeContext);
}
// maxMergeAtOnceExplicit显示默认一次合并的最大段数 默认30
// maxMergedSegmentBytes 段的最大大小值
return doFindMerges(sortedInfos, maxMergedSegmentBytes,
maxMergeAtOnceExplicit, Integer.MAX_VALUE, 0, MERGE_TYPE.FORCE_MERGE_DELETES, mergeContext, false);
}
TieredMergePolicy.doFindMerges
private MergeSpecification doFindMerges(List<SegmentSizeAndDocs> sortedEligibleInfos,
final long maxMergedSegmentBytes,
final int mergeFactor, final int allowedSegCount,
final int allowedDelCount, final MERGE_TYPE mergeType,
MergeContext mergeContext,
boolean maxMergeIsRunning) throws IOException {
List<SegmentSizeAndDocs> sortedEligible = new ArrayList<>(sortedEligibleInfos);
Map<SegmentCommitInfo, SegmentSizeAndDocs> segInfosSizes = new HashMap<>();
for (SegmentSizeAndDocs segSizeDocs : sortedEligible) {
segInfosSizes.put(segSizeDocs.segInfo, segSizeDocs);
}
int originalSortedSize = sortedEligible.size();
if (verbose(mergeContext)) {
message("findMerges: " + originalSortedSize + " segments", mergeContext);
}
if (originalSortedSize == 0) {
return null;
}
final Set<SegmentCommitInfo> toBeMerged = new HashSet<>();
MergeSpecification spec = null;
// 循环以可能选择多个合并:
// 索引中全部删除文档的触发点同时导致了一堆大段合并。 因此,在每个周期的合并列表中只放置一个大合并。 下次我们将进行另一个合并。
boolean haveOneLargeMerge = false;
while (true) {
// Gather eligible segments for merging, ie segments
// not already being merged and not already picked (by
// prior iteration of this loop) for merging:
// 删除不合格的片段。 这些要么已经被合并,要么已经被先前的迭代选中
Iterator<SegmentSizeAndDocs> iter = sortedEligible.iterator();
while (iter.hasNext()) {
SegmentSizeAndDocs segSizeDocs = iter.next();
if (toBeMerged.contains(segSizeDocs.segInfo)) {
iter.remove();
}
}
if (verbose(mergeContext)) {
message(" allowedSegmentCount=" + allowedSegCount + " vs count=" + originalSortedSize + " (eligible count=" + sortedEligible.size() + ")", mergeContext);
}
if (sortedEligible.size() == 0) {
return spec;
}
final int remainingDelCount = sortedEligible.stream().mapToInt(c -> c.delCount).sum();
// allowedSegCount每层分段的数量,如果合并类型是NATURAL,当需要处理的段小于每层允许的分段数量和剩余删除的文档小于允许删除文档阈值,则这批次合并选择结束
if (mergeType == MERGE_TYPE.NATURAL &&
sortedEligible.size() <= allowedSegCount &&
remainingDelCount <= allowedDelCount) {
return spec;
}
// OK we are over budget -- find best merge!
MergeScore bestScore = null;
List<SegmentCommitInfo> best = null;
boolean bestTooLarge = false;
long bestMergeBytes = 0;
for (int startIdx = 0; startIdx < sortedEligible.size(); startIdx++) {
long totAfterMergeBytes = 0;
final List<SegmentCommitInfo> candidate = new ArrayList<>();
boolean hitTooLarge = false;
long bytesThisMerge = 0;
// 合并因子mergeFactor一次合并的段的数量
for (int idx = startIdx; idx < sortedEligible.size() && candidate.size() < mergeFactor && bytesThisMerge < maxMergedSegmentBytes; idx++) {
final SegmentSizeAndDocs segSizeDocs = sortedEligible.get(idx);
final long segBytes = segSizeDocs.sizeInBytes;
if (totAfterMergeBytes + segBytes > maxMergedSegmentBytes) {
// 超过最大限制值时,将此次合并定义为超大合并
hitTooLarge = true;
if (candidate.size() == 0) {
// We should never have something coming in that _cannot_ be merged, so handle singleton merges
candidate.add(segSizeDocs.segInfo);
bytesThisMerge += segBytes;
}
// 注意:我们继续,以便我们可以尝试将较小的段“打包”到此合并中,看看我们是否可以接近最大大小; 这通常并不完美,因为这实际上是“装箱”,我们必须尝试不同的排列。
continue;
}
candidate.add(segSizeDocs.segInfo);
bytesThisMerge += segBytes;
totAfterMergeBytes += segBytes;
}
// We should never see an empty candidate: we iterated over maxMergeAtOnce
// segments, and already pre-excluded the too-large segments:
assert candidate.size() > 0;
// 单例的合并如果不处理删除标识的文档,那么意义不大,当执行forceMerge时可以触发这个循环
if (candidate.size() == 1) {
SegmentSizeAndDocs segSizeDocs = segInfosSizes.get(candidate.get(0));
if (segSizeDocs.delCount == 0) {
continue;
}
}
// 如果我们没有找到一个太大的合并并且有一个长度小于合并因子的候选列表,这意味着我们到达了段列表的尾部并且只会找到更小的合并。 此轮寻找需要合并的端的过程到此结束
if (bestScore != null &&
hitTooLarge == false &&
candidate.size() < mergeFactor) {
break;
}
// 对此次寻找的结果进行打分
final MergeScore score = score(candidate, hitTooLarge, segInfosSizes);
if (verbose(mergeContext)) {
message(" maybe=" + segString(mergeContext, candidate) + " score=" + score.getScore() + " " + score.getExplanation() + " tooLarge=" + hitTooLarge + " size=" + String.format(Locale.ROOT, "%.3f MB", totAfterMergeBytes/1024./1024.), mergeContext);
}
if ((bestScore == null || score.getScore() < bestScore.getScore()) && (!hitTooLarge || !maxMergeIsRunning)) {
best = candidate;
bestScore = score;
bestTooLarge = hitTooLarge;
bestMergeBytes = totAfterMergeBytes;
}
}
if (best == null) {
return spec;
}
// 合并类型是FORCE_MERGE_DELETES的行为与代码当前的行为相同,可以创建大量并发的大合并。 如果我们让 findForcedDeletesMerges 表现得像 findForcedMerges 并循环,我们应该删除它。
if (haveOneLargeMerge == false || bestTooLarge == false || mergeType == MERGE_TYPE.FORCE_MERGE_DELETES) {
haveOneLargeMerge |= bestTooLarge;
if (spec == null) {
spec = new MergeSpecification();
}
final OneMerge merge = new OneMerge(best);
spec.add(merge);
if (verbose(mergeContext)) {
message(" add merge=" + segString(mergeContext, merge.segments) + " size=" + String.format(Locale.ROOT, "%.3f MB", bestMergeBytes / 1024. / 1024.) + " score=" + String.format(Locale.ROOT, "%.3f", bestScore.getScore()) + " " + bestScore.getExplanation() + (bestTooLarge ? " [max merge]" : ""), mergeContext);
}
}
// 无论我们是否要在规范中返回此列表,我们都需要在下一个循环中将其从考虑中删除。
toBeMerged.addAll(best);
}
}
TieredMergePolicy.score
protected MergeScore score(List<SegmentCommitInfo> candidate, boolean hitTooLarge, Map<SegmentCommitInfo, SegmentSizeAndDocs> segmentsSizes) throws IOException {
// 合并之前总大小
long totBeforeMergeBytes = 0;
// 合并之后总大小
long totAfterMergeBytes = 0;
// 合并之后总大小的下线
long totAfterMergeBytesFloored = 0;
for(SegmentCommitInfo info : candidate) {
final long segBytes = segmentsSizes.get(info).sizeInBytes;
totAfterMergeBytes += segBytes;
//floorSize是 Math.max(floorSegmentBytes, bytes),floorSegmentBytes默认2MB。
totAfterMergeBytesFloored += floorSize(segBytes);
totBeforeMergeBytes += info.sizeInBytes();
}
// 粗略测量合并的“偏斜”,即合并的“平衡”程度(段是否大致相同),范围从 1.0/numSegsBeingMerged(好)到 1.0(差)。 严重不平衡的合并(偏斜接近 1.0)是不好的; 这意味着随着时间的推移 O(N^2) 合并成本:
final double skew;
// 如果存在大段的合并,则采用每层分段的数量和一次合并的最大值中的最小值作为合并因子,来计算偏斜值
if (hitTooLarge) {
// 假装合并有完美的倾斜; 在这种情况下,偏斜无关紧要,因为此合并不会“级联”,因此随着时间的推移它不会导致 N^2 合并成本:
final int mergeFactor = (int) Math.min(maxMergeAtOnce, segsPerTier);
skew = 1.0/mergeFactor;
} else {
skew = ((double) floorSize(segmentsSizes.get(candidate.get(0)).sizeInBytes)) / totAfterMergeBytesFloored;
}
// 强烈支持具有较少偏斜的合并(mergeScore 越小越好):
double mergeScore = skew;
// 温和地支持较小的合并而不是较大的合并。 我们不想让这个指数太大,否则我们最终可能会对小段进行糟糕的合并,以避免大的合并
mergeScore *= Math.pow(totAfterMergeBytes, 0.05);
// 强烈支持回收删除的合并:
final double nonDelRatio = ((double) totAfterMergeBytes)/totBeforeMergeBytes;
mergeScore *= Math.pow(nonDelRatio, 2);
final double finalMergeScore = mergeScore;
return new MergeScore() {
@Override
public double getScore() {
return finalMergeScore;
}
@Override
public String getExplanation() {
return "skew=" + String.format(Locale.ROOT, "%.3f", skew) + " nonDelRatio=" + String.format(Locale.ROOT, "%.3f", nonDelRatio);
}
};
}
没有限制最大segment数的合并
主要是将maxNumSegments值设置成-1,进行执行。
IndexWriter.updatePendingMerges
private synchronized boolean updatePendingMerges(MergePolicy mergePolicy, MergeTrigger trigger, int maxNumSegments)
throws IOException {
// In case infoStream was disabled on init, but then enabled at some
// point, try again to log the config here:
messageState();
// UNBOUNDED_MAX_MERGE_SEGMENTS的值为-1
assert maxNumSegments == UNBOUNDED_MAX_MERGE_SEGMENTS || maxNumSegments > 0;
assert trigger != null;
if (stopMerges) {
return false;
}
// Do not start new merges if disaster struck
if (tragedy.get() != null) {
return false;
}
boolean newMergesFound = false;
final MergePolicy.MergeSpecification spec;
if (maxNumSegments != UNBOUNDED_MAX_MERGE_SEGMENTS) {
assert trigger == MergeTrigger.EXPLICIT || trigger == MergeTrigger.MERGE_FINISHED :
"Expected EXPLICT or MERGE_FINISHED as trigger even with maxNumSegments set but was: " + trigger.name();
spec = mergePolicy.findForcedMerges(segmentInfos, maxNumSegments, Collections.unmodifiableMap(segmentsToMerge), this);
newMergesFound = spec != null;
if (newMergesFound) {
final int numMerges = spec.merges.size();
for(int i=0;i<numMerges;i++) {
final MergePolicy.OneMerge merge = spec.merges.get(i);
merge.maxNumSegments = maxNumSegments;
}
}
} else {
// 没有配置最大分段数量的合并
spec = mergePolicy.findMerges(trigger, segmentInfos, this);
}
newMergesFound = spec != null;
if (newMergesFound) {
final int numMerges = spec.merges.size();
for(int i=0;i<numMerges;i++) {
registerMerge(spec.merges.get(i));
}
}
return newMergesFound;
}
TieredMergePolicy.findMerges
@Override
public MergeSpecification findMerges(MergeTrigger mergeTrigger, SegmentInfos infos, MergeContext mergeContext) throws IOException {
final Set<SegmentCommitInfo> merging = mergeContext.getMergingSegments();
// 计算总索引字节数并打印有关索引的详细信息
long totIndexBytes = 0;
long minSegmentBytes = Long.MAX_VALUE;
int totalDelDocs = 0;
int totalMaxDoc = 0;
long mergingBytes = 0;
// 按分段大小倒序排序
List<SegmentSizeAndDocs> sortedInfos = getSortedBySegmentSize(infos, mergeContext);
Iterator<SegmentSizeAndDocs> iter = sortedInfos.iterator();
while (iter.hasNext()) {
SegmentSizeAndDocs segSizeDocs = iter.next();
final long segBytes = segSizeDocs.sizeInBytes;
if (verbose(mergeContext)) {
String extra = merging.contains(segSizeDocs.segInfo) ? " [merging]" : "";
// 出现高于5GB和低于2MB的分段信息会有用日志输出
if (segBytes >= maxMergedSegmentBytes) {
extra += " [skip: too large]";
} else if (segBytes < floorSegmentBytes) {
extra += " [floored]";
}
message(" seg=" + segString(mergeContext, Collections.singleton(segSizeDocs.segInfo)) + " size=" + String.format(Locale.ROOT, "%.3f", segBytes / 1024 / 1024.) + " MB" + extra, mergeContext);
}
if (merging.contains(segSizeDocs.segInfo)) {
mergingBytes += segSizeDocs.sizeInBytes;
iter.remove();
// 如果此分段正在进行合并,那么其删除文档已经被回收了
// 只计算总最大文档中的实时文档
totalMaxDoc += segSizeDocs.maxDoc - segSizeDocs.delCount;
} else {
totalDelDocs += segSizeDocs.delCount;
totalMaxDoc += segSizeDocs.maxDoc;
}
minSegmentBytes = Math.min(segBytes, minSegmentBytes);
totIndexBytes += segBytes;
}
assert totalMaxDoc >= 0;
assert totalDelDocs >= 0;
final double totalDelPct = 100 * (double) totalDelDocs / totalMaxDoc;
int allowedDelCount = (int) (deletesPctAllowed * totalMaxDoc / 100);
// If we have too-large segments, grace them out of the maximum segment count
// If we're above certain thresholds of deleted docs, we can merge very large segments.
int tooBigCount = 0;
iter = sortedInfos.iterator();
// 在两种情况下从考虑中删除大分段。
//1> 已删除文档的总体百分比相对较小,并且该分段大于maxSegSize的50%
//2> 已删除文档的分段百分比比较小,并且该分段大于maxSegSize的50%
while (iter.hasNext()) {
SegmentSizeAndDocs segSizeDocs = iter.next();
double segDelPct = 100 * (double) segSizeDocs.delCount / (double) segSizeDocs.maxDoc;
if (segSizeDocs.sizeInBytes > maxMergedSegmentBytes / 2 && (totalDelPct <= deletesPctAllowed || segDelPct <= deletesPctAllowed)) {
iter.remove();
tooBigCount++; // Just for reporting purposes.
totIndexBytes -= segSizeDocs.sizeInBytes;
allowedDelCount -= segSizeDocs.delCount;
}
}
allowedDelCount = Math.max(0, allowedDelCount);
final int mergeFactor = (int) Math.min(maxMergeAtOnce, segsPerTier);
// 计算索引可以有多少个分段数量
long levelSize = Math.max(minSegmentBytes, floorSegmentBytes);
long bytesLeft = totIndexBytes;
double allowedSegCount = 0;
while (true) {
// 索引中可以合并分段的中大小/最小的分段大小=在最小存储大小级别的分段可以有几个
final double segCountLevel = bytesLeft / (double) levelSize;
if (segCountLevel < segsPerTier || levelSize == maxMergedSegmentBytes) {
allowedSegCount += Math.ceil(segCountLevel);
break;
}
allowedSegCount += segsPerTier;
bytesLeft -= segsPerTier * levelSize;
// 每层大小的累加方式,层数*最小的合并数量
levelSize = Math.min(maxMergedSegmentBytes, levelSize * mergeFactor);
}
// 如果分段大小低于分段最小下限值,索引允许存在的分段数量会小于segsPerTier,按最大值算
allowedSegCount = Math.max(allowedSegCount, segsPerTier);
if (verbose(mergeContext) && tooBigCount > 0) {
message(" allowedSegmentCount=" + allowedSegCount + " vs count=" + infos.size() +
" (eligible count=" + sortedInfos.size() + ") tooBigCount= " + tooBigCount, mergeContext);
}
return doFindMerges(sortedInfos, maxMergedSegmentBytes, mergeFactor, (int) allowedSegCount, allowedDelCount, MERGE_TYPE.NATURAL,
mergeContext, mergingBytes >= maxMergedSegmentBytes);
}
doFindMerges方法见上述***只合并删除文档***中的分析
限制了最大segment数的合并
IndexWriter.forceMerge
public void forceMerge(int maxNumSegments, boolean doWait) throws IOException {
ensureOpen();
// 此时最大的分段数不能小于1
if (maxNumSegments < 1) {
throw new IllegalArgumentException("maxNumSegments must be >= 1; got " + maxNumSegments);
}
if (infoStream.isEnabled("IW")) {
infoStream.message("IW", "forceMerge: index now " + segString());
infoStream.message("IW", "now flush at forceMerge");
}
// 先将目前的内存中的缓存更新到物理文件中
flush(true, true);
// 重置合并异常,清空合并记录,强制合并等操作需要上锁的,防止并发问题
synchronized(this) {
// 重置合并异常
resetMergeExceptions();
// 清空合并记录
segmentsToMerge.clear();
for(SegmentCommitInfo info : segmentInfos) {
assert info != null;
segmentsToMerge.put(info, Boolean.TRUE);
}
// 合并的最大分段数量
mergeMaxNumSegments = maxNumSegments;
// 现在将所有挂起和正在运行的合并标记为强制合并
// merge:
for(final MergePolicy.OneMerge merge : pendingMerges) {
merge.maxNumSegments = maxNumSegments;
if (merge.info != null) {
// TODO: explain why this is sometimes still null
segmentsToMerge.put(merge.info, Boolean.TRUE);
}
}
for (final MergePolicy.OneMerge merge: runningMerges) {
merge.maxNumSegments = maxNumSegments;
if (merge.info != null) {
// TODO: explain why this is sometimes still null
segmentsToMerge.put(merge.info, Boolean.TRUE);
}
}
}
// 此方法与没有限制最大segment数的合并走相同的方法,只不过maxNumSegments变成确切值
maybeMerge(config.getMergePolicy(), MergeTrigger.EXPLICIT, maxNumSegments);
// ES调用的时候,实际是开启阻塞的。
// 如果出现合并异常,则中断等待
if (doWait) {
synchronized(this) {
while(true) {
if (tragedy.get() != null) {
throw new IllegalStateException("this writer hit an unrecoverable error; cannot complete forceMerge", tragedy.get());
}
if (mergeExceptions.size() > 0) {
// Forward any exceptions in background merge
// threads to the current thread:
final int size = mergeExceptions.size();
for(int i=0;i<size;i++) {
final MergePolicy.OneMerge merge = mergeExceptions.get(i);
if (merge.maxNumSegments != UNBOUNDED_MAX_MERGE_SEGMENTS) {
throw new IOException("background merge hit exception: " + merge.segString(), merge.getException());
}
}
}
// 如果 pendingMerges 或 runningMerges 中的任何合并是 maxNumSegments 合并,则返回 true。
if (maxNumSegmentsMergesPending())
// 休眠1s
doWait();
else
break;
}
}
// If close is called while we are still
// running, throw an exception so the calling
// thread will know merging did not
// complete
ensureOpen();
}
// NOTE: in the ConcurrentMergeScheduler case, when
// doWait is false, we can return immediately while
// background threads accomplish the merging
}
maybeMerge方法内部调用的还是updatePendingMerges,不过走的是调用findForcedMerges的分支。
ElasticsearchMergePolicy.findForcedMerges
@Override
public MergeSpecification findForcedMerges(SegmentInfos segmentInfos,
int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergeContext mergeContext)
throws IOException {
// 存在升级的情况,但是强制使用api合并的时候upgradeInProgress为false
if (upgradeInProgress) {
MergeSpecification spec = new MergeSpecification();
for (SegmentCommitInfo info : segmentInfos) {
if (shouldUpgrade(info)) {
// TODO: Use IndexUpgradeMergePolicy instead. We should be comparing codecs,
// for now we just assume every minor upgrade has a new format.
logger.debug("Adding segment {} to be upgraded", info.info.name);
spec.add(new OneMerge(Collections.singletonList(info)));
}
// TODO: we could check IndexWriter.getMergingSegments and avoid adding merges that IW will just reject?
// 一次升级最大5个分段
if (spec.merges.size() == MAX_CONCURRENT_UPGRADE_MERGES) {
// hit our max upgrades, so return the spec. we will get a cascaded call to continue.
logger.debug("Returning {} merges for upgrade", spec.merges.size());
return spec;
}
}
// We must have less than our max upgrade merges, so the next return will be our last in upgrading mode.
if (spec.merges.isEmpty() == false) {
logger.debug("Returning {} merges for end of upgrade", spec.merges.size());
return spec;
}
// 仅在有 0 个段需要升级时才设置此项,因为当我们返回spec时,如果我们要求合并的某些段已经(自然地)合并,IndexWriter 可能(默默地!)拒绝该合并
upgradeInProgress = false;
// 失败,因此当我们没有任何要升级的段时,委托策略有机会决定要做什么(例如,折叠段以满足 maxSegmentCount)
}
return super.findForcedMerges(segmentInfos, maxSegmentCount, segmentsToMerge, mergeContext);
}
TieredMergePolicy.findForcedMerges
public MergeSpecification findForcedMerges(SegmentInfos infos, int maxSegmentCount, Map<SegmentCommitInfo,Boolean> segmentsToMerge, MergeContext mergeContext) throws IOException {
if (verbose(mergeContext)) {
message("findForcedMerges maxSegmentCount=" + maxSegmentCount + " infos=" + segString(mergeContext, infos) +
" segmentsToMerge=" + segmentsToMerge, mergeContext);
}
// 按分段大小从大到小排序
List<SegmentSizeAndDocs> sortedSizeAndDocs = getSortedBySegmentSize(infos, mergeContext);
long totalMergeBytes = 0;
final Set<SegmentCommitInfo> merging = mergeContext.getMergingSegments();
// 消减列表,将非原始的分段和已经合并的分段去掉
Iterator<SegmentSizeAndDocs> iter = sortedSizeAndDocs.iterator();
while (iter.hasNext()) {
SegmentSizeAndDocs segSizeDocs = iter.next();
final Boolean isOriginal = segmentsToMerge.get(segSizeDocs.segInfo);
// 什么情况会出现null呢???
if (isOriginal == null) {
iter.remove();
} else {
// 去掉已经在合并列表里的分段
if (merging.contains(segSizeDocs.segInfo)) {
iter.remove();
} else {
totalMergeBytes += segSizeDocs.sizeInBytes;
}
}
}
long maxMergeBytes = maxMergedSegmentBytes;
// 根据指定的分段数设置最大分段大小。
if (maxSegmentCount == 1) maxMergeBytes = Long.MAX_VALUE;
else if (maxSegmentCount != Integer.MAX_VALUE) {
// Fudge this up a bit so we have a better chance of not having to rewrite segments. If we use the exact size,
// it's almost guaranteed that the segments won't fit perfectly and we'll be left with more segments than
// we want and have to re-merge in the code at the bottom of this method.
// 在总的合并大小/分段的数量=每个分段的最大数量,与设置的最大分段大小中取最大值,在方法25%作为分段的大小--也就是说最大分段可以突破配置的最大分段大小的25%
maxMergeBytes = Math.max((long) (((double) totalMergeBytes / (double) maxSegmentCount)), maxMergedSegmentBytes);
maxMergeBytes = (long) ((double) maxMergeBytes * 1.25);
}
iter = sortedSizeAndDocs.iterator();
boolean foundDeletes = false;
while (iter.hasNext()) {
SegmentSizeAndDocs segSizeDocs = iter.next();
Boolean isOriginal = segmentsToMerge.get(segSizeDocs.segInfo);
// 如果分段是原始的且用已删除的文档,就在合并之列。这是 forceMerge,应合并所有已删除文档的分段。
if (segSizeDocs.delCount != 0) { // This is forceMerge, all segments with deleted docs should be merged.
if (isOriginal != null && isOriginal) {
foundDeletes = true;
}
continue;
}
// Let the scoring handle whether to merge large segments.
// 分段个数几乎无限,且分段已经是非原始的将被剔除
if (maxSegmentCount == Integer.MAX_VALUE && isOriginal != null && isOriginal == false) {
iter.remove();
}
// 不要尝试合并没有超过最大大小的已删除文档的分段。
if (maxSegmentCount != Integer.MAX_VALUE && segSizeDocs.sizeInBytes >= maxMergeBytes) {
iter.remove();
}
}
// 未找到满足的分段
if (sortedSizeAndDocs.size() == 0) {
return null;
}
// 在不合并已删除文件的时候,确认不进行无谓的合并。
if (foundDeletes == false) {
SegmentCommitInfo infoZero = sortedSizeAndDocs.get(0).segInfo;
if ((maxSegmentCount != Integer.MAX_VALUE && maxSegmentCount > 1 && sortedSizeAndDocs.size() <= maxSegmentCount) ||
(maxSegmentCount == 1 && sortedSizeAndDocs.size() == 1 && (segmentsToMerge.get(infoZero) != null || isMerged(infos, infoZero, mergeContext)))) {
if (verbose(mergeContext)) {
message("already merged", mergeContext);
}
return null;
}
}
if (verbose(mergeContext)) {
message("eligible=" + sortedSizeAndDocs, mergeContext);
}
// 这是合并到一个分段的特殊情况:当最大分数是1,且带合并列表中的分段数小于一次强制合并的数量,且列表中的中大小小于一个分段的最大大小时,直接将分段列表全部返回。不进行无谓的查询。
if (sortedSizeAndDocs.size() < maxMergeAtOnceExplicit && maxSegmentCount == 1 && totalMergeBytes < maxMergeBytes) {
MergeSpecification spec = new MergeSpecification();
List<SegmentCommitInfo> allOfThem = new ArrayList<>();
for (SegmentSizeAndDocs segSizeDocs : sortedSizeAndDocs) {
allOfThem.add(segSizeDocs.segInfo);
}
spec.add(new OneMerge(allOfThem));
return spec;
}
MergeSpecification spec = doFindMerges(sortedSizeAndDocs, maxMergeBytes, maxMergeAtOnceExplicit,
maxSegmentCount, 0, MERGE_TYPE.FORCE_MERGE, mergeContext, false);
return spec;
}
doFindMerges方法见上述***只合并删除文档***中的分析
合并策略
合并策略的动态设置
- index.merge.policy.expunge_deletes_allowed:当调用
expungeDeletes
时,我们仅在其删除百分比超过此阈值时才合并掉一个段。默认值10
。 - index.merge.policy.floor_segment:小于此的段将“向上取整”到此大小,即视为合并选择的相等(下限)大小。 这是为了防止频繁刷新微小段,从而防止索引中出现长尾。默认值
2MB
。 - index.merge.policy.max_merge_at_once:在“normal”合并期间一次合并的最大段数。默认值
10
。 - index.merge.policy.max_merge_at_once_explicit:在强制合并或 expungeDeletes 期间,一次合并的最大段数。默认值
30
。 - index.merge.policy.max_merged_segment:在正常合并(not explicit force merge)期间生成的最大段。此设置是近似值:合并段大小的估计是通过对要合并的段的大小求和(补偿已删除文档的百分比)得出的。默认值
5GB
。 - index.merge.policy.segments_per_tier:设置每层允许的段数。较小的值意味着更多的合并但更少的段。默认为
10
。请注意,此值需要大于max_merge_at_once
的值,否则您将强制发生太多合并。 - index.merge.policy.deletes_pct_allowed:控制索引中允许的已删除文档的最大百分比。较低的值会使索引的空间效率更高,但会增加 CPU 和 I/O 活动。值必须介于
20
和50
之间。默认值为33
。
进行写入优化的时候可以将index.merge.policy.segments_per_tier
适当的变大,比如20
,在降低合并的操作次数。同时降低index.merge.policy.max_merged_segment
的值,比如2GB
,来加快写入的速度。
ES对于Lucene的策略封装
上述的配置都在MergePolicyConfig中,而且是个索引级别的配置,在MergePolicyConfig中初始化了EsTieredMergePolicy对象。
public static final Setting<Double> INDEX_COMPOUND_FORMAT_SETTING =
new Setting<>("index.compound_format", Double.toString(TieredMergePolicy.DEFAULT_NO_CFS_RATIO),
MergePolicyConfig::parseNoCFSRatio, Property.Dynamic, Property.IndexScope);
public static final Setting<Double> INDEX_MERGE_POLICY_EXPUNGE_DELETES_ALLOWED_SETTING =
Setting.doubleSetting("index.merge.policy.expunge_deletes_allowed", DEFAULT_EXPUNGE_DELETES_ALLOWED, 0.0d,
Property.Dynamic, Property.IndexScope);
public static final Setting<ByteSizeValue> INDEX_MERGE_POLICY_FLOOR_SEGMENT_SETTING =
Setting.byteSizeSetting("index.merge.policy.floor_segment", DEFAULT_FLOOR_SEGMENT,
Property.Dynamic, Property.IndexScope);
public static final Setting<Integer> INDEX_MERGE_POLICY_MAX_MERGE_AT_ONCE_SETTING =
Setting.intSetting("index.merge.policy.max_merge_at_once", DEFAULT_MAX_MERGE_AT_ONCE, 2,
Property.Dynamic, Property.IndexScope);
public static final Setting<Integer> INDEX_MERGE_POLICY_MAX_MERGE_AT_ONCE_EXPLICIT_SETTING =
Setting.intSetting("index.merge.policy.max_merge_at_once_explicit", DEFAULT_MAX_MERGE_AT_ONCE_EXPLICIT, 2,
Property.Dynamic, Property.IndexScope);
public static final Setting<ByteSizeValue> INDEX_MERGE_POLICY_MAX_MERGED_SEGMENT_SETTING =
Setting.byteSizeSetting("index.merge.policy.max_merged_segment", DEFAULT_MAX_MERGED_SEGMENT,
Property.Dynamic, Property.IndexScope);
public static final Setting<Double> INDEX_MERGE_POLICY_SEGMENTS_PER_TIER_SETTING =
Setting.doubleSetting("index.merge.policy.segments_per_tier", DEFAULT_SEGMENTS_PER_TIER, 2.0d,
Property.Dynamic, Property.IndexScope);
public static final Setting<Double> INDEX_MERGE_POLICY_RECLAIM_DELETES_WEIGHT_SETTING =
Setting.doubleSetting("index.merge.policy.reclaim_deletes_weight", DEFAULT_RECLAIM_DELETES_WEIGHT, 0.0d,
Property.Dynamic, Property.IndexScope, Property.Deprecated);
public static final Setting<Double> INDEX_MERGE_POLICY_DELETES_PCT_ALLOWED_SETTING =
Setting.doubleSetting("index.merge.policy.deletes_pct_allowed", DEFAULT_DELETES_PCT_ALLOWED, 20.0d, 50.0d,
Property.Dynamic, Property.IndexScope);
EsTieredMergePolicy和ElasticsearchMergePolicy都继承了FilterMergePolicy。在EsTieredMergePolicy中构建了两个TieredMergePolicy对象regularMergePolicy
和forcedMergePolicy
。分为用于常规合并和强制合并。在强制合并的时候取消了分段最大存储的限制。
EsTieredMergePolicy() {
super(new TieredMergePolicy());
regularMergePolicy = (TieredMergePolicy) in;
forcedMergePolicy = new TieredMergePolicy();
forcedMergePolicy.setMaxMergedSegmentMB(Double.POSITIVE_INFINITY); // unlimited
}
对于正常的合并,策略首先计算索引中允许有多少段的“预算”。如果索引超出预算,则策略通过减小大小(按比例考虑删除百分比)对段进行排序,然后找到成本最低的合并。合并成本由合并的“倾斜”(最大 seg 的大小除以最小 seg)、总合并大小和回收的 pct 删除数的组合来衡量,因此具有较低倾斜、较小大小和回收更多删除的合并是青睐。
如果合并将生成大于 max_merged_segment
的段,则该策略将合并更少的段(如果该段有删除,则一次减少到 1 个)以将段大小保持在预算范围内。
请注意,这可能意味着对于包含许多 GB 数据的大型分片,默认的 max_merged_segment
(5gb
) 可能会导致许多段位于索引中,并导致搜索速度变慢。使用索引段 API 查看索引具有的段,并可能增加 max_merged_segment
或为索引发出优化调用(尝试在低流量时间发出它)。
合并调度
合并调度程序 (ConcurrentMergeScheduler
) 在需要合并操作时控制它们的执行(根据合并策略)。 合并在单独的线程中运行,当达到最大线程数时,将等待进一步的合并,直到合并线程可用。
调度相关动态配置
index.merge.scheduler.max_thread_count
:一次可以合并的最大线程数。 默认为Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))
,它适用于良好的固态硬盘 (SSD)。 如果您的索引位于旋转盘片驱动器(机械硬盘)上,请将其减少到 1。index.merge.scheduler.auto_throttle
:如果是true
(默认值),则合并调度程序将根据随时间请求的合并次数将合并的 IO(写入)速率限制为自适应值。 一个低索引率的应用程序不幸突然需要一个大的合并,将会看到合并被严重限制,而一个执行大量索引的应用程序会看到限制调整的更高,以允许合并跟上正在进行的写入。index.merge.scheduler.max_merge_count
:一次可以合并的最大分段数。默认值由最大线程数据+5。最小值为1。
这些配置都在MergeSchedulerConfig中 初始化
public static final Setting<Integer> MAX_THREAD_COUNT_SETTING =
new Setting<>("index.merge.scheduler.max_thread_count",
(s) -> Integer.toString(Math.max(1, Math.min(4, EsExecutors.numberOfProcessors(s) / 2))),
(s) -> Setting.parseInt(s, 1, "index.merge.scheduler.max_thread_count"), Property.Dynamic,
Property.IndexScope);
public static final Setting<Integer> MAX_MERGE_COUNT_SETTING =
new Setting<>("index.merge.scheduler.max_merge_count",
(s) -> Integer.toString(MAX_THREAD_COUNT_SETTING.get(s) + 5),
(s) -> Setting.parseInt(s, 1, "index.merge.scheduler.max_merge_count"), Property.Dynamic, Property.IndexScope);
public static final Setting<Boolean> AUTO_THROTTLE_SETTING =
Setting.boolSetting("index.merge.scheduler.auto_throttle", true, Property.Dynamic, Property.IndexScope);
使用场景
当在集群中的索引存在大量已删除文件,可以在集群低使用期通过主动调用强制合并API,加快分段文件的合并,保证索引分片的大小和数据存储的有效性。