Elasticsearch-PEER RECOVERY（三）

最新推荐文章于 2022-01-09 21:39:30 发布

cigarL

最新推荐文章于 2022-01-09 21:39:30 发布

阅读量371

点赞数 1

分类专栏： elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/weixin_43211119/article/details/103886322

版权

elasticsearch 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

这里说明一下chunk并发的逻辑，即 cancellableThreads.execute(() -> requestSeqIdTracker.waitForOpsToComplete(requestSeqId - maxConcurrentFileChunks));每次write成功后会更新checkpoint

public synchronized void waitForOpsToComplete(final long seqNo) throws InterruptedException {
    while (checkpoint < seqNo) {
        // notified by updateCheckpoint
        this.wait();
    }
}

举例来说：
1. 初始化一个tracker，此时该tracker的checkpoint为-1；nextSeqNo为0；假设trunk并发使用默认值为2；
2. 条件为checkpoint < nextSeqNo - maxConcurrentFileChunks，我们转换一下，即checkpoint + maxConcurrentFileChun < nextSeqNo
3. 首次为-1,0,2，即 0 > -1 + 2，不阻塞，继续执行；
4. 第二次为 1 > -1 + 2，不阻塞，继续执行；（由于上次未执行完，checkpoint未更新，仍为-1）
5. 第三次为 2 > -1 + 2，条件满足，阻塞；（由于上次未执行完，checkpoint未更新，仍为-1）
6. 如果首次已经写完，更新了checkpoint（-1 -> 0），此时条件为 2 > 0 + 2，不阻塞，继续执行
...
可以看到，通过checkpoint和seqNo，可控制当前拷贝的trunk个数，此阈值为maxConcurrentFileChun，即参数indices.recovery.max_concurrent_file_chunks 的值。

recoveryTarget.writeFileChunk 即通过transport发送一个请求（internal:index/shard/recovery/file_chunk），发送请求时调用lucene接口pause，来控制每秒拷贝的速度；下面看下拷贝文件怎么处理。
使用一个priorityQueue来缓存所有需要拷贝的FileChunk，按position进行排序，即多个文件按chunk拆分后，发送到副本分片节点target node，target根据position判断文件chunk的顺序，按顺序写入；同时使用lastPosition记录target写入后的偏移量，如果与source node发送的position不匹配，则退出，等待下一次（即等到下一个为期望的trunk再继续处理，例如：一个文件分为5个trunk，拿到的顺序为1、2、3、5、4，假设拿到文件后，可以瞬间写完，那么最多缓存2个，即1、2、3每次过来后，可以立即写完，由于5是第四个过来的，非期望的trunk，则退出，等到4过来的时候，就可以处理4，处理完4后就可以处理5）。
写入时，如果position为0，即第一个trunk，需要创建（空的文件，并生成IndexOutput实例）并写入，否则直接写入，这里使用lucene接口，获取对应的IndexOutput，调用writeBytes写入本地。
写入完成后，需要对target存储目录下的文件进行清理，即删除"source node中不存在的文件"（拷贝后，主副分片的数据文件要保持一致）。
至此，phase1阶段结束。

private final class FileChunkWriter {
    // 各个chunk可以以无序的方式发送过来，因此，需要在这里对chunk进行缓存，通过position来控制chunk的间隙
    final PriorityQueue<FileChunk> pendingChunks = new PriorityQueue<>(Comparator.comparing(fc -> fc.position));
    // 写入时，上一次的位置
    long lastPosition = 0;
    void writeChunk(FileChunk newChunk) throws IOException {
       pendingChunks.add(newChunk);
        while (true) {
            final FileChunk chunk;
            // 取当前position最小的第一个chunk
            chunk = pendingChunks.peek();
            // 如果chunk为空，或者position不是期望值，则退出（比如文件被拆分为两个chunk，且第2个先到，则position不为0，退出
            // 等待下一个过来，当第1个已经到了之后，下一次就先处理第一个，因为position为0，是期望值）
            if (chunk == null || chunk.position != lastPosition) {
                return;
            }
            // 获取到目标chunk后，移除
            pendingChunks.remove();
            // 写入
            innerWriteFileChunk(chunk.md, chunk.position, chunk.content, chunk.lastChunk);
            // 更新下一个chunk的期望position
            lastPosition += chunk.content.length();
            // 如果是最后一个trunk，移除该文件
            if (chunk.lastChunk) {
                fileChunkWriters.remove(chunk.md.name());}}}
    }
}

3.2.3 VERIFY_INDEX

phase1结束后，通过prepareTargetForTranslog，发送一个请求给target node，即通过 RemoteRecoveryTargetHandler#prepareForTranslogOperations 发送一个transport请求（internal:index/shard/recovery/prepare_translog）。
target node接收到请求后，处理入口：PeerRecoveryTargetService.PrepareForTranslogOperationsRequestHandler#messageReceived；由于代码嵌套较多，初始化部分不做过多说明，直接看代码流程，找到实际处理的逻辑入口：recoveryRef.target().prepareForTranslogOperations() -> indexShard().openEngineAndSkipTranslogRecovery() -> innerOpenEngineAndTranslog。
先将此时的阶段置为VERIFY_INDEX，即 recoveryState.setStage(RecoveryState.Stage.VERIFY_INDEX)；如果 checkIndexOnStartup 为"true"或者"checksum"，才进入此阶段，否则直接将stage状态改为"TRANSLOG"。
ES默认跳过该阶段，且该阶段主要为验证过程，故暂时跳过，后续补充。

private void innerOpenEngineAndTranslog() throws IOException {
    if (state != IndexShardState.RECOVERING) {
        throw new IndexShardNotRecoveringException(shardId, state);
    }
    // 更新stage状态为VERIFY_INDEX
    recoveryState.setStage(RecoveryState.Stage.VERIFY_INDEX);
    // 如果参数index.shard.check_on_startup为true，或者checksum，再校验，否则跳（参数取值范围：false、true、checksum，默认为false）
    if (Booleans.isTrue(checkIndexOnStartup) || "checksum".equals(checkIndexOnStartup)) {
        checkIndex();
    }
    // 更新stage状态为"TRANSLOG"
    recoveryState.setStage(RecoveryState.Stage.TRANSLOG);
    /.../
}

3.2.4 TRANSLOG

3.2.5 FINALIZE

3.2.4 DONE

cigarL

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch-PEER RECOVERY（三）

这里说明一下chunk并发的逻辑，即 cancellableThreads.execute(() -> requestSeqIdTracker.waitForOpsToComplete(requestSeqId - maxConcurrentFileChunks));每次write成功后会更新checkpointpublic synchronized void waitForOpsTo...
复制链接

扫一扫