MongoDB分片迁移原理与源码（3）

最新推荐文章于 2024-08-15 09:48:57 发布

云计算与数据库

最新推荐文章于 2024-08-15 09:48:57 发布

阅读量588

点赞数 1

分类专栏： MongoDB 文章标签：数据库分布式 mongodb

本文链接：https://blog.csdn.net/dreamdaye123/article/details/105278247

版权

本文深入探讨MongoDB分片迁移的原理，包括moveChunk过程、config服务器的角色、from shard和to shard的迁移操作。文章详细介绍了从启动迁移、数据复制到提交变更的每个阶段，并讨论了状态机的变化以及在from shard和to shard两端的关键步骤，确保数据一致性。

摘要由CSDN通过智能技术生成

MongoDB分片迁移原理与源码

move chunk

moveChunk 是一个比较复杂的动作, 大致过程如下：

基于对应一开始介绍的块迁移流程

执行moveChunk有一些参数，比如在_moveChunks调用MigrationManager::executeMigrationsForAutoBalance()时，

balancerConfig->getSecondaryThrottle()，返回的为_secondaryThrottle: 变量，true 表示 balancer 插入数据时，至少等待一个 secondary 节点回复；false 表示不等待写到 secondary 节点；也可以直接设置为 write concern ，则迁移时使用这个 write concern . 3.2 版本默认 true, 3.4 开始版本默认 false。

balancerConfig->waitForDelete()，返回的为waitForDelete，迁移一个 chunk 数据以后，是否同步等待数据删除完毕；默认为 false , 由一个单独的线程异步删除孤儿数据。

config服务器

int Balancer::_moveChunks(OperationContext* opCtx,
                          const BalancerChunkSelectionPolicy::MigrateInfoVector& candidateChunks) {
    auto migrationStatuses =
            _migrationManager.executeMigrationsForAutoBalance(opCtx,
                                                              candidateChunks,
                                                              balancerConfig->getMaxChunkSizeBytes(),               
                                                              balancerConfig->getSecondaryThrottle(),
                                                              balancerConfig->waitForDelete());

}

***executeMigrationsForAutoBalance()函数***会将所有需要迁移的块信息(from shard, to shard, chunk)信息构造一个块迁移任务请求发送给from shard，然后由from shard执行后续的move chunk流程。

MigrationStatuses MigrationManager::executeMigrationsForAutoBalance(
                                            OperationContext* opCtx,
                                            const vector<MigrateInfo>& migrateInfos,
                                            uint64_t maxChunkSizeBytes,
                                            const MigrationSecondaryThrottleOptions& secondaryThrottle,
                                            bool waitForDelete) {
    //将每一个需要处理的块迁移操作分别创建迁移任务请求发送到from shard                                            
    for (const auto& migrateInfo : migrateInfos) {
        //向config.migrations中写入一个文档，防止此迁移必须由平衡器恢复。如果块已经在移动，则迁移下一个。
        auto statusWithScopedMigrationRequest =
            ScopedMigrationRequest::writeMigration(opCtx, migrateInfo, waitForDelete);
        if (!statusWithScopedMigrationRequest.isOK()) {
            migrationStatuses.emplace(migrateInfo.getName(),
                                      std::move(statusWithScopedMigrationRequest.getStatus()));
            continue;
        }
        scopedMigrationRequests.emplace(migrateInfo.getName(),
                                        std::move(statusWithScopedMigrationRequest.getValue()));
        //将一个块迁移操作加入到调度
        responses.emplace_back(
            _schedule(opCtx, migrateInfo, maxChunkSizeBytes, secondaryThrottle, waitForDelete),
            migrateInfo);
    }
    
    //等待所有的迁移任务结束，更新
    for (auto& response : responses) {
        //......
    }
}

之后，会创建一个远程调用命令给from shard，去触发迁移流程

shared_ptr<Notification<RemoteCommandResponse>> MigrationManager::_schedule(
    OperationContext* opCtx,
    const MigrateInfo& migrateInfo,
    uint64_t maxChunkSizeBytes,
    const MigrationSecondaryThrottleOptions& secondaryThrottle,
    bool waitForDelete) {
    //......
    
    //构造"moveChunk"命令
    BSONObjBuilder builder;
    MoveChunkRequest::appendAsCommand(
        &builder,
        nss,
        migrateInfo.version,
        repl::ReplicationCoordinator::get(opCtx)->getConfig().getConnectionString(),
        migrateInfo.from,
        migrateInfo.to,
        ChunkRange(migrateInfo.minKey, migrateInfo.maxKey),
        maxChunkSizeBytes,
        secondaryThrottle,
        waitForDelete);

    Migration migration(nss, builder.obj());

    //发送到fromHostStatus.getValue()对应的from shard执行该moveChunk操作。
    _schedule(lock, opCtx, fromHostStatus.getValue(), std::move(migration));
}

至此，后续的迁移任务就由from shard和to shard来执行了

from shard

迁移任务由from shard执行moveChunk命令，来完成迁移。

class MoveChunkCommand : public BasicCommand {
public:
    MoveChunkCommand() : BasicCommand("moveChunk") {}
    
    bool run(OperationContext* opCtx,
             const std::string& dbname,
             const BSONObj& cmdObj,
             BSONObjBuilder& result) override {
        _runImpl(opCtx, moveChunkRequest);     
    }    
}

from端迁移状态机。此对象必须由单个线程创建和拥有，该线程控制其生存期，不应该跨线程传递。除非明确指出它的方法不能被一个以上的线程调用，也不能在持有任何锁时调用。

工作流程如下:

获取即将移动数据块的集合的分布式锁。
在堆栈上实例化一个MigrationSourceManager。这将快照最新的收集元数据，由于分布式收集锁，这些元数据应该保持稳定。
调用startClone启动块内容的后台克隆。这将执行复制子系统对克隆程序的必要注册，并开始监听文档更改，同时响应来自接收者的数据获取请求。
调用awaitUntilCriticalSectionIsAppropriate以等待克隆过程充分赶上，所以我们不会保持服务器在只读状态太长时间。
调用enterCriticalSection使碎片进入“只读”模式，而最新的更改将由to shard处理完毕。
调用commitDonateChunk将此次迁移结果提交到config服务器，并保持只读(临界区)模式。

几个阶段的状态为：
enum State { kCreated, kCloning, kCloneCaughtUp, kCriticalSection, kCloneCompleted, kDone };

static void _runImpl(OperationContext* opCtx, const MoveChunkRequest& moveChunkRequest) {
    //根据config传过来的_secondaryThrottle来处理是否插入数据时，至少等待一个 secondary 节点回复
    const auto writeConcernForRangeDeleter =
        uassertStatusOK(ChunkMoveWriteConcernOptions::getEffectiveWriteConcern(
            opCtx, moveChunkRequest.getSecondaryThrottle()));

    // Resolve the donor and recipient shards and their connection string
    auto const shardRegistry = Grid::get(opCtx)->shardRegistry();

    //获取from shard的连接串
    const auto donorConnStr =
        uassertStatusOK(shardRegistry->getShard(opCtx, moveChunkRequest.getFromShardId()))
            ->getConnString();
    //获取to shard的连接信息
    const auto recipientHost = uassertStatusOK([&] {
        auto recipientShard =
            uassertStatusOK(shardRegistry->getShard(opCtx, moveChunkRequest.getToShardId()));

        return recipientShard->getTargeter()->findHostNoWait(
            ReadPreferenceSetting{ReadPreference::PrimaryOnly});
    }());

    
    moveTimingHelper.done(1);
    MONGO_FAIL_POINT_PAUSE_WHILE_SET(moveChunkHangAtStep1);

    /*使用指定的迁移参数实例化新的迁移源管理器。必须使用预先获得的分布式锁来调用(而不是断言)。加载最新的集合元数据并将其用作起点。由于分布式锁，集合的元数据不会进一步更改。*/
    //kCreated
    MigrationSourceManager migrationSourceManager(
        opCtx, moveChunkRequest, donorConnStr, recipientHost);

    moveTimingHelper.done(2);
    MONGO_FAIL_POINT_PAUSE_WHILE_SET(moveChunkHangAtStep2);

    //kCloning
    uassertStatusOKWithWarning(migrationSourceManager.startClone(opCtx));
    moveTimingHelper.done(3);
    MONGO_FAIL_POINT_PAUSE_WHILE_SET(moveChunkHangAtStep3);

    //kCloneCaughtUp
    uassertStatusOKWithWarning(migrationSourceManager.a