【Elasticsearch】Elasticsearch 的allocation模块源码流程分析

allocatio模块介绍
  ES的分片分配就是把分片指派到集群中某个节点的过程,分配决策是有主节点完成的,其分配决策主要有两两面
  1:哪些节点需要分配到哪个节点
  2:哪个分片是主分片,哪个分片是副分片
  对于分片的分配主要有两个组件allocation和deciders完成,allocation的任务是找个最优的节点来分配分片,而deciders负责判断是否要进行这次分配。
  比如对于新建索引,allocation模块负责找出拥有分片最少的节点列表,然后deciders依次遍历节点,决定要不要把分片分配到节点。

对于已有的索引,主要区分哪个是主分片,哪个是副分片,对于主分片,allocation会找到已经拥有该分片最完整数据的节点上

allocatio触发条件
 
 index的增删
 node节点的增删
 手工reroute
 集群重启等
allocation模块结构概述
   这个复杂的分配过程是在reroute函数中实现的:
   allocationService.reroute方法,此方法催分片进行分配后,分配后的新的集群状态,Master节点将对新的集群状态进行广播.
组件allocator
  allocator主要分gatewayAllocation和shardAllocation
  gatewayAllocation 
    primayShardAllocation :找出拥有最新分片数据节点
    replicationShardAllocation  找出拥有分片数据的节点
  shardAllocation
     rebalanceShardAllocation 找出拥有最少分片的数据节点
组件deciders
  决策器决定了分片是否分配到节点中,其决策过程都会调用canAllocate()方法,其可以分类为
  负载均衡类
  SameShardAllocationDecider
  ShardsLimitAllocationDecider
  AwarenessAllocationDecider
  条件限制类决策器
  RebalanceOnlyWhenActiveAllocationDecider
  FilterAllocationDecider
  index.routing.allocation.include.
  index.routing.allocation.exclude.
allocation核心流程

其核心过程在AllocationService类reroute方法中,其实现过程如下
  
   private void reroute(RoutingAllocation allocation) {
        assert hasDeadNodes(allocation) == false : "dead nodes should be explicitly cleaned up. See disassociateDeadNodes";
        assert AutoExpandReplicas.getAutoExpandReplicaChanges(allocation.metaData(), allocation.nodes()).isEmpty() :
            "auto-expand replicas out of sync with number of nodes in the cluster";
        判断是否有未分配的分片
        if (allocation.routingNodes().unassigned().size() > 0) {
            removeDelayMarkers(allocation);
            //此方法分配分片
            gatewayAllocator.allocateUnassigned(allocation);
        }
        //rebalance集群
        shardsAllocator.allocate(allocation);
        assert RoutingNodes.assertShardStats(allocation.routingNodes());
   }
   
   关键流程走到GatewayAllocator类的innerAllocatedUnassigned()方法
   protected static void innerAllocatedUnassigned(RoutingAllocation allocation,
                                                   PrimaryShardAllocator primaryShardAllocator,
                                                   ReplicaShardAllocator replicaShardAllocator) {
        //找到未分配分片 
        RoutingNodes.UnassignedShards unassigned = allocation.routingNodes().unassigned();
        按恢复等级排序
        unassigned.sort(PriorityComparator.getAllocationComparator(allocation)); // sort for priority ordering
        //分配主分片,进入主分片分配方法
        primaryShardAllocator.allocateUnassigned(allocation);
        //分配副本
        replicaShardAllocator.processExistingRecoveries(allocation);
        replicaShardAllocator.allocateUnassigned(allocation);
    }
    
    public void allocateUnassigned(RoutingAllocation allocation) {
        final RoutingNodes routingNodes = allocation.routingNodes();
        final RoutingNodes.UnassignedShards.UnassignedIterator unassignedIterator = routingNodes.unassigned().iterator();
        //循环分片
        while (unassignedIterator.hasNext()) {
            final ShardRouting shard = unassignedIterator.next();
            //此方法关键,调用主分片的决策器进行决策,决定分配要分配到哪个节点和是否要分配,这个方法放在最后单独分析
            final AllocateUnassignedDecision allocateUnassignedDecision = makeAllocationDecision(shard, allocation, logger);

            if (allocateUnassignedDecision.isDecisionTaken() == false) {
                // no decision was taken by this allocator
                continue;
            }
            //若决策器决定可以分配
            if (allocateUnassignedDecision.getAllocationDecision() == AllocationDecision.YES) {
                //初始化为分配分片
                unassignedIterator.initialize(allocateUnassignedDecision.getTargetNode().getId(),
                    allocateUnassignedDecision.getAllocationId(),
                    shard.primary() ? ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE :
                                      allocation.clusterInfo().getShardSize(shard, ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE),
                    allocation.changes());
            } else {
                unassignedIterator.removeAndIgnore(allocateUnassignedDecision.getAllocationStatus(), allocation.changes());
            }
        }
    } 
    
   流程到RoutingNodes类的initializeShard()方法,此实现过程如下代码
   public ShardRouting initializeShard(ShardRouting unassignedShard, String nodeId, @Nullable String existingAllocationId,
                                        long expectedSize, RoutingChangesObserver routingChangesObserver) {
        ensureMutable();
        assert unassignedShard.unassigned() : "expected an unassigned shard " + unassignedShard;
        //初始化未分配分片
        ShardRouting initializedShard = unassignedShard.initialize(nodeId, existingAllocationId, expectedSize);
        添加到目的节点的分片列表
        node(nodeId).add(initializedShard);
        inactiveShardCount++;
        if (initializedShard.primary()) {
            inactivePrimaryCount++;
        }
        addRecovery(initializedShard);
        //把分片放在已经分配中
        assignedShardsAdd(initializedShard);
        //设置状态更新
        routingChangesObserver.shardInitialized(unassignedShard, initializedShard);
        return initializedShard;
    }
    
    以上就是完成了主分片的allocation的任务,当allocation成功后,构建集群状态。当makeAllocationDecision成功后,unassignedShard.initialize()方法,创建一个新的ShardRouting对象,
    把相关信息添加到集群状态中,后面再把状态广播出去。
    
    
主分片决策器流程分析
此流程分析有makeAllocationDecision    PrimaryShardAllocator类的makeAllocationDecision方法,其实现过程去如下

   public AllocateUnassignedDecision makeAllocationDecision(final ShardRouting unassignedShard,
                                                             final RoutingAllocation allocation,
                                                             final Logger logger) {
        if (isResponsibleFor(unassignedShard) == false) {
            // this allocator is not responsible for allocating this shard
            return AllocateUnassignedDecision.NOT_TAKEN;
        }

        final boolean explain = allocation.debugDecision();
        //获取分片元数据
        final FetchResult<NodeGatewayStartedShards> shardState = fetchData(unassignedShard, allocation);
        if (shardState.hasData() == false) {
            allocation.setHasPendingAsyncFetch();
            List<NodeAllocationResult> nodeDecisions = null;
            if (explain) {
                //此方法调用决策器
                nodeDecisions = buildDecisionsForAllNodes(unassignedShard, allocation);
            }
            return AllocateUnassignedDecision.no(AllocationStatus.FETCHING_SHARD_DATA, nodeDecisions);
        }

        // don't create a new IndexSetting object for every shard as this could cause a lot of garbage
        // on cluster restart if we allocate a boat load of shards
        final IndexMetaData indexMetaData = allocation.metaData().getIndexSafe(unassignedShard.index());
        final Set<String> inSyncAllocationIds = indexMetaData.inSyncAllocationIds(unassignedShard.id());
        final boolean snapshotRestore = unassignedShard.recoverySource().getType() == RecoverySource.Type.SNAPSHOT;

        assert inSyncAllocationIds.isEmpty() == false;
        // use in-sync allocation ids to select nodes
        final NodeShardsResult nodeShardsResult = buildNodeShardsResult(unassignedShard, snapshotRestore,
            allocation.getIgnoreNodes(unassignedShard.shardId()), inSyncAllocationIds, shardState, logger);
        final boolean enoughAllocationsFound = nodeShardsResult.orderedAllocationCandidates.size() > 0;
        logger.debug("[{}][{}]: found {} allocation candidates of {} based on allocation ids: [{}]", unassignedShard.index(),
            unassignedShard.id(), nodeShardsResult.orderedAllocationCandidates.size(), unassignedShard, inSyncAllocationIds);

        if (enoughAllocationsFound == false) {
            if (snapshotRestore) {
                // let BalancedShardsAllocator take care of allocating this shard
                logger.debug("[{}][{}]: missing local data, will restore from [{}]",
                             unassignedShard.index(), unassignedShard.id(), unassignedShard.recoverySource());
                return AllocateUnassignedDecision.NOT_TAKEN;
            } else {
                // We have a shard that was previously allocated, but we could not find a valid shard copy to allocate the primary.
                // We could just be waiting for the node that holds the primary to start back up, in which case the allocation for
                // this shard will be picked up when the node joins and we do another allocation reroute
                logger.debug("[{}][{}]: not allocating, number_of_allocated_shards_found [{}]",
                             unassignedShard.index(), unassignedShard.id(), nodeShardsResult.allocationsFound);
                return AllocateUnassignedDecision.no(AllocationStatus.NO_VALID_SHARD_COPY,
                    explain ? buildNodeDecisions(null, shardState, inSyncAllocationIds) : null);
            }
        }

        NodesToAllocate nodesToAllocate = buildNodesToAllocate(
            allocation, nodeShardsResult.orderedAllocationCandidates, unassignedShard, false
        );
        DiscoveryNode node = null;
        String allocationId = null;
        boolean throttled = false;
        if (nodesToAllocate.yesNodeShards.isEmpty() == false) {
            DecidedNode decidedNode = nodesToAllocate.yesNodeShards.get(0);
            logger.debug("[{}][{}]: allocating [{}] to [{}] on primary allocation",
                         unassignedShard.index(), unassignedShard.id(), unassignedShard, decidedNode.nodeShardState.getNode());
            node = decidedNode.nodeShardState.getNode();
            allocationId = decidedNode.nodeShardState.allocationId();
        } else if (nodesToAllocate.throttleNodeShards.isEmpty() && !nodesToAllocate.noNodeShards.isEmpty()) {
            // The deciders returned a NO decision for all nodes with shard copies, so we check if primary shard
            // can be force-allocated to one of the nodes.
            nodesToAllocate = buildNodesToAllocate(allocation, nodeShardsResult.orderedAllocationCandidates, unassignedShard, true);
            if (nodesToAllocate.yesNodeShards.isEmpty() == false) {
                final DecidedNode decidedNode = nodesToAllocate.yesNodeShards.get(0);
                final NodeGatewayStartedShards nodeShardState = decidedNode.nodeShardState;
                logger.debug("[{}][{}]: allocating [{}] to [{}] on forced primary allocation",
                             unassignedShard.index(), unassignedShard.id(), unassignedShard, nodeShardState.getNode());
                node = nodeShardState.getNode();
                allocationId = nodeShardState.allocationId();
            } else if (nodesToAllocate.throttleNodeShards.isEmpty() == false) {
                logger.debug("[{}][{}]: throttling allocation [{}] to [{}] on forced primary allocation",
                             unassignedShard.index(), unassignedShard.id(), unassignedShard, nodesToAllocate.throttleNodeShards);
                throttled = true;
            } else {
                logger.debug("[{}][{}]: forced primary allocation denied [{}]",
                             unassignedShard.index(), unassignedShard.id(), unassignedShard);
            }
        } else {
            // we are throttling this, since we are allowed to allocate to this node but there are enough allocations
            // taking place on the node currently, ignore it for now
            logger.debug("[{}][{}]: throttling allocation [{}] to [{}] on primary allocation",
                         unassignedShard.index(), unassignedShard.id(), unassignedShard, nodesToAllocate.throttleNodeShards);
            throttled = true;
        }

        List<NodeAllocationResult> nodeResults = null;
        if (explain) {
            nodeResults = buildNodeDecisions(nodesToAllocate, shardState, inSyncAllocationIds);
        }
        if (allocation.hasPendingAsyncFetch()) {
            return AllocateUnassignedDecision.no(AllocationStatus.FETCHING_SHARD_DATA, nodeResults);
        } else if (node != null) {
            return AllocateUnassignedDecision.yes(node, allocationId, nodeResults, false);
        } else if (throttled) {
            return AllocateUnassignedDecision.throttle(nodeResults);
        } else {
            return AllocateUnassignedDecision.no(AllocationStatus.DECIDERS_NO, nodeResults, true);
        }
    }
   
   引用借鉴  es源码分析
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值