【Elasticsearch】Elasticsearch 的allocation模块源码流程分析

最新推荐文章于 2023-02-08 22:36:06 发布

奔跑的前浪

最新推荐文章于 2023-02-08 22:36:06 发布

阅读量831

点赞数

分类专栏：大数据搜索索引文章标签： elasticsearch 大数据

本文链接：https://blog.csdn.net/lisi1129/article/details/109850012

版权

大数据同时被 2 个专栏收录

28 篇文章 0 订阅

订阅专栏

搜索索引

5 篇文章 0 订阅

订阅专栏

allocatio模块介绍
ES的分片分配就是把分片指派到集群中某个节点的过程，分配决策是有主节点完成的，其分配决策主要有两两面
1：哪些节点需要分配到哪个节点
2：哪个分片是主分片，哪个分片是副分片
对于分片的分配主要有两个组件allocation和deciders完成，allocation的任务是找个最优的节点来分配分片，而deciders负责判断是否要进行这次分配。
比如对于新建索引，allocation模块负责找出拥有分片最少的节点列表，然后deciders依次遍历节点，决定要不要把分片分配到节点。

对于已有的索引，主要区分哪个是主分片，哪个是副分片，对于主分片，allocation会找到已经拥有该分片最完整数据的节点上

allocatio触发条件

index的增删
node节点的增删
手工reroute
集群重启等
allocation模块结构概述
这个复杂的分配过程是在reroute函数中实现的：
allocationService.reroute方法，此方法催分片进行分配后，分配后的新的集群状态，Master节点将对新的集群状态进行广播.
组件allocator
allocator主要分gatewayAllocation和shardAllocation
gatewayAllocation
primayShardAllocation ：找出拥有最新分片数据节点
replicationShardAllocation 找出拥有分片数据的节点
shardAllocation
rebalanceShardAllocation 找出拥有最少分片的数据节点
组件deciders
决策器决定了分片是否分配到节点中，其决策过程都会调用canAllocate()方法，其可以分类为
负载均衡类
SameShardAllocationDecider
ShardsLimitAllocationDecider
AwarenessAllocationDecider
条件限制类决策器
RebalanceOnlyWhenActiveAllocationDecider
FilterAllocationDecider
index.routing.allocation.include.
index.routing.allocation.exclude.
allocation核心流程

其核心过程在AllocationService类reroute方法中，其实现过程如下

private void reroute(RoutingAllocation allocation) {
assert hasDeadNodes(allocation) == false : "dead nodes should be explicitly cleaned up. See disassociateDeadNodes";
assert AutoExpandReplicas.getAutoExpandReplicaChanges(allocation.metaData(), allocation.nodes()).isEmpty() :
"auto-expand replicas out of sync with number of nodes in the cluster";
判断是否有未分配的分片
if (allocation.routingNodes().unassigned().size() > 0) {
removeDelayMarkers(allocation);
           //此方法分配分片
gatewayAllocator.allocateUnassigned(allocation);
}
//rebalance集群
shardsAllocator.allocate(allocation);
assert RoutingNodes.assertShardStats(allocation.routingNodes());
}

关键流程走到GatewayAllocator类的innerAllocatedUnassigned()方法
protected static void innerAllocatedUnassigned(RoutingAllocation allocation,
PrimaryShardAllocator primaryShardAllocator,
ReplicaShardAllocator replicaShardAllocator) {
       //找到未分配分片
RoutingNodes.UnassignedShards unassigned = allocation.routingNodes().unassigned();
       按恢复等级排序
unassigned.sort(PriorityComparator.getAllocationComparator(allocation)); // sort for priority ordering
//分配主分片，进入主分片分配方法
primaryShardAllocator.allocateUnassigned(allocation);
       //分配副本
replicaShardAllocator.processExistingRecoveries(allocation);
replicaShardAllocator.allocateUnassigned(allocation);
}

public void allocateUnassigned(RoutingAllocation allocation) {
final RoutingNodes routingNodes = allocation.routingNodes();
final RoutingNodes.UnassignedShards.UnassignedIterator unassignedIterator = routingNodes.unassigned().iterator();
       //循环分片
while (unassignedIterator.hasNext()) {
final ShardRouting shard = unassignedIterator.next();
           //此方法关键，调用主分片的决策器进行决策,决定分配要分配到哪个节点和是否要分配，这个方法放在最后单独分析
final AllocateUnassignedDecision allocateUnassignedDecision = makeAllocationDecision(shard, allocation, logger);

if (allocateUnassignedDecision.isDecisionTaken() == false) {
// no decision was taken by this allocator
continue;
}
//若决策器决定可以分配
if (allocateUnassignedDecision.getAllocationDecision() == AllocationDecision.YES) {
           //初始化为分配分片
unassignedIterator.initialize(allocateUnassignedDecision.getTargetNode().getId(),
allocateUnassignedDecision.getAllocationId(),
shard.primary() ? ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE :
allocation.clusterInfo().getShardSize(shard, ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE),
allocation.changes());
} else {
unassignedIterator.removeAndIgnore(allocateUnassignedDecision.getAllocationStatus(), allocation.changes());
}
}
}

流程到RoutingNodes类的initializeShard()方法，此实现过程如下代码
public ShardRouting initializeShard(ShardRouting unassignedShard, String nodeId, @Nullable String existingAllocationId,
long expectedSize, RoutingChangesObserver routingChangesObserver) {
ensureMutable();
assert unassignedShard.unassigned() : "expected an unassigned shard " + unassignedShard;
       //初始化未分配分片
ShardRouting initializedShard = unassignedShard.initialize(nodeId, existingAllocationId, expectedSize);
       添加到目的节点的分片列表
node(nodeId).add(initializedShard);
inactiveShardCount++;
if (initializedShard.primary()) {
inactivePrimaryCount++;
}
addRecovery(initializedShard);
       //把分片放在已经分配中
assignedShardsAdd(initializedShard);
       //设置状态更新
routingChangesObserver.shardInitialized(unassignedShard, initializedShard);
return initializedShard;
}

   以上就是完成了主分片的allocation的任务，当allocation成功后，构建集群状态。当makeAllocationDecision成功后，unassignedShard.initialize()方法，创建一个新的ShardRouting对象，
   把相关信息添加到集群状态中，后面再把状态广播出去。


主分片决策器流程分析
此流程分析有makeAllocationDecision   PrimaryShardAllocator类的makeAllocationDecision方法，其实现过程去如下

public AllocateUnassignedDecision makeAllocationDecision(final ShardRouting unassignedShard,
final RoutingAllocation allocation,
final Logger logger) {
if (isResponsibleFor(unassignedShard) == false) {
// this allocator is not responsible for allocating this shard
return AllocateUnassignedDecision.NOT_TAKEN;
}

final boolean explain = allocation.debugDecision();
//获取分片元数据
final FetchResult<NodeGatewayStartedShards> shardState = fetchData(unassignedShard, allocation);
if (shardState.hasData() == false) {
allocation.setHasPendingAsyncFetch();
List<NodeAllocationResult> nodeDecisions = null;
if (explain) {
//此方法调用决策器
nodeDecisions = buildDecisionsForAllNodes(unassignedShard, allocation);
}
return AllocateUnassignedDecision.no(AllocationStatus.FETCHING_SHARD_DATA, nodeDecisions);
}

// don't create a new IndexSetting object for every shard as this could cause a lot of garbage
// on cluster restart if we allocate a boat load of shards
final IndexMetaData indexMetaData = allocation.metaData().getIndexSafe(unassignedShard.index());
final Set<String> inSyncAllocationIds = indexMetaData.inSyncAllocationIds(unassignedShard.id());
final boolean snapshotRestore = unassignedShard.recoverySource().getType() == RecoverySource.Type.SNAPSHOT;

assert inSyncAllocationIds.isEmpty() == false;
// use in-sync allocation ids to select nodes
final NodeShardsResult nodeShardsResult = buildNodeShardsResult(unassignedShard, snapshotRestore,
allocation.getIgnoreNodes(unassignedShard.shardId()), inSyncAllocationIds, shardState, logger);
final boolean enoughAllocationsFound = nodeShardsResult.orderedAllocationCandidates.size() > 0;
logger.debug("[{}][{}]: found {} allocation candidates of {} based on allocation ids: [{}]", unassignedShard.index(),
unassignedShard.id(), nodeShardsResult.orderedAllocationCandidates.size(), unassignedShard, inSyncAllocationIds);

if (enoughAllocationsFound == false) {
if (snapshotRestore) {
// let BalancedShardsAllocator take care of allocating this shard
logger.debug("[{}][{}]: missing local data, will restore from [{}]",
unassignedShard.index(), unassignedShard.id(), unassignedShard.recoverySource());
return AllocateUnassignedDecision.NOT_TAKEN;
} else {
// We have a shard that was previously allocated, but we could not find a valid shard copy to allocate the primary.
// We could just be waiting for the node that holds the primary to start back up, in which case the allocation for
// this shard will be picked up when the node joins and we do another allocation reroute
logger.debug("[{}][{}]: not allocating, number_of_allocated_shards_found [{}]",
unassignedShard.index(), unassignedShard.id(), nodeShardsResult.allocationsFound);
return AllocateUnassignedDecision.no(AllocationStatus.NO_VALID_SHARD_COPY,
explain ? buildNodeDecisions(null, shardState, inSyncAllocationIds) : null);
}
}

NodesToAllocate nodesToAllocate = buildNodesToAllocate(
allocation, nodeShardsResult.orderedAllocationCandidates, unassignedShard, false
);
DiscoveryNode node = null;
String allocationId = null;
boolean throttled = false;
if (nodesToAllocate.yesNodeShards.isEmpty() == false) {
DecidedNode decidedNode = nodesToAllocate.yesNodeShards.get(0);
logger.debug("[{}][{}]: allocating [{}] to [{}] on primary allocation",
unassignedShard.index(), unassignedShard.id(), unassignedShard, decidedNode.nodeShardState.getNode());
node = decidedNode.nodeShardState.getNode();
allocationId = decidedNode.nodeShardState.allocationId();
} else if (nodesToAllocate.throttleNodeShards.isEmpty() && !nodesToAllocate.noNodeShards.isEmpty()) {
// The deciders returned a NO decision for all nodes with shard copies, so we check if primary shard
// can be force-allocated to one of the nodes.
nodesToAllocate = buildNodesToAllocate(allocation, nodeShardsResult.orderedAllocationCandidates, unassignedShard, true);
if (nodesToAllocate.yesNodeShards.isEmpty() == false) {
final DecidedNode decidedNode = nodesToAllocate.yesNodeShards.get(0);
final NodeGatewayStartedShards nodeShardState = decidedNode.nodeShardState;
logger.debug("[{}][{}]: allocating [{}] to [{}] on forced primary allocation",
unassignedShard.index(), unassignedShard.id(), unassignedShard, nodeShardState.getNode());
node = nodeShardState.getNode();
allocationId = nodeShardState.allocationId();
} else if (nodesToAllocate.throttleNodeShards.isEmpty() == false) {
logger.debug("[{}][{}]: throttling allocation [{}] to [{}] on forced primary allocation",
unassignedShard.index(), unassignedShard.id(), unassignedShard, nodesToAllocate.throttleNodeShards);
throttled = true;
} else {
logger.debug("[{}][{}]: forced primary allocation denied [{}]",
unassignedShard.index(), unassignedShard.id(), unassignedShard);
}
} else {
// we are throttling this, since we are allowed to allocate to this node but there are enough allocations
// taking place on the node currently, ignore it for now
logger.debug("[{}][{}]: throttling allocation [{}] to [{}] on primary allocation",
unassignedShard.index(), unassignedShard.id(), unassignedShard, nodesToAllocate.throttleNodeShards);
throttled = true;
}

List<NodeAllocationResult> nodeResults = null;
if (explain) {
nodeResults = buildNodeDecisions(nodesToAllocate, shardState, inSyncAllocationIds);
}
if (allocation.hasPendingAsyncFetch()) {
return AllocateUnassignedDecision.no(AllocationStatus.FETCHING_SHARD_DATA, nodeResults);
} else if (node != null) {
return AllocateUnassignedDecision.yes(node, allocationId, nodeResults, false);
} else if (throttled) {
return AllocateUnassignedDecision.throttle(nodeResults);
} else {
return AllocateUnassignedDecision.no(AllocationStatus.DECIDERS_NO, nodeResults, true);
}
}

引用借鉴 es源码分析

奔跑的前浪

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Elasticsearch】Elasticsearch 的allocation模块源码流程分析

allocatio模块介绍 ES的分片分配就是把分片指派到集群中某个节点的过程，分配决策是有主节点完成的，其分配决策主要有两两面 1：哪些节点需要分配到哪个节点 2：哪个分片是主分片，哪个分片是副分片对于分片的分配主要有两个组件allocation和deciders完成，allocation的任务是找个最优的节点来分配分片，而deciders负责判断是否要进行这次分配。比如对于新建索引，allocation模块负责找出拥有分片最少的节点列表，然后deciders依次遍历节点，决定要不...
复制链接

扫一扫