Elasticsearch-Bulk基本流程（一）

最新推荐文章于 2024-03-30 10:32:04 发布

cigarL

最新推荐文章于 2024-03-30 10:32:04 发布

阅读量1.2k

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/weixin_43211119/article/details/103885935

版权

elasticsearch 专栏收录该内容

8 篇文章 1 订阅

订阅专栏

1.Bulk基本流程

Elasticsearch写操作，是先在主分片执行成功后，转发请求到其他副本分片进行处理，所有分片执行成功，返回响应给主分片，主分片拿到结果，返回客户端。可以通过wait_for_active_shards参数指定需要确认的分片数，默认为1，即主分片写入成功就返回结果(假设该参数为3，但只有主分片可用，可以观察到，客户端会被阻塞)。

来自官网的图片：
在这里插入图片描述
bulk流程：

1.2.协调节点流程

代码入口：TransportBulkAction#doExecute(Task task, BulkRequest bulkRequest, ActionListener listener){…}

1.2.1.Pipeline

首先是做了一系列pipeline及process处理，因为当前涉及Ingest较少，暂不做过多说明，待后续学习完整个Ingest流程再来补充。

1.2.2.Auto Create Index

判断是否需要自动创建索引，即needToCheck();进去后可以看到，返回的是autoCreate.autoCreateIndex，即自动创建索引的开关；

public boolean needToCheck() {
  return this.autoCreate.autoCreateIndex;
}

如果否，则直接准备下一步操作;

executeBulk(task, bulkRequest, startTime, listener, responses, emptyMap());

如果是，则需要进行以下这几步操作后，进入executeBulk()执行下一步操作；

Step 1：对BulkRequest进行过滤，并获取所有索引名。主要为OpType和VersionType，其中OpType为索引操作类型，支持的有INDEX,CREATE,UPDATE,DELETE四种，
        INDEX和CREATE的区别在于，如果doc id存在，INDEX操作会覆盖更新，而CREATE不会
    final Set<String> indices = bulkRequest.requests.stream()
          .filter(request -> request.opType() != DocWriteRequest.OpType.DELETE
                  || request.versionType() == VersionType.EXTERNAL
                  || request.versionType() == VersionType.EXTERNAL_GTE)
          .map(DocWriteRequest::index)
          .collect(Collectors.toSet());
Step 2：对各索引进行检查，输出一个Map来存储无法创建的索引信息indicesThatCannotBeCreated，和一个可以自动创建索引的Set。
        索引是否可以正常自动创建，主要检查：1.是否存在该索引或别名（存在则无法创建）；2.该索引是否被允许自动创建（二次检查，
        为了防止check信息丢失）；3.动态mapping是否被禁用（如果被禁用，则无法创建）；4.创建索引的匹配规则是否存在并可以正常
        匹配（如果表达式非空，且该索引无法匹配上，则无法创建）。
        if (resolver.hasIndexOrAlias(index, state)) {
           return false;
        }
        final AutoCreate autoCreate = this.autoCreate;
        if (autoCreate.autoCreateIndex == false) {
           throw new IndexNotFoundException("[" + AUTO_CREATE_INDEX_SETTING.getKey() + "] is [false]", index);
        }
        if (dynamicMappingDisabled) {
           throw new IndexNotFoundException("[" + MapperService.INDEX_MAPPER_DYNAMIC_SETTING.getKey() + "] is [false]", index);
        }
        if (autoCreate.expressions.isEmpty()) {
           return true;
        }
        for (Tuple<String, Boolean> expression : autoCreate.expressions) {...}
Step 3：如果没有需要创建的索引，直接executeBulk到下一步操作；如果存在需要创建的索引，则逐个创建索引，并监听结果，成功计数器减1，
        失败的话，则将BulkRequest中对应的request的value置为null，计数器减1，当所有索引执行“创建索引”操作结束后（即计数器减为0），
        进入executeBulk()。
        for (String index : autoCreateIndices) {
           createIndex(index, bulkRequest.timeout(), new ActionListener<CreateIndexResponse>() {
                 @Override
                 public void onResponse(CreateIndexResponse result) {
                     if (counter.decrementAndGet() == 0) {
                         executeBulk(task, bulkRequest, startTime, listener, responses, indicesThatCannotBeCreated);
                      }
                  }
                  @Override
                  public void onFailure(Exception e) {
                     if (!(ExceptionsHelper.unwrapCause(e) instanceof ResourceAlreadyExistsException)) {
                         for (int i = 0; i < bulkRequest.requests.size(); i++) {...}
                         if (counter.decrementAndGet() == 0) {
                             executeBulk(...);
                         }}});
        }

1.2.3.Shard Request

executeBulk进入到BulkOperation#doRun，检查集群无BlockException后（存在BlockedException会不断重试，直至超时），逐个request开始操作：1.获取request中对应的value，如果为null（上一步中，不能自动创建的索引会置为null），则跳过；2.如果索引不可用则跳过（如索引被关闭等）；
对Request中索引操作类型进行判断，如果是UPDATE或者DELETE，判断需要路由到哪个节点；如果是CREATE或INDEX，检查mapping、routing、version，并生成id（如果id不存在，生成一个uuid作为doc id）。如果存在任何异常，例如解析异常，路由信息不对等，均会再次将该requets的value置为null。
接下来合并请求，即，将“应该在同一个分片上执行的请求”合并到一起，发送给该分片所属节点进行处理。1.如果对应的请求被置为null则跳过（上面出现异常时置为null）；2.计算分片id，并将相同分片id的请求合并；3.按分片设置参数（如wait_for_active_shards、timeout等），并转发TransportShardBulkAction请求。

// 计算shardId（这里的partitionOffset是根据参数index.routing_partition_size获取的，默认为1，写入时指定id，可能导致分布不均，可调大该参数，让分片id可变范围更大，分布更均匀）：
// routingFactor默认为1，主要是在做spilt和shrink时改变
final int hash = Murmur3HashFunction.hash(effectiveRouting) + partitionOffset;
return Math.floorMod(hash, indexMetaData.getRoutingNumShards()) / indexMetaData.getRoutingFactor();

// 合并分片请求：
List<BulkItemRequest> shardRequests = requestsByShard.computeIfAbsent(shardId, shard -> new ArrayList<>());
shardRequests.add(new BulkItemRequest(i, request));

// 处理请求（在listener中等待响应，响应都是按shard返回的，如果一个shard中有部分请求失败，将异常填到response中，所有请求完成，即计数器为0，调用finishHim()，整体请求做成功处理）：
shardBulkAction.execute(bulkShardRequest, new ActionListener<BulkShardResponse>() {
    @Override
    public void onResponse(BulkShardResponse bulkShardResponse) {}
    @Override
    public void onFailure(Exception e) {...}
    private void finishHim() {
        listener.onResponse(new BulkResponse(responses.toArray(new BulkItemResponse[responses.length()]),
            buildTookInMillis(startTimeNanos)));
    }
}

1.2.4.Send Request To Primary Shard

转发TransportShardBulkAction请求，是到了TransportAction#execute(Request request, ActionListener listener) -> TransportAction#execute(Task task, Request request, ActionListener listener) -> TransportAction#proceed -> TransportAction#doExecute -> TransportReplicationAction#doExecute -> TransportReplicationAction.ReroutePhase#doRun
在这里插入图片描述
将该task当前阶段标识为“routing”，然后检查是否存在blockException，如果正常，获取主分片所在的node信息，判断是否为本节点，如果不是，则转发请求到对应节点，如果该分片在当前节点，则继续执行。

setPhase(task, "routing");
/.../
final DiscoveryNode node = state.nodes().get(primary.currentNodeId());
if (primary.currentNodeId().equals(state.nodes().getLocalNodeId())) {
    performLocalAction(state, primary, node, indexMetaData);
} else {
    performRemoteAction(state, primary, node);
}

如果分片在当前节点，task当前阶段置为“waiting_on_primary”，否则为“rerouted”，两者都走到同一入口，即performAction(…)，通过tansportService.sendRequest发送请求，在messageReceived接收并处理请求。

// 当前节点，更新task状态，发送transport请求
private void performLocalAction(ClusterState state, ShardRouting primary, DiscoveryNode node, IndexMetaData indexMetaData) {
   setPhase(task, "waiting_on_primary");
   performAction(...);
}
// 其他节点，检查版本号，更新task状态，发送transport请求
private void performRemoteAction(ClusterState state, ShardRouting primary, DiscoveryNode node) {
   /.../
   setPhase(task, "rerouted");
   performAction(node, actionName, false, request);
}
// 发送tansport请求
private void performAction(final DiscoveryNode node, final String action, final boolean isPrimaryAction,
                           final TransportRequest requestToPerform) {
    transportService.sendRequest(node, action, requestToPerform, transportOptions, new TransportResponseHandler<Response>(){
      /.../
    } );
}

1.3.主分片节点流程

代码入口：TransportReplicationAction.PrimaryOperationTransportHandler#messageReceived(ConcreteShardRequest request, TransportChannel channel, Task task) {…}

1.3.1.检查请求

主要检查：1.当前是否为主分片；2.allocationId是否是预期值；3.PrimaryTerm是否是预期值

if (shardRouting.primary() == false) {
    throw new ReplicationOperation.RetryOnPrimaryException(...);
}
final String actualAllocationId = shardRouting.allocationId().getId();
if (actualAllocationId.equals(targetAllocationID) == false) {
    throw new ShardNotFoundException(...);
}
final long actualTerm = indexShard.getPendingPrimaryTerm();
if (actualTerm != primaryTerm) {
    throw new ShardNotFoundException(...);
}

1.3.2.查看主分片是否已经迁移

1.3.2.1.如果已经迁移

1.将phase状态设为“primary_delegation”；2.关闭当前分片的primaryShardReference，及时释放资源；3.获取已经迁移到的目标节点，将请求转发到该节点，并等待执行结果；4.拿到结果后，将task状态更新为“finish”。

transportService.sendRequest(relocatingNode, transportPrimaryAction,
    new ConcreteShardRequest<>(request, primary.allocationId().getRelocationId(), primaryTerm),
    transportOptions,
    new TransportChannelResponseHandler<Response>(logger, channel, "rerouting indexing to target primary " + primary,
        reader) {
        @Override
        public void handleResponse(Response response) {
            setPhase(replicationTask, "finished");
            super.handleResponse(response);
        }
        @Override
        public void handleException(TransportException exp) {
            setPhase(replicationTask, "finished");
            super.handleException(exp);
        }
    });

1.3.2.2.如果没有迁移

1.将task状态更新为“primary”；2.主分片准备操作(主要部分)；3.转发请求给副本分片

setPhase(replicationTask, "primary");
final ActionListener<Response> listener = createResponseListener(primaryShardReference);
createReplicatedOperation(request,
        ActionListener.wrap(result -> result.respond(listener), listener::onFailure),
        primaryShardReference).execute(); // 入口

主分片执行操作：1.检查active shards是否满足条件，即查询路由表，看当前active shards是否可以满足wait_for_active_shards，如果不满足，超时后抛异常，默认为1，即主分片执行成功即可；2.执行写入操作(下面详细说明)；3.更新LocalCheckpoint；4.转发请求给副本分片；

// 1. 执行写入操作
primaryResult = primary.perform(request);
// 2. 更新localCheckpoint
primary.updateLocalCheckpointForShard(primaryRouting.allocationId().getId(), primary.localCheckpoint());
// 3. 构建副本请求
final ReplicaRequest replicaRequest = primaryResult.replicaRequest();