ElasticSearch源码分析之二：索引过程源码概要分析

最新推荐文章于 2023-02-09 15:17:40 发布

Kevin-林

最新推荐文章于 2023-02-09 15:17:40 发布

阅读量798

点赞数

文章标签：全文搜索 elasticsearch 源码索引

本文链接：https://blog.csdn.net/ljc2008110/article/details/48653395

版权

elasticsearch的索引逻辑简单分析，这里只是理清主要的脉络，一些细节方面以后的文章或会阐述。

假如通过java api来调用es的索引接口，先是构造成一个json串（es里表示为XContent，是对要处理的内容进行抽象），在IndexRequest里面指定要索引文档到那个索引库（index）、其类型（type）还有文档的id，如果没有指定文档的id，es会通过UUID工具自动生成一个uuid，代码在IndexRequest的process方法内。

   if (allowIdGeneration) { 
         if (id == null) { 
             id(UUID.randomBase64UUID()); 
            opType(IndexRequest.OpType.CREATE); 
         } 
     }

然后使用封装过netty的TransportService通过tcp协议发送请求到es服务器（rest的话就是通过http协议）。

服务器获得TransportAction后解析索引请求（TransportShardReplicationOperationAction）。到AsyncShardOperationAction.start()方法开始进行分片操作，先读取集群状态，把目标索引及其分片信息提取出来，根据索引数据的id、类型以及索引分片信息进行哈希取模，确定把该条数据分配到那个分片。

    private int shardId(ClusterStateclusterState, String index, String type, @Nullable String id, @Nullable Stringrouting) { 
         if (routing == null) { 
             if (!useType) { 
                 return Math.abs(hash(id) %indexMetaData(clusterState, index).numberOfShards()); 
             } else { 
                 return Math.abs(hash(type, id)% indexMetaData(clusterState, index).numberOfShards()); 
             } 
         } 
         return Math.abs(hash(routing) %indexMetaData(clusterState, index).numberOfShards()); 
     }

并找到数据要分配到的分片的主分片，先把索引请求提交到主分片处理（TransportIndexAction.shardOperationOnPrimary）。

判断是否必须要指定routing值

    MappingMetaData mappingMd =clusterState.metaData().index(request.index()).mappingOrDefault(request.type()); 
      if (mappingMd != null &&mappingMd.routing().required()) { 
          if (request.routing() == null) { 
              throw newRoutingMissingException(request.index(), request.type(), request.id()); 
          } 
      }

判断索引操作的类型，索引操作有两种，一种是INDEX，当要索引的文档id已经存在时，不会覆盖原来的文档，只是更新原来文档。一种是CREATE，当索引文档id存在时，会抛出该文档已存在的错误。

    if (request.opType() ==IndexRequest.OpType.INDEX)

调用InternalIndexShard进行索引操作

    Engine.Indexindex = indexShard.prepareIndex(sourceToParse) 
            .version(request.version()) 
            .versionType(request.versionType()) 
           .origin(Engine.Operation.Origin.PRIMARY); 
    indexShard.index(index);

通过（InternalIndexShard）查找与请求索引数据类型（type）相符的mapping。对要索引的json字符串进行解析，根据mapping转换为对应的解析结果ParsedDocument 。

    public Engine.IndexprepareIndex(SourceToParse source) throws ElasticSearchException { 
        long startTime =System.nanoTime(); 
        DocumentMapper docMapper =mapperService.documentMapperWithAutoCreate(source.type()); 
        ParsedDocument doc =docMapper.parse(source); 
        return new Engine.Index(docMapper,docMapper.uidMapper().term(doc.uid()), doc).startTime(startTime); 
    }

最后调用RobinEngine中的相关方法(添加或修改)对底层lucene进行操作，这里是写入到lucene的内存索引中（RobinEngine.innerIndex）。

    if (currentVersion == -1) { 
           // document does not exists, we canoptimize for create 
           if (index.docs().size() > 1){ 
              writer.addDocuments(index.docs(), index.analyzer()); 
           } else { 
              writer.addDocument(index.docs().get(0), index.analyzer()); 
           } 
       } else { 
           if (index.docs().size() > 1){ 
              writer.updateDocuments(index.uid(), index.docs(),index.analyzer()); 
           } else { 
              writer.updateDocument(index.uid(), index.docs().get(0), index.analyzer()); 
           } 
       }

写入内存索引后还会写入到Translog（Translog是对索引的操作日志，会记录没有持久化的操作）中，防止flush前断电导致索引数据丢失。

    Translog.Location translogLocation =translog.add(new Translog.Create(create));

主分片索引请求完就把请求发给副本进行索引操作。最后把成功信息返回给客户端。

Kevin-林

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch源码分析之二：索引过程源码概要分析

elasticsearch的索引逻辑简单分析，这里只是理清主要的脉络，一些细节方面以后的文章或会阐述。假如通过java api来调用es的索引接口，先是构造成一个json串（es里表示为XContent，是对要处理的内容进行抽象），在IndexRequest里面指定要索引文档到那个索引库（index）、其类型（type）还有文档的id，如果没有指定文档的id，es会通过UUID工具自动生成一个
复制链接

扫一扫