solr-6.1.0源码分析---4

solr-6.1.0源码分析—创建索引

本章假设nutch客户端通过solrindex命令发送一个http请求,请求命令如下,

bin/nutch solrindex http://127.0.0.1/solr/test crawl/crawldb -linkdb crawl/linkdb -dir crawl/segments/ -filter -normalize

请求地址中的test为solr中的core名称,参数如下,

wt=javabin&version=2

该请求到达服务器后,会经过tomcat框架调用SolrDispatchFilter的doFilter函数,继而调用HttpSolrCall的call函数。

SolrDispatchFilter::doFilter->HttpSolrCall::call

  public Action call() throws IOException {
    ...

    init();

    HttpServletResponse resp = response;
    switch (action) {
      case ADMIN:
        ...
      case REMOTEQUERY:
        ...
      case PROCESS:
        final Method reqMethod = Method.getMethod(req.getMethod());
        SolrQueryResponse solrRsp = new SolrQueryResponse();
        execute(solrRsp);
        QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq);
        writeResponse(solrRsp, responseWriter, reqMethod);
        return RETURN;
  }

init函数进行处理请求前的准备工作,getMethod函数获得请求方法,例如GET和POST。call函数创建SolrQueryResponse用于封装返回信息,execute函数调用对应的Handler处理请求,包含处理请求的主要逻辑。getQueryResponseWriter根据请求参数javabin获取BinaryResponseWriter,然后调用writeResponse函数将结果返回给客户端。

SolrDispatchFilter::doFilter->HttpSolrCall::call->init

  private void init() throws Exception {

    corename = path.substring(1, idx);
    core = cores.getCore(corename);
    path = path.substring(idx);

    config = core.getSolrConfig();
    SolrRequestParsers parser = config.getRequestParsers();

    extractHandlerFromURLPath(parser);
    solrReq = parser.parse(core, path, req);
    action = PROCESS;
    return;
  }

getCore函数获得在SolrDispatchFilter的init函数中创建的SolrCore,getSolrConfig获得solr的配置信息SolrConfig,继而调用getRequestParsers获得请求解析器SolrRequestParsers。extractHandlerFromURLPath函数根据请求路径获得处理类,这里的请求路径为/update,最终获得UpdateRequestHandler。最后调用SolrRequestParsers的parse函数创建SolrQueryRequest封装HttpServletRequest。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute

  protected void execute(SolrQueryResponse rsp) {
    solrReq.getContext().put("webapp", req.getContextPath());
    solrReq.getCore().execute(handler, solrReq, rsp);
  }

  public void execute(SolrRequestHandler handler, SolrQueryRequest req, SolrQueryResponse rsp) {

    ...

    handler.handleRequest(req,rsp);

    ...

  }

execute会进而调用SolrCore的execute函数,该函数的主要逻辑是调用对应请求路径的SolrRequestHandler的handleRequest函数处理请求。本章假设该SolrRequestHandler是UpdateRequestHandler,因此下面看其的handleRequest函数。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest

  public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {

    ...

    handleRequestBody( req, rsp );

    ...
  }

  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
    SolrParams params = req.getParams();
    UpdateRequestProcessorChain processorChain =
        req.getCore().getUpdateProcessorChain(params);

    UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp);
    ContentStreamLoader documentLoader = newLoader(req, processor);

    Iterable<ContentStream> streams = req.getContentStreams();
    for (ContentStream stream : streams) {
      documentLoader.load(req, rsp, stream, processor);
    }    
  }

handleRequest函数主要调用handleRequestBody函数处理请求。getUpdateProcessorChain获得在SolrCore构造函数中创建的UpdateRequestProcessorChain。createProcessor创建处理该请求的processor。newLoader函数返回ContentStreamLoader。最后调用ContentStreamLoader的load函数处理请求。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->UpdateRequestProcessorChain::createProcessor

  public UpdateRequestProcessor createProcessor(SolrQueryRequest req, 
                                                SolrQueryResponse rsp) 
  {
    UpdateRequestProcessor processor = null;
    UpdateRequestProcessor last = null;

    for (int i = chain.size() - 1; i >= 0; i--) {
      UpdateRequestProcessorFactory factory = chain.get(i);
      processor = factory.getInstance(req, rsp, last);
      last = processor == null ? last : processor;
    }

    return last;
  }

默认的chain链表中包含RunUpdateProcessorFactory、DistributedUpdateProcessorFactory和LogUpdateProcessorFactory。RunUpdateProcessorFactory的getInstance函数返回RunUpdateProcessor。DistributedUpdateProcessorFactory的getInstance函数返回DistributedUpdateProcessor。LogUpdateProcessorFactory的getInstance函数返回LogUpdateProcessorFactory。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load

    public void load(SolrQueryRequest req, SolrQueryResponse rsp,
        ContentStream stream, UpdateRequestProcessor processor) throws Exception {

      type = stream.getContentType();
      int idx = type.indexOf(';');
      if (idx > 0) {
        type = type.substring(0, idx);
      }
      loader = loaders.get(type);
      loader.load(req, rsp, stream, processor);
    }

根据数据类型获得对应的loader,其初始化代码如下所示,

  protected Map<String,ContentStreamLoader> createDefaultLoaders(NamedList args) {
    SolrParams p = null;
    if(args!=null) {
      p = SolrParams.toSolrParams(args);
    }
    Map<String,ContentStreamLoader> registry = new HashMap<>();
    registry.put("application/xml", new XMLLoader().init(p) );
    registry.put("application/json", new JsonLoader().init(p) );
    registry.put("application/csv", new CSVLoader().init(p) );
    registry.put("application/javabin", new JavabinLoader().init(p) );
    registry.put("text/csv", registry.get("application/csv") );
    registry.put("text/xml", registry.get("application/xml") );
    registry.put("text/json", registry.get("application/json"));

    pathVsLoaders.put(JSON_PATH,registry.get("application/json"));
    pathVsLoaders.put(DOC_PATH,registry.get("application/json"));
    pathVsLoaders.put(CSV_PATH,registry.get("application/csv"));
    return registry;
  }

假设这里的类型为application/xml,则获得XMLLoader,继而调用其load函数处理请求。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load->XMLLoader::load

  public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {
    final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());
    InputStream is = = stream.getStream();

    XMLStreamReader parser = inputFactory.createXMLStreamReader(is, charset);
    this.processUpdate(req, processor, parser);
  }

首先获得数据的编码类型和输入流,然后通过createXMLStreamReader函数创建XMLStreamReader,最后调用processUpdate函数处理请求。

SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load->XMLLoader::load->processUpdate

  void processUpdate(SolrQueryRequest req, UpdateRequestProcessor processor, XMLStreamReader parser)
          throws XMLStreamException, IOException, FactoryConfigurationError {
    AddUpdateCommand addCmd = null;
    SolrParams params = req.getParams();
    while (true) {
      int event = parser.next();
      switch (event) {
        case XMLStreamConstants.END_DOCUMENT:
          parser.close();
          return;

        case XMLStreamConstants.START_ELEMENT:
          String currTag = parser.getLocalName();
          if (currTag.equals(UpdateRequestHandler.ADD)) {
            addCmd = new AddUpdateCommand(req);
            addCmd.commitWithin = params.getInt(UpdateParams.COMMIT_WITHIN, -1);
            addCmd.overwrite = params.getBool(UpdateParams.OVERWRITE, true);

            for (int i = 0; i < parser.getAttributeCount(); i++) {
              String attrName = parser.getAttributeLocalName(i);
              String attrVal = parser.getAttributeValue(i);
              if (UpdateRequestHandler.OVERWRITE.equals(attrName)) {
                addCmd.overwrite = StrUtils.parseBoolean(attrVal);
              } else if (UpdateRequestHandler.COMMIT_WITHIN.equals(attrName)) {
                addCmd.commitWithin = Integer.parseInt(attrVal);
              }
            }

          } else if ("doc".equals(currTag)) {
            if(addCmd != null) {
              addCmd.clear();
              addCmd.solrDoc = readDoc(parser);
              processor.processAdd(addCmd);
            }
          } else if (UpdateRequestHandler.COMMIT.equals(currTag) || UpdateRequestHandler.OPTIMIZE.equals(currTag)) {
            ...
          }
          else if (UpdateRequestHandler.ROLLBACK.equals(currTag)) {
            ...
          }
          else if (UpdateRequestHandler.DELETE.equals(currTag)) {
            ...
          }
          break;
      }
    }
  }

processUpdate函数遍历请求的标签,这里假设为add标签,则创建AddUpdateCommand并从请求中添加相关参数。创建完AddUpdateCommand后,继续遍历请求,从中找到doc标签,将请求的文档输入解析成SolrInputDocument类型的文档并添加到AddUpdateCommand中,然后调用processor的processAdd函数将该文档添加到lucene索引中。下面假设processor链表依次为LogUpdateProcessor、DistributedUpdateProcessor和RunUpdateProcessor。其中RunUpdateProcessor的processAdd函数完成lucene索引的添加工作。

LogUpdateProcessor::processAdd

    public void processAdd(AddUpdateCommand cmd) throws IOException {

      ...

      if (next != null) next.processAdd(cmd);

      ...

    }

LogUpdateProcessor的processAdd函数并没有重要的工作,next为DistributedUpdateProcessor,processAdd函数会继而调用DistributedUpdateProcessor的processAdd函数。

DistributedUpdateProcessor::processAdd

  public void processAdd(AddUpdateCommand cmd) throws IOException {

    ...

    versionAdd(cmd);

    ...

  }

  protected boolean versionAdd(AddUpdateCommand cmd) throws IOException {

    ...

    doLocalAdd(cmd);

    ...

  }

versionAdd的省略的部分是为文档添加版本,然后调用doLocalAdd继续处理,doLocalAdd会获取下一个RunUpdateProcessor并调用其processAdd函数。

RunUpdateProcessor::processAdd

  public void processAdd(AddUpdateCommand cmd) throws IOException {
    updateHandler.addDoc(cmd);
    super.processAdd(cmd);
  }

updateHandler为DirectUpdateHandler2。addDoc函数用于向lucene索引中添加文档。
其父类的processAdd进而调用下一个Processor的processAdd函数,如果没有就直接返回。

RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc

  public int addDoc(AddUpdateCommand cmd) throws IOException {
    return addDoc0(cmd);
  }

  private int addDoc0(AddUpdateCommand cmd) throws IOException {

    ...

    doNormalUpdate(cmd);

    ...
  }

addDoc进而调用doNormalUpdate函数添加文档。

RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc->doNormalUpdate

  private void doNormalUpdate(AddUpdateCommand cmd) throws IOException {
    Term updateTerm;
    Term idTerm = new Term(idField.getName(), cmd.getIndexedId());
    boolean del = false;
    updateTerm = idTerm;

    RefCounted<IndexWriter> iw = solrCoreState.getIndexWriter(core);
    IndexWriter writer = iw.get();
    Document luceneDocument = cmd.getLuceneDocument();
    writer.updateDocument(updateTerm, luceneDocument);
  }

idTerm一般设置为url地址,同一个url地址的文档最后会被覆盖。getIndexWriter和get函数最终获得SolrIndexWriter。getLuceneDocument函数将SolrInputDocument中的数据添加到lucene的文档Document中。然后调用SolrIndexWriter的updateDocument函数将该文档更新到lucene索引中,其内部会调用lucene的DocumentsWriter的updateDocument函数更新文档。

RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc->doNormalUpdate->AddUpdateCommand::getLuceneDocument

   public Document getLuceneDocument() {
     return DocumentBuilder.toDocument(getSolrInputDocument(), req.getSchema());
   }

  public static Document toDocument( SolrInputDocument doc, IndexSchema schema ){ 
    Document out = new Document();

    for( SolrInputField field : doc ) {
      String name = field.getName();
      SchemaField sfield = schema.getFieldOrNull(name);    

      for( Object v : field ) {
        addField(out, sfield, v, 1f);
        usedFields.add(sfield.getName());
      }
    }

    ...

    return out;
  }

getFieldOrNull表示获取managed-scheme文件中关于field的配置。addField将域添加到lucene的文档中。这里依次遍历SolrInputDocument文档中的域,根据managed-schema文件的配置,将其添加到lucene的文档Document中。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值