solr-6.1.0源码分析—创建索引
本章假设nutch客户端通过solrindex命令发送一个http请求,请求命令如下,
bin/nutch solrindex http://127.0.0.1/solr/test crawl/crawldb -linkdb crawl/linkdb -dir crawl/segments/ -filter -normalize
请求地址中的test为solr中的core名称,参数如下,
wt=javabin&version=2
该请求到达服务器后,会经过tomcat框架调用SolrDispatchFilter的doFilter函数,继而调用HttpSolrCall的call函数。
SolrDispatchFilter::doFilter->HttpSolrCall::call
public Action call() throws IOException {
...
init();
HttpServletResponse resp = response;
switch (action) {
case ADMIN:
...
case REMOTEQUERY:
...
case PROCESS:
final Method reqMethod = Method.getMethod(req.getMethod());
SolrQueryResponse solrRsp = new SolrQueryResponse();
execute(solrRsp);
QueryResponseWriter responseWriter = core.getQueryResponseWriter(solrReq);
writeResponse(solrRsp, responseWriter, reqMethod);
return RETURN;
}
init函数进行处理请求前的准备工作,getMethod函数获得请求方法,例如GET和POST。call函数创建SolrQueryResponse用于封装返回信息,execute函数调用对应的Handler处理请求,包含处理请求的主要逻辑。getQueryResponseWriter根据请求参数javabin获取BinaryResponseWriter,然后调用writeResponse函数将结果返回给客户端。
SolrDispatchFilter::doFilter->HttpSolrCall::call->init
private void init() throws Exception {
corename = path.substring(1, idx);
core = cores.getCore(corename);
path = path.substring(idx);
config = core.getSolrConfig();
SolrRequestParsers parser = config.getRequestParsers();
extractHandlerFromURLPath(parser);
solrReq = parser.parse(core, path, req);
action = PROCESS;
return;
}
getCore函数获得在SolrDispatchFilter的init函数中创建的SolrCore,getSolrConfig获得solr的配置信息SolrConfig,继而调用getRequestParsers获得请求解析器SolrRequestParsers。extractHandlerFromURLPath函数根据请求路径获得处理类,这里的请求路径为/update,最终获得UpdateRequestHandler。最后调用SolrRequestParsers的parse函数创建SolrQueryRequest封装HttpServletRequest。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute
protected void execute(SolrQueryResponse rsp) {
solrReq.getContext().put("webapp", req.getContextPath());
solrReq.getCore().execute(handler, solrReq, rsp);
}
public void execute(SolrRequestHandler handler, SolrQueryRequest req, SolrQueryResponse rsp) {
...
handler.handleRequest(req,rsp);
...
}
execute会进而调用SolrCore的execute函数,该函数的主要逻辑是调用对应请求路径的SolrRequestHandler的handleRequest函数处理请求。本章假设该SolrRequestHandler是UpdateRequestHandler,因此下面看其的handleRequest函数。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest
public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) {
...
handleRequestBody( req, rsp );
...
}
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
SolrParams params = req.getParams();
UpdateRequestProcessorChain processorChain =
req.getCore().getUpdateProcessorChain(params);
UpdateRequestProcessor processor = processorChain.createProcessor(req, rsp);
ContentStreamLoader documentLoader = newLoader(req, processor);
Iterable<ContentStream> streams = req.getContentStreams();
for (ContentStream stream : streams) {
documentLoader.load(req, rsp, stream, processor);
}
}
handleRequest函数主要调用handleRequestBody函数处理请求。getUpdateProcessorChain获得在SolrCore构造函数中创建的UpdateRequestProcessorChain。createProcessor创建处理该请求的processor。newLoader函数返回ContentStreamLoader。最后调用ContentStreamLoader的load函数处理请求。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->UpdateRequestProcessorChain::createProcessor
public UpdateRequestProcessor createProcessor(SolrQueryRequest req,
SolrQueryResponse rsp)
{
UpdateRequestProcessor processor = null;
UpdateRequestProcessor last = null;
for (int i = chain.size() - 1; i >= 0; i--) {
UpdateRequestProcessorFactory factory = chain.get(i);
processor = factory.getInstance(req, rsp, last);
last = processor == null ? last : processor;
}
return last;
}
默认的chain链表中包含RunUpdateProcessorFactory、DistributedUpdateProcessorFactory和LogUpdateProcessorFactory。RunUpdateProcessorFactory的getInstance函数返回RunUpdateProcessor。DistributedUpdateProcessorFactory的getInstance函数返回DistributedUpdateProcessor。LogUpdateProcessorFactory的getInstance函数返回LogUpdateProcessorFactory。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load
public void load(SolrQueryRequest req, SolrQueryResponse rsp,
ContentStream stream, UpdateRequestProcessor processor) throws Exception {
type = stream.getContentType();
int idx = type.indexOf(';');
if (idx > 0) {
type = type.substring(0, idx);
}
loader = loaders.get(type);
loader.load(req, rsp, stream, processor);
}
根据数据类型获得对应的loader,其初始化代码如下所示,
protected Map<String,ContentStreamLoader> createDefaultLoaders(NamedList args) {
SolrParams p = null;
if(args!=null) {
p = SolrParams.toSolrParams(args);
}
Map<String,ContentStreamLoader> registry = new HashMap<>();
registry.put("application/xml", new XMLLoader().init(p) );
registry.put("application/json", new JsonLoader().init(p) );
registry.put("application/csv", new CSVLoader().init(p) );
registry.put("application/javabin", new JavabinLoader().init(p) );
registry.put("text/csv", registry.get("application/csv") );
registry.put("text/xml", registry.get("application/xml") );
registry.put("text/json", registry.get("application/json"));
pathVsLoaders.put(JSON_PATH,registry.get("application/json"));
pathVsLoaders.put(DOC_PATH,registry.get("application/json"));
pathVsLoaders.put(CSV_PATH,registry.get("application/csv"));
return registry;
}
假设这里的类型为application/xml,则获得XMLLoader,继而调用其load函数处理请求。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load->XMLLoader::load
public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {
final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());
InputStream is = = stream.getStream();
XMLStreamReader parser = inputFactory.createXMLStreamReader(is, charset);
this.processUpdate(req, processor, parser);
}
首先获得数据的编码类型和输入流,然后通过createXMLStreamReader函数创建XMLStreamReader,最后调用processUpdate函数处理请求。
SolrDispatchFilter::doFilter->HttpSolrCall::call->execute->SolrCore::execute->UpdateRequestHandler::handleRequest->handleRequestBody->ContentStreamLoader::load->XMLLoader::load->processUpdate
void processUpdate(SolrQueryRequest req, UpdateRequestProcessor processor, XMLStreamReader parser)
throws XMLStreamException, IOException, FactoryConfigurationError {
AddUpdateCommand addCmd = null;
SolrParams params = req.getParams();
while (true) {
int event = parser.next();
switch (event) {
case XMLStreamConstants.END_DOCUMENT:
parser.close();
return;
case XMLStreamConstants.START_ELEMENT:
String currTag = parser.getLocalName();
if (currTag.equals(UpdateRequestHandler.ADD)) {
addCmd = new AddUpdateCommand(req);
addCmd.commitWithin = params.getInt(UpdateParams.COMMIT_WITHIN, -1);
addCmd.overwrite = params.getBool(UpdateParams.OVERWRITE, true);
for (int i = 0; i < parser.getAttributeCount(); i++) {
String attrName = parser.getAttributeLocalName(i);
String attrVal = parser.getAttributeValue(i);
if (UpdateRequestHandler.OVERWRITE.equals(attrName)) {
addCmd.overwrite = StrUtils.parseBoolean(attrVal);
} else if (UpdateRequestHandler.COMMIT_WITHIN.equals(attrName)) {
addCmd.commitWithin = Integer.parseInt(attrVal);
}
}
} else if ("doc".equals(currTag)) {
if(addCmd != null) {
addCmd.clear();
addCmd.solrDoc = readDoc(parser);
processor.processAdd(addCmd);
}
} else if (UpdateRequestHandler.COMMIT.equals(currTag) || UpdateRequestHandler.OPTIMIZE.equals(currTag)) {
...
}
else if (UpdateRequestHandler.ROLLBACK.equals(currTag)) {
...
}
else if (UpdateRequestHandler.DELETE.equals(currTag)) {
...
}
break;
}
}
}
processUpdate函数遍历请求的标签,这里假设为add标签,则创建AddUpdateCommand并从请求中添加相关参数。创建完AddUpdateCommand后,继续遍历请求,从中找到doc标签,将请求的文档输入解析成SolrInputDocument类型的文档并添加到AddUpdateCommand中,然后调用processor的processAdd函数将该文档添加到lucene索引中。下面假设processor链表依次为LogUpdateProcessor、DistributedUpdateProcessor和RunUpdateProcessor。其中RunUpdateProcessor的processAdd函数完成lucene索引的添加工作。
LogUpdateProcessor::processAdd
public void processAdd(AddUpdateCommand cmd) throws IOException {
...
if (next != null) next.processAdd(cmd);
...
}
LogUpdateProcessor的processAdd函数并没有重要的工作,next为DistributedUpdateProcessor,processAdd函数会继而调用DistributedUpdateProcessor的processAdd函数。
DistributedUpdateProcessor::processAdd
public void processAdd(AddUpdateCommand cmd) throws IOException {
...
versionAdd(cmd);
...
}
protected boolean versionAdd(AddUpdateCommand cmd) throws IOException {
...
doLocalAdd(cmd);
...
}
versionAdd的省略的部分是为文档添加版本,然后调用doLocalAdd继续处理,doLocalAdd会获取下一个RunUpdateProcessor并调用其processAdd函数。
RunUpdateProcessor::processAdd
public void processAdd(AddUpdateCommand cmd) throws IOException {
updateHandler.addDoc(cmd);
super.processAdd(cmd);
}
updateHandler为DirectUpdateHandler2。addDoc函数用于向lucene索引中添加文档。
其父类的processAdd进而调用下一个Processor的processAdd函数,如果没有就直接返回。
RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc
public int addDoc(AddUpdateCommand cmd) throws IOException {
return addDoc0(cmd);
}
private int addDoc0(AddUpdateCommand cmd) throws IOException {
...
doNormalUpdate(cmd);
...
}
addDoc进而调用doNormalUpdate函数添加文档。
RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc->doNormalUpdate
private void doNormalUpdate(AddUpdateCommand cmd) throws IOException {
Term updateTerm;
Term idTerm = new Term(idField.getName(), cmd.getIndexedId());
boolean del = false;
updateTerm = idTerm;
RefCounted<IndexWriter> iw = solrCoreState.getIndexWriter(core);
IndexWriter writer = iw.get();
Document luceneDocument = cmd.getLuceneDocument();
writer.updateDocument(updateTerm, luceneDocument);
}
idTerm一般设置为url地址,同一个url地址的文档最后会被覆盖。getIndexWriter和get函数最终获得SolrIndexWriter。getLuceneDocument函数将SolrInputDocument中的数据添加到lucene的文档Document中。然后调用SolrIndexWriter的updateDocument函数将该文档更新到lucene索引中,其内部会调用lucene的DocumentsWriter的updateDocument函数更新文档。
RunUpdateProcessor::processAdd->DirectUpdateHandler2::addDoc->doNormalUpdate->AddUpdateCommand::getLuceneDocument
public Document getLuceneDocument() {
return DocumentBuilder.toDocument(getSolrInputDocument(), req.getSchema());
}
public static Document toDocument( SolrInputDocument doc, IndexSchema schema ){
Document out = new Document();
for( SolrInputField field : doc ) {
String name = field.getName();
SchemaField sfield = schema.getFieldOrNull(name);
for( Object v : field ) {
addField(out, sfield, v, 1f);
usedFields.add(sfield.getName());
}
}
...
return out;
}
getFieldOrNull表示获取managed-scheme文件中关于field的配置。addField将域添加到lucene的文档中。这里依次遍历SolrInputDocument文档中的域,根据managed-schema文件的配置,将其添加到lucene的文档Document中。