系列文章目录
(一)问答系统的文段检索
(二)lucene全文检索底层原理理解
(三)Lucene查询的底层实现IndexSearch(上)
(四)Lucene查询的底层实现IndexSearch(下)
文章目录
获取LeafCollector
IndexSearcher的search方法:
主要作用:检索文档和文档打分
接受LeafReaderContext列表、之前创建好的Weight、第一步创建的collector三个参数
- 检索文档
检索所有的segment,并将结果汇总
方法中就是一个循环,遍历所有的LeafReaderContex进行检索和打分
//此方法以独占方式对所有给定的叶子执行搜索。要搜索所有搜索者,请使用 leafContexts
//BooleanQuery.TooManyClauses- 如果查询将超过 BooleanQuery.getMaxClauseCount() 子句
protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector)
throws IOException {
// TODO: should we make this
// threaded...? the Collector could be sync'd?
// always use single thread:
for (LeafReaderContext ctx : leaves) {
// search each subreader
final LeafCollector leafCollector;
try {
leafCollector = collector.getLeafCollector(ctx);
} catch (CollectionTerminatedException e) {
// there is no doc of interest in this reader context
// continue with the following leaf
continue;
}
//此类用于一次对一系列文档进行评分,并由 Weight.bulkScorer(org.apache.lucene.index.LeafReaderContext) 返回。
// 只有具有更优化的跨一系列文档评分方式的查询才需要重写此方式。
// 否则,默认实现将围绕 Weight.scorer(org.apache.lucene.index.LeafReaderContext) 返回的 Scorer。
BulkScorer scorer = weight.bulkScorer(ctx);
if (scorer != null) {
try {
scorer.score(leafCollector, ctx.reader().getLiveDocs());
} catch (CollectionTerminatedException e) {
// collection was terminated prematurely
// continue with the following leaf
}
}
}
}
public TopDocs searchAfter(ScoreDoc after, Query query, int numHits) throws IOException {
//reader在读取索引目录时生成的,reader.maxDoc会返回索引库总共的文档数量
final int limit = Math.max(1, reader.maxDoc());
if (after != null && after.doc >= limit) {
throw new IllegalArgumentException("after.doc exceeds the number of documents in the reader: after.doc="
+ after.doc + " limit=" + limit);
}
final int cappedNumHits = Math.min(numHits, limit);
//CollectorManager是一个接口, 它用于并行化的处理查询请求
//创建一个CollectorManager,它使用一个共享命中计数器来维护命中数
//*以及一个共享的{@link MaxScoreAccumulator}来传播最小分数accross段
final CollectorManager<TopScoreDocCollector, TopDocs> manager = new CollectorManager<TopScoreDocCollector, TopDocs>() {
priv