Lucene查询的底层实现IndexSearch（下）

qq_47537678

已于 2022-03-27 11:17:30 修改

阅读量679

点赞数

分类专栏：项目实训文章标签： python java

于 2022-03-27 03:01:04 首次发布

本文链接：https://blog.csdn.net/qq_47537678/article/details/123767557

版权

系列文章目录

（一）问答系统的文段检索
 （二）lucene全文检索底层原理理解
 （三）Lucene查询的底层实现IndexSearch（上）
（四）Lucene查询的底层实现IndexSearch（下）

文章目录

系列文章目录
获取LeafCollector
- - collector是通过TopScoreDocCollector.create()创建
  - SimpleTopScoreDocCollector实现

获取LeafCollector

IndexSearcher的search方法：
主要作用：检索文档和文档打分
接受LeafReaderContext列表、之前创建好的Weight、第一步创建的collector三个参数

检索文档
检索所有的segment，并将结果汇总
方法中就是一个循环，遍历所有的LeafReaderContex进行检索和打分

 //此方法以独占方式对所有给定的叶子执行搜索。要搜索所有搜索者，请使用 leafContexts
    //BooleanQuery.TooManyClauses- 如果查询将超过 BooleanQuery.getMaxClauseCount（） 子句
    protected void search(List<LeafReaderContext> leaves, Weight weight, Collector collector)
            throws IOException {
   

        // TODO: should we make this
        // threaded...? the Collector could be sync'd?
        // always use single thread:
        for (LeafReaderContext ctx : leaves) {
    // search each subreader
            final LeafCollector leafCollector;
            try {
   
                leafCollector = collector.getLeafCollector(ctx);
            } catch (CollectionTerminatedException e) {
   
                // there is no doc of interest in this reader context
                // continue with the following leaf
                continue;
            }
            //此类用于一次对一系列文档进行评分，并由 Weight.bulkScorer（org.apache.lucene.index.LeafReaderContext） 返回。
            // 只有具有更优化的跨一系列文档评分方式的查询才需要重写此方式。
            // 否则，默认实现将围绕 Weight.scorer（org.apache.lucene.index.LeafReaderContext） 返回的 Scorer。
            BulkScorer scorer = weight.bulkScorer(ctx);
            if (scorer != null) {
   
                try {
   
                    scorer.score(leafCollector, ctx.reader().getLiveDocs());
                } catch (CollectionTerminatedException e) {
   
                    // collection was terminated prematurely
                    // continue with the following leaf
                }
            }
        }
    }

 public TopDocs searchAfter(ScoreDoc after, Query query, int numHits) throws IOException {
   
        //reader在读取索引目录时生成的，reader.maxDoc会返回索引库总共的文档数量
        final int limit = Math.max(1, reader.maxDoc());
        if (after != null && after.doc >= limit) {
   
            throw new IllegalArgumentException("after.doc exceeds the number of documents in the reader: after.doc="
                    + after.doc + " limit=" + limit);
        }

        final int cappedNumHits = Math.min(numHits, limit);

        //CollectorManager是一个接口, 它用于并行化的处理查询请求
        //创建一个CollectorManager，它使用一个共享命中计数器来维护命中数
        //*以及一个共享的{@link MaxScoreAccumulator}来传播最小分数accross段
        final CollectorManager<TopScoreDocCollector, TopDocs> manager = new CollectorManager<TopScoreDocCollector, TopDocs>()

最低0.47元/天解锁文章

qq_47537678

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lucene查询的底层实现IndexSearch（下）

系列文章目录提示：这里可以添加系列文章的所有文章的目录，目录需要自己手动添加例如：第一章 Python 机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录系列文章目录前言一、pandas是什么？二、使用步骤1.引入库2.读入数据总结前言提示：这里可以添加本文要记录的大概内容：例如：随着人工智能的不断发展，机器学习这门技术也越来越重要，很多人都开启了学习机器学习，本文就介绍了机器学习的基础内容。提示：以下是本篇文章正文内容，下面案例可
复制链接

扫一扫