IndexSearcher排序

最新推荐文章于 2019-01-21 17:28:38 发布

ningbohezhijun

最新推荐文章于 2019-01-21 17:28:38 发布

阅读量914

点赞数

分类专栏： Lucene

本文链接：https://blog.csdn.net/ningbohezhijunbl/article/details/20546261

版权

Lucene 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

http://www.cnblogs.com/huangfox/archive/2010/10/18/1854403.html

1.IndexSearcher中和排序相关的方法及sort类、SortField类（api级别）；

用IndexSearcher直接排序一般使用方法

search(Weight weight, Filter filter, int n, Sort sort)
Expert: Low-level search implementation with arbitrary sorting.

该方法只需传入一个sort实例。

Constructor Summary
`Sort()` Sorts by computed relevance.
`Sort(SortField... fields)` Sorts in succession by the criteria in each SortField.
`Sort(SortField field)` Sorts by the criteria in the given SortField.

在sort实例中，决定对哪个字段进行排序，按照什么数据类型排序，是升序还是降序，由SortField说的算。

两个最基础的构造方法如下：

SortField(String field, int type)
Creates a sort by terms in the given field with the type of term values explicitly given.

SortField(String field, int type, boolean reverse)
Creates a sort, possibly in reverse, by terms in the given field with the type of term values explicitly given.

通过这些类我们能很方便的完成检索结果的排序。

简单示例：

package ceshi0305;

import java.io.File;

import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class SortDemo {
	public static void main(String[] args) {
		new SortDemo().search();
	}
	
	public void search() {
		Directory dir = null;
		IndexReader reader = null;
		IndexSearcher searcher = null;
		try {
			dir = FSDirectory.open(new File("d:\\20140303index"));
			reader = DirectoryReader.open(dir);
			searcher = new IndexSearcher(reader);
			
			SortField sortF = new SortField("f1", SortField.TYPE.STRING);
			Sort sort = new Sort(sortF);
			TopDocs res = searcher.search(new MatchAllDocsQuery(), null, 10, sort);
			for (ScoreDoc doc : res.scoreDocs) {
				System.out.println(searcher.doc(doc.doc));
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

经测试，在构造SortField，第三个参数传true时，为方向排序（没有即为正向）。另外，如果Filed “f1”是多值域，貌似只针对该域的第一个值进行排序（目前这么认为）。

2.按文档得分进行排序；

IndexSearcher默认的搜索就是按照文档得分进行排序的。

在SortField中将类型设置为SCORE即可。

SCORE
Sort by document score (relevancy).

3.按文档内部id进行排序；

每个文档进入索引的时候都会分配一个id号，有时可能会需要按照这个id号进行排序，

那么将SortField中类型设置为DOC即可。

DOC
Sort by document number (index order).

注意点：

对日期、价格等数据排序都要选择合适的排序类型，不单单是满足业务的需要，而且用INT、FLOAT等数值型的排序

比STRING效率要高。

5.多Field排序；

...实例代码：

 
 SortField sortF  
 = 
 new 
  SortField( 
 " 
 f 
 " 
 , SortField.INT);
 SortField sortF2  
 = 
 new 
  SortField( 
 " 
 f1 
 " 
 , SortField.INT);
 Sort sort  
 = 
 new 
  Sort( 
 new 
  SortField[]{sortF , sortF2});
 TopFieldDocs docs  
 = 
  searcher.search(query,  
 null 
 ,  
 10 
 , sort); 

结果：

 
 Document 
 < 
 stored,indexed 
 < 
 f: 
 - 
 2 
 > 
  stored,indexed 
 < 
 f1: 
 20000128 
 > 
  stored,indexed 
 < 
 a:fox 
 >> 
 
 Document 
 < 
 stored,indexed 
 < 
 f: 
 0 
 > 
  stored,indexed 
 < 
 f1: 
 20050719 
 > 
  stored,indexed 
 < 
 a:fox 
 >> 
 
 Document 
 < 
 stored,indexed 
 < 
 f: 
 5 
 > 
  stored,indexed 
 < 
 f1: 
 20101019 
 > 
  stored,indexed 
 < 
 a:fox 
 >> 
 
 Document 
 <stored,indexed<f:10> stored,indexed<f1:20090512> stored,indexed<a:fox>>
 Document<stored,indexed<f:10> stored,indexed<f1:20100215> stored,indexed<a:fox>> 

注意点：

先按照 f字段进行排序，如果 f字段值相等，再按照 f1字段进行排序。

这个顺序由 SortField数组中 SortField实例的顺序一致。

6.通过改变boost值来改变文档的得分。

默认排序（相关度排序），原始排序情况：

 
 Document 
 < 
 stored,indexed 
 < 
 f: 
 10 
 > 
  stored,indexed 
 < 
 f1: 
 20100215 
 > 
  stored,indexed 
 < 
 a:fox 
 >> 
 
 Document 
 < 
 stored,indexed 
 < 
 f: 
 10 
 > 
  stored,indexed 
 < 
 f1: 
 20090512 
 > 
  stored,indexed 
 < 
 a:fox 
 >> 
 
 Document 
 < 
 stored,indexed 
 < 
 f: 
 5 
 > 
  stored,indexed 
 < 
 f1: 
 20101019 
 > 
  stored,indexed 
 < 
 a:fox 
 >> 
 
 Document 
 < 
 stored,indexed 
 < 
 f: 
 - 
 2 
 > 
  stored,indexed 
 < 
 f1: 
 20000128 
 > 
  stored,indexed 
 < 
 a:fox 
 >> 
 
 Document 
 <stored,indexed<f:0> stored,indexed<f1:20050719> stored,indexed<a:fox>> 

在Lucene4.6版本中Document已经没有setBoost这个方法，如果一定要给文档整体打分，那么可以增加一个字段Boost，他的值为所需要的分数。再对Boost字段排序即可。

for (int i = 0; i < 500000; i++) {
				Document doc = new Document();
				f1.setStringValue("f1 hello doc" + i);
				doc.add(f1);
				f2.setStringValue("f2 world doc" + i);
				doc.add(f2);
				doc.add(new IntField("Boost", i, Store.YES));
				writer.addDocument(doc);
			}