使用lucene,我们通过搜索出来的信息,都是相关性最强的排在前面的,这里涉及到评分机制,在实际生产中必定是要根据具体的业务需求做出更为复杂的自定义评分机制,但这里先简单看看lucene的评分是如何设定的。
private Map<String,Float> scores = new HashMap<String,Float>();
//构造函数
public IndexUtil() {
try {
setDates();
//设置Score相对高的信息
scores.put("itat.org",2.0f);
scores.put("zttc.edu", 1.5f);
directory = FSDirectory.open(new File("d:/lucene/index02"));
} catch (IOException e) {
e.printStackTrace();
}
}
//创建索引
public void index() {
IndexWriter writer = null;
try {
writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35)));
writer.deleteAll();
Document doc = null;
for(int i=0;i<ids.length;i++) {
doc = new Document();
doc.add(new Field("id",ids[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
doc.add(new Field("email",emails[i],Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("email","test"+i+"@test.com",Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("content",contents[i],Field.Store.NO,Field.Index.ANALYZED));
doc.add(new Field("name",names[i],Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
//存储数字
doc.add(new NumericField("attach",Field.Store.YES,true).setIntValue(attachs[i]));
//存储日期
doc.add(new NumericField("date",Field.Store.YES,true).setLongValue(dates[i].getTime()));
//截取@后面的字段
String et = emails[i].substring(emails[i].lastIndexOf("@")+1);
//设置评分
if(scores.containsKey(et)) {
doc.setBoost(scores.get(et));
} else {
doc.setBoost(0.5f);
}
writer.addDocument(doc);
}
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (LockObtainFailedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if(writer!=null)writer.close();
} catch (CorruptIndexException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
测试结果
进行评分前,它是长这样的:
进行评分后,它是长这样的:
注:这里面是的doc进行setBoost,但是之前看一篇博文,Lucene4.x之后好像不能对doc进行评分了,只能对Field进行评分。在学习《Lucene 实战》一书时,发现可以对Query进行加大评分,特别是使用像BooleanQuery这种包含多个Query的查询器。