利用lucene的Filter,具体可以查看lucene的api中的org.apache.lucene.search.CachingWrapperFilter,它可以缓存上次的搜索结果,从而实现在结果中的搜索。
01.import java.io.IOException;
02.import org.apache.lucene.analysis.SimpleAnalyzer;
03.import org.apache.lucene.document.Document;
04.import org.apache.lucene.document.Field;
05.import org.apache.lucene.index.IndexWriter;
06.import org.apache.lucene.queryParser.ParseException;
07.import org.apache.lucene.queryParser.QueryParser;
08.import org.apache.lucene.search.CachingWrapperFilter;
09.import org.apache.lucene.search.Filter;
10.import org.apache.lucene.search.Hits;
11.import org.apache.lucene.search.IndexSearcher;
12.import org.apache.lucene.search.Query;
13.import org.apache.lucene.search.QueryFilter;
14.public class IndexTest {
15.
16. public static void main(String[] args) throws IOException, ParseException {
17. index();
18. search("day"); //简单搜索
19. searchInResult("day", "you"); //在结果集中搜索
20. }
21.
22. public static void index() throws IOException {
23. IndexWriter writer = new IndexWriter("d:/tesindex",new SimpleAnalyzer(), true);
24. writer.setMaxMergeDocs(1000);
25. writer.setMergeFactor(100);
26. for (int i = 0; i < 10; i++) {
27. Document doc = new Document();
28. String content = "How do you do?";
29. if (i >= 5) {
30. content = "What's a good day. ";
31. }
32. if (i >= 7) {
33. content = "Nice day. Thanks you!";
34. }
35. doc.add(new Field("content", content, Field.Store.YES,Field.Index.TOKENIZED));
36. writer.addDocument(doc);
37. }
38. }
39.
40. //简单实现对qw的搜索.
41. public static void search(String qw) throws IOException, ParseException {
42. QueryParser queryParser = new QueryParser("content",new SimpleAnalyzer());
43. Query query = queryParser.parse(qw.trim());
44. QueryFilter filter = new QueryFilter(query);
45.
46. search(query, filter);
47. }
48.
49. //在搜索oldqw的结果集中搜索qw.
50. public static void searchInResult(String qw, String oldqw) throws ParseException, IOException {
51. QueryParser queryParser = new QueryParser("content",new SimpleAnalyzer());
52. Query query = queryParser.parse(qw.trim());
53. Query oldQuery = queryParser.parse(oldqw.trim());
54. QueryFilter oldFilter = new QueryFilter(oldQuery);
55. CachingWrapperFilter filter = new CachingWrapperFilter(oldFilter);
56.
57. search(query, filter);
58. }
59.
60. private static void search(Query query, Filter filter) throws IOException, ParseException {
61. IndexSearcher ins = new IndexSearcher("d:/tesindex");
62. Hits hits = ins.search(query, filter);
63. for (int i = 0; i < hits.length(); i++) {
64. Document doc = hits.doc(i);
65. System.out.println(doc.get("content"));
66. }
67. System.out.println();
68. }
69.}
Lucene.net的搜索结果的百分比相关度值是如何实现的?
答:
Hits result = searcher.Search(q);
float score = result.Score(n) ;//n为查询结果文挡序号,返回的是一个<=1f的float的值,表示为百分比字符串:score.ToString("0%") ;
问:如何通过编程的方式改变Lucene.net的锁文件存放的位置?
答:
Lucene.net的锁文件默认是存放系统临时文件夹,可以通过下面的语句来修改
System.Configuration.ConfigurationSettings.AppSettings.Add("Lucene.Net.lockDir", "your new lockDir") ;
通过FSDirectory.LOCK_DIR可以获得锁文件存放的位置(文件夹)
问:如何判断某个索引库被锁定,如何强制解除锁定?
答:
具体实现,可以参看Lucene.Net.Store.FSDirectory的Obtain()(判断是否锁定)方法和Release()方法(解除锁定)
备注:还有一个IsLocked方法也可以参考下。
问:如何实现多个索引的联合搜索?
答:
IndexSearcher[] searchers = new IndexSearcher[2];
searchers[0] = new IndexSearcher(dir1) ;
searchers[1] = new IndexSearcher(dir2) ;
MultiSearcher searcher = new MultiSearcher(searchers) ;//或ParallelMultiSearcher searcher = new ParallelMultiSearcher(searchers) ;
searcher.Search(query) ;
ParallelMultiSearcher与MultiSearcher的区别,前者为每一个索引单独开一个线程,以多线程的方式同步搜索;后者是逐个依次搜索,然后合并;
所以ParallelMultiSearcher的搜索总用时是最慢的哪个索引的搜索用时,MultiSearcher则是搜索总用时等于所有索引搜索用时之和;
问:如何实现在结果中搜索?
答:
*方法一,使用CachingWrapperFilter。不能实现无限级的“在结果中搜索”:
QueryParser parser = new QueryParser("content", analyzer);
Query currentQuery = parser.Parse(currentKeyword) ;
Query oldQuery = parser.Parse(oldKeyword) ;
QueryFilter oldFilter = new QueryFilter(oldQuery) ;
CachingWrapperFilter filter = new CachingWrapperFilter(oldFilter) ;
IndexSearcher searcher = new IndexSearcher(indexDir);
Hits result = searcher.Search(currentQuery, filter) ;
*方法二,将多个查询关键词做AND的BooleanQuery或者直接构造查询Sytax传给QueryParser,都可以实现无限级的“在结果中搜索“。
问:BooleanQuery.maxClauseCount的含义?
答:
添加到BooleanQuery的最多的Query数,默认是1024。超过该值会抛出TooManyClauses异常,可以通过BooleanQuery.SetMaxClauseCount(int)设置新的值。
备注:含义解释未明确。
问:如何判断一个索引库是否存在?
答:
string indexPath = "your indexPath" ; //索引所在目录
if (System.IO.Directory.Exists(indexPath) && System.IO.File.Exists(Path.Combine(indexPath,"segments")
//存在
else
//不存在
当然有更直接的方法
if (Lucene.Net.Index.IndexReader.IndexExists(indexPath))
//存在
else
//不存在
Lucene.Net.Index.IndexReader.IndexExists方法内部的实现方式和上面的类似,当然直接用Lucene.Net.Index.IndexReader.IndexExists更可靠些。