实验七 高级搜索技术实现
1、实验目的和要求
- Lucene域缓存
- 对搜索结果进行排序
- 多线程查询
- 实现跨度查询
- 使用lucene内置过滤器实现对搜索结果的过滤
- 能实现多索引搜索
2、实验环境
安装有eclipse与JDK 的计算机
3、实验内容
-
使用域缓存读取域值
-
练习使用sort接口及相关API
-
使用MultiFieldQueryParser类进行多线程并发查询
-
使用跨度查询API实现若干语汇单元间带跨度查询
-
使用lucene内置过滤器实现对搜索结果的过滤
-
使用MultiSearcher类进行多索引查询
4、实验步骤
准备工作:先在磁盘创建一个索引,里面加入若干文档。
String Index_Store_Path = “D:/index”;
File file = new File(Index_Store_Path);
try {
Directory Index = FSDirectory.open(file);
IndexWriter writer = new IndexWriter(Index, new StandardAnalyzer(Version.LUCENE_CURRENT), true,
MaxFieldLength.LIMITED);
writer.setUseCompoundFile(false);Document doc1 = new Document(); Field f11 = new Field("bookNumber", "0000001", Field.Store.YES, Field.Index.NOT_ANALYZED); Field f12 = new Field("bookName", " 钢铁是怎样炼成的 ", Field.Store.YES, Field.Index.ANALYZED); Field f13 = new Field("publishDate", "1970-01-01", Field.Store.YES, Field.Index.NOT_ANALYZED); doc1.add(f11); doc1.add(f12); doc1.add(f13); Document doc2 = new Document(); Field f21 = new Field("bookNumber", "0000002", Field.Store.YES, Field.Index.NOT_ANALYZED); Field f22 = new Field("bookName", " 钢铁战士 ", Field.Store.YES, Field.Index.ANALYZED); Field f23 = new Field("publishDate", "1970-01-01", Field.Store.YES, Field.Index.NOT_ANALYZED); doc2.add(f21); doc2.add(f22); doc2.add(f23); Document doc3 = new Document(); Field f31 = new Field("bookNumber", "0000003", Field.Store.YES, Field.Index.NOT_ANALYZED); Field f32 = new Field("bookName", " 篱笆女人和狗 ", Field.Store.YES, Field.Index.ANALYZED); Field f33 = new Field("publishDate", "1970-01-01", Field.Store.YES, Field.Index.NOT_ANALYZED); doc3.add(f31); doc3.add(f32); doc3.add(f33); Document doc4 = new Document(); Field f41 = new Field("bookNumber", "0000004", Field.Store.YES, Field.Index.NOT_ANALYZED); Field f42 = new Field("bookName", " 女人是水做的 ", Field.Store.YES, Field.Index.ANALYZED); Field f43 = new Field("publishDate", "1970-01-01", Field.Store.YES, Field.Index.NOT_ANALYZED); doc4.add(f41); doc4.add(f42); doc4.add(f43); Document doc5 = new Document(); Field f51 = new Field("bookNumber", "0000005", Field.Store.YES, Field.Index.NOT_ANALYZED); Field f52 = new Field("bookName", " 英雄儿女 ", Field.Store.YES, Field.Index.ANALYZED); Field f53 = new Field("publishDate", "1970-01-01", Field.Store.YES, Field.Index.NOT_ANALYZED); doc5.add(f51); doc5.add(f52); doc5.add(f53; Document doc6 = new Document(); Field f61 = new Field("bookNumber", "0000006", Field.Store.YES, Field.Index.NOT_ANALYZED); Field f62 = new Field("bookName", " 白毛女 ", Field.Store.YES, Field.Index.ANALYZED); Field f63 = new Field("publishDate", "1970-01-01", Field.Store.YES, Field.Index.NOT_ANALYZED); doc6.add(f61); doc6.add(f62); doc6.add(f63); Document doc7 = new Document(); Field f71 = new Field("bookNumber", "0000007", Field.Store.YES, Field.Index.NOT_ANALYZED); Field f72 = new Field("bookName", " 我的兄弟和女儿 ", Field.Store.YES, Field.Index.ANALYZED); Field f73 = new Field("publishDate", "1970-01-01", Field.Store.YES, Field.Index.NOT_ANALYZED); doc7.add(f71); doc7.add(f72); doc7.add(f73); writer.addDocument(doc1); writer.addDocument(doc2); writer.addDocument(doc3); writer.addDocument(doc4); writer.addDocument(doc5); writer.addDocument(doc6); writer.addDocument(doc7); writer.optimize(); writer.close();
实验一:使用域缓存读取域值
要求:使用域缓存读取所有文档中域 “bookName"的值
实验二:对搜索结果进行排序
要求:自己选择查询条件,对查询到的结果做以下四种情况进行输出显示
1、按照相关性(评分)进行排序
2、根据域值进行排序
3、根据文档索引顺序进行排序
4、通过多个域进行排序
实验三:实现多线程查询
要求:自己选择查询条件,要可以同时查询域”bookName”与”book bookNumber”并且输出搜索结果
实验四:实现跨度查询
要求:1、打印输出使用SpanFirstQuery查询语汇单元[铁]在跨度1,2两种情况下的查询情况;
2、打印输出使用SpanNearQuery查询单元语汇[铁][战][大]在跨度1,2,3的查询情况
实验五: 使用lucene内置过滤器实现对搜索结果的过滤
要求:1、先查询索引中包含书名中有“女”字的文档
TermQuery q = new TermQuery(new Term(“bookName”, “女”));
2、使用特定域项范围过滤器进行搜索结果过滤并显示
(1)过滤显示出版日期在"19700102"到"19700105"之间的结果
(2)过滤显示数字ID域在3到5之间的结果
实验六:MultiSearcher类进行多索引查询
要求:同时查询中生成的2个索引,并且把输出结果显示
参考代码:
package lab07;
import java.io.IOException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiSearcher;
import org.apache.lucene.search.TermRangeQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import junit.framework.TestCase;
public class MultiSearcherTest extends TestCase {
public static void main(String[] args) throws Exception {
MultiSearcherTest test = new MultiSearcherTest();
test.setUp();
test.testMulti();
}
private IndexSearcher[] searchers;
public void setUp() throws Exception {
String[] animals = { “aardvark”, “beaver”, “coati”,
“dog”, “elephant”, “frog”, “gila monster”,
“horse”, “iguana”, “javelina”, “kangaroo”,
“lemur”, “moose”, “nematode”, “orca”,
“python”, “quokka”, “rat”, “scorpion”,
“tarantula”, “uromastyx”, “vicuna”,
“walrus”, “xiphias”, “yak”, “zebra”};
Analyzer analyzer = new WhitespaceAnalyzer();
Directory aTOmDirectory = new RAMDirectory(); // a-m的建立一个索引
Directory nTOzDirectory = new RAMDirectory(); // n-z的建立一个索引
IndexWriter aTOmWriter = new IndexWriter(aTOmDirectory,
analyzer,
IndexWriter.MaxFieldLength.UNLIMITED);
IndexWriter nTOzWriter = new IndexWriter(nTOzDirectory,
analyzer,
IndexWriter.MaxFieldLength.UNLIMITED);
for (int i=animals.length - 1; i >= 0; i–) {
Document doc = new Document();
String animal = animals[i];
doc.add(new Field(“animal”, animal, Field.Store.YES, Field.Index.NOT_ANALYZED));
if (animal.charAt(0) < ‘n’) {
aTOmWriter.addDocument(doc); //生成索引1
} else {
nTOzWriter.addDocument(doc); //生成索引2
}
}
aTOmWriter.close();
nTOzWriter.close();
searchers = new IndexSearcher[2];
searchers[0] = new IndexSearcher(aTOmDirectory);
searchers[1] = new IndexSearcher(nTOzDirectory);
}
public void testMulti() throws Exception {
MultiSearcher searcher = new MultiSearcher(searchers);
// 同时查询2个索引中的动物园域,包含h与t字符之间域值的文档
TermRangeQuery query = new TermRangeQuery(“animal”,
“h”,
“t”,
true, true);
TopDocs results = searcher.search(query, 10);
assertEquals(“tarantula not included”, 12, results.totalHits);
System.out.println(query);
for(ScoreDoc sd:results.scoreDocs)
{
System.out.println("------------------------------------");
int docID=sd.doc;
Document document=searcher.doc(docID);
System.out.println(" 动物名 : "+document.get(“animal”));
}
}
}
构建索引并添加若干个文档
使用域缓存读取所有文档中域 “bookName"的值
对搜索结果进行排序——按照相关性进行排序
实现多线程查询
要求:自己选择查询条件,要可以同时查询域”bookName”与”book bookNumber”并且输出搜索结果
实验四:实现跨度查询
要求:1、打印输出使用SpanFirstQuery查询语汇单元[铁]在跨度1,2两种情况下的查询情况;
2、打印输出使用SpanNearQuery查询单元语汇[铁][战][大]在跨度1,2,3的查询情况