Lucene3.0全文检索入门实例

最新推荐文章于 2024-09-07 22:04:51 发布

星哥儿

最新推荐文章于 2024-09-07 22:04:51 发布

阅读量2.5k

点赞数

分类专栏： Lucene 文章标签： lucene 全文检索 exception query file path

本文链接：https://blog.csdn.net/princezx/article/details/5806872

版权

Lucene 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Lucene3.0和Lucene2.0API有多处改动，以下实例用Luence3.0实现。

第一部分：Lucene建立索引
Lucene建立索引主要有以下两步：
第一步：建立索引器
第二步：添加索引文件
准备在E盘建立testlucene文件夹，然后在testlucene下建立文件夹test和index两个文件夹。
在test文件夹下建立如下四个txt文件
a.txt 内容：中华人民共和国
b.txt 内容：人民共和国
c.txt 内容：人民
d.txt 内容：共和国

这四个文件就是我们要建立索引的文件，
Index文件夹作为索引结果输出文件夹

准备工作完成以后，我们开始建立索引。
第一步：建立索引器，如下
writer = new IndexWriter(FSDirectory.open(new File(Constants.INDEX_STORE_PATH)), new StandardAnalyzer(
Version.LUCENE_30), true, IndexWriter.MaxFieldLength.LIMITED);

第二步：添加索引文件
writer.addDocument(doc);

具体完整代码如下：
package testlucene; import java.util.Date; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import java.io.File; import java.io.FileInputStream; import java.io.BufferedReader; import java.io.InputStreamReader; import org.apache.lucene.document.Field; import org.apache.lucene.util.Version; import org.apache.lucene.store.FSDirectory; public class LuceneIndex { // 索引器对象 private IndexWriter writer = null; // 在构造函数中建立索引器 public LuceneIndex() { try { writer = new IndexWriter(FSDirectory.open(new File(Constants.INDEX_STORE_PATH)), new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.LIMITED);// 有变化的地方 } catch (Exception e) { e.printStackTrace(); } } public Document getDocument(File f) throws Exception { // 生成文档对象 Document doc = new Document(); // 获取文件输入流 FileInputStream input = new FileInputStream(f); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(input)); // 添加索引内容 doc.add(new Field("content", bufferedReader));// Lucene3.0有变化的地方 doc.add(new Field("path", f.getAbsolutePath(), Field.Store.YES, Field.Index.ANALYZED));// Lucene3.0有变化的地方 return doc; } public void writeToIndex() throws Exception { File folder = new File(Constants.INDEX_FILE_PATH); if (folder.isDirectory()) { String[] files = folder.list(); for (int i = 0; i < files.length; i++) { File file = new File(folder, files[i]); Document doc = getDocument(file); System.out.println("正在建立索引：" + file + " "); // 添加索引文件 writer.addDocument(doc); } }else { System.out.println("-----folder.isDirectory():false."); } } public void close() throws Exception { writer.close(); } public static void main(String[] args) throws Exception { // 声明一个对象 LuceneIndex indexer = new LuceneIndex(); // 建立索引 Date start = new Date(); indexer.writeToIndex(); Date end = new Date(); System.out.println("建立索引用时：" + (end.getTime() - start.getTime()) + "毫秒"); // 关闭索引器 indexer.close(); } }

package testlucene; public class Constants { //要建立索引的文件的存放路径 public static final String INDEX_FILE_PATH = "E://testlucene//test"; //索引存放的位置 public static final String INDEX_STORE_PATH = "E://testlucene//index"; }

最后，执行程序，结果如下：
正在建立索引：E:/testlucene/test/a.txt
正在建立索引：E:/testlucene/test/b.txt
正在建立索引：E:/testlucene/test/c.txt
正在建立索引：E:/testlucene/test/d.txt
建立索引用时：47毫秒
在E:/testlucene/index下发现索引结果文件
_7.cfs segments.gen segments_9

第二部分：在索引上检索
在索引上搜索主要包括个步骤，使用两个对象—IndexSearcher和Query。
检索步骤：
第一步：创建索引器
searcher = new IndexSearcher(IndexReader.open(FSDirectory.open(new File(Constants.INDEX_STORE_PATH))));

第二步：将待检索关键字打包成Query对象
query = queryParser.parse(keyword);

第三步：使用索引器检索Query，得到检索结果Hits对象
TopDocs hits = searcher.search(query, 10);
最后，将检索到的结果Hits打印出来：
   for (int i = 0; i < hits.scoreDocs.length; i++) {
    try {
     ScoreDoc scoreDoc = hits.scoreDocs[i];// 有变化的地方
     Document doc = searcher.doc(scoreDoc.doc);// 有变化的地方
     System.out.print("这是第" + (i+1) + "个检索结果，文件路径为:");
     System.out.println(doc.get("path"));

} catch (Exception ex) {

}
全部程序如下：
package testlucene; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.index.IndexReader; import org.apache.lucene.search.Query; import org.apache.lucene.queryParser.QueryParser; import java.util.Date; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.document.Document; import org.apache.lucene.util.Version; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.store.FSDirectory; import java.io.File; public class LuceneSearch { // 声明一个IndexSearcher对象 private IndexSearcher searcher = null; // 声明一个Query对象 private Query query = null; public LuceneSearch() { try { // 创建索引器 searcher = new IndexSearcher(IndexReader.open(FSDirectory.open(new File(Constants.INDEX_STORE_PATH)))); } catch (Exception e) { e.printStackTrace(); } } public final TopDocs search(String keyword) { System.out.println("正在搜素关键字：" + keyword); try { QueryParser queryParser = new QueryParser(Version.LUCENE_30, "content", new StandardAnalyzer(Version.LUCENE_30)); // 将待检索关键字打包成Query对象 query = queryParser.parse(keyword); Date start = new Date(); // 使用索引器检索Query，得到检索结果Hits对象 TopDocs hits = searcher.search(query, 10);// 有变化的地方 Date end = new Date(); System.out.println("搜索完毕用时:" + (end.getTime() - start.getTime()) + "毫秒"); return hits; } catch (Exception ex) { return null; } } public void printResult(TopDocs hits) { if (hits.totalHits == 0) { System.out.println("没有找到您需要的结果"); } else { for (int i = 0; i < hits.scoreDocs.length; i++) { try { ScoreDoc scoreDoc = hits.scoreDocs[i];// 有变化的地方 Document doc = searcher.doc(scoreDoc.doc);// 有变化的地方 System.out.print("这是第" + (i+1) + "个检索结果，文件路径为:"); System.out.println(doc.get("path")); } catch (Exception ex) { } } } System.out.println("--------------------------------"); } public static void main(String[] args) throws Exception { LuceneSearch test = new LuceneSearch(); TopDocs hits = null; hits = test.search("中华"); test.printResult(hits); hits = test.search("人民"); test.printResult(hits); hits = test.search("共和国"); test.printResult(hits); } }

在执行第一部分的程序得到索引后，执行搜索程序LuceneSearch，在控制台下得到结果如下：
（对比我们在f:/testlucene/test下的四个文件可知，检索结果正确）
正在搜素关键字：中华
搜索完毕用时:15毫秒
这是第1个检索结果，文件路径为:E:/testlucene/test/a.txt
--------------------------------
正在搜素关键字：人民
搜索完毕用时:0毫秒
这是第1个检索结果，文件路径为:E:/testlucene/test/c.txt
这是第2个检索结果，文件路径为:E:/testlucene/test/b.txt
这是第3个检索结果，文件路径为:E:/testlucene/test/a.txt
--------------------------------
正在搜素关键字：共和国
搜索完毕用时:0毫秒
这是第1个检索结果，文件路径为:E:/testlucene/test/d.txt
这是第2个检索结果，文件路径为:E:/testlucene/test/b.txt
这是第3个检索结果，文件路径为:E:/testlucene/test/a.txt
--------------------------------