lucene入门

最新推荐文章于 2024-04-26 13:05:32 发布

每一步都要留下深脚印

最新推荐文章于 2024-04-26 13:05:32 发布

阅读量1.5k

点赞数

分类专栏： lucene 文章标签： lucene file exception string javadoc filter

lucene 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

http://ryxxlong.javaeye.com/blog/760792，一篇很好的lucene入门程序，以下是主要代码，包含HelloWorld.java文件，File2DocumentUtils.java文件，并在src目录下新建luceneDataSource和luceneIndex目录，luceneDataSource目录放需要被搜索的文件，luceneIndex目录是索引目录，用到的jar包有junit-4.8.2.jar，lucene-core-3.0.2.jar，然后用JUnit运行HelloWorld.java中的search()测试方法。有不明白的地方可以参考原文：http://ryxxlong.javaeye.com/blog/760792。

HelloWorld.java文件：

package com.reiyen.lucene.helloworld;
import java.io.File;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Filter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.junit.Test;

import com.reiyen.lucene.utils.File2DocumentUtils;

public class HelloWorld {

String[] filePath = {"luceneDatasource//test addDocument test.txt",
"luceneDatasource//IndexWriter addDocument's a javadoc .txt",
"luceneDatasource//Test.html"}; //.// IndexWriter addDocument's a javadoc .txt IndexWriter.txt

String indexPath2 = ".//luceneIndex";

//使用lucene标准的分词器
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
// Analyzer analyzer = new KeywordAnalyzer();
// PerFieldAnalyzerWrapper pfaw = new PerFieldAnalyzerWrapper(analyzer);

File indexPath = new File(indexPath2);
/**
* 创建索引
*
* IndexWriter 是用来操作（增、删、改）索引库的
*/
@Test
public void createIndex() throws Exception {
// file --> doc
Document[] docs = File2DocumentUtils.file2Document(filePath);

// 建立索引
// 我们注意到类 IndexWriter 的构造函数中传入的四个参数，第一个参数指定了所创建的索引要存放的位置，他可以是一个 File
// 对象，也可以是一个 FSDirectory 对象或者 RAMDirectory 对象。
// 第二个参数指定了 Analyzer 类的一个实现，也就是指定这个索引是用哪个分词器对文挡内容进行分词。
// 第三个参数是一个布尔型的变量，如果为 true 的话就代表创建一个新的索引，为 false 的话就代表在原来索引的基础上进行操作。
// 第四个参数是一个IndexWriter.MaxFieldLength,表示Field(字段)中的term/token(令牌)的数目,它有UNLIMITED(它的值为2147483647,表示没有限制),LIMITED(值为10000)两个已定义的值,
//也可new一个新对象,如:new IndexWriter.MaxFieldLength(2000),表示最大数目是2000个
IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexPath), analyzer, true,
MaxFieldLength.LIMITED);
//最后把document用 IndexWriter 类的 add 方法加入到索引中去。
for(int i=0;i<docs.length;i++) {
indexWriter.addDocument(docs[i]);
}
indexWriter.close();
}

/**
* 搜索
*
* IndexSearcher 是用来在索引库中进行查询的
*/
@Test
public void search() throws Exception {

createIndex();

String queryString = "hello"; //搜索关键字
//String queryString = "adddocument";

// 1，把要搜索的文本解析为 Query
//在名为name和content的字段中搜索queryString
String[] fields = {"name","content" }; //
QueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_30,fields, analyzer);
Query query = queryParser.parse(queryString);

// 2，进行查询
IndexSearcher indexSearcher = new IndexSearcher(FSDirectory.open(indexPath));
//Filter暂进不使用
Filter filter = null;
TopDocs topDocs = indexSearcher.search(query, filter, 10000);
System.out.println("总共有【" + topDocs.totalHits + "】条匹配结果");

// 3，打印结果
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
int docSn = scoreDoc.doc; // 文档内部编号
Document doc = indexSearcher.doc(docSn); // 根据编号取出相应的文档
File2DocumentUtils.printDocumentInfo(doc); // 打印出文档信息
}
}
}

File2DocumentUtils.java文件：

package com.reiyen.lucene.utils;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.util.NumericUtils;

public class File2DocumentUtils {

// �ļ��name, content, size, path
public static Document[] file2Document(String[] path) {

Document[] docs = new Document[path.length];
for(int i=0;i<path.length;i++) {
File file = new File(path[i]);

Document doc = new Document();
doc.add(new Field("name", file.getName(), Store.YES, Index.ANALYZED));
System.out.println("file.getName(): "+file.getName());
doc.add(new Field("content", readFileContent(file), Store.YES, Index.ANALYZED));
doc.add(new Field("size", NumericUtils.longToPrefixCoded(file.length()), Store.YES, Index.NOT_ANALYZED));
doc.add(new Field("path", file.getAbsolutePath(), Store.YES, Index.NOT_ANALYZED));
docs[i] = doc;
}
return docs;
}

public static String readFileContent(File file) {
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
StringBuffer content = new StringBuffer();

for (String line = null; (line = reader.readLine()) != null;) {
content.append(line).append("/n");
}

return content.toString();
} catch (Exception e) {
throw new RuntimeException(e);
}
}

public static void printDocumentInfo(Document doc) {
// Field f = doc.getField("name");
// f.stringValue();
System.out.println("------------------------------");
System.out.println("name = " + doc.get("name"));
System.out.println("content = " + doc.get("content"));
System.out.println("size = " + NumericUtils.prefixCodedToLong((doc.get("size"))));
System.out.println("path = " + doc.get("path"));
}

}

每一步都要留下深脚印

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
lucene入门

<br />http://ryxxlong.javaeye.com/blog/760792，一篇很好的lucene入门程序，以下是主要代码，包含HelloWorld.java文件，File2DocumentUtils.java文件，并在src目录下新建luceneDataSource和luceneIndex目录，luceneDataSource目录放需要被搜索的文件，luceneIndex目录是索引目录，用到的jar包有junit-4.8.2.jar，lucene-core-3.0.2.jar，然后用JUn
复制链接

扫一扫