0. 一个 Demo
-
Lucene 是一个用于文本搜索的库(官网解释为:[ Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java ],即 Lucene 是完全使用 Java 实现的,功能全面的高性能文本搜索引擎库)。我在这里说的是 Lucene-core 这个库(jar),因为围绕着 Lucene 有大量的扩展实现可以使用。
-
Lucene 两个基本的功能即 创建索引 和 搜索索引。主要类如下图:
创建索引时,分割需要被索引的数据,构造 Field 和 Document,然后使用 IndexWriter 写入到索引文件中(生成的索引文件会有多个,所以必须指定一个目录用于存放索引,执行搜索时,基于相同的目录(Directory)执行搜索)
搜索索引时,使用 IndexSearcher 在 Document 中对 Query 指定的 Field 进行搜索,以返回符合 Query 要求的 Document -
下面是一个 Lucene 的使用 demo(该 demo 根据 lucene-in-5-minutes 做了细微的改动)(这里使用的 Lucene 版本是 8.0.0,需要 JDK 8及以上版本,使用 Maven 管理依赖):
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class Demo2 {
public static void main(String[] args) throws IOException, ParseException {
// create analyzer and directory
StandardAnalyzer analyzer = new StandardAnalyzer();
Path path = Paths.get("F:/lucene-demo-index", new String[0]);
Directory index = FSDirectory.open(path);
// indexing
// 1 create index-writer
IndexWriterConfig config = new IndexWriterConfig(analyzer);
config.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(index, config);
// 2 write index
addDoc(writer, "Lucene in Action", "193398817");
addDoc(writer, "Lucene for Dummies", "55320055Z");
addDoc(writer, "Managing Gigabytes", "55063554A");
addDoc(writer, "The Art of Computer Science", "9900333X");
writer.close();
// search
// 1 create query
String queryStr = "lucene";
Query q = new QueryParser("title", analyzer).parse(queryStr);
System.out.println("query: " + q.toString());
int hitsPerPage = 10;
// 2 create index-searcher
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
// 3 do search
TopDocs docs = searcher.search(q, hitsPerPage);
ScoreDoc[] hits = docs.scoreDocs;
// display results
System.out.println("found " + hits.length + " results");
for(ScoreDoc hit : hits) {
int docId = hit.doc;
Document doc = searcher.doc(docId);
System.out.println(doc.get("title") + " - " + doc.get("isbn"));
}
}
private static void addDoc(IndexWriter writer, String title, String isbn) throws IOException {
Document doc = new Document();
doc.add(new TextField("title", title, Field.Store.YES));
doc.add(new StringField("isbn", isbn, Field.Store.YES));
writer.addDocument(doc);
}
}
Maven 依赖为:
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>8.0.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-queryparser -->
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-queryparser</artifactId>
<version>8.0.0</version>
</dependency>
程序运行结果为:
query: title:lucene
found 2 results
Lucene in Action - 193398817
Lucene for Dummies - 55320055Z
程序中 Query 语句为 “title:lucene”,即 查找标题(title)域(Field)中包含单词 “lucene” 的文档(Document)。
参考链接
- 下载 - Lucene
- 首先是官网的资源 Lucene API 文档
- lucene tutorial,简单介绍 Lucene以及 Lucene 的入门 demo,这个作者还维护了和 Lucene 相关的另外两个站点: Solr tutorial 和 elasticsearch tutorial。都 值得一看。