Lucene简介(一个 Demo 示例)

5 篇文章 0 订阅
0. 一个 Demo
  • Lucene 是一个用于文本搜索的库(官网解释为:[ Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java ],即 Lucene 是完全使用 Java 实现的,功能全面的高性能文本搜索引擎库)。我在这里说的是 Lucene-core 这个库(jar),因为围绕着 Lucene 有大量的扩展实现可以使用。

  • Lucene 两个基本的功能即 创建索引 和 搜索索引。主要类如下图:
    Lucene主要类图
    创建索引时,分割需要被索引的数据,构造 Field 和 Document,然后使用 IndexWriter 写入到索引文件中(生成的索引文件会有多个,所以必须指定一个目录用于存放索引,执行搜索时,基于相同的目录(Directory)执行搜索)
    搜索索引时,使用 IndexSearcher 在 Document 中对 Query 指定的 Field 进行搜索,以返回符合 Query 要求的 Document

  • 下面是一个 Lucene 的使用 demo(该 demo 根据 lucene-in-5-minutes 做了细微的改动)(这里使用的 Lucene 版本是 8.0.0,需要 JDK 8及以上版本,使用 Maven 管理依赖):

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

public class Demo2 {
	public static void main(String[] args) throws IOException, ParseException {
		// create analyzer and directory
		StandardAnalyzer analyzer = new StandardAnalyzer();
		Path path = Paths.get("F:/lucene-demo-index", new String[0]);
		Directory index = FSDirectory.open(path);
		
		// indexing
		// 1 create index-writer
		IndexWriterConfig config = new IndexWriterConfig(analyzer);
		config.setOpenMode(OpenMode.CREATE);
		IndexWriter writer = new IndexWriter(index, config);
		// 2 write index
		addDoc(writer, "Lucene in Action", "193398817");
        addDoc(writer, "Lucene for Dummies", "55320055Z");
        addDoc(writer, "Managing Gigabytes", "55063554A");
        addDoc(writer, "The Art of Computer Science", "9900333X");
        writer.close();
        
        // search
        // 1 create query
        String queryStr = "lucene";
        Query q = new QueryParser("title", analyzer).parse(queryStr);
        System.out.println("query: " + q.toString());
        
        int hitsPerPage = 10;
        // 2 create index-searcher
        IndexReader reader = DirectoryReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        // 3 do search
        TopDocs docs = searcher.search(q, hitsPerPage);
        ScoreDoc[] hits = docs.scoreDocs;
        
        // display results
        System.out.println("found " + hits.length + " results");
        for(ScoreDoc hit : hits) {
        	int docId = hit.doc;
        	Document doc = searcher.doc(docId);
        	System.out.println(doc.get("title") + " - " + doc.get("isbn"));
        }
	}
	
	private static void addDoc(IndexWriter writer, String title, String isbn) throws IOException {
		Document doc = new Document();
		doc.add(new TextField("title", title, Field.Store.YES));
		doc.add(new StringField("isbn", isbn, Field.Store.YES));
		writer.addDocument(doc);
	}
}

Maven 依赖为:

	<dependency>
		<groupId>org.apache.lucene</groupId>
		<artifactId>lucene-core</artifactId>
		<version>8.0.0</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-queryparser -->
	<dependency>
		<groupId>org.apache.lucene</groupId>
		<artifactId>lucene-queryparser</artifactId>
		<version>8.0.0</version>
	</dependency>

程序运行结果为:

query: title:lucene
found 2 results
Lucene in Action - 193398817
Lucene for Dummies - 55320055Z

程序中 Query 语句为 “title:lucene”,即 查找标题(title)域(Field)中包含单词 “lucene” 的文档(Document)。


参考链接
  1. 下载 - Lucene
  2. 首先是官网的资源 Lucene API 文档
  3. lucene tutorial,简单介绍 Lucene以及 Lucene 的入门 demo,这个作者还维护了和 Lucene 相关的另外两个站点: Solr tutorialelasticsearch tutorial。都 值得一看
好的,下面是一个简单的 Lucene9 的示例代码,包括创建索引和搜索: ```java import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.Term; import org.apache.lucene.queryparser.classic.ParseException; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.RAMDirectory; import java.nio.file.Paths; public class LuceneDemo { public static void main(String[] args) throws IOException, ParseException { // 创建索引 Directory directory = FSDirectory.open(Paths.get("index")); IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer()); IndexWriter writer = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add(new TextField("content", "Lucene is a full-text search engine library in Java", Field.Store.YES)); writer.addDocument(doc1); Document doc2 = new Document(); doc2.add(new TextField("content", "Lucene is used to build search applications", Field.Store.YES)); writer.addDocument(doc2); writer.close(); // 搜索 Directory directory2 = FSDirectory.open(Paths.get("index")); IndexSearcher searcher = new IndexSearcher(directory2); // TermQuery Query termQuery = new TermQuery(new Term("content", "search")); TopDocs topDocs = searcher.search(termQuery, 10); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document doc = searcher.doc(scoreDoc.doc); System.out.println(doc.get("content") + " score: " + scoreDoc.score); } // QueryParser QueryParser parser = new QueryParser("content", new StandardAnalyzer()); Query query = parser.parse("Lucene search"); TopDocs topDocs2 = searcher.search(query, 10); for (ScoreDoc scoreDoc : topDocs2.scoreDocs) { Document doc = searcher.doc(scoreDoc.doc); System.out.println(doc.get("content") + " score: " + scoreDoc.score); } searcher.getIndexReader().close(); directory2.close(); } } ``` 以上代码创建了一个包含两条文档的索引,然后使用 TermQuery 和 QueryParser 分别进行了搜索,并打印出了搜索结果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值