Lucene 的使用

最新推荐文章于 2023-04-19 17:18:00 发布

一只媛球球

最新推荐文章于 2023-04-19 17:18:00 发布

阅读量248

点赞数 3

分类专栏： Lucene 文章标签： Lucene 的使用

本文链接：https://blog.csdn.net/qq_44641053/article/details/100564459

版权

Lucene 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Lucene

lucene 简介
lucene 入门 Demo
lucene 工具类
- 中文分词器及高亮显示

lucene 简介

在以前，我们查询数据是从数据库查询，点击浏览器的界面查询，请求到后台，再到数据库SQL语句查询，然后再返回一个结果集展示到页面。

而搜索引擎，这个表的数据并不会直接展示到页面上来，它将数据源通过 IndexWriter 检索成 Document （一条数据就是一个 Document ），然后将这些索引后得到的文件存放到指定的索引目录，查询时，利用 IndexSearcher 对象可快速从索引文件夹中检索出所查询的数据，这些数据在 TopDocs 对象中。

注意：

lucene 查询的数据量不易过多，容易出问题，但是十万条数据还是绰绰有余的。O(∩_∩)O
但是，lucene也是挺便捷的，导个jar包就能用。 (●’◡’●)

简介

主页： http://lucene.apache.org/

1.1 什么是 lucene ？

Lucene 是一个开源的使用 java语言编写的 全文搜索引擎开发包 ，可以融入到自己的项目中来实现增加索引和搜索功能，其实就是一款高性能的、可扩展的信息检索工具库。

和其他开源软件一样，有着与生俱来的优点：功能和结构的透明性、功能强大且具有较强的扩展性、技术社区的强大支持同时也方便技术交流。

Lucene 只是一个搜索的核心库，并不提供具体的实现，但它的应用十分广泛，
Solr、ElasticSearch 、Katta等底层用的都是Lucene。

其特点：API简单，易于学习（但是不同的版本API差别较大）。

Lucene 的原理： 倒排序索引

** 什么是倒排序索引，什么又是正排序索引呢？ **

个人理解：

倒排序索引： 即经过Lucene分词之后，它会维护一个类似于“词条–文档ID”的对应关系，当我们进行搜索某个词条的时候，就会得到相应的文档 ID。

正排序索引： 当我们进行搜索的时候，会对整个文档的内容进行搜索，不维护“词条–文档ID”的对应关系，然后如果匹配上，得到对应的文档ID，这样做肯定耗时，因为就像是数据库里的表缺少了索引一样。

Lucene 核心API

2.1 索引过程中的核心类
```
 				Document、 Field 、IndexWriter、 Directroy、 Analyzer
```
Dcoument 文档，一个文档代表一些域(Field)的集合。他是承载数据的实体，是一个抽象的概念，并不是word 或者 Txt什么的。

Document 代表一个被索引的基本单元。

构造方法
```
 				org.apache.lucene.document.Document.Document()
```
常用的API
```
 				org.apache.lucene.document.Document.add(IndexableField)//添加字段
```
Field 索引中的每一个 Document 对象都包含一个或者多个不同的域(Field)，域是由域名(name)和域值(value)对组成，每一个域都包含一段相应的数据信息。

常用的构造方法：
```
 				org.apache.lucene.document.Field.Field(String, String, IndexableFieldType)
```
IndexWriter 是索引过程的核心组件,要利用 Analyzer（分词器）和 FSDirectory（文件存放地址）拿到 IndexWriter 。

这个类用于创建一个新的索引并且把文档加到已有的索引中去。他可以提供对索引的写入操作，但不能进行读取或搜索。

构造方法：
```
 				org.apache.lucene.index.IndexWriter.IndexWriter(Directory,IndexWriterConfig)
```
其核心API有：
```
 				org.apache.lucene.index.IndexWriter.addDocument(Iterable<? extends IndexableField>) //添加文档
 				org.apache.lucene.index.IndexWriter.updateDocuments(Term, Iterable<? extends Iterable<? extends IndexableField>>) //新文档
 				org.apache.lucene.index.IndexWriter.tryDeleteDocument(IndexReader, int) //删除文档
 				org.apache.lucene.index.IndexWriter.deleteDocuments(Term...) //删除含有词条的文档
```
Directory 是索引的存放位置，是个抽象类。具体的子类提供特定的存储索引的地址。
FSDirectory 将索引存放在指定的磁盘中，RAMDirectory ·将索引存放在内存中。

FSDirectory 的创建：
```
 				org.apache.lucene.store.FSDirectory.open(Path) //创建索引库
 				org.apache.lucene.store.FSDirectory.listAll(Path) //列举索引库下的文件
```
RAMDirectory 创建:
```
 				org.apache.lucene.store.RAMDirectory.RAMDirectory()
```
Analyzer 分词器，在文本被索引之前，需要经过分词器处理，他负责从将被索引的文档中提取词汇单元，并剔除剩下的无用信息（停止词汇），分词器十分关键，因为不同的分词器，解析相同的文档结果会有很大的不同，默认标准分词器（英文）。
Analyzer 是一个抽象类，是所有分词器的基类。他通过TokenStream类似一种很好的方式，将文本逐字。

常用的分词器：
```
 				org.apache.lucene.analysis.standard.StandardAnalyzer //标准粉瓷器
 				org.apache.lucene.analysis.core.SimpleAnalyzer //简单分词器
 				org.wltea.analyzer.lucene.IKAnalyzer //IK分词器
```
2.2 搜索过程中的核心类

IndexSearcher 、Term、Query、TermQuery、TopDocs

IndexSearcher 调用它的search方法，用于搜索IndexWriter 所创建的索引。

构造方法：
```
 				org.apache.lucene.search.IndexSearcher.IndexSearcher(IndexReader)
```
常用API:
```
 				org.apache.lucene.search.IndexSearcher.search(Query, int) //进行搜索返回评分较高的前n个文档
 				org.apache.lucene.search.IndexSearcher.searchAfterScoreDoc, Query, int) 
 				org.apache.lucene.search.IndexSearcher.search(Query, int, Sort)
```
Term 使用于搜索的一个基本单元。

Query Lucene 中含有多种查询（Query）子类。
比如，TermQuery（单词条查询）、BooleanQuery（布尔查询）、PhraseQuery（短语搜索）、PrefixQuery（前缀搜索）等。它们用于查询条件的限定其中TermQuery 是Lucene提供的最基本的查询类型，也是最简单的，它主要用来匹配在指定的域（Field）中包含了特定项(Term)的文档。

TopDocs 是一个存放有序搜索结果指针的简单容器，在这里搜索的结果是指匹配一个查询条件的一系列的文档。

查看Lucene的分词结果的工具----Luke

luke 各版本的下载git地址： https://github.com/DmitryKey/luke/releases

注意：不要将该工具放到中文目录下

具体应用：

打开索引库目录/查看词条/进行搜索：

在这里插入图片描述

lucene 入门 Demo

			<dependency>
			    <groupId>org.apache.lucene</groupId>
			    <artifactId>lucene-core</artifactId>
			    <version>5.3.1</version>
			</dependency>
			<dependency>
			    <groupId>org.apache.lucene</groupId>
			    <artifactId>lucene-queryparser</artifactId>
			    <version>5.3.1</version>
			</dependency>
			<dependency>
			    <groupId>org.apache.lucene</groupId>
			    <artifactId>lucene-analyzers-common</artifactId>
			    <version>5.3.1</version>
			</dependency>

在这里插入图片描述
目的：索引数据目录，在指定目录生成索引文件

1、构造方法实例化 IndexWriter

u 获取索引文件存放地址对象

u 获取输出流

设置输出流的对应配置

给输出流配置设置分词器
2、关闭索引输出流
3、索引指定路径下的所有文件
4、索引指定的文件
5、获取文档（索引文件中包含的重要信息，key-value的形式）
6、测试

创建索引

IndexCreate

package com.dj.lucene;

import java.io.File;
import java.io.FileReader;
import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;

/**
 * 创建索引
 * 
 * 配合Demo1.java进行lucene的helloword实现
 * @author Administrator
 *
 */
public class IndexCreate {
	private IndexWriter indexWriter;
	
	/**
	 * 1、构造方法 实例化IndexWriter
	 * @param indexDir
	 * @throws Exception
	 */
	public IndexCreate(String indexDir) throws Exception{
//		获取索引文件的存放地址对象
		FSDirectory dir = FSDirectory.open(Paths.get(indexDir));
//		标准分词器（针对英文）
		Analyzer analyzer = new StandardAnalyzer();
//		索引输出流配置对象
		IndexWriterConfig conf = new IndexWriterConfig(analyzer); //分词器包装
		indexWriter = new IndexWriter(dir, conf);//生成IndexWriter
	}
	
	/**
	 * 2、关闭索引输出流
	 * @throws Exception
	 */
	public void closeIndexWriter()  throws Exception{
		indexWriter.close();//关闭流的时候，自动存放到索引目录文件中
	}
	
	/**
	 * 3、索引指定路径下的所有文件
	 * @param dataDir
	 * @return
	 * @throws Exception
	 */
	public int index(String dataDir) throws Exception{
		File[] files = new File(dataDir).listFiles();//获取索引文件夹里的所有文件
		for (File file : files) {//遍历
			indexFile(file);//索引单个文件  一个一个来
		}
		return indexWriter.numDocs();
	}
	
	/**
	 * 4、索引单个指定的文件
	 * @param file
	 * @throws Exception
	 */
	private void indexFile(File file) throws Exception{
		System.out.println("被索引文件的全路径："+file.getCanonicalPath());
		Document doc = getDocument(file);//一个文件对应一个Document
		indexWriter.addDocument(doc);//添加到流
	}
	
	/**
	 * 5、获取文档（索引文件中包含的重要信息，key-value的形式）
	 * @param file
	 * @return
	 * @throws Exception
	 */
	private Document getDocument(File file) throws Exception{
		Document doc = new Document();//索引文件 doc
		doc.add(new TextField("contents", new FileReader(file)));//搜索传入的关键字 根据这个搜索
//		Field.Store.YES是否存储到硬盘
		doc.add(new TextField("fullPath", file.getCanonicalPath(),Field.Store.YES));
		doc.add(new TextField("fileName", file.getName(),Field.Store.YES));
		return doc;
	}
}

在这里插入图片描述

使用索引

从索引文件中拿数据

1、获取输入流（通过dirReader）
2、获取索引搜索对象（通过输入流来拿）
3、获取查询对象（通过查询解析器来获取，解析器是通过分词器获取）
4、获取包含关键字排前面的文档对象集合
5、可以获取对应文档的内容

IndexUse

package com.dj.lucene;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;

/**
 * 索引使用
 * 
 * 配合Demo2.java进行lucene的helloword实现
 * @author Administrator
 *
 */
public class IndexUse {
	/**
	 * 通过关键字在索引目录中查询
	 * @param indexDir	索引文件所在目录
	 * @param q	关键字
	 */
	public static void search(String indexDir, String q) throws Exception{
		FSDirectory indexDirectory = FSDirectory.open(Paths.get(indexDir));//目录对象
//		注意:索引输入流不是new出来的，是通过目录读取工具类打开的
		IndexReader indexReader = DirectoryReader.open(indexDirectory);//读取目录对象打开索引输入流
//		获取索引搜索对象
		IndexSearcher indexSearcher = new IndexSearcher(indexReader);
		Analyzer analyzer = new StandardAnalyzer();//标准分词器
		QueryParser queryParser = new QueryParser("contents", analyzer);//查询解析器-解析索引文件   contents--输入查询搜索的内容
//		获取符合关键字的查询对象
		Query query = queryParser.parse(q);
		
		long start=System.currentTimeMillis();
//		获取关键字出现的前十次数据
		TopDocs topDocs = indexSearcher.search(query , 10);//数据集合
		long end=System.currentTimeMillis();
		System.out.println("匹配 "+q+" ，总共花费"+(end-start)+"毫秒"+"查询到"+topDocs.totalHits+"个记录");
		
		for (ScoreDoc scoreDoc : topDocs.scoreDocs) {//遍历数据集合
			int docID = scoreDoc.doc;
//			索引搜索对象通过文档下标获取文档
			Document doc = indexSearcher.doc(docID);//拿到Document
			System.out.println("通过索引文件："+doc.get("fullPath")+"拿数据");
		}
		
		indexReader.close();
	}
}

在这里插入图片描述

构建索引

package com.dj.lucene;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

/**
 * 构建索引
 * 
 * 	对索引的增  删改(重点)
 * @author Administrator
 *
 */
public class Demo3 {
	private String ids[]={"1","2","3"};
	private String citys[]={"qingdao","nanjing","shanghai"};
	private String descs[]={
			"Qingdao is a beautiful city.",
			"Nanjing is a city of culture.",
			"Shanghai is a bustling city."
	};
	private FSDirectory dir;//索引文件目录
	
	/**
	 * 每次都生成索引文件
	 * @throws Exception
	 */
	@Before
	public void setUp() throws Exception {
		dir  = FSDirectory.open(Paths.get("D:\\Software\\Softwarepath\\lucene\\demo\\demo2\\indexDir"));
		//获得IndexWriter
		IndexWriter indexWriter = getIndexWriter();
		for (int i = 0; i < ids.length; i++) {
			Document doc = new Document();
			doc.add(new StringField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("city", citys[i], Field.Store.YES));
			doc.add(new TextField("desc", descs[i], Field.Store.NO));
			indexWriter.addDocument(doc);
		}
		indexWriter.close();//关闭流
	}

	/**
	 * 获取索引输出流
	 * @return
	 * @throws Exception
	 */
	private IndexWriter getIndexWriter()  throws Exception{
		Analyzer analyzer = new StandardAnalyzer();//分词器
		IndexWriterConfig conf = new IndexWriterConfig(analyzer);
		return new IndexWriter(dir, conf );
	}
	
	/**
	 * 测试写了几个索引文件
	 * @throws Exception
	 */
	@Test
	public void getWriteDocNum() throws Exception {
		IndexWriter indexWriter = getIndexWriter();
		System.out.println("索引目录下生成"+indexWriter.numDocs()+"个索引文件");
	}
	
	/**
	 * 删除索引，但数据还在，document还在
	 * 
	 * 打上标记，该索引实际并未删除
	 * @throws Exception
	 */
	@Test
	public void deleteDocBeforeMerge() throws Exception {
		IndexWriter indexWriter = getIndexWriter();
		System.out.println("最大文档数："+indexWriter.maxDoc());
		indexWriter.deleteDocuments(new Term("id", "1"));
		indexWriter.commit();
		
		System.out.println("最大文档数："+indexWriter.maxDoc());
		System.out.println("实际文档数："+indexWriter.numDocs());
		indexWriter.close();
	}
	
	/**
	 * 数据删除
	 * 
	 * 对应索引文件已经删除,但是该版本的分词会保留
	 * @throws Exception
	 */
	@Test
	public void deleteDocAfterMerge() throws Exception {
//		https://blog.csdn.net/asdfsadfasdfsa/article/details/78820030
//		org.apache.lucene.store.LockObtainFailedException: Lock held by this virtual machine:indexWriter是单例的、线程安全的，不允许打开多个。
		IndexWriter indexWriter = getIndexWriter();
		System.out.println("最大文档数："+indexWriter.maxDoc());
		indexWriter.deleteDocuments(new Term("id", "1"));
		indexWriter.forceMergeDeletes(); //强制删除
		indexWriter.commit();
		
		System.out.println("最大文档数："+indexWriter.maxDoc());
		System.out.println("实际文档数："+indexWriter.numDocs());
		indexWriter.close();
	}
	
	/**
	 * 测试更新索引
	 * @throws Exception
	 */
	@Test
	public void testUpdate()throws Exception{
		IndexWriter writer=getIndexWriter();
		Document doc=new Document();
		doc.add(new StringField("id", "1", Field.Store.YES));
		doc.add(new StringField("city","qingdao",Field.Store.YES));
		doc.add(new TextField("desc", "dsss is a city.", Field.Store.NO));
		writer.updateDocument(new Term("id","1"), doc);
		writer.close();
	}
}

文档域加权

package com.dj.lucene;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

/**
 * 文档域加权
 * 
 * @author Administrator
 *
 */
public class Demo4 {
	private String ids[]={"1","2","3","4"};
	private String authors[]={"Jack","Marry","John","Json"};
	private String positions[]={"accounting","technician","salesperson","boss"};
	private String titles[]={"Java is a good language.","Java is a cross platform language","Java powerful","You should learn java"};
	private String contents[]={
			"If possible, use the same JRE major version at both index and search time.",
			"When upgrading to a different JRE major version, consider re-indexing. ",
			"Different JRE major versions may implement different versions of Unicode,",
			"For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6,"
	};
	
	private Directory dir;//索引文件目录

	/**
	 * 每次都生成索引文件
	 * @throws Exception
	 */
	@Before
	public void setUp()throws Exception {
		dir = FSDirectory.open(Paths.get("D:\\Software\\Softwarepath\\lucene\\demo\\demo3\\indexDir"));
		//获得IndexWriter
		IndexWriter writer = getIndexWriter();
		for (int i = 0; i < authors.length; i++) {
			Document doc = new Document();
			doc.add(new StringField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("author", authors[i], Field.Store.YES));
			doc.add(new StringField("position", positions[i], Field.Store.YES));
			
			TextField textField = new TextField("title", titles[i], Field.Store.YES);
			
//			Json投钱做广告，把排名刷到第一了
			//if("boss".equals(positions[i])) {
			//	textField.setBoost(2f);//设置权重，默认为1
			//}
			
			doc.add(textField);
//			TextField会分词，StringField不会分词
			doc.add(new TextField("content", contents[i], Field.Store.NO));
			writer.addDocument(doc);
		}
		writer.close();//关闭流
		
	}

	/**
	 * 获取索引输出流
	 * @return
	 * @throws Exception
	 */
	private IndexWriter getIndexWriter() throws Exception{
		Analyzer analyzer = new StandardAnalyzer();//分词器
		IndexWriterConfig conf = new IndexWriterConfig(analyzer);
		return new IndexWriter(dir, conf);
	}
	
	@Test
	public void index() throws Exception{
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);
		String fieldName = "title";
		String keyWord = "java";
		Term t = new Term(fieldName, keyWord);
		Query query = new TermQuery(t);
		TopDocs hits = searcher.search(query, 10);
		System.out.println("关键字：‘"+keyWord+"’命中了"+hits.totalHits+"次");
		for (ScoreDoc scoreDoc : hits.scoreDocs) {
			Document doc = searcher.doc(scoreDoc.doc);
			System.out.println(doc.get("author"));
		}
	}
	
}

这是正常排名：
在这里插入图片描述
Json投钱打广告后的排名：

在这里插入图片描述

特定项搜索

package com.dj.lucene;

import java.io.IOException;
import java.nio.file.Paths;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

/**
 * 特定项搜索
 * 
 * 查询表达式（queryParser）
 * @author Administrator
 *
 */
public class Demo5 {
	@Before
	public void setUp() {
		// 索引文件将要存放的位置
		String indexDir = "D:\\Software\\Softwarepath\\lucene\\demo\\demo4";
		// 数据源地址
		String dataDir = "D:\\Software\\Softwarepath\\lucene\\demo\\demo4\\data";
		IndexCreate ic = null;//创建索引 ic
		try {
			ic = new IndexCreate(indexDir);
			long start = System.currentTimeMillis();
			int num = ic.index(dataDir);
			long end = System.currentTimeMillis();
			System.out.println("检索指定路径下" + num + "个文件，一共花费了" + (end - start) + "毫秒");
			
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			try {
				ic.closeIndexWriter();
			} catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
	
	/**
	 * 特定项搜索
	 */
	@Test
	public void testTermQuery() {
		// 索引文件将要存放的位置
		String indexDir = "D:\\Software\\Softwarepath\\lucene\\demo\\demo4";
		
		String fld = "contents";//搜索关键字
		String text = "indexformattoooldexception";//搜索片段名
//		特定项片段名和关键字
		Term t  = new Term(fld , text);
		TermQuery tq = new TermQuery(t  );
		try {
			FSDirectory indexDirectory = FSDirectory.open(Paths.get(indexDir));
//			注意:索引输入流不是new出来的，是通过目录读取工具类打开的
			IndexReader indexReader = DirectoryReader.open(indexDirectory);
//			获取索引搜索对象
			IndexSearcher is = new IndexSearcher(indexReader);
			
			TopDocs hits = is.search(tq, 100);
//			System.out.println(hits.totalHits);
			for(ScoreDoc scoreDoc: hits.scoreDocs) {
				Document doc = is.doc(scoreDoc.doc);
				System.out.println("文件"+doc.get("fullPath")+"中含有该关键字");
				
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	/**
	 * 查询表达式（queryParser）
	 */
	@Test
	public void testQueryParser() {
		// 索引文件将要存放的位置
		String indexDir = "D:\\Software\\Softwarepath\\lucene\\demo\\demo4";
//		获取查询解析器（通过哪种分词器去解析哪种片段）
		QueryParser queryParser = new QueryParser("contents", new StandardAnalyzer());
		try {
			FSDirectory indexDirectory = FSDirectory.open(Paths.get(indexDir));//获取索引文件目录
//			注意:索引输入流不是new出来的，是通过目录读取工具类打开的
			IndexReader indexReader = DirectoryReader.open(indexDirectory);//通过目录读取工具类打开索引输入流
//			获取索引搜索对象
			IndexSearcher is = new IndexSearcher(indexReader);
			
//			由解析器去解析对应的关键字
			TopDocs hits = is.search(queryParser.parse("indexformattoooldexception") , 100);//查询的关键字：indexformattoooldexception
			for(ScoreDoc scoreDoc: hits.scoreDocs) {
				Document doc = is.doc(scoreDoc.doc);
				System.out.println("文件"+doc.get("fullPath")+"中含有该关键字");
				
			}
		} catch (IOException e) {
			e.printStackTrace();
		} catch (ParseException e) {
			e.printStackTrace();
		}
	}
	
}

组合查询

package com.dj.lucene;

import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.IntField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

/**
 * 指定数字范围查询
 * 
 * 指定字符串开头字母查询（prefixQuery）
 * 
 * @author Administrator
 *
 */
public class Demo6 {
	private int ids[]={1,2,3};
	private String citys[]={"qingdao","nanjing","shanghai"};
	private String descs[]={
			"Qingdao is a beautiful city.",
			"Nanjing is a city of culture.",
			"Shanghai is a bustling city."
	};
	private FSDirectory dir;
	
	/**
	 * 每次都生成索引文件
	 * @throws Exception
	 */
	@Before
	public void setUp() throws Exception {
		dir  = FSDirectory.open(Paths.get("D:\\Software\\Softwarepath\\lucene\\demo\\demo2\\indexDir"));
		IndexWriter indexWriter = getIndexWriter();
		for (int i = 0; i < ids.length; i++) {
			Document doc = new Document();
			doc.add(new IntField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("city", citys[i], Field.Store.YES));
			doc.add(new TextField("desc", descs[i], Field.Store.YES));
			indexWriter.addDocument(doc);
		}
		indexWriter.close();
	}
	
	/**
	 * 获取索引输出流
	 * @return
	 * @throws Exception
	 */
	private IndexWriter getIndexWriter()  throws Exception{
		Analyzer analyzer = new StandardAnalyzer();
		IndexWriterConfig conf = new IndexWriterConfig(analyzer);
		return new IndexWriter(dir, conf );
	}
	
	/**
	 * 指定数字范围查询
	 * @throws Exception
	 */
	@Test
	public void testNumericRangeQuery()throws Exception{
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher is = new IndexSearcher(reader);
		
		NumericRangeQuery<Integer> query=NumericRangeQuery.newIntRange("id", 1, 2, true, true);//闭区间  [1,2]
		TopDocs hits=is.search(query, 10);
		for(ScoreDoc scoreDoc:hits.scoreDocs){
			Document doc=is.doc(scoreDoc.doc);
			System.out.println(doc.get("id"));
			System.out.println(doc.get("city"));
			System.out.println(doc.get("desc"));
		}		
	}
	
	/**
	 * 指定字符串开头字母查询（prefixQuery）
	 * @throws Exception
	 */
	@Test
	public void testPrefixQuery()throws Exception{
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher is = new IndexSearcher(reader);
		
		PrefixQuery query=new PrefixQuery(new Term("city","n"));
		TopDocs hits=is.search(query, 10);
		for(ScoreDoc scoreDoc:hits.scoreDocs){
			Document doc=is.doc(scoreDoc.doc);
			System.out.println(doc.get("id"));
			System.out.println(doc.get("city"));
			System.out.println(doc.get("desc"));
		}	
	}
	
	/**
	 * 组合查询
	 * 
	 * 常用这个查询
	 * @throws Exception
	 */
	@Test
	public void testBooleanQuery()throws Exception{
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher is = new IndexSearcher(reader);
		
		NumericRangeQuery<Integer> query1=NumericRangeQuery.newIntRange("id", 1, 2, true, true);
		PrefixQuery query2=new PrefixQuery(new Term("city","n"));//区分大小写
		
		BooleanQuery.Builder booleanQuery=new BooleanQuery.Builder();//构建查询条件
		booleanQuery.add(query1,BooleanClause.Occur.MUST);
		booleanQuery.add(query2,BooleanClause.Occur.MUST);
		
		TopDocs hits=is.search(booleanQuery.build(), 10);//搜索查询
		for(ScoreDoc scoreDoc:hits.scoreDocs){
			Document doc=is.doc(scoreDoc.doc);
			System.out.println(doc.get("id"));
			System.out.println(doc.get("city"));
			System.out.println(doc.get("desc"));
		}	
	}
}

lucene 工具类

LuceneUtil

package com.dj.blog.util;

import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.highlight.Formatter;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryTermScorer;
import org.apache.lucene.search.highlight.Scorer;
import org.apache.lucene.search.highlight.SimpleFragmenter;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;

/**
 * lucene工具类
 * 
 * @author Administrator
 *
 */
public class LuceneUtil {

	/**
	 * 获取索引文件存放的文件夹对象
	 * 
	 * @param path
	 * @return
	 */
	public static Directory getDirectory(String path) {
		Directory directory = null;
		try {
			directory = FSDirectory.open(Paths.get(path));
		} catch (IOException e) {
			e.printStackTrace();
		}
		return directory;
	}

	/**
	 * 索引文件存放在内存
	 * 
	 * 一般不用
	 * @return
	 */
	public static Directory getRAMDirectory() {
		Directory directory = new RAMDirectory();
		return directory;
	}

	/**
	 * 文件夹读取对象
	 * 
	 * @param directory
	 * @return
	 */
	public static DirectoryReader getDirectoryReader(Directory directory) {
		DirectoryReader reader = null;
		try {
			reader = DirectoryReader.open(directory);
		} catch (IOException e) {
			e.printStackTrace();
		}
		return reader;
	}

	/**
	 * 文件索引对象
	 * 
	 * @param reader
	 * @return
	 */
	public static IndexSearcher getIndexSearcher(DirectoryReader reader) {
		IndexSearcher indexSearcher = new IndexSearcher(reader);
		return indexSearcher;
	}

	/**
	 * 写入索引对象
	 * 
	 * @param directory
	 * @param analyzer
	 * @return
	 */
	public static IndexWriter getIndexWriter(Directory directory, Analyzer analyzer)

	{
		IndexWriter iwriter = null;
		try {
			IndexWriterConfig config = new IndexWriterConfig(analyzer);
			config.setOpenMode(OpenMode.CREATE_OR_APPEND);
			// Sort sort=new Sort(new SortField("content", Type.STRING));
			// config.setIndexSort(sort);//排序
			config.setCommitOnClose(true);
			// 自动提交
			// config.setMergeScheduler(new ConcurrentMergeScheduler());
			// config.setIndexDeletionPolicy(new
			// SnapshotDeletionPolicy(NoDeletionPolicy.INSTANCE));
			iwriter = new IndexWriter(directory, config);
		} catch (IOException e) {
			e.printStackTrace();
		}
		return iwriter;
	}

	/**
	 * 关闭索引文件生成对象以及文件夹对象
	 * 
	 * @param indexWriter
	 * @param directory
	 */
	public static void close(IndexWriter indexWriter, Directory directory) {
		if (indexWriter != null) {
			try {
				indexWriter.close();
			} catch (IOException e) {
				indexWriter = null;
			}
		}
		if (directory != null) {
			try {
				directory.close();
			} catch (IOException e) {
				directory = null;
			}
		}
	}

	/**
	 * 关闭索引文件读取对象以及文件夹对象
	 * 
	 * @param reader
	 * @param directory
	 */
	public static void close(DirectoryReader reader, Directory directory) {
		if (reader != null) {
			try {
				reader.close();
			} catch (IOException e) {
				reader = null;
			}
		}
		if (directory != null) {
			try {
				directory.close();
			} catch (IOException e) {
				directory = null;
			}
		}

	}

	/**
	 * 高亮标签
	 * 
	 * @param query
	 * @param fieldName
	 * @return
	 */

	public static Highlighter getHighlighter(Query query, String fieldName)

	{
		Formatter formatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
		Scorer fragmentScorer = new QueryTermScorer(query, fieldName);
		Highlighter highlighter = new Highlighter(formatter, fragmentScorer);
		highlighter.setTextFragmenter(new SimpleFragmenter(200));
		return highlighter;
	}
}

构建lucene索引：

package com.dj.blog.web;

import java.io.IOException;
import java.nio.file.Paths;
import java.sql.SQLException;
import java.util.List;
import java.util.Map;

import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import com.dj.blog.dao.BlogDao;
import com.dj.blog.util.PropertiesUtil;


/**
 * 构建lucene索引
 * @author Administrator
 * 1。构建索引	IndexWriter
 * 2、读取索引文件，获取命中片段
 * 3、使得命中片段高亮显示
 *
 */
public class IndexStarter {
	private static BlogDao blogDao = new BlogDao();
	public static void main(String[] args) {
		IndexWriterConfig conf = new IndexWriterConfig(new SmartChineseAnalyzer());
		Directory d;
		IndexWriter indexWriter = null;
		try {
			d = FSDirectory.open(Paths.get(PropertiesUtil.getValue("indexPath")));
			indexWriter = new IndexWriter(d , conf );
			
//			为数据库中的所有数据构建索引
			List<Map<String, Object>> list = blogDao.list(null, null);
			for (Map<String, Object> map : list) {
				Document doc = new Document();
				doc.add(new StringField("id", (String) map.get("id"), Field.Store.YES));
//				TextField用于对一句话分词处理	java培训机构
				doc.add(new TextField("title", (String) map.get("title"), Field.Store.YES));
				doc.add(new StringField("url", (String) map.get("url"), Field.Store.YES));
				indexWriter.addDocument(doc);
			}
			
		} catch (IOException e) {
			e.printStackTrace();
		} catch (InstantiationException e) {
			e.printStackTrace();
		} catch (IllegalAccessException e) {
			e.printStackTrace();
		} catch (SQLException e) {
			e.printStackTrace();
		}finally {
			try {
				if(indexWriter!= null) {
					indexWriter.close();
				}
			} catch (IOException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}
	}
}

中文分词器及高亮显示

package com.dj.lucene;

import java.io.StringReader;
import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.IntField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.SimpleSpanFragmenter;
import org.apache.lucene.store.FSDirectory;
import org.junit.Before;
import org.junit.Test;

/**
 * 中文分词器
 * 
 * 高亮显示
 * 
 * @author 86182
 *
 */
public class Demo7 {
	private Integer ids[] = { 1, 2, 3 };
	private String citys[] = { "青岛", "南京", "上海" };
	// private String descs[]={
	// "青岛是个美丽的城市。",
	// "南京是个有文化的城市。",
	// "上海市个繁华的城市。"
	// };
	private String descs[] = { "青岛是个美丽的城市。",
			"南京是一个文化的城市南京，简称宁，是江苏省会，地处中国东部地区，长江下游，濒江近海。全市下辖11个区，总面积6597平方公里，2013年建成区面积752.83平方公里，常住人口818.78万，其中城镇人口659.1万人。[1-4] “江南佳丽地，金陵帝王州”，南京拥有着6000多年文明史、近2600年建城史和近500年的建都史，是中国四大古都之一，有“六朝古都”、“十朝都会”之称，是中华文明的重要发祥地，历史上曾数次庇佑华夏之正朔，长期是中国南方的政治、经济、文化中心，拥有厚重的文化底蕴和丰富的历史遗存。[5-7] 南京是国家重要的科教中心，自古以来就是一座崇文重教的城市，有“天下文枢”、“东南第一学”的美誉。截至2013年，南京有高等院校75所，其中211高校8所，仅次于北京上海；国家重点实验室25所、国家重点学科169个、两院院士83人，均居中国第三。[8-10]",
			"上海市个繁华的城市。" };

	private FSDirectory dir;

	/**
	 * 每次都生成索引文件
	 * 
	 * @throws Exception
	 */
	@Before
	public void setUp() throws Exception {
		dir = FSDirectory.open(Paths.get("D:\\Software\\Softwarepath\\lucene\\demo\\demo2\\indexDir"));
		IndexWriter indexWriter = getIndexWriter();
		for (int i = 0; i < ids.length; i++) {
			Document doc = new Document();
			doc.add(new IntField("id", ids[i], Field.Store.YES));
			doc.add(new StringField("city", citys[i], Field.Store.YES));
			doc.add(new TextField("desc", descs[i], Field.Store.YES));
			indexWriter.addDocument(doc);
		}
		indexWriter.close();
	}

	/**
	 * 获取索引输出流
	 * 
	 * @return
	 * @throws Exception
	 */
	private IndexWriter getIndexWriter() throws Exception {
//		Analyzer analyzer = new StandardAnalyzer();//默认分词器
		Analyzer analyzer = new SmartChineseAnalyzer();//中文分词器
		IndexWriterConfig conf = new IndexWriterConfig(analyzer);
		return new IndexWriter(dir, conf);
	}

	/**
	 * luke查看索引生成
	 * 
	 * @throws Exception
	 */
	@Test
	public void testIndexCreate() throws Exception {

	}

	/**
	 * 测试高亮
	 * 
	 * @throws Exception
	 */
	@Test
	public void testHeight() throws Exception {
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);

		SmartChineseAnalyzer analyzer = new SmartChineseAnalyzer();
		QueryParser parser = new QueryParser("desc", analyzer);
		// Query query = parser.parse("南京文化");
		Query query = parser.parse("南京文明");
		TopDocs hits = searcher.search(query, 100);

		// 查询得分项
		QueryScorer queryScorer = new QueryScorer(query);
		// 得分项对应的内容片段
		SimpleSpanFragmenter fragmenter = new SimpleSpanFragmenter(queryScorer);
		// 高亮显示的样式
		SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter("<span color='red'><b>", "</b></span>");
		// 高亮显示对象
		Highlighter highlighter = new Highlighter(htmlFormatter, queryScorer);
		// 设置需要高亮显示对应的内容片段
		highlighter.setTextFragmenter(fragmenter);

		for (ScoreDoc scoreDoc : hits.scoreDocs) {
			//索引搜索对象通过文档下标获取文档
			Document doc = searcher.doc(scoreDoc.doc);
			String desc = doc.get("desc");
			if (desc != null) {
				// tokenstream是从doucment的域（field)中抽取的一个个分词而组成的一个数据流，用于分词。
				TokenStream tokenStream = analyzer.tokenStream("desc", new StringReader(desc));
				System.out.println("高亮显示的片段：" + highlighter.getBestFragment(tokenStream, desc));
			}
			System.out.println("所有内容：" + desc);
		}

	}

}

标题为空时：从数据库查找数据

在这里插入图片描述

标题不为空时：从索引目录查找
在这里插入图片描述
注意：索引目录下一定要有索引文件，否则会报错！！！

		no segments* file found in MMapDirectory@D:\Software\Softwarepath\lucene\demo\text lockFactory=org.apache.lucene.store.NativeFSLockFactory@57bce7cc: files: []