lucene入门

最新推荐文章于 2022-04-26 20:30:46 发布

扶木

最新推荐文章于 2022-04-26 20:30:46 发布

阅读量212

点赞数

文章标签： lucene 全文检索

本文链接：https://blog.csdn.net/qq_39607417/article/details/79164383

版权

先说下两者(solr)的区别吧

首先Solr是基于Lucene做的，Lucene是一套信息检索工具包，但并不包含搜索引擎系统，它包含了索引结构、读写索引工具、相关性工具、排序等功能，因此在使用Lucene时你仍需要关注搜索引擎系统，例如数据获取、解析、分词等方面的东西。
而Solr的目标是打造一款企业级的搜索引擎系统，因此它更接近于我们认识到的搜索引擎系统，它是一个搜索引擎服务，通过各种API可以让你的应用使用搜索服务，而不需要将搜索逻辑耦合在应用中。而且Solr可以根据配置文件定义数据解析的方式，更像是一个搜索框架，它也支持主从、热换库等操作。还添加了飘红、facet等搜索引擎常见功能的支持。
因而，Lucene使用上更加灵活，但是你需要自己处理搜素引擎系统架构，以及其他附加附加功能的实现。而Solr帮你做了更多，但是是一个处于高层的框架，Lucene很多新特性不能及时向上透传，所以有时候可能发现需要一个功能，Lucene是支持的，但是Solr上已经看不到相关接口。

用哪个?

直接贴上大佬的话

Many people new to Lucene and Solr will ask the obvious question: Should I use Lucene or Solr?

The answer is simple: if you're asking yourself this question, in 99% of situations, what you want to use is Solr.

A simple way to conceptualize the relationship between Solr and Lucene is that of acar and its engine. You can't drive an engine, but you can drive a car. Similarly, Lucene is a programmatic library which you can't use as-is, whereas Solr is a complete application which you can use out-of-box.

然而现在更加推荐使用ELK 以后有机会学学

ELK 由三部分组成elasticsearch、logstash、kibana，elasticsearch是一个近似实时的搜索平台,它让你以前所未有的速度处理大数据成为可能。

ELK>solr>lucene

-------------------------------------------------分割----------------------------------------------------------

基础理论

Lucene 是Apache开源的全文检索的工具包创建索引查询索引

什么是全文检索？先创建索引再对索引进行搜索的过程叫全文检索

索引是什么？非结构数据中提取一个数据、并重新组合的过程叫索引

结构化数据：指具有固定格式或有限长度的数据，如数据库，元数据等。

非结构化数据：指不定长或无固定格式的数据，如邮件，word文档等磁盘上的文件

Lucene实现全文检索的流程

1 创建索引

第一步：获取文件

第二步：创建文档对象

第三步：创建分析器

第四步：保存索引及文档到索引库

2 搜索索引

第一步：用户接口（百度）

第二步：创建Query查询对象（KV）域名：值

第三步：执行查询

第四步：渲染

代码实现

创建索引库

第一步：创建一个java工程，并导入jar包。

第二步：创建一个indexwriter对象。

1）指定索引库的存放位置Directory对象

2）指定一个分析器，对文档内容进行分析。

第二步：创建document对象。

创建field对象，将field添加到document对象中。

第四步：使用indexwriter对象将document对象写入索引库，此过程进行索引创建。并将索引和document对象写入索引库。

第五步：关闭IndexWriter对象。

	//创建索引
	@Test
	public void testIndex() throws Exception{
//		第一步：创建一个java工程，并导入jar包。
//		第二步：创建一个indexwriter对象。
//			1）指定索引库的存放位置Directory对象
//			2）指定一个分析器，对文档内容进行分析。
			//指定索引库存放的路径  file system directory
		// Directory directory = new RAMDirectory();//保存索引到内存中 （内存索引库）
		Directory directory = FSDirectory.open(new File("H:\\lucene&solr\\lucene\\temp\\index"));
			//指定分析器  StandardAnalyzer 官方推荐 但是不适用中文  使用第三方 IKAnalyzer
		Analyzer analyzer = new IKAnalyzer();
			//创建indexwriterCofig对象
			//第一个参数： Lucene的版本信息，可以选择对应的lucene版本也可以使用LATEST 就是多个时使用最新版
			//第二根参数：分析器对象
		IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
		IndexWriter indexWriter = new IndexWriter(directory, config);
//		第三步：创建field对象，将field添加到document对象中。
		File f =  new File("H:\\gitHub");
		File[] listFiles = f.listFiles();
		for (File file : listFiles) {
			//创建document对象
			Document document = new Document();
			//文件名
			String file_name = file.getName();
			Field fileNameField = new TextField("fileName", file_name, Store.YES);
			//文件大小
			Long file_size = FileUtils.sizeOf(file);
			Field fileSizeField = new LongField("fileSize", file_size, Store.YES);
			//文件内容
			String file_content = FileUtils.readFileToString(file);
			Field fileContentField = new TextField("fileContent", file_content, Store.YES);
			//文件路径
			String file_path = file.getPath();
			Field filePathField = new StoredField("filePath", file_path);
			//添加到文档
			document.add(fileNameField);
			document.add(fileSizeField);
			document.add(fileContentField);
			document.add(filePathField);
	//		第四步：使用indexwriter对象将document对象写入索引库，此过程进行索引创建。并将索引和document对象写入索引库。
			indexWriter.addDocument(document);
		}
//		第五步：关闭IndexWriter对象。
		indexWriter.close();
	}

查询索引

第一步：创建一个Directory对象，也就是索引库存放的位置。

第二步：创建一个indexReader对象，需要指定Directory对象。

第三步：创建一个indexsearcher对象，需要指定IndexReader对象

第四步：创建一个TermQuery对象，指定查询的域和查询的关键词。

第五步：执行查询。

第六步：返回查询结果。遍历查询结果并输出。

第七步：关闭IndexReader对象

	//搜索索引
	@Test
	public void testSearch() throws Exception{
//		第一步：创建一个Directory对象，也就是索引库存放的位置。
		Directory directory = FSDirectory.open(new File("H:\\lucene&solr\\lucene\\temp\\index"));
//		第二步：创建一个indexReader对象，需要指定Directory对象。
		IndexReader indexReader = DirectoryReader.open(directory);
//		第三步：创建一个indexsearcher对象，需要指定IndexReader对象
		IndexSearcher indexSearcher = new IndexSearcher(indexReader);
//		第四步：创建一个TermQuery对象，指定查询的域和查询的关键词。
		TermQuery termQuery = new TermQuery(new Term("fileName","spring"));
//		第五步：执行查询。
		TopDocs topDocs = indexSearcher.search(termQuery, 10);
//		第六步：返回查询结果。遍历查询结果并输出。
		ScoreDoc[] scoreDocs = topDocs.scoreDocs;
		for (ScoreDoc scoreDoc : scoreDocs) {
			int doc = scoreDoc.doc;
			Document document = indexSearcher.doc(doc);
			// 文件名称
			String fileName = document.get("fileName");
			System.out.println(fileName);
			// 文件内容
			String fileContent = document.get("fileContent");
			System.out.println(fileContent);
			// 文件大小
			String fileSize = document.get("fileSize");
			System.out.println(fileSize);
			// 文件路径
			String filePath = document.get("filePath");
			System.out.println(filePath);
			System.out.println("------------");
		}
//		第七步：关闭IndexReader对象
		indexReader.close();
	}

IndexSearch搜索方法

indexSearcher.search(query, n)	Query搜索，返回评分最高的n条记录
indexSearcher.search(query, filter, n)	据Query搜索，添加过滤策略，返回评分最高的n条记录x
indexSearcher.search(query, n, sort)	根据Query搜索，添加排序策略，返回评分最高的n条记录
indexSearcher.search(booleanQuery, filter, n, sort)	根据Query搜索，添加过滤策略，添加排序策略，返回评分最高的n条记录

第三方插件中文分析器

使用IK分析器

第一步：导入IK.jar

第二步：复制ik.cfg.xml

第三步：复制stopword.dic ext.dic

全放在classpath下

<properties>
	<!-- 文件名IKAnalyzer.cfg.xml -->
	<comment>IK Analyzer 扩展配置</comment>
	<!-- 用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">ext.dic;</entry> 
	
	<!-- 配置自己的扩展停止词字典 -->
	<entry key="ext_stopwords">stopword.dic;</entry> 
</properties>

Lucene的维护

添加

删除

修改

查询

public IndexWriter getIndexWriter() throws Exception {
//	第一步：创建一个java工程，并导入jar包。
//	第二步：创建一个indexwriter对象。
	//	1）指定索引库的存放位置Directory对象
		Directory directory = FSDirectory.open(new File("H:\\lucene&solr\\lucene\\temp\\index"));	
	//	2）指定一个分析器，对文档内容进行分析。
		Analyzer analyzer = new StandardAnalyzer();
		IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
	return new IndexWriter(directory,config);
	}
	
	//全删除
	@Test
	public void testAllDelete() throws Exception{
		IndexWriter indexWriter = getIndexWriter();
		indexWriter.deleteAll();
		indexWriter.close();
	}
	//条件删除
	@Test
	public void testDelete() throws Exception{
		IndexWriter indexWriter = getIndexWriter();
		Query query = new TermQuery(new Term("fileName", "spring"));
		indexWriter.deleteDocuments(query);
		indexWriter.close();
	}
	//修改
	@Test
	public void testUpdate() throws Exception{
		IndexWriter indexWriter = getIndexWriter();
		Document doc = new Document();
		doc.add(new TextField("fileN", "测试文件名", Store.YES));
		doc.add(new TextField("fileC", "测试文件内容", Store.YES));
		indexWriter.updateDocument(new Term("fileName", "spr"), doc,new IKAnalyzer());
		indexWriter.close();
	}

高级查询

第一个：全查询 MatchAllDocsQuery

第二个：TermQuery 精准查询

第三个：区间查询（根据数值）L

第四个：组合查询BooleadQuery 多个 AND OR NOT 对应Occur..MUST/NOT/ SHOULD

	//indexReader  indexSearch
	public IndexSearcher getIndexSearcher() throws Exception{
		// 第一步：创建一个Directory对象，也就是索引库存放的位置。
		Directory directory = FSDirectory.open(new File("H:\\lucene&solr\\lucene\\temp\\index"));
		// 第二步：创建一个indexReader对象，需要指定Directory对象。
		IndexReader indexReader = DirectoryReader.open(directory);
		// 第三步：创建一个indexsearcher对象，需要指定IndexReader对象
		return new IndexSearcher(indexReader);
	}
	
	//执行查询的结果
	public void printResult(IndexSearcher indexSearcher,Query query) throws Exception{
		TopDocs topDocs = indexSearcher.search(query, 10);
		ScoreDoc[] scoreDocs = topDocs.scoreDocs;
		for (ScoreDoc scoreDoc : scoreDocs) {
			int doc = scoreDoc.doc;
			Document document = indexSearcher.doc(doc);
			//文件名称
			String fileName = document.get("fileName");
			System.out.println(fileName);
			// 文件内容
			String fileContent = document.get("fileContent");
			System.out.println(fileContent);
			// 文件大小
			String fileSize = document.get("fileSize");
			System.out.println(fileSize);
			// 文件路径
			String filePath = document.get("filePath");
			System.out.println(filePath);
			System.out.println("------------");
		}
	}
	
	//查询所有
	@Test
	public void testMatchAllDocsQuery() throws Exception{
		
		IndexSearcher indexSearcher = getIndexSearcher();
		Query query = new MatchAllDocsQuery();
		System.err.println(query);
		printResult(indexSearcher, query);
		indexSearcher.getIndexReader().close();
	}
	
	//根据数值范围查询
	@Test
	public void testNumericRangeQuery() throws Exception{
		IndexSearcher indexSearcher = getIndexSearcher();
		Query query  = NumericRangeQuery.newLongRange("fileSize", 47L, 200L, false, true);
		System.out.println(query);
		printResult(indexSearcher, query);
		indexSearcher.getIndexReader().close();
	}
	//可以组合查询条件
	@Test
	public void testBooleanQuery() throws Exception{
		IndexSearcher indexSearcher = getIndexSearcher();
		BooleanQuery booleanQuery = new BooleanQuery();
		
		Query query1 = new TermQuery(new Term("fileName","mvc"));
		Query query2 = new TermQuery(new Term("fileName","spring"));
		
		booleanQuery.add(query1,Occur.MUST);
		booleanQuery.add(query2,Occur.SHOULD);
//		Occur.MUST：必须满足此条件，相当于and
//		Occur.SHOULD：应该满足，但是不满足也可以，相当于or
//		Occur.MUST_NOT：必须不满足。相当于not
		System.out.println(booleanQuery);
		printResult(indexSearcher, booleanQuery);
		//关闭资源  
		indexSearcher.getIndexReader().close();
	}

扶木

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
lucene入门

先说下两者(solr)的区别吧首先Solr是基于Lucene做的，Lucene是一套信息检索工具包，但并不包含搜索引擎系统，它包含了索引结构、读写索引工具、相关性工具、排序等功能，因此在使用Lucene时你仍需要关注搜索引擎系统，例如数据获取、解析、分词等方面的东西。而Solr的目标是打造一款企业级的搜索引擎系统，因此它更接近于我们认识到的搜索引擎系统，它是一个搜索引擎服务，通过
复制链接

扫一扫