lucene搜索引擎总结

最新推荐文章于 2024-04-29 18:36:24 发布

简乐君

最新推荐文章于 2024-04-29 18:36:24 发布

阅读量496

点赞数 1

分类专栏： lucene 文章标签： lucene 搜索引擎 IndexWriter

本文链接：https://blog.csdn.net/u012557538/article/details/48524199

版权

lucene 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

<p>Lucene分析器---Analyzer主要包括分词器和过滤器两种组件</p><p>分词器主要作用是对传入的文本进行切分，将文本按规则切分为一个个可以进入索引的最小单位，</p><p>而过滤器的功能则是对这种最小单位进行预处理，比如‘大写转小写，复数转单数’也可以进行相当复杂的的功能，如根据语义改写拼写错误的单词。</p><p>分析器使用过滤器和分词器构成了一个管道，文本在“流过”这个管道后，就成为可以进入索引的最小单位。</p>

package com.cn.zsj.lucene;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
/*
 * 这里推荐2本非常好的书籍，对搜索引擎的入门和提高都有极大帮助
 * 《Ajax+Lucene构建搜索引擎》--李刚  宋伟  邱哲 编著（人民邮电出版社）
 * 《lucene搜索引擎开发权威经典》--于天恩 编著（中国铁道出版社）
 */
/*
 * 这个例子要导入的包有
 * lucene-analyzers-common-5.3.0.jar
 * lucene-core-5.3.0.jar
 */
public class TestLucene {
	private static   String indexPath="f://indexPath";//这里我就定死索引文件的存放目录吧，这样好理解
	private static   String  filePath="f://filePath.txt";//文本文件的路径
public static void main(String[] args) {

		try {
			createIndex();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
}

/*
 * 第一步
 * 构造索引写入器对象IndexWriter
 * 这里要掌握的有索引的存放位置----FSDirectory,RAMDirectory
 * 在使用FSDirectory时，Lucene会自动再内存中建立缓存，然后到一定时候就将索引写入磁盘
 * 在使用RAMDirectory时，从功能角度来讲，FSDirectory在磁盘上能做的，它在内存中一样能做，并且会有更快的速度，
 * 唯一的问题就是它在虚拟机退出后，内存中的索引就不复存在了。因此如果还需要内存中的索引时，就需要将内存中的
 * 索引转入磁盘上，这就需要合并索引了，将内存中的索引合并到磁盘索引上
 * 这里用的方法是IndexWriter的addIndexes(Direcroty /Directory[])方法
 * 要注意合并前一定要先关闭被合并索引器的索引写入器，如果不这样会使程序运行失败
 */
public static IndexWriter getIndexWriter()throws IOException{
	Path path=Paths.get(indexPath);//通过指定文件系统的uri构建Path实例
	Directory dir=FSDirectory.open(path);//通过Path对象构建FSDirectory实例
	Analyzer analyzer=new StandardAnalyzer();//分析器、分词器，作用是在IndexWriter将文件写入索引以前，把文本信息切分成一个个可以进行索引的词条
	IndexWriterConfig indexCfg=new IndexWriterConfig(analyzer);//用标准的分词器配置索引写入器，详细介绍见API文档
	IndexWriter writer=new IndexWriter(dir,indexCfg);//这是一个索引写入器，指定索引文件的存放目录，主要的功能是将文档加入索引中。同时控制索引过程中的各种参数
	return writer;	
}
/*
 * 第二步
 * 构造文档对象Document
 * Document:lucene只对Document对象建立索引。任何需要进行索引的‘文件’都必须转化为Document对象才能被索引或者搜索到
 * 任何数据源经过组织都可以构建一个Document对象
 * Field:是与document对象紧密相连的一个概念，他代表了不同的数据源名称
 * Field有很多派生类来处理各种不同的数据源
 */
public static Document getDocument() throws FileNotFoundException{
	
	Document doc=new Document();
	//假如这里要将一个文本文件加入到索引文件中
	File  f=new File(filePath);//通过指定文本文件的路径构造File对象，
	String filename=f.getName();
	FileInputStream fis=new FileInputStream(f);//构建文件输入流对象
	Reader reader=new BufferedReader(new InputStreamReader(fis,StandardCharsets.UTF_8));
	doc.add(new StringField("filename",filename,Field.Store.YES));//Field.Store.YES:存储字段值（未分词前的字段值）
	doc.add(new TextField("content",reader));
	return doc;
}
/*
 * 切记要调用IndexWriter的close()方法来关闭写入器。只有调用了close方法后，索引写入器才会将存放于内存中的所有内容写入磁盘并关闭流。
 */
public static void  createIndex() throws IOException{
	IndexWriter writer=TestLucene.getIndexWriter();
	Document doc=TestLucene.getDocument();
	try {
		writer.addDocument(doc);
	} catch (Exception e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}
	writer.close();
}

}

下面是索引文件的存放目录