Lucene02----整体架构

最新推荐文章于 2024-06-27 14:41:12 发布

语行tt

最新推荐文章于 2024-06-27 14:41:12 发布

阅读量117

点赞数

分类专栏： Lucene Java

Java 同时被 2 个专栏收录

19 篇文章 0 订阅

订阅专栏

Lucene

2 篇文章 0 订阅

订阅专栏

Lucene的总体架构

Lucene 是有索引和搜索的两个过程，包含索引创建，索引，搜索三个要点

看Lucene 的各组件

被索引的文档用Document对象表示。

IndexWriter通过函数addDocument将文档添加到索引中，实现创建索引的过程。

Lucene的索引是应用反向索引。

当用户有请求时，Query代表用户的查询语句。

IndexSearcher通过函数search搜索Lucene Index。

IndexSearcher计算term weight和score并且将结果返回给用户。

Lucene API 的调用实现索引

File path = new File("G:/index");

if (!path.exists()) {
		path.mkdir();
}
//索引文件保存的目录
Directory d = FSDirectory.open(path);
//词法分析及语言处理组件
Analyzer analyzer = new StandardAnalyzer(DEFAULT_VERSION);
//保存有关IndexWriter的所有配置信息
IndexWriterConfig conf = new IndexWriterConfig(DEFAULT_VERSION,
					analyzer);
//创建以及维护一个索引的IndexWriter
IndexWriter indexWriter = new IndexWriter(d, conf);


//要简历索引的目录
File indexFileDir = new File("G:/SearchDir");
File[] files = indexFile.listFiles();
for (File file : files) {
			if (file.isDirectory()) {
					fileIndexerProcessor(writer, file);
				} else {
					Document document = new Document();
		document.add(new Field("text", new FileReader(file)));
					writer.addDocument(document);
				}
		}
//在一个索引上执行优化操作
indexWriter.optimize();
//提交在一个索引上所做的所有改变以及关闭关联的文件。要注意这是一个昂贵的操作，要尝试重用单个IndexWriter实例而不是关闭和重新创建
indexWriter.close();

搜索过程

File indexDir = new File("G:/index");

//索引文件存放的目录
	Directory dir = FSDirectory.open(indexDir);
 	//IndexReader是一个抽象类，提供了访问一篇索引的接口
	IndexReader reader = IndexReader.open(dir); 
	//在单个IndexReader搜索的实现,通常，应用程序只需要调用继承的 search(Query,int) or search(Query,Filter,int) 方法,为了更好的性能，当你的索引没有修改的情况下，一般最好在多个搜索上共享一个IndexReader实例，而不是每一次搜索都新创建一个。如果你的索引被修改了并且你希望看到这些修改对搜索结果的影响，应该使用IndexReader.reopen() 来获取一个新的reader然后用它创建一个新的IndexSearcher。
	IndexSearcher searcher = new IndexSearcher(reader);
    //Analyzer创建了解析文本的TokenStreams,它代表了一种从文本中提取索引项的方针
	Analyzer analyzer = new StandardAnalyzer(DEFAULT_VERSION);
	//QueryParse对查询语句行进分析，有JAVACC生成，最重要的方法就是parse(String);
	QueryParser parser = new QueryParser(DEFAULT_VERSION, "text",
					analyzer);
    //QueryParser 调用parser 进行语法分析，形成查询语法树，放到Query 中
	Query query = parser.parse("Romanian,Bulgarian");
	//TopDocs代表搜索所获得的命中的结果数
	TopDocs hits = searcher.search(query, null, 10);
		System.out.println(hits.totalHits);

此图是上一节介绍的全文检索的流程对应的Lucene 实现的包结构

Lucene 的analysis 模块主要负责词法分析及语言处理而形成Term。

Lucene的index模块主要负责索引的创建，里面有IndexWriter。

Lucene的store模块主要负责索引的读写。

Lucene 的QueryParser主要负责语法分析。

Lucene的search模块主要负责对索引的搜索。

语行tt

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Lucene02----整体架构

Lucene的总体架构 Lucene 是有索引和搜索的两个过程，包含索引创建，索引，搜索三个要点看Lucene 的各组件被索引的文档用Document对象表示。 IndexWriter通过函数addDocument将文档添加到索引中，实现创建索引的过程。 Lucene的索引是应用反向索引。当用户有请求时...
复制链接

扫一扫

专栏目录