实验四构建索引

最新推荐文章于 2022-11-02 11:49:43 发布

静心兴*_*（bug收割員）

最新推荐文章于 2022-11-02 11:49:43 发布

阅读量172

点赞数

分类专栏：移动搜索算法

本文链接：https://blog.csdn.net/weixin_44123412/article/details/107372798

版权

移动搜索算法专栏收录该内容

14 篇文章 0 订阅

订阅专栏

实验四构建索引（2学时）
1、实验目的和要求

能理解Lucene中的Document-Field结构的数据建模过程；
能编针对特定数据生成索引文件。
2、实验环境
安装有eclipse与JDK 的计算机
3、实验内容
预备知识：
1、Document- Field的结构，与关系型数据库相似。表—索引；记录—Document；字段—Field。
2、索引文件：一个典型的segment通常包含以下几种后缀的文件，这几种文件共同构成了lucene索引的一个segment：
.f：评分信息。
.frq：每个词条的频率信息。
.prx：每个词条的位置信息。
.fnm：包含了Document中的所有field。
.fdt：用于存储具有Store.YES属性的Field数据。和.fdx是综合使用的两个文件。
.fdx：是一个索引，用于存储Document在.fdt中的位置。
.tis：用于存储分词后的词条（Term）。
.tii：就是它的索引文件，它标明了每个.tis文件中的词条的位置。
.cfx：这个属性的含义为是否使用复合格式来保存索引。
segments: lucene对索引管理的最大单位就是segment。每个segment内的所有索引文件都具有相同的前缀。一个索引中可能有多个segment。在一个索引中，只有一个“segments”文件，这个文件没有后缀，它记录着当前的索引内有多少个segment。每个segment中，有多少个Document这样的信息。
deletable：lucene索引中，所有的文档被删除后，并不是立刻从索引中去除，而是留待下一次合并索引或是对索引进行优化时才真正删除。类似于Windows的回收站原理。这种功能是通过deletable文件实现的。所有的文档在被删除后，会首先在deletable文件中留一个记录，真正删除的时候才将索引删除。
3、索引的合并与索引的优化
public IndexWriter(Directory d, Analyzer a, boolean create)
Lucene索引的存储路径有两种类型，FSDirectory和RAMDirectory。前者与文件系统的目录有关，写索引的时候直接写到磁盘上，后者与内存相关，添加过程与前者一样，但需将它写到磁盘中，否则当虚拟机推出后，里面的内容也会随之消失。需将RAMDirectory中的索引写到FSDirectory中。
使用IndexWriter来合并索引：
（1）RAMDirectory中的索引写到FSDirectory中；
（2）遇到大量的索引文件需要进行整理，但他们存在于不同的目录中，甚至是不同的物理空间中（将一个大型网络中的每一台主机上的索引文件合并，统一存储于一个具体的目录下）。
题目一：在指定目录生成表示3本书的索引，要求建立3个document分别存放书名数据。把生成的索引文件截好图（复合索引与一般索引各生成一次）

题目二：修改题目一的代码，使用多值域在一个文档中存放3本书的书名值。
题目三：针对题目一的三个文档，分别做如下操作：根据书名在索引中删除一个值、修改一个文档的域值。

4、实验实现过程

package com.GreatIndex;

import java.io.File;
import java.io.IOException;
import java.util.Dictionary;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldSelectorResult;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;

import com.sun.corba.se.impl.util.Version;

public class GreatIndex {
	public static void main(String[] args) {
		GreatIndex GreatIndexObj = new GreatIndex();
		GreatIndex.setUp();
		GreatIndex.DeleteDocument();
		GreatIndex.UpdateDocument();
		GreatIndex.setUp2();
	}  
	static String indexDir = "E:/learnspa/yidongsousuosuanfa/workspace/lab-index";
	static FSDirectory directory;
//	IndexWriter name = new IndexWriter(dictionary,new StandardAnalyzer(Version.LUCENE_30),true,IndexWriter.MaxFiledLength.UNLIMITED);
	private static void setUp() {
		try {
			directory = FSDirectory.open(new File(indexDir));
			IndexWriter writer = new IndexWriter(directory,new StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_30),true,IndexWriter.MaxFieldLength.UNLIMITED);
			Document doc1 = new Document();
			Document doc2= new Document();
			Document doc3= new Document();

			doc1.add(new Field("bookname","伐清",org.apache.lucene.document.Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
			doc2.add(new Field("bookname","奥术神座",org.apache.lucene.document.Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
			doc3.add(new Field("bookname","冰与火之歌",org.apache.lucene.document.Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
			
			
			writer.addDocument(doc1);
			writer.addDocument(doc2);
			writer.addDocument(doc3);
			writer.close();
			System.out.println("setUpDocument"+doc1);
			System.out.println("setUpDocument"+doc2);
			System.out.println("setUpDocument"+doc3);
			
	
			
			
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
	
	private static void DeleteDocument() {
		// TODO Auto-generated method stub
		try {
			IndexWriter writer = new IndexWriter(directory,new StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_30),true,IndexWriter.MaxFieldLength.UNLIMITED);
//			使用优化策略直接删除文档
			writer.deleteDocuments(new Term("bookname","伐清"));
			writer.close();
			
			System.out.println("DeleteDocument"+new Document());
		} catch (CorruptIndexException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (LockObtainFailedException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

	}
	
	private static void UpdateDocument() {
		// TODO Auto-generated method stub
		try {
			IndexWriter writer = new IndexWriter(directory,new StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_30),true,IndexWriter.MaxFieldLength.UNLIMITED);
//			创建一个新的document用于替换
			Document document = new Document();
			document.add(new Field("bookname", "lucene实战第二版",Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
			writer.updateDocument(new Term("booknmae","官仙"), document);
			writer.close();
			System.out.println("UpdateDocument"+document);
		} catch (CorruptIndexException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (LockObtainFailedException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
static String[] booknames = {"伐清","奥术神座","冰与火之歌"};
private static void setUp2() {
	// TODO Auto-generated method stub
	try {
		directory = FSDirectory.open(new File(indexDir));
		IndexWriter writer = new IndexWriter(directory,new StandardAnalyzer(org.apache.lucene.util.Version.LUCENE_30),true,IndexWriter.MaxFieldLength.UNLIMITED);
		Document doc = new Document();
		for (String bookname : booknames) {
			doc.add(new Field("bookname", bookname,Field.Store.YES,Field.Index.NOT_ANALYZED_NO_NORMS));
			writer.addDocument(doc);
//			writer.close();
			System.out.println("setUp2"+doc);
			
		}
	} catch (IOException e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}
}
}

实验结果
在这里插入图片描述

静心兴*_*（bug收割員）

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
实验四构建索引

实验四构建索引（2学时）1、实验目的和要求能理解Lucene中的Document-Field结构的数据建模过程；能编针对特定数据生成索引文件。2、实验环境安装有eclipse与JDK 的计算机3、实验内容预备知识：1、Document- Field的结构，与关系型数据库相似。表—索引；记录—Document；字段—Field。2、索引文件：一个典型的segment通常包含以下几种后缀的文件，这几种文件共同构成了lucene索引的一个segment：.f：评分信息。.frq：每个词
复制链接

扫一扫