IndexWriter中的commit、rollback和close

最新推荐文章于 2024-05-26 21:37:42 发布

ningbohezhijun

最新推荐文章于 2024-05-26 21:37:42 发布

阅读量2.1k

点赞数

分类专栏： Lucene

本文链接：https://blog.csdn.net/ningbohezhijunbl/article/details/20469479

版权

Lucene 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

http://www.cnblogs.com/huangfox/archive/2010/10/18/1854142.html

Commit：

Commits all pending changes (added & deleted documents, optimizations, segment merges, added indexes, etc.) to the index, and syncs all referenced index files, such that a reader will see the changes and the index updates will survive an OS or machine crash or power loss. Note that this does not wait for any running background merges to finish.This may be a costly operation, so you should test the cost in your application and do it only when really necessary.

rollback：

Close the IndexWriter without committing any changes that have occurred since the last commit (or since it was opened, if commit hasn't been called). This removes any temporary files that had been created, after which the state of the index will be the same as it was when commit() was last called or when this writer was first opened.

回滚可以理解为操作系统的还原操作，还原到最近一次提交时的状态。如果IndexWriter打开后没有commit过，则还原到IndexWriter打开时的状态。

注意一点：

rollback会关闭当前IndexWriter实例。

close：

Commits all changes to an index and closes all associated files. Note that this may be a costly operation, so, try to re-use a single writer instead of closing and opening a new one.

下面给出实例代码：

首先，给出控制索引目录的类

package ceshi0304;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.store.FSDirectory;

public class Dir {
	public static String path = "D:\\20140303index";
	public static FSDirectory dir = null;
	
	public static String getPath() {
		return path;
	}
	public static FSDirectory getDir() {
		if (dir == null) {
			try {
				dir = FSDirectory.open(new File(path));
			}catch (IOException e) {
				e.printStackTrace();
			}
		}
		return dir;
	}
	public static void closeDir() {
		if (dir != null) {
			dir.close();
		}
	}
}

接着给出添加文档的类：

package ceshi0304;

import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;

public class Writer {
	IndexWriter writer = null;
	FSDirectory dir = Dir.getDir();
	
	public Writer() {
		try {
			Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46);  
            IndexWriterConfig iwConfig = new IndexWriterConfig(  
                    Version.LUCENE_46, analyzer);
            iwConfig.setMaxBufferedDocs(4);
            writer = new IndexWriter(dir, iwConfig);
            
            System.out.println(iwConfig.getMaxBufferedDocs() + ":max buffered docs");
            System.out.println(iwConfig.getRAMBufferSizeMB() + ":ram buffer size mb");
            System.out.println(iwConfig.getMergePolicy().toString() + ":merge policy");
		}catch (Exception e) {
			e.printStackTrace();
		}
	}
	
	
	//添加数据，不commit
	public int add(String content) {
		int res = 0;
		try {
			System.out.println("num doc ： " + writer.numDocs());
			Document doc = new Document();
			doc.add(new TextField("f", content, Store.YES));
			
			if (content == null || content.trim().equals("")) {
				throw new Exception("模拟异常，启用回滚机制！");
			}
			writer.addDocument(doc);
			res = 1;
		}catch (CorruptIndexException e) {
			e.printStackTrace();
		}catch (LockObtainFailedException e) {
			e.printStackTrace();
		}catch (IOException e) {
			e.printStackTrace();
		}catch (Exception e) {
			try {
				writer.rollback();
				System.out.println("回滚...");
			}catch (IOException e1) {
				e.printStackTrace();
			}
		}finally {
			if (res == 0) {
				res = -1;
			}
		}
		return res;
	}
	
	//添加数据，进行commit
	public int addc(String content) {
		int res = 0;
		try {
			System.out.println("num doc ： " + writer.numDocs());
			Document doc = new Document();
			doc.add(new TextField("f", content, Store.YES));
			if (content == null || content.trim().equals("")) {
				throw new Exception("模拟异常，启用回滚机制！");
			}
			writer.addDocument(doc);
			writer.commit();
			res = 1;
		}catch (CorruptIndexException e) {
			e.printStackTrace();
		}catch (LockObtainFailedException e) {
			e.printStackTrace();
		}catch (IOException e) {
			e.printStackTrace();
		}catch (Exception e) {
			try {
				writer.rollback();
				System.out.println("回滚...");
			}catch (IOException e1) {
				e.printStackTrace();
			}
		}finally {
			if (res == 0) {
				res = -1;
			}
		}
		return res;
	}
	
	//关闭IndexWriter实例
	public void close() {
		if (writer != null) {
			try {
				writer.close();
			}catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
}

该添加文档的类，提供了两个添加文档的方法。

区别是：

add方法：添加后不提交（commit）；

addc方法：添加后马上提交。

公用一个Indexwriter实例。

接着再给出一个测试检索的类：

package ceshi0304;

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Search {
	IndexSearcher searcher = null;
	IndexReader reader = null;
	
	public Search() {
		FSDirectory dir = Dir.getDir();
		try {
			reader = DirectoryReader.open(dir);
			searcher = new IndexSearcher(reader);
		}catch (Exception e) {
			e.printStackTrace();
		}
	}
	
	//重新打开IndexSearcher实例
	public void reopen() {
		closeSearcher();
		FSDirectory dir = Dir.getDir();
		try {
			reader = DirectoryReader.open(dir);
			searcher = new IndexSearcher(reader);
			System.out.println(reader.numDocs());
		}catch (Exception e) {
			e.printStackTrace();
		}
	}
	
	//模拟检索过程
	public void search(String queryString) {
		try {
			Query query = new QueryParser(Version.LUCENE_46, "f", new StandardAnalyzer(Version.LUCENE_46)).parse(queryString);
			TopScoreDocCollector results = TopScoreDocCollector.create(10, true);
			searcher.search(query, results);
			
			System.out.println("total hits : " + results.getTotalHits());
			TopDocs top = results.topDocs(0, results.getTotalHits());
			ScoreDoc[] docs = top.scoreDocs;
			for (ScoreDoc doc : docs) {
				System.out.println(searcher.doc(doc.doc));
			}
		}catch (Exception e) {
			e.printStackTrace();
		}
	}
	
	//关闭IndexReader实例
	public void closeSearcher() {
		if (this.reader != null) {
			try {
				this.reader.close();
			}catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
}

注意其中给出一个reopen的方法，即重新构造一个新的IndexSearcher实例。

另外，lucene4.6版本中，IndexSearcher是没有关闭方法了，所以可以通过IndexReader来关闭，考虑到IndexSearcher只是IndexReader的简单包装。

最后给出一个总控制程序，进行测试：

package ceshi0304;

import java.io.BufferedReader;
import java.io.InputStreamReader;

public class MainApp {
	public static void main(String[] args) {
		BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
		Writer w = new Writer();
		Search s = new Search();
		String order = "";
		while (true) {
			try {
				System.out.println("输入指令：");
				order = reader.readLine();
				if (order.equals("add")) {
					System.out.println("字段内容：");
					String content = reader.readLine();
					w.add(content);
				}else if (order.equals("addc")) {
					System.out.println("字段内容：");
					String content = reader.readLine();
					w.addc(content);
				}else if (order.equals("sea")) {
					System.out.println("检索式：");
					String queryString = reader.readLine();
					s.search(queryString);
				}else if (order.equals("reopen")) {
					s.reopen();
					System.out.println("searcher was reopened...");
				}else if (order.equals("e")) {
					w.close();
					s.closeSearcher();
					break;
				}else {
					continue;
				}
			}catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
}

开始进行测试：

（这里补充一点，我建的索引和搜索的目录都是在上一篇文章中的目录下操作的，因为直接搞个空的目录，IndexSearcher不能建立，抛异常。不知道原文中的博主为什么能够建立，好在原先的索引对于这里的实验都是无效的索引，即不影响实验结果）

第一项测试：

测试目的：

添加的文档在当前IndexWriter实例没有关闭之前，要想让IndexSearcher可见，必要条件包括：

IndexWriter实例添加文档后commit；

IndexSearcher实例重新打开reopen；

测试过程：

add一个文档，

检索（预计检索不到）；

addc一个文档，

检索（预计检索不到）；

reopen

检索（预计检索到）；

测试结果：

4:max buffered docs
16.0:ram buffer size mb
[TieredMergePolicy: maxMergeAtOnce=10, maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.1:merge policy
输入指令：
add
字段内容：
fox1
num doc ： 500000
输入指令：
sea
检索式：
fox*
total hits : 0
输入指令：
addc
字段内容：
fox2
num doc ： 500001
输入指令：
sea
检索式：
fox*
total hits : 0
输入指令：
reopen
500002
searcher was reopened...
输入指令：
sea
检索式：
fox*
total hits : 2
Document<stored,indexed,tokenized<f:fox1>>
Document<stored,indexed,tokenized<f:fox2>>
输入指令：

可以看到，每次插入后，writer.numDocs()是直接变大了（无论是否commit，注意num doc:的打印在wirter.addDocument(doc)），但是我们无法搜索到。因为搜索是和reader相关的，和writer无关。

测试结论：

要使IndexSearcher对当前索引的更新可见，IndexWriter的更新动作后必须提交，并且IndexSearcher实例必须reopen。

这里IndexWriter必须提交也包括IndexWriter.close()，因为close也会提交。（即使程序运行结束，虽然会释放IndexWriter的空间，但不会调用其close方法，而且也是不好的编程习惯）

第二项测试：

测试目的：

rollback将回滚掉最近一次commit动作后的所有更新。

测试过程：

第一项测试已经添加了两篇文档（fox1、fox2）

add一篇文档（fox3）

add一篇文档（fox4）

add一个空字符，造成异常模拟rollback（此时IndexWriter实例已经关闭。）

重新开启主控制程序（MainApp）

addc一篇文档（fox5）

reopen

sea（预计检索fox1、fox2、fox5）

测试结果：

制造异常导致rollback的结果略去分析。

4:max buffered docs
16.0:ram buffer size mb
[TieredMergePolicy: maxMergeAtOnce=10, maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.1:merge policy
输入指令：
add
字段内容：
fox3
num doc ： 500002
输入指令：
add
字段内容：
fox4
num doc ： 500003
输入指令：
add
字段内容：
 
num doc ： 500004
回滚...
输入指令：

重新开启主控制程序（MainApp）

4:max buffered docs
16.0:ram buffer size mb
[TieredMergePolicy: maxMergeAtOnce=10, maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.1:merge policy
输入指令：
addc
字段内容：
fox5
num doc ： 500002
输入指令：
sea
检索式：
fox*
total hits : 2
Document<stored,indexed,tokenized<f:fox1>>
Document<stored,indexed,tokenized<f:fox2>>
输入指令：
reopen
500003
searcher was reopened...
输入指令：
sea
检索式：
fox*
total hits : 3
Document<stored,indexed,tokenized<f:fox1>>
Document<stored,indexed,tokenized<f:fox2>>
Document<stored,indexed,tokenized<f:fox5>>
输入指令：

测试结论：

rollback将回滚掉最近一次commit动作后的所有更新记录。

另外说明一点，调用rollback后，IndexWriter会关闭，这也是要重启 MainApp的原因，这个可以看API：

rollback()

Close the IndexWriter without committing any changes that have occurred since the last commit (or since it was opened, if commit hasn't been called).

ningbohezhijun

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
IndexWriter中的commit、rollback和close

http://www.cnblogs.com/huangfox/archive/2010/10/18/1854142.htmlCommit：Commits all pending changes (added & deleted documents, optimizations, segment merges, added indexes, etc.) to the i
复制链接

扫一扫

专栏目录