Lucene: Introduction to Lucene (Part II)

1. Lucene Delete Function

 /**
  * Delete Index
  * 
  */
 public void delete()
 {
  Directory dir = FSDirectory.open(new File("E:/LuceneIndex"));
  IndexWriter writer = null;
  try
  {
   writer = new IndexWriter(dir, new IndexWriterConfig(
     Version.LUCENE_35, new SimpleAnalyzer(Version.LUCENE_35)));
   // Param is a selector. It can be a Query or a Term
   // A Query is a set of conditions (id like %1%)
   // A Term is a specific condition (name = 1)
   writer.deleteDocuments(new Term("name", "FileItemIterator.java"));
  } catch (CorruptIndexException e)
  {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (LockObtainFailedException e)
  {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (IOException e)
  {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } finally
  {
   try
   {
    writer.close();
   } catch (CorruptIndexException e)
   {
    // TODO Auto-generated catch block
    e.printStackTrace();
   } catch (IOException e)
   {
    // TODO Auto-generated catch block
    e.printStackTrace();
   }
  }
 }

 

2.

    1) Like Windows, lucene provides a recycle bin for indices.

    2) When we execute query, we won't get the data that has been deleted.

    3) But we can fetch the indices whenever we want to rollback. And the deleted item is tag as _*_*.del

 

3. We can use IndexReader to get the number of deleted files

	/**
	 * Search
	 * @throws IOException
	 * @throws CorruptIndexException 
	 * 
	 */
	public void search() throws CorruptIndexException, IOException
	{
		IndexReader reader = IndexReader.open(dir);
		
		// We can get index file count by using reader
		System.out.println("numDocs = " + reader.numDeletedDocs());
		System.out.println("maxDocs = " + reader.maxDoc());
		System.out.println("deleteDocs = " + reader.numDeletedDocs());
	}

 

4. We can ue IndexReader to undelete deleted index files

 

	/**
	 * Undelete
	 * 
	 */
	public void undelete()
	{
		IndexReader reader = null;
		try
		{
			// param1: the directory
			// param2: readOnly
			reader = IndexReader.open(dir, false);
			reader.undeleteAll();
		} catch (CorruptIndexException e)
		{
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e)
		{
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally
		{
			try
			{
				reader.close();
			} catch (IOException e)
			{
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}

	}

    Comments:

        1) When we want to recovery the deleted files, we have to tag the readonly as false. Because by default, readonly is true.

        2) After the undelete operation, the file that with the suffix of .del has gone. The data in it has been recovery into index files.

 

5. How do we empty recycle bin? (Delete files that with the suffix of .del)

        1) Befory Lucene-3.5, this operation is called writer.optimize(). But it's now deprecated as every time we optimize, Lucene has to update all the index files. It's really high cost.

        2) In/After Lucene-3.5, the operation writer.forceMerge() is the alias of writer.optimize(). They do the same operation and both are high cost.

        3) So instead, we can use writer.forceMergeDeletes() to delete all deleted index files and is low cost.

 

6. About index file redundancy:

        1)We can find that every time we execute buildIndex(), there will be another group of index files that are built.

        2) As the count of execution grows, the index dir would become larger and larger. We should force the index file to update.

        3) But the operation of index file update is deprecated as Lucene will maintain these index files for us automatically.

        4) But we can merge index file manually.

	/**
	 * Merge
	 */
	public void merge()
	{

		IndexWriter writer = null;
		try
		{
			writer = new IndexWriter(dir, new IndexWriterConfig(
					Version.LUCENE_35, new SimpleAnalyzer(Version.LUCENE_35)));
			// Lucene will merge index files into two segments. The deleted item will be empty.
			// After Lucene-3.5, this method is deprecated.
			writer.forceMerge(2);
		} catch (CorruptIndexException e)
		{
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (LockObtainFailedException e)
		{
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e)
		{
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

 

 

6. How to delete all index files every time before we build index?

	/**
	 * Create Index
	 * 
	 * @throws IOException
	 * @throws LockObtainFailedException
	 * @throws CorruptIndexException
	 */
	public void buildIndex() throws CorruptIndexException,
			LockObtainFailedException, IOException
	{
		// 2. Create IndexWriter
		// --> It is used to write data into index files
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35,
				new SimpleAnalyzer(Version.LUCENE_35));
		IndexWriter writer = new IndexWriter(dir, config);
		// This function will empty index directory.
		writer.deleteAll();
		// Before 3.5 the way to create index is like below(depreciated):
		// new IndexWriter(Direcotry d, Analyzer a, boolean c, MaxFieldLength
		// mfl);
		// d: Directory, a: Analyzer, c: Shoule we create new one each time
		// mlf: The max length of the field to be indexed.

		// 3. Create Document
		// --> The target we want to search may be a doc file or a table in DB.
		// --> The path, name, size and modified date of the file.
		// --> All the information of the file should be stored in the Document.
		Document doc = null;

		// 4. Each Item of The Document is Called a Field.
		// --> The relationship of document and field is like table and cell.

		// Eg. We want to build index for all the txt file in the c:/lucene dir.
		// So each txt file in this dir is called a document.
		// And the name, size, modified date, content is called a field.
		File files = new File("E:/LuceneData");
		for (File file : files.listFiles())
		{
			doc = new Document();
			// Using FileReader, we didn't store content into index file
			// doc.add(new Field("content", new FileReader(file)));
			// If we want to store content into index file, we have to read
			// content into string.
			String content = FileUtils.readFileToString(file);
			doc.add(new Field("content", content, Field.Store.YES,
					Field.Index.ANALYZED));

			doc.add(new Field("name", file.getName(), Field.Store.YES,
					Field.Index.NOT_ANALYZED));
			// Field.Store.YES --> The field should be stored in index file
			// Field.Index.ANALYZED --> The filed should be participled
			doc.add(new Field("path", file.getAbsolutePath(), Field.Store.YES,
					Field.Index.NOT_ANALYZED));

			// 5. Create Index File for Target Document by IndexWriter.
			writer.addDocument(doc);
		}

		// 6. Close Index Writer
		if (null != writer)
		{
			writer.close();
		}
	}

    Comments: writer.deleteAll() --> Will delete all index files.

 

6. How to update index?   

 

	/**
	 * Update
	 * 
	 */
	public void update()
	{
		IndexWriter writer = null;
		Document doc = null;
		try
		{
			writer = new IndexWriter(dir, new IndexWriterConfig(
					Version.LUCENE_35, new SimpleAnalyzer(Version.LUCENE_35)));

			doc = new Document();
			doc.add(new Field("id", "1", Field.Store.YES, Field.Index.ANALYZED));
			doc.add(new Field("name", "Yang", Field.Store.YES,
					Field.Index.NOT_ANALYZED));
			doc.add(new Field("password", "Kunlun", Field.Store.YES,
					Field.Index.NOT_ANALYZED));
			doc.add(new Field("gender", "Male", Field.Store.YES,
					Field.Index.NOT_ANALYZED));
			doc.add(new Field("score", 110 + "", Field.Store.YES,
					Field.Index.NOT_ANALYZED));
			/*
			 * Actually, Lucene doesn't provide update function. The update
			 * function is delete + add First, delete index files that match the
			 * term Second, build new index based on doc passed in
			 */
			writer.updateDocument(new Term("name", "Davy"), doc);

		} catch (CorruptIndexException e)
		{
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (LockObtainFailedException e)
		{
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (IOException e)
		{
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

	}

  

 Summary:

    1. Delete: Using writer.deleteAll(); writer.delete(new Term(key, value)); writer.optimize(); writer.forceMergeDeletes(maxSegments);

    2. Recovery: Using reader.undeleteAll() to recovery all items that are deleted.

    3. Update: Using writer.update(new Term(key, value), doc); It will delete items that match the term and will add doc using the passing in doc. 

1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值