3、学习lucene之索引的更新和优化

最新推荐文章于 2021-02-26 13:17:06 发布

鲲鹏_斯坦森

最新推荐文章于 2021-02-26 13:17:06 发布

阅读量2.9k

点赞数

分类专栏： lucene 文章标签： lucene 优化 string 数据库 class file

本文链接：https://blog.csdn.net/PATTAP/article/details/7787014

版权

lucene 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

上一篇里简单介绍了lucene中索引的删除等操作，本篇将简单介绍索引的更新和优化。其实，在lucene中并没有像数据库中的更新似的操作，lucene的索引更新其实是先把对应的索引删除，然后加入新的索引；至于索引的优化问题，我记得在3.0之前的时候用的是IndexWriter的optimize()方法，但到了3.0之后本方法就过时了，提供了新的优化方法。

一、索引的更新

啥也不说了，直接上代码。

1、创建索引的目录对象Directory

2、创建IndexWriter对象

3、执行索引更新的方法

4、关闭IndexWriter

此处是把索引中id为1的记录替换成了id为18的新的内容

package com.hlp.lucene.updateIndex;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;

/**
 * 功能：更新索引
 *
 */
public class UpdateIndex
{
    // 索引的存放位置
    String luceneIndex = "G://lucene//luceneIndex2";

    // 分词器
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);


    public void updateIndex()
    {
	IndexWriter indexWriter = null;

	try
	{
	    // 1、创建索引的目录对象Directory
	    Directory directory = FSDirectory.open(new File(luceneIndex));

	    // 2、创建IndexWriter对象
	   
	    IndexWriterConfig iWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
	    indexWriter = new IndexWriter(directory, iWriterConfig);

	    // 3、执行索引更新的方法,
	    // 其实，在lucene中并没有像数据库中的那样更新，lucene中的更新是先删除对应的内容，再重新插入一条新的数据
	    Document doc = new Document();
	    doc.add(new Field("id", "18", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
	    doc.add(new Field("emailAddr", "18@18.com", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
	    doc.add(new Field("content", "This is 18.", Field.Store.NO, Field.Index.ANALYZED));
	    indexWriter.updateDocument(new Term("id", "1"), doc);

	}
	catch (CorruptIndexException e)
	{
	    e.printStackTrace();
	}
	catch (LockObtainFailedException e)
	{
	    e.printStackTrace();
	}
	catch (IOException e)
	{
	    e.printStackTrace();
	}finally
	{
	    // 4、关闭IndexWriter
	    try
	    {
		indexWriter.close();
	    }
	    catch (CorruptIndexException e)
	    {
		e.printStackTrace();
	    }
	    catch (IOException e)
	    {
		e.printStackTrace();
	    }
	}  // finally

    }
}

二、索引的优化

优化这个问题是比较纠结的，索引优化也是很费资源和时间的，但是优化索引也是提高检索速度的重要方法，因此需要好好权衡这一点。还有就是在lucene3.6中lucene可以自动进行索引的优化，当索引的数目达到一定的量之后会自动进行索引的优化。

1、声明Directory对象

2、声明IndexWriter对象

3、执行优化的方法，参数表示优化称几段索引

4、关闭IndexWriter

以下是代码演示：

package com.hlp.lucene.optimizeIndex;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.util.Version;

/**
 * 功能：优化索引
 * 
 * 注意：如果不手动优化索引，lucene会根据生成的索引文件的段数来判断，进而自己进行索引文件的合并和索引的优化等操作。
 * 
 * 建议不要手动优化，因为优化索引是一个很废资源的过程
 * 
 *
 */
public class OptimizeIndex
{

    // 索引的存放位置
    String luceneIndex = "G://lucene//luceneIndex2";
    // 分词器
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

    public void optimizeIndex()
    {
	IndexWriter indexWriter = null;

	try
	{
	    // 1、声明Directory对象
	    Directory directory = FSDirectory.open(new File(luceneIndex));

	    // 2、声明IndexWriter对象
	    IndexWriterConfig iWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
	    indexWriter = new IndexWriter(directory, iWriterConfig);

	    // 3、执行优化的方法，参数表示优化称几段索引
	    indexWriter.forceMerge(2);
	}
	catch (CorruptIndexException e)
	{
	    e.printStackTrace();
	}
	catch (LockObtainFailedException e)
	{
	    e.printStackTrace();
	}
	catch (IOException e)
	{
	    e.printStackTrace();
	}finally
	{
	    // 4、关闭IndexWriter
	    try
	    {
		indexWriter.close();
	    }
	    catch (CorruptIndexException e)
	    {
		e.printStackTrace();
	    }
	    catch (IOException e)
	    {
		e.printStackTrace();
	    }
	} // finally
    }
}

从代码中也可以看到，进行索引优化的方法名中用了force一词，这个词也在提醒我们这个方法要慎用，并且参数的大小要设置合理。