搜索引擎（Lucene-索引详解）

最新推荐文章于 2024-07-04 14:41:08 发布

weixin_34248118

最新推荐文章于 2024-07-04 14:41:08 发布

阅读量412

点赞数

文章标签：人工智能数据库 python

原文链接：https://my.oschina.net/u/3728166/blog/3004711

版权

2019独角兽企业重金招聘Python工程师标准>>>

IndexWriter详解

问题1：索引创建过程完成什么事？

回顾架构图

Lucene索引创建API 图示

Lucene索引创建代码示例

public static void main(String[] args) throws IOException {
    // 创建使用的分词器
    Analyzer analyzer = new IKAnalyzer4Lucene7(true);
    // 索引配置对象
    IndexWriterConfig config = new IndexWriterConfig( analyzer );
    // 设置索引库的打开模式：新建、追加、新建或追加
    config.setOpenMode(OpenMode.CREATE_OR_APPEND);
    // 索引存放目录
    // 存放到文件系统中
    Directory directory = FSDirectory.open((new File("f:/test/indextest")).toPath());
    // 存放到内存中
    // Directory directory = new RAMDirectory();
    // 创建索引写对象
    IndexWriter writer = new IndexWriter(directory, config);
    // 创建document
    Document doc = new Document();
    // 往document中添加 商品id字段
    doc.add(new StoredField("prodId", "p0001"));
    // 往document中添加 商品名称字段
    String name = "ThinkPad X1 Carbon 20KH0009CD/25CD 超极本轻薄笔记本电脑联想";
    doc.add(new TextField("name", name, Store.YES));
}

IndexWriter涉及类图示

IndexWriterConfig 写索引配置：

Ø 使用的分词器。
Ø 如何打开索引（是新建，还是追加）。
Ø 还可配置缓冲区大小、或缓存多少个文档，再刷新到存储中。
Ø 还可配置合并、删除等的策略。

Directory 指定索引数据存放的位置：

Ø 内存
Ø 文件系统
Ø 数据库

保存到文件系统用法：
Directory directory = FSDirectory.open(Path path); // path 指定目

IndexWriter 用来创创建、维护一个索引。它的API 使用流程

// 创建索引写对象
IndexWriter writer = new IndexWriter(directory, config);
// 创建document
// 将文档添加到索引
writer.addDocument(doc);
// 删除文档
//writer.deleteDocuments(terms);
//修改文档
//writer.updateDocument(term, doc);
// 刷新
writer.flush();
// 提交
writer.commit()

注意：IndexWriter是线程安全的。如果你的业务代码中有其他的同步控制，请不要使用IndexWriter作为锁对象，以免死锁。

IndexWriter还提供：add方法、delete方法、updatre方法、其他方法。

问题2：索引库中会存储反向索引数据，会存储document吗？

问题3： document会以什么结构存储？

网页会存储哪些信息？

Document详解

Document 文档

要索引的数据记录、文档在lucene中的表示，是索引、搜索的基本单元。一个Document由多个字段Field构成。就像数据库的记录-字段。IndexWriter按加入的顺序为Document指定一个递增的id（从0开始），称为文档id。反向索引中存储的是这个id，文档存储中正向索引也是这个id。业务数据的主键id只是文档的一个字段。请查看Document的源码，找出操作字段的API

Document API

Field

字段：由字段名name、字段值value（fieldsData）、字段类型 type 三部分构成。字段值可以是文本（String、Reader 或预分析的 TokenStream）、二
进制值（byte[]）或数值。请查看Field的源码，找出这三个属性查看它提供了哪些构造方法供我们使用。

IndexableField Field API

Document—Field 数据举例

Ø 新闻：新闻id，新闻标题、新闻内容、作者、所属分类、发表时间
Ø 网页搜索的网页：标题、内容、链接地址
Ø 商品： id、名称、图片链接、类别、价格、库存、商家、品牌、月销量、详情…

问题1：我们收集数据创建document对象来为其创建索引，数据的所有属性是否都需要加入到document中？如数据库表中的数据记录的所有字段是否都
需要放到document中？哪些字段应加入到document中？

问题2：是不是所有加入的字段都需要进行索引？是不是所有加入的字段都要保存到索引库中？什么样的字段该被索引？什么样的字段该被存储？

请就网页、商品进行思考？

网页：标题、内容、链接地址
商品： id、名称、图片链接、类别、价格、库存、商家、品牌、月销量、详情…

问题3：各种要被索引的字段该以什么样的方式进行索引，全都是分词进行索引，还是有不同区别？

网页：标题、内容、链接地址
商品： id、名称、图片链接、类别、价格、库存、商家、品牌、月销量、详情…

从问题2、3得出：不同的字段会有不同的索引设置信息。这些信息通过字段的类型属性type:IndexableFieldType对象来定义

IndexableFieldType

字段类型：描述该如何索引存储该字
字段可选择性地保存在索引中，这样在搜索结果中，这些保存的字段值就可获得。一个Document应该包含一个或多个存储字段来唯一标识一个文档。为什么？注意：未存储的字段，从索引中取得的document中是没有这些字段的。

请查看IndexableFieldType 的源码，找到存储、分词、索引信息的定义
请查看IndexableFieldType的实现类有哪些？

Document 类关系

IndexableFieldType API 说明

IndexOptions 索引选项说明：

Ø NONE
Not indexed 不索引
Ø DOCS
反向索引中只存储了包含该词的文档id，没有词频、位置
Ø DOCS_AND_FREQS
反向索引中会存储文档id、词频
Ø DOCS_AND_FREQS_AND_POSITIONS
反向索引中存储文档id、词频、位置
Ø DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
反向索引中存储文档id、词频、位置、偏移量

public class IndexOptionsDemo {

	public static void main(String[] args) {
		// 创建使用的分词器
		Analyzer analyzer = new IKAnalyzer4Lucene7(true);

		// 索引配置对象
		IndexWriterConfig config = new IndexWriterConfig(analyzer);

		try ( // 索引存放到文件系统中
				Directory directory = FSDirectory
						.open((new File("f:/test/indextest")).toPath());

				// 创建索引写对象
				IndexWriter writer = new IndexWriter(directory, config);) {

			// 准备document
			Document doc = new Document();
			// 字段content
			String name = "content";
			String value = "张三说的确实在理";
			FieldType type = new FieldType();
			// 设置是否存储该字段
			type.setStored(true); // 请试试不存储的结果
			// 设置是否对该字段分词
			type.setTokenized(true); // 请试试不分词的结果
			// 设置该字段的索引选项
			type.setIndexOptions(
					IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); // 请尝试不同的选项的效果
			type.freeze(); // 使不可更改

			Field field = new Field(name, value, type);
			// 添加字段
			doc.add(field);
			// 加入到索引中
			writer.addDocument(doc);

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

Luke 索引查看工具安装

下载地址 : https://github.com/DmitryKey/luke/releases

当前最新版 7.2.0 可用于lucene7.3.0版

luke Document 查看说明：

问题4：如果要在搜索结果中做关键字高亮，需要什么信息？如果要实现短语查询、临近查询（跨度查询），需要什么信息？如要搜索包含“张三” “李四”，且两词之间跨度不超过5个字符。

问题5：位置、偏移数据在反向索引中占的存储量占比大不大？

问题6：如果某个字段不需要进行短语查询、临近查询，那么在反向索引中就不需要保存位置、偏移数据。这样是不是可以降低反向索引的数据量，提升效率？但是如果该字段要做高亮显示支持，该怎么办？。

为了提升反向索引的效率，这样的字段的位置、偏移数据是不应该保存到反向索引中的。这也你前面看到 IndexOptions为什么有那些选项的原因。在lucene4.0以前，反向索引中总会存储这些数据，4.0后改进为可选择的。那对于只做高亮显示（或得到搜索结果后需要使用这些信息）的字段怎么办？

一个字段分词器分词后，每个词项会得到一系列属性信息，如出现频率、位置、偏移量等，这些信息构成一个词项向量termVectors

请查看IndexableFieldType、FieldType中有没有设置保存termVectors的方法。

IndexableFieldType API 说明

storeTermVectors

对于不需要在搜索反向索引时用到，但在搜索结果处理时需要的位置、偏移量、附加数据(payLoad) 的字段，我们可以单独为该字段存储（文档id词项向量）的正向索引。

Ø boolean storeTermVectors() 是否存储词项向量
Ø boolean storeTermVectorPositions() 是否在词项向量中存储位置
Ø boolean storeTermVectorOffsets() 是否在词项向量中存储偏移量
Ø boolean storeTermVectorPayloads() 是否在词项向量中存储附加信息

FieldType实现类中有对应的set方法

ublic class IndexTermVectorsDemo {

	public static void main(String[] args) {
		// 创建使用的分词器
		Analyzer analyzer = new IKAnalyzer4Lucene7(true);

		// 索引配置对象
		IndexWriterConfig config = new IndexWriterConfig(analyzer);

		try ( // 索引存放到文件系统中
				Directory directory = FSDirectory
						.open((new File("f:/test/indextest")).toPath());

				// 创建索引写对象
				IndexWriter writer = new IndexWriter(directory, config);) {

			// 准备document
			Document doc = new Document();
			// 字段content
			String name = "content";
			String value = "张三说的确实在理";
			FieldType type = new FieldType();
			// 设置是否存储该字段
			type.setStored(true); // 请试试不存储的结果
			// 设置是否对该字段分词
			type.setTokenized(true); // 请试试不分词的结果
			// 设置该字段的索引选项
			type.setIndexOptions(IndexOptions.DOCS); // 反向索引中只保存词项

			// 设置为该字段保存词项向量
			type.setStoreTermVectors(true);
			type.setStoreTermVectorPositions(true);
			type.setStoreTermVectorOffsets(true);
			type.setStoreTermVectorPayloads(true);

			type.freeze(); // 使不可更改

			Field field = new Field(name, value, type);
			// 添加字段
			doc.add(field);
			// 加入到索引中
			writer.addDocument(doc);

		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

概念说明：Token trem 词条。分词得到的词

什么是附加信息Payloads

练习1

请为商品记录建立索引，字段信息如下：
Ø 商品id：字符串，不索引、但存储
String prodId = "p0001";
Ø 商品名称：字符串，分词索引(存储词频、位置、偏移量)、存储
String name = “ThinkPad X1 Carbon 20KH0009CD/25CD 超极本轻薄笔记本电脑";
Ø 图片链接：仅存储
String imgUrl = "http://www.dongnao.com/aaa";
Ø 商品简介：字符串，分词索引（不需要支持短语、临近查询）、存储，结果中
支持高亮显示
String simpleIntro = "集成显卡英特尔酷睿 i5-8250U 14英寸";
Ø 品牌：字符串，不分词索引，存储
String brand = "ThinkPad";

问题7 ：我们往往需要对搜索的结果支持按不同的字段进行排序，如商品搜索结果按价格排序、按销量排序等。以及对搜索结果进行按某字段分组统计，如
按品牌统计。假如我们按关键字“娃娃”搜索后得到相关的文档id列表
{10,21,18,48,29,…..}
要对它们进行按价格排序有的人想看销量排序
有时需要按品牌统计数量…
反向索引对排序有用吗？需得到每个id对应的价格或销售是多少、品牌是什么，再进行排序、统计。这个价格、销量、品牌数据在哪里？如果搜到的文档列表量很大，排序会有什么问题没？

空间换时间

对这种需要排序、分组、聚合的字段，为其建立独立的文档->字段值的正向索引、列式存储。这样我们要加载搜中文档的这个字段的数据就快很多，耗内存少。

docValuesType

IndexableFieldType 中的 docValuesType方法就是让你来为需要排序、分组、聚合的字段指定如何为该字段创建文档->字段值的正向索引的。

IndexableFieldType API 说明

DocValuesType 选项说明

Ø NONE 不开启docvalue
Ø NUMERIC 单值、数值字段，用这个
Ø BINARY 单值、字节数组字段用
Ø SORTED 单值、字符字段用，会预先对值字节进行排序、去重存储
Ø SORTED_NUMERIC 单值、数值数组字段用，会预先对数值数组进行排序
Ø SORTED_SET 多值字段用，会预先对值字节进行排序、去重存储

具体使用选择 :

Ø 字符串+单值会选择SORTED作为docvalue存储
Ø 字符串+多值会选择SORTED_SET作为docvalue存储
Ø 数值或日期或枚举字段+单值会选择NUMERIC 作为docvalue存储
Ø 数值或日期或枚举字段+多值会选择SORTED_SET作为docvalue存储

强调：需要排序、分组、聚合、分类查询（面查询）的字段才创建docValues

练习2

Ø 1、修改品牌字段：支持统计查询
Ø 2、增加商品类别字段：字符串（类别名），索引不分词，不存储、支持分类统计，多值（一个商品可能属于多个类别）。
type = {“电脑”,”笔记本电脑”}
Ø 3、增加价格字段：整数，单位分，不索引、存储，需要支持排序多值如何加入到document ？
同字段多次加入

public class ProductIndexExercise1 {

	public static void main(String[] args) {
		// 创建使用的分词器
		Analyzer analyzer = new IKAnalyzer4Lucene7(true);

		// 索引配置对象
		IndexWriterConfig config = new IndexWriterConfig(analyzer);

		try (
				// 索引存放目录
				// 存放到文件系统中
				Directory directory = FSDirectory
						.open((new File("f:/test/indextest")).toPath());

				// 存放到内存中
				// Directory directory = new RAMDirectory();

				// 创建索引写对象
				IndexWriter writer = new IndexWriter(directory, config);) {

			// 准备document
			Document doc = new Document();
			// 商品id：字符串，不索引、但存储
			String prodId = "p0001";
			FieldType onlyStoredType = new FieldType();
			onlyStoredType.setTokenized(false);
			onlyStoredType.setIndexOptions(IndexOptions.NONE);
			onlyStoredType.setStored(true);
			onlyStoredType.freeze();
			doc.add(new Field("prodId", prodId, onlyStoredType));

			// 商品名称：字符串，分词索引(存储词频、位置、偏移量)、存储
			String name = "ThinkPad X1 Carbon 20KH0009CD/25CD 超极本轻薄笔记本电脑联想";
			FieldType indexedAllStoredType = new FieldType();
			indexedAllStoredType.setStored(true);
			indexedAllStoredType.setTokenized(true);
			indexedAllStoredType.setIndexOptions(
					IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
			indexedAllStoredType.freeze();
			doc.add(new Field("name", name, indexedAllStoredType));

			// 图片链接：仅存储
			String imgUrl = "http://www.dongnao.com/aaa";
			doc.add(new Field("imgUrl", imgUrl, onlyStoredType));

			// 商品简介：文本，分词索引（不需要支持短语、临近查询）、存储，结果中支持高亮显示
			String simpleIntro = "集成显卡 英特尔 酷睿 i5-8250U 14英寸";
			FieldType indexedTermVectorsStoredType = new FieldType();
			indexedTermVectorsStoredType.setStored(true);
			indexedTermVectorsStoredType.setTokenized(true);
			indexedTermVectorsStoredType
					.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
			indexedTermVectorsStoredType.setStoreTermVectors(true);
			indexedTermVectorsStoredType.setStoreTermVectorPositions(true);
			indexedTermVectorsStoredType.setStoreTermVectorOffsets(true);
			indexedTermVectorsStoredType.freeze();

			doc.add(new Field("simpleIntro", simpleIntro,
					indexedTermVectorsStoredType));

			// 价格，整数，单位分，不索引、存储
			int price = 2999900;
			// Field 类有整数类型值的构造方法吗？
			// 用字节数组来存储试试，还是转为字符串？
			byte[] result = new byte[Integer.BYTES];
			NumericUtils.intToSortableBytes(price, result, 0);

			doc.add(new Field("price", result, onlyStoredType));

			// 执行后，请到luke中看下结果是什么。

			// 用字节数组、字符串就背离了本义数值。

			// 请查看Field类中提供了对应的构造方法或其他方法没？
			// 请查看 IndexableField 类中对一个字段提供哪几种数据类型值的读取？
			// 请再查看Field类中对应的实现。
			// 结合Field的构造方法和numericValue()方法还有setIntValue()方法，你是不是很疑惑、迷糊了。

			// 搞不清楚为什么这样，那就先来看看IndexWriter在索引、存储字段时是如何使用这些方法的吧。

			writer.addDocument(doc);

		} catch (IOException e) {
			e.printStackTrace();
		}

	}

}

如何加入数值字段

请查看Field类中提供了对应的构造方法或其他方法没？Field的构造方法和set方法：

构造方法没有对应的，set方法倒是有，请看下setIntValue方法的源码，看它是如何将字段值设为一个整数值的，你有疑惑吗？再看下其他的set方法。

请再查看IndexableField 的API :

回顾前面 Field 的定义

字段：由字段名name、字段值value、字段类型 type三部分构成。字段值可以是文本（String 、Reader或预分析的TokenStream）、二进制值（byte[]）或数值

加入数值字段方式

Ø 扩展Field ，提供构造方法传入数值类型值，赋给字段值字段；
Ø 改写binaryValue() 方法，返回数值的字节引用。

public class MyIntField extends Field {
	public MyIntField(String fieldName, int value, FieldType type) {
		super(fieldName, type);
		this.fieldsData = Integer.valueOf(value);
	}

	@Override
	public BytesRef binaryValue() {
		byte[] bs = new byte[Integer.BYTES];
		NumericUtils.intToSortableBytes((Integer) this.fieldsData, bs, 0);
		return new BytesRef(bs);
	}
}

问题8： IndexableFieldType中最后定义的的pointDimensionCount()，pointNumBytes() 是做何用的？

Lucene6以后引入了点的概念来表示数值字段，废除了原来的IntField等。在Point字段类中提供了精确、范围查询的便捷方法。
注意：只是引入点的概念，并未改变数值字段的本质。既然是点，就有空间概念：维度。一维：一个值，二维：两个值的；……
pointDimensionCount() 返回点的维数
pointNumBytes() 返回点中数值类型的字节数。

Lucene预定义的字段子类，你可灵活选用

Ø TextField: Reader or String indexed for full-text search
Ø StringField: String indexed verbatim as a single token
Ø IntPoint: int indexed for exact/range queries.
Ø LongPoint: long indexed for exact/range queries.
Ø FloatPoint: float indexed for exact/range queries.
Ø DoublePoint: double indexed for exact/range queries.
Ø SortedDocValuesField: byte[] indexed column-wise for sorting/faceting
Ø SortedSetDocValuesField: SortedSet<byte[]> indexed column-wise for sorting/faceting
Ø NumericDocValuesField: long indexed column-wise for sorting/faceting
Ø SortedNumericDocValuesField: SortedSet<long> indexed column-wise for sorting/faceting
Ø StoredField: Stored-only value for retrieving in summary results

public class IndexWriteDemo {

	public static void main(String[] args) {
		// 创建使用的分词器
		Analyzer analyzer = new IKAnalyzer4Lucene7(true);

		// 索引配置对象
		IndexWriterConfig config = new IndexWriterConfig(analyzer);

		try (
				// 索引存放目录
				// 存放到文件系统中
				Directory directory = FSDirectory
						.open((new File("f:/test/indextest")).toPath());

				// 存放到内存中
				// Directory directory = new RAMDirectory();

				// 创建索引写对象
				IndexWriter writer = new IndexWriter(directory, config);) {

			// 准备document
			Document doc = new Document();
			// 商品id：字符串，不索引、但存储
			String prodId = "p0001";
			FieldType onlyStoredType = new FieldType();
			onlyStoredType.setTokenized(false);
			onlyStoredType.setIndexOptions(IndexOptions.NONE);
			onlyStoredType.setStored(true);
			onlyStoredType.freeze();
			doc.add(new Field("prodId", prodId, onlyStoredType));

			// 等同下一行
			// doc.add(new StoredField("prodId", prodId));

			// 商品名称：字符串，分词索引(存储词频、位置、偏移量)、存储
			String name = "ThinkPad X1 Carbon 20KH0009CD/25CD 超极本轻薄笔记本电脑联想";
			// String name = "张三说的确实在理";
			// String name = "厉害了我的国，我终于等到你了";
			FieldType indexedAllStoredType = new FieldType();
			indexedAllStoredType.setStored(true);
			indexedAllStoredType.setTokenized(true);
			indexedAllStoredType.setIndexOptions(
					IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
			indexedAllStoredType.freeze();

			doc.add(new Field("name", name, indexedAllStoredType));

			// 图片链接：仅存储
			String imgUrl = "http://www.dongnao.com/aaa";
			doc.add(new Field("imgUrl", imgUrl, onlyStoredType));

			// 商品简介：文本，分词索引（不需要支持短语、临近查询）、存储，结果中支持高亮显示
			String simpleIntro = "集成显卡 英特尔 酷睿 i5-8250U 14英寸";
			FieldType indexedTermVectorsStoredType = new FieldType();
			indexedTermVectorsStoredType.setStored(true);
			indexedTermVectorsStoredType.setTokenized(true);
			indexedTermVectorsStoredType
					.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
			indexedTermVectorsStoredType.setStoreTermVectors(true);
			indexedTermVectorsStoredType.setStoreTermVectorPositions(true);
			indexedTermVectorsStoredType.setStoreTermVectorOffsets(true);
			indexedTermVectorsStoredType.freeze();

			doc.add(new Field("simpleIntro", simpleIntro,
					indexedTermVectorsStoredType));

			// 类别：字符串，索引不分词，不存储、支持分类统计,多值
			FieldType indexedDocValuesType = new FieldType();
			indexedDocValuesType.setTokenized(false);
			indexedDocValuesType.setIndexOptions(IndexOptions.DOCS);
			indexedDocValuesType.setDocValuesType(DocValuesType.SORTED_SET);
			indexedDocValuesType.freeze();

			doc.add(new Field("type", "电脑", indexedDocValuesType) {
				@Override
				public BytesRef binaryValue() {
					return new BytesRef((String) this.fieldsData);
				}
			});
			doc.add(new Field("type", "笔记本电脑", indexedDocValuesType) {
				@Override
				public BytesRef binaryValue() {
					return new BytesRef((String) this.fieldsData);
				}
			});

			// 等同下四行
			// doc.add(new StringField("type", "电脑", Store.NO));
			// doc.add(new SortedSetDocValuesField("type", new BytesRef("电脑")));
			// doc.add(new StringField("type", "笔记本电脑", Store.NO));
			// doc.add(new SortedSetDocValuesField("type", new
			// BytesRef("笔记本电脑")));

			// 价格，整数，单位分，不索引、存储、要支持排序
			int price = 999900;
			FieldType numericDocValuesType = new FieldType();
			numericDocValuesType.setTokenized(false);
			numericDocValuesType.setIndexOptions(IndexOptions.DOCS);
			numericDocValuesType.setStored(true);
			numericDocValuesType.setDocValuesType(DocValuesType.NUMERIC);
			numericDocValuesType.setDimensions(1, Integer.BYTES);
			numericDocValuesType.freeze();

			doc.add(new MyIntField("price", price, numericDocValuesType));

			// 与下两行等同
			// doc.add(new StoredField("price", price));
			// doc.add(new NumericDocValuesField("price", price));

			// 商家 索引(不分词)，存储、按面（分类）查询
			String fieldName = "shop";
			String value = "联想官方旗舰店";
			doc.add(new StringField(fieldName, value, Store.YES));
			doc.add(new SortedDocValuesField(fieldName, new BytesRef(value)));

			// 上架时间：数值，排序需要
			long upShelfTime = System.currentTimeMillis();
			doc.add(new NumericDocValuesField("upShelfTime", upShelfTime));

			writer.addDocument(doc);

		} catch (IOException e) {
			e.printStackTrace();
		}

	}
}

问题9： Field中提供那么多的setXXValue()方法，是什么意图？
问题10： 加入索引时，每个数据记录需要都创建一个Document吗？

索引更新

IndexWriter 索引更新 API

说明：

Ø Term 词项指定字段的词项
Ø 删除流程：根据Term、Query找到相关的文档id、同时删除索引信息，再根据文档id删除对应的文档存储。
Ø 更新流程：先删除、再加入新的doc
Ø 注意：只可根据索引的字段进行更新。

public class IndexUpdateDemo {

	public static void main(String[] args) {
		// 创建使用的分词器
		Analyzer analyzer = new IKAnalyzer4Lucene7(true);

		// 索引配置对象
		IndexWriterConfig config = new IndexWriterConfig(analyzer);

		try (
				// 索引存放目录
				// 存放到文件系统中
				Directory directory = FSDirectory
						.open((new File("f:/test/indextest")).toPath());

				// 存放到内存中
				// Directory directory = new RAMDirectory();

				// 创建索引写对象
				IndexWriter writer = new IndexWriter(directory, config);) {

			// Term term = new Term("prodId", "p0001");
			Term term = new Term("type", "笔记本电脑");

			// 准备document
			Document doc = new Document();
			// 商品id：字符串，不索引、但存储
			String prodId = "p0003";
			FieldType onlyStoredType = new FieldType();
			onlyStoredType.setTokenized(false);
			onlyStoredType.setIndexOptions(IndexOptions.NONE);
			onlyStoredType.setStored(true);
			onlyStoredType.freeze();
			doc.add(new Field("prodId", prodId, onlyStoredType));

			writer.updateDocument(term, doc);

			// Term term = new Term("name", "笔记本电脑");
			// writer.deleteDocuments(term);

			writer.flush();

			writer.commit();
			System.out.println("执行更新完毕。");

		} catch (IOException e) {
			e.printStackTrace();
		}

	}
}

转载于:https://my.oschina.net/u/3728166/blog/3004711