【Lucene】更合理地使用Document和Field

最新推荐文章于 2021-08-02 11:33:16 发布

iteye_5013

最新推荐文章于 2021-08-02 11:33:16 发布

阅读量236

点赞数

分类专栏：【工作】【**Search Engine】【**架构设计/设计模式】文章标签：数据库 java

本文链接：https://blog.csdn.net/iteye_5013/article/details/82301798

版权

【**Search Engine】同时被 3 个专栏收录

21 篇文章 0 订阅

订阅专栏

【**架构设计/设计模式】

11 篇文章 0 订阅

订阅专栏

【工作】

6 篇文章 0 订阅

订阅专栏

writer = ...; //#1
PreparedStatement pstmt = conn.prepareStatement(selectSql);
ResultSet	rs = pstmt.executeQuery();
Document doc = null;
while (rs.next()) {
	doc = new Document(); //#2
	doc.add(new Field(ConstantsUtil.ROW_ID, rs.getString("rowid"), Field.Store.YES,Field.Index.UN_TOKENIZED)); //#3
	doc.add(new Field(ConstantsUtil.FD_COMMAND_ID, String.valueOf(rs.getLong(ConstantsUtil.DB_COMMAND_ID)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
	if (rs.getString(ConstantsUtil.DB_DEST_ID) != null)
		doc.add(new Field(ConstantsUtil.FD_DEST_ID, rs.getString(ConstantsUtil.DB_DEST_ID), Field.Store.YES,
								Field.Index.TOKENIZED));
	if (rs.getString(ConstantsUtil.DB_SRC_ID) != null)
		doc.add(new Field(ConstantsUtil.FD_SRC_ID, rs.getString(ConstantsUtil.DB_SRC_ID), Field.Store.YES,
								Field.Index.TOKENIZED));
	doc.add(new Field(ConstantsUtil.FD_UP_MSG_ID, String.valueOf(rs.getLong(ConstantsUtil.DB_UP_MSG_ID)),
Field.Store.YES, Field.Index.UN_TOKENIZED));
	doc.add(new Field(ConstantsUtil.FD_CREATED_DATE, DateTools.dateToString(rs.getTimestamp(ConstantsUtil.DB_CREATED_DATE), DateTools.Resolution.MINUTE), Field.Store.YES,
Field.Index.UN_TOKENIZED));
	if (rs.getString(ConstantsUtil.DB_STATION_ID) != null)
		doc.add(new Field(ConstantsUtil.FD_STATION_ID, rs.getString(ConstantsUtil.DB_STATION_ID),Field.Store.YES, Field.Index.UN_TOKENIZED));
	writer.addDocument(doc); //#4
}

以上设计、编码存在一些问题：

1.对于ResultSet的一行就实例一个Document。How to make indexing faster 建议重用Document 和 Field实例。 ——性能？

2.数据库一个字段对应一个Field，简单地将需要的字段对应成Field然后 add到Document里（没有理解Docuemt、Field、全文检索，lucene里的Field和数据库中的字段是不是一样的？ ），并且Field的值也直接来自数据库中的值

如果以后需要把数据库中其他字段的值也加入到索引里，该怎么做？按上面的思路，只能把需要的字段构造相应的Field然后add到Document里，需要修改这里的代码，增加doc.add(field) 。 ——修改：灵活性，增加新需求：可扩展性？

并且如果构造一个Filed的value需要在从数据库取出的原始值基础上改造（比如截取数字的部分值）or 新需求需要修改原先的获得值的方法，还是需要对上面代码做修改。 ——修改灵活性

lucene里的Field和数据库中的字段是不是一样的？

不是。

Re-use Document and Field instances

http://www.lucidimagination.com/search/link?url=http://wiki.apache.org/lucene-java/HowTo 写道

Re-use Document and Field instances As of Lucene 2.3 there are new setValue(...) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(...), etc), and then re-add your Document instance.

Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index. See Field for details.

基于原有的设计，如果遇到以下问题该如何处理？

1.不想有这么多的Field，即Field不应该是与数据库中的字段一一对应，如content域，想让数据库中若干个字段的值合在一块构成一个content域。

2.如果一个Field的值不能直接拿数据库中的值，而是需要做些处理（可能是格式上的也可能是跟业务有关的）。

3.需求变更：需要修改某个Field值的获取，比如原先是截取某数字前4位，现在想截取前6位。

4.新增字段索引需求：需要对数据库中某字段的值建索引（该字段的值原先不在索引里）。

TODO:结合《java与模式》3.1 软件系统的可维护性来思考以上问题。

如果基于原来的设计，由数据库中一行数据获得某个Field的值，在该类中一个方法里(getValue(bean))处理，然后将处理结果返回。如果需要修改field的值则需要修改方法，如果要增加某个field，则增加document.add(new Field(name,getValue(),xx))同时增加相应的获取value的方法getValue。