在Lucene4.x以前
向Document中添加Filed是如此操作
Field field = new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED);
但是之后的版本,包括5.0都将Field.Index参数废弃掉了,建议直接使用与域类型相关的域,比如:
Field pathField = new StringField("path", filetoIndex.getPath(),Field.Store.YES);
那么 我们看一下源码中是怎样说的:
/** A field that is indexed but not tokenized: the entire
* String value is indexed as a single token. For example
* this might be used for a 'country' field or an 'id'
* field, or any field that you intend to use for sorting
* or access through the field cache. */
public final class StringField extends Field
说明StringField默认是不被分词器解析,直接作为单个StreamToken被索引的,所以如果搜索这样的域只有整个值都匹配了才能搜索出来。
/** A field that is indexed and tokenized, without term
* vectors. For example this would be used on a 'body'
* field, that contains the bulk of a document's text. */
public final class TextField extends Field
上面是TextField的官方说明,所以,如果想要将一段文字采用分词的方式进行索引,可以使用这个类,但是这个Field默认是不会存储源数据的,如果有这样的需求,可以采用这个构造方法指定
/** Creates a new TextField with String value.
* @param name field name
* @param value string value
* @param store Store.YES if the content should also be stored
* @throws IllegalArgumentException if the field name or value is null.
*/
public TextField(String name, String value, Store store) {
super(name, value, store == Store.YES ? TYPE_STORED : TYPE_NOT_STORED);
}
虽然TextField应该是处理富文本而存在的,但是例如Word,PDF这样的富文本如果直接索引得到的只是一顿乱码,需要使用Tika先进行解析工作。
国外StackOverFlow对于这个API改动的讨论如下:
Question:
Until Lucene version 3.9 , we could specify to index or not to index a field by using FIELD.INDEX.NO or FIELD.INDEX.ANALYZED etc. But in lucene 4.0 there is no constructor available, in which we may define this . How do we control indexing in this version? I mean if i want a field "name" to be stored in index but doesn't want to index it, then how can i do it in lucene 4.0?
Answer:
Constructors taking Field.Index arguments are available, but are deprecated in 4.0, and should not be used. Instead, you should look to subclasses of Field to control how a field is indexed.
StringField is the standard un-analyzed indexed field. The field is indexed is a single token. It is appropriate things like identifiers, for which you only need to search for exact matches.
TextField is the standard analyzed (and, of course, indexed) field, for textual content. It is an appropriate choice for full-text searching.
StoredField is a stored field that is not indexed at all (and so, is not searchable).
Except StoredField, each of these can be passed a Field.Store value as a constructor argument, similar to Lucene 3.6.
For more information on this change, take a look at the Lucene Migration Guide, particularly the sections titled: "Separate IndexableFieldType from Field instances"
以上就基本能解决这个API改动的困惑