上一篇(http://blog.csdn.net/peppengliu/article/details/51463918)做了solr测试环境的安装,本篇学习下solr中document及field。
下面就介绍下document、field及相关的概念:
document:文档是索引的基本单元,它是一组待索引数据的描述的集合。document由field组成。一个简单的document示例如下:
{"id":"138761234112","goods_name":"product","value":12}
field:document的组成部分,它描述了了待索引数据的更详细的信息。定义了document中每个field的数据类型,由<field>定义,它包含了两个必选属性name、type及一些可选属性,name对应索引数据的字段名称,type对应索引数据类型,可选属性说明如下(官网wiki摘录):
<strong>default</strong>
The default value for this field if none is provided while adding documents
<strong>indexed=true|false</strong>
True if this field should be "indexed". If (and only if) a field is indexed, then it is searchable, sortable, and facetable.
<strong>stored=true|false</strong>
True if the value of the field should be retrievable during a search, or if you're using highlighting or MoreLikeThis.
<strong>compressed=true|false</strong>
True if this field should be stored using gzip compression. (This will only apply if the field type is compressible; among the standard field types, only TextField and StrField are.)
<strong>compressThreshold=<integer>
multiValued=true|false</strong>
True if this field may contain multiple values per document, i.e. if it can appear multiple times in a document
<strong>omitNorms=true|false</strong>
This is arguably an advanced option.
Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.
<strong>termVectors=false|true</strong> <?> Solr 1.1
If set, include full term vector info.
If enabled, often also used with termPositions="true" and termOffsets="true".
To use interactively, requires TermVectorComponent
Corresponds to TV button in Luke, and V field attribute.
<strong>omitTermFreqAndPositions=true|false</strong> <!> Solr1.4
If set, omit term freq, positions and payloads from postings for this field. This can be a performance boost for fields that don't require that information and reduces storage space required for the index. Queries that rely on position that are issued on a field with this option fail with an exception. Prior to <!> Solr4.0 the queries would silently fail to find documents.
<strong>omitPositions=true|false</strong> <!> Solr3.4
If set, omits positions, but keeps term frequencies
field分为以下几类:define fields(由field定义)、
copyField(由copyField定义)、dynamicField(由dynamicField定义)。field analysis:field值分析器,定义了field域中value的分析方法。当field需要进行额外处理时(如分词、过滤等)需定义此项。典型配置如下:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
该配置型定义一个名为text_general的数据类型,当field中type为text_general时,自动为该field的值使用该标签中定义的类型来处理field的值。
以上配置均配置于schema.xml中,想了解其他配置项,请参考:http://wiki.apache.org/solr/SchemaXml
本文主要参考http://wiki.apache.org/solr/SchemaXml及https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide,有总结不足之处,欢迎留言指正。