跟着官网学solr（二）:Document、Field说明

最新推荐文章于 2024-04-07 11:30:23 发布

peppengliu

最新推荐文章于 2024-04-07 11:30:23 发布

阅读量963

点赞数

分类专栏： solr 文章标签： solr

本文链接：https://blog.csdn.net/peppengliu/article/details/51483792

版权

solr 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

上一篇（http://blog.csdn.net/peppengliu/article/details/51463918）做了solr测试环境的安装，本篇学习下solr中document及field。

下面就介绍下document、field及相关的概念：

document：文档是索引的基本单元，它是一组待索引数据的描述的集合。document由field组成。一个简单的document示例如下：

{"id":"138761234112","goods_name":"product","value":12}

field：document的组成部分，它描述了了待索引数据的更详细的信息。定义了document中每个field的数据类型，由<field>定义，它包含了两个必选属性name、type及一些可选属性，name对应索引数据的字段名称，type对应索引数据类型，可选属性说明如下（官网wiki摘录）：

<strong>default</strong>
The default value for this field if none is provided while adding documents
<strong>indexed=true|false</strong>
True if this field should be "indexed". If (and only if) a field is indexed, then it is searchable, sortable, and facetable.
<strong>stored=true|false</strong>
True if the value of the field should be retrievable during a search, or if you're using highlighting or MoreLikeThis.
<strong>compressed=true|false</strong>
True if this field should be stored using gzip compression. (This will only apply if the field type is compressible; among the standard field types, only TextField and StrField are.)
<strong>compressThreshold=<integer>
multiValued=true|false</strong>
True if this field may contain multiple values per document, i.e. if it can appear multiple times in a document
<strong>omitNorms=true|false</strong>
This is arguably an advanced option.
Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.
<strong>termVectors=false|true</strong> <?> Solr 1.1
If set, include full term vector info.
If enabled, often also used with termPositions="true" and termOffsets="true".
To use interactively, requires TermVectorComponent
Corresponds to TV button in Luke, and V field attribute.
<strong>omitTermFreqAndPositions=true|false</strong> <!> Solr1.4
If set, omit term freq, positions and payloads from postings for this field. This can be a performance boost for fields that don't require that information and reduces storage space required for the index. Queries that rely on position that are issued on a field with this option fail with an exception. Prior to <!> Solr4.0 the queries would silently fail to find documents.
<strong>omitPositions=true|false</strong> <!> Solr3.4
If set, omits positions, but keeps term frequencies

field分为以下几类：define fields（由field定义）、 copyField（由copyField定义）、dynamicField（由dynamicField定义）。

field analysis：field值分析器，定义了field域中value的分析方法。当field需要进行额外处理时（如分词、过滤等）需定义此项。典型配置如下：

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

该配置型定义一个名为text_general的数据类型，当field中type为text_general时，自动为该field的值使用该标签中定义的类型来处理field的值。

以上配置均配置于schema.xml中，想了解其他配置项，请参考：http://wiki.apache.org/solr/SchemaXml

本文主要参考http://wiki.apache.org/solr/SchemaXml及https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide，有总结不足之处，欢迎留言指正。