跟着官网学solr(二):Document、Field说明

        上一篇(http://blog.csdn.net/peppengliu/article/details/51463918)做了solr测试环境的安装,本篇学习下solr中document及field。

        下面就介绍下document、field及相关的概念:

        document:文档是索引的基本单元,它是一组待索引数据的描述的集合。document由field组成。一个简单的document示例如下:

        

{"id":"138761234112","goods_name":"product","value":12}

        field:document的组成部分,它描述了了待索引数据的更详细的信息。定义了document中每个field的数据类型,由<field>定义,它包含了两个必选属性name、type及一些可选属性,name对应索引数据的字段名称,type对应索引数据类型,可选属性说明如下(官网wiki摘录):

<strong>default</strong>
The default value for this field if none is provided while adding documents
<strong>indexed=true|false</strong>
True if this field should be "indexed". If (and only if) a field is indexed, then it is searchable, sortable, and facetable.
<strong>stored=true|false</strong>
True if the value of the field should be retrievable during a search, or if you're using highlighting or MoreLikeThis.
<strong>compressed=true|false</strong>
True if this field should be stored using gzip compression. (This will only apply if the field type is compressible; among the standard field types, only TextField and StrField are.)
<strong>compressThreshold=<integer>
multiValued=true|false</strong>
True if this field may contain multiple values per document, i.e. if it can appear multiple times in a document
<strong>omitNorms=true|false</strong>
This is arguably an advanced option.
Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.
<strong>termVectors=false|true</strong> <?> Solr 1.1
If set, include full term vector info.
If enabled, often also used with termPositions="true" and termOffsets="true".
To use interactively, requires TermVectorComponent
Corresponds to TV button in Luke, and V field attribute.
<strong>omitTermFreqAndPositions=true|false</strong> <!> Solr1.4
If set, omit term freq, positions and payloads from postings for this field. This can be a performance boost for fields that don't require that information and reduces storage space required for the index. Queries that rely on position that are issued on a field with this option fail with an exception. Prior to <!> Solr4.0 the queries would silently fail to find documents.
<strong>omitPositions=true|false</strong> <!> Solr3.4
If set, omits positions, but keeps term frequencies
        field分为以下几类:define fields(由field定义)、 copyField(由copyField定义)、dynamicField(由dynamicField定义)。

        field analysis:field值分析器,定义了field域中value的分析方法。当field需要进行额外处理时(如分词、过滤等)需定义此项。典型配置如下:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

        该配置型定义一个名为text_general的数据类型,当field中type为text_general时,自动为该field的值使用该标签中定义的类型来处理field的值。


        以上配置均配置于schema.xml中,想了解其他配置项,请参考:http://wiki.apache.org/solr/SchemaXml

        本文主要参考http://wiki.apache.org/solr/SchemaXml及https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide,有总结不足之处,欢迎留言指正。

        

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值