Solr6.4.1配置文件详情_solr.schemacodecfactory-CSDN博客

Solr6.4.1配置文件详情

1.在solr-home/my_core/conf/中名字为：managed-schema文件，复制一份命名为：schema.xml

2.修改solr-home/my_core/conf/中solrconfig.xml文件：
在 <codecFactory class="solr.SchemaCodecFactory"/> 后添加（解除管理员模式）：

 <schemaFactory class="ClassicIndexSchemaFactory"/>

3.编辑schema.xml文件详情：

Field（字段）：

<!-- fields各个属性说明:  
     name: 必须属性 - 字段名  
     type: 必须属性 - <types>中定义的字段类型   
     indexed: 如果字段需要被索引（用于搜索或排序），属性值设置为true  
     stored: 如果字段内容需要被返回，值设置为true  
     docValues: 如果这个字段应该有文档值（doc values），设置为true。文档值在门  
           面搜索，分组，排序和函数查询中会非常有用。虽然不是必须的，而且会导致生成  
           索引变大变慢，但这样设置会使索引加载更快，更加NRT友好，更高的内存使用效率。  
           然而也有一些使用限制：目前仅支持StrField, UUIDField和所有 Trie*Fields,   
           并且依赖字段类型, 可能要求字段为单值（single-valued）的,必须的或者有默认值。  
     multiValued: 如果这个字段在每个文档中可能包含多个值，设置为true  
     termVectors: [false] 设置为true后，会保存所给字段的相关向量（vector）  
           当使用MoreLikeThis时, 用于相似度判断的字段需要设置为stored来达到最佳性能.  
     termPositions: 保存和向量相关的位置信息，会增加存储开销   
     termOffsets: 保存 offset 和向量相关的信息，会增加存储开销  
     required: 字段必须有值，否则会抛异常  
     default: 在增加文档时，可以根据需要为字段设置一个默认值，防止为空  
   -->

如下：

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
   <field name="name" type="text_general" indexed="true" stored="true"/>
   <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />
   <field name="weight" type="float" indexed="true" stored="true"/>
   <field name="price"  type="float" indexed="true" stored="true"/>
   <field name="popularity" type="int" indexed="true" stored="true" />
   <field name="inStock" type="boolean" indexed="true" stored="true" />
   <field name="store" type="location" indexed="true" stored="true"/>

注意：
1.<field name="_version_" type="long" indexed="true" stored="true"/>类似这样的字段（两端为下划线的字段为保留字段）为保留字段(version，root)
2.　强烈推荐ID为主键。<uniqueKey>存在于几乎所有的Solr安装。看到下面的<uniqueKey>声明<uniqueKey>设置为“id”

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

3.field的定义相当重要，有几个技巧需注意一下，对可能存在多值得字段尽量设置 multiValued属性为true，避免建索引是抛出错误；如果不需要存储相应字段值，尽量将stored属性设为false。

dynamicField（动态字段）

Lucene的强大的特性之一是,你不需要预定义各个领域当您首先创建索引。虽然Solr提供了强有力的数据类型化的领域,它仍然保留了这种灵活性使用“动态字段”。使用< dynamicField >声明,您可以创建字段规则,应该使用Solr将使用理解数据类型在给定一个字段名没有明确定义,但匹配dynamicField中使用的前缀或后缀。

例如以下动态字段声明告诉Solr,每当它看到一个字段名称以“_i”不是一个显式定义的字段,那么它应该动态地创建一个整数字段的名称…

<dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>

glob-like模式的名称属性必须有一个“*”只在开始或结束。长模式将匹配。如果同等大小模式匹配,将使用第一个出现在模式。

uniqueKey（唯一标识）

文档的唯一标识，可理解为主键，除非标识为required=”false”, 否则值不能为空

copyField（复制字段）

任意number 的< copyField >声明可以包含在您的方案,指导Solr,你想要复制任何数据在“source”字段添加到索引的文档,该文档的“dest”字段。你是负责确保字段的数据类型是兼容的。原文发送“source”字段的“dest”字段,在任何配置分析程序调用原始或目标字段。

这是提供一种方便的方法来确保数据放入几个字段,无需多次包括中的数据更新命令。复制是在流源级和没有完成的源复制到另一个副本。maxChars属性可能用于copyField声明。这就限制了数量的字符复制。例如:

<copyField source="body" dest="teaser" maxChars="300"/>

一个共同的要求是将所有输入字段复制或合并成一个solr字段。这个可以做如下:

<copyField source="*" dest="text"/>

你也可以自动生成新字段名称包括星号在源和目标字段。例如,如果您有以下copyField指令:

<copyField source="*_t" dest="*_t_facet" />

称为author_t然后提交一个字段,该字段的值也会被复制到另一个字段称为author_t_facet,在“作者”一词是由原来的星号匹配源属性,然后使用模式匹配文本的生成目标字段名称,通过目的地属性中的星号* _t_facet,作为字段名模板。

建议建立了一个拷贝字段，将所有的全文字段复制到一个字段中，以便进行统一的检索：

<field name="all" type="text" indexed="true" stored="false" multiValued="true"/>

并在拷贝字段结点处完成拷贝设置：

<copyField source="name" dest="all"/>
<copyField source="summary" dest="all"/>

注：“拷贝字段”就是查询的时候不用再输入：userName:伟帅 and userProfile:伟帅的个人简介。
直接可以输入”伟帅”就可以将“名字”含“伟帅”或者“简介”中含“伟帅”的又或者“名字”和“简介”都含有“伟帅”的查询出来。
他将需要查询的内容放在了一个字段中，并且默认查询该字段设为该字段就行了

另请参阅复制字段https://cwiki.apache.org/confluence/display/solr/Copying+Fields Apache Solr参考指南https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

FieldType（类型）

类型定义：

<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<!-- boolean type: "true" or "false" -->
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

数值字段类型,各级指数每个值的精度加速范围查询范围之间的值的数量端点是巨大的。看到javadoc NumericRangeQuery内部
　　实现细节：
　　小precisionStep值(比特)中指定将导致更多的令牌
　　每值索引,索引大小略大一些,和更快的范围查询。
　　precisionStep 0禁用索引在不同的精度水平。

<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>

一个文本字段,只有分裂为精确匹配的单词空格


  <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    </analyzer>
  </fieldType>

<!--　　一般合理的文本字段,通用的
　　跨语言默认值:它与StandardTokenizer符,
　　移除阻止单词大小写不敏感的“stopwords.txt”
　　默认(空),病例。在查询时,它
　　也适用于同义词。
 -->
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- 在本例中,我们只会在查询时使用同义词
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>