solr之schema.xml中文翻译

译者:张春玲
原文地址:http://blog.csdn.net/zcl_love_wx/article/details/51907488

这里写图片描述
翻译中……………
声明:faceting我也一直不清楚在solr到底表示什么,以下遇到该词我也无能为力。

了解

此时solr已经更新到6了,此篇翻译入门学习之用是够了。
1. schema.xml文件是solr的schema文件,在solr_home的conf目录下。
2. 关于schema.xml的英文讲解链接:http://wiki.apache.org/solr/SchemaXml
3. 约定优于配置 (https://zh.wikipedia.org/wiki/%E7%BA%A6%E5%AE%9A%E4%BC%98%E4%BA%8E%E9%85%8D%E7%BD%AE) :

schema.xml文件

字段名由字母数字下划线组成,且不能以数字开头。两端为下划线的字段为保留字段,如(_version_)。

<?xml version="1.0" encoding="UTF-8" ?>
<!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->

<!--  
 This is the Solr schema file. This file should be named "schema.xml" and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 This example schema is the recommended starting point for users.
 It should be kept correct and concise, usable out-of-the-box.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml

  性能须知: 这里包含了很多实际应用不需要的可选项。 为改善性能,你可以: 
- 尽量将仅用于搜索而不需要实际返回的字段设置stored=”false”; 
- 尽量将仅用于返回而不用于搜索的字段设置indexed=”false”; 
- 去掉所有不需要的copyField 语句; 
- 为了达到最佳的索引大小和搜索性能,对所有的文本字段设置indexed=”false”,使用copyField将他们拷贝到“整合字段”name=”text”的字段中,使用整合字段进行搜索; 
- 使用server模式来运行JVM,同时将log级别调高, 避免输出所有请求的日志。 

-->

<schema name="example-DIH-solr" version="1.5">
  <!-- 属性"name"是该schema的名字,只用来展示目的。
       version="x.y" is Solr's version number for the schema syntax and 
       semantics.  It should not normally be changed by applications.

       1.0: multiValued attribute did not exist, all fields are multiValued 
            by nature
       1.1: multiValued attribute introduced, false by default 
       1.2: omitTermFreqAndPositions attribute introduced, true by default 
            except for text fields.
       1.3: removed optional field compress feature
       1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser
            behavior when a single string produces multiple tokens.  Defaults 
            to off for version >= 1.4
       1.5: omitNorms defaults to true for primitive field types 
            (int, float, boolean, string...)
     -->
 <!-- 字段的有效属性:
     name: 字段名 (必须属性)
     type: <types>中定义的字段类型 (必须属性) 
     indexed: 如果字段需要被索引(用于搜索或排序),值设置为true
     stored: 如果字段内容需要被返回,值设置为true;
         如果返回的字段在文档(documents)里没数据,则不会返回,即没有对应数据的字段不会被返回。
     docValues: 如果这个字段应该有文档值(doc values),设置为true。
         文档值在门面搜索,分组,排序和函数查询中会非常有用。虽然不是必须的,而且会导致生成
         索引变大变慢,但这样设置会使索引加载更快,NRT更加友好,内存使用效率更高。
         然而也有一些使用限制:目前仅支持StrField, UUIDField和所有 Trie*Fields, 
         并且依赖字段类型, 可能要求字段为单值(single-valued)的,必须的或者有默认值。
     multiValued: 如果这个字段在每个文档中可能包含多个值,设置为true
     termVectors: [false] 设置为true后,会保存所给字段的相关向量(vector)
           当使用MoreLikeThis时, 用于相似度判断的字段需要设置为stored来达到最佳性能.
     termPositions: 保存和向量相关的位置信息,会增加存储开销 
     termOffsets: 保存 offset 和向量相关的信息,会增加存储开销
     required: 字段必须有值,否则会抛异常
     default: 在增加文档时,可以根据需要为字段设置一个默认值,防止为空
-->

<!-- 如果删除该字段,在solrconfig.xml中就不能更新日志了,且solr也不能启动。而在SolrCloud中 _version_和更新日志是必须的-->
  <field name="_version_" type="long" indexed="true" stored="true"/>

  <!-- 指向嵌套文档的一个块的根文件。支持嵌套文档需要,可以以其它方式去除 -->
  <field name="_root_" type="string" indexed="true" stored="false"/>

  <!-- 除非不删不行,否则别删。虽然不是严格要求,但强烈建议保留。 一个<uniqueKey>标签几乎出现在所有solr安装当中。参见后面设置"id"的<uniqueKey>声明。-->
  <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 

   <field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
   <field name="name" type="text_general" indexed="true" stored="true"/>
   <field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
   <field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="includes" type="text_general" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />

   <field name="weight" type="float" indexed="true" stored="true"/>
   <field name="price"  type="float" indexed="true" stored="true"/>
   <field name="popularity" type="int" indexed="true" stored="true" />
   <field name="inStock" type="boolean" indexed="true" stored="true" />

   <field name="store" type="location" indexed="true" stored="true"/>

   <!-- 当解析像Word、PDF这样丰富的文档时,普通元数据字段会被命一个特殊的名字来与SolrCell的元数据匹配。一些字段会有多个值,因为Tika系统可能会为它们返回多个值。一些元数据从文件中解析,而还有一些来自客户端上下文:
       "content_type": 来自输入流里的http请求的头部
       "resourcename": 来自SolrCell请求参数 resource.name
   -->
   <field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="subject" type="text_general" indexed="true" stored="true"/>
   <field name="description" type="text_general" indexed="true" stored="true"/>
   <field name="comments" type="text_general" indexed="true" stored="true"/>
   <field name="author" type="text_general" indexed="true" stored="true"/>
   <field name="keywords" type="text_general" indexed="true" stored="true"/>
   <field name="category" type="text_general" indexed="true" stored="true"/>
   <field name="resourcename" type="text_general" indexed="true" stored="true"/>
   <field name="url" type="text_general" indexed="true" stored="true"/>
   <field name="content_type" type="string" indexed="true" stored="true" multiValued="true"/>
   <field name="last_modified" type="date" indexed="true" stored="true"/>
   <field name="links" type="string" indexed="true" stored="true" multiValued="true"/>

   <!-- 由SolrCell提取文档主体。注:此字段默认情况下不索引,因为使用copyField被拷贝到了名为text的字段中。这是为了节省空间。通过这个字段返回和突出(高亮)文档内容。使用"text"字段搜索该内容 -->
    <field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>

    <!-- 包罗万象的字段(整合字段),包含所有其他可供搜索的文本字段(通过copyField实现) -->
    <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

catchall text field that indexes tokens both normally and in reverse for efficient
        leading wildcard queries
   <!--该字段(包罗万象的文本字段)不管是正常地还是反向地创建的令牌是为了高效的引领通配符查询 -->
     <field name="text_rev" type="text_general_rev" indexed="true" stored="false" multiValued="true"/>

  <!-- 没有标记的制造商版本使得它更容易排序或将结果分组。通过copyField抄袭"manu" -->
   <field name="manu_exact" type="string" indexed="true" stored="false"/>

   <field name="payloads" type="payloads" indexed="true" stored="true"/>


   <!-- 
       一些像popularity和manu_exact的字段被修改后能利用文档值:
       <field name="popularity" type="int" indexed="true" stored="true" docValues="true" />
       <field name="manu_exact" type="string" indexed="false" stored="false" docValues="true" />
       <field name="cat" type="string" indexed="true" stored="true" docValues="true" multiValued="true"/>
       虽然它能会使索引明显变慢变多,但它也使得加载索引更快,内存使用效率更高,NRT更友好
     -->


//译者注:所谓动态字段(Dynamic Field)就是不用指定具体的名称,只要定义字段名称的规则。
//动态字段允许 solr 去索引没有在 schema 中 明确定义 的字段。
//假设你忘定义某个字段了,但只要该字段符合动态字段的定义规则就一样能被索引。
//假设schema中定义了一个叫*_i的动态字段,当你要索引一个在schema中没有(忘记)定义的 myField_i 字段时,myField_i 能够被索引到。
    <!-- 为了字段通过模式匹配字段名的规范,定义动态字段允许约定优于配置 
       例:  name="*_i" 会匹配任何以_i结尾的字段(如 myid_i,z_i)
       限制: 这种glob-like匹配模式的名字属性只能在开头或结尾必须有一个"*"
     -->   
   <dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
   <dynamicField name="*_is" type="int"    indexed="true"  stored="true"  multiValued="true"/>
   <dynamicField name="*_s"  type="string"  indexed="true"  stored="true" />
   <dynamicField name="*_ss" type="string"  indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_l"  type="long"   indexed="true"  stored="true"/>
   <dynamicField name="*_ls" type="long"   indexed="true"  stored="true"  multiValued="true"/>
   <dynamicField name="*_t"  type="text_general"    indexed="true"  stored="true"/>
   <dynamicField name="*_txt" type="text_general"   indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_en"  type="text_en"    indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_b"  type="boolean" indexed="true" stored="true"/>
   <dynamicField name="*_bs" type="boolean" indexed="true" stored="true"  multiValued="true"/>
   <dynamicField name="*_f"  type="float"  indexed="true"  stored="true"/>
   <dynamicField name="*_fs" type="float"  indexed="true"  stored="true"  multiValued="true"/>
   <dynamicField name="*_d"  type="double" indexed="true"  stored="true"/>
   <dynamicField name="*_ds" type="double" indexed="true"  stored="true"  multiValued="true"/>

   <!-- 经常用来索引name="location"的FieldType的组件。 -->
   <dynamicField name="*_dt"  type="date"    indexed="true"  stored="true"/>
   <dynamicField name="*_dts" type="date"    indexed="true"  stored="true" multiValued="true"/>
   <dynamicField name="*_p"  type="location" indexed="true" stored="true"/>

   <!-- 一些像trie-coded的动态字段范围查询更快 -->
   <dynamicField name="*_ti" type="tint"    indexed="true"  stored="true"/>
   <dynamicField name="*_tl" type="tlong"   indexed="true"  stored="true"/>
   <dynamicField name="*_tf" type="tfloat"  indexed="true"  stored="true"/>
   <dynamicField name="*_td" type="tdouble" indexed="true"  stored="true"/>
   <dynamicField name="*_tdt" type="tdate"  indexed="true"  stored="true"/>

   <dynamicField name="*_c"   type="currency" indexed="true"  stored="true"/>

   <dynamicField name="ignored_*" type="ignored" multiValued="true"/>
   <dynamicField name="attr_*" type="text_general" indexed="true" stored="true" multiValued="true"/>

   <dynamicField name="random_*" type="random" />

   <!-- 取消所有不匹配已经存在的字段或动态字段,而不是报告一个错误。如果你想默认不知道的字段被索引或排序(又或既被索引也被排序),你可以将type的"ignored"值改成其它,如"text"
-->
    <!--dynamicField name="*" type="ignored" multiValued="true" /-->


    <!-- 该字段用于标记文档的唯一性。这是一个必须的字段,除非标记required="false" -->
    <uniqueKey>id</uniqueKey>

    <!-- 弃用:当解析一个没有明确字段的查询字符串时,默认查询字段会被多个查询解析器访问。
        最好确定哪些是非用户产生的查询,否则可以使用"df"请求参数优先于该查询。
        注意:如果你的请求处理程序在solrconfig.xml里定义了优先的"df",注释的defaultSearchField将是不够的,那么这些就要被删除。
    <defaultSearchField>text</defaultSearchField>
    -->

    <!-- 弃用:弃用了就不翻译了。<solrQueryParser defaultOperator="OR"/> -->


//译者注:copyField 将多个字段的内容添加到一个字段中。
//source属性 表示要复制的属性
//dest属性 表示要复制到哪个字段
//source和destination都支持通配符
//还有一个maxChars属性 表示复制内容的最大字数
    <!-- 当向document里添加索引时,copyField命令将复制一个字段。
        为了更快更容易地搜索,copyField要么给同一个字段创建不同索引,要么向同一个字段里添加多个字段。
    -->

   <copyField source="cat" dest="text"/>
   <copyField source="name" dest="text"/>
   <copyField source="manu" dest="text"/>
   <copyField source="features" dest="text"/>
   <copyField source="includes" dest="text"/>
   <copyField source="manu" dest="manu_exact"/>

   <!-- 复制price到激活的货币字段中,默认美元 -->
   <copyField source="price" dest="price_c"/>

   <!-- 在所有字段中,文本字段默认从SolrCell搜索 -->
   <copyField source="title" dest="text"/>
   <copyField source="author" dest="text"/>
   <copyField source="description" dest="text"/>
   <copyField source="keywords" dest="text"/>
   <copyField source="content" dest="text"/>
   <copyField source="content_type" dest="text"/>
   <copyField source="resourcename" dest="text"/>
   <copyField source="url" dest="text"/>

   <!-- 创建author字段的一个字符串版本 -->
   <copyField source="author" dest="author_s"/>

    <!-- 上面将多个字段复制到了text字段。
        另一种将多个字段映射到同一目的字段的办法是使用动态字段语法。
        copyField还支持maxChars在复制时设置 
    -->
    <!-- <copyField source="*_t" dest="text" maxChars="3000"/> -->

    <!-- 将name字段内容复制到按名字排序的alphaNameSort字段中 -->
    <!-- <copyField source="name" dest="alphaNameSort"/> -->




//译者注:fieldType告诉solr如何去处理某个字段的数据,以及在查询这个字段地如何处理。
//class其实就是引用的java类,说有该字段的类型
//如果FieldType 是 TextFiled类型,则还有analysis属性
    <!-- fieldType 定义:
        name属性:只是一个定义字段的标签
        class属性:class属性和其它属性决定fieldType的真正行为。
        class属性值以solr开头,指向一个标准包(如org.apache.solr.analysis)里java类
     -->

    <!-- StrField 类型是逐字索引(indexed)或存储(stored),而不是解析。
        它支持doc值,但在这种情况下,字段需要单值和一个必须的默认值。
      -->
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />

    <!-- boolean 类型: "true" or "false" -->
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>

    <!-- sortMissingLast和sortMissingFirst属性是可选属性,目前支持字符串和数字的内部排序类型。
        这包括"string","boolean", and, as of 3.5 (and 4.x), int, float, long, date, double, 包括“单词查找树”的不同版本。
        .如果sortMissingLast = " true ",那么将导致 没有该字段的文档排在有该字段的文档的后面,不管规定的排序方式(asc 或 desc)
        .如果sortMissingFirst= " true ",那么将导致 没有该字段的文档排在有该字段的文档的前面,不管规定的排序方式
        .如果 sortMissingLast="false" 和 sortMissingFirst="false" (默认),则将使用默认的lucene排序,即将没有该字段的文档放在升序排序中的第一个,降序排序中的最后一个。
        -->


//译者注:
//Solr至少有五种不同的域类型来保存一个整型,如果你算上string类型,那就是六种!
//float,double,long和date也是至少有五种类型。以"Trie"开头的类型可以满足90%以上的需求。
以Integer为例:
//  tieIntField(设置precisionStep=0),通常称为"int"。它对大部分应用场景都是适合的。
//  tieIntField(设置precisionStep>0),通常称为"tint"。如果你想在很多值的域上进行数值区间查询(包括facet区间),那么这种域类型在查询时有着极好的性能,但需要在索引时多消耗一些索引空间和时间。 
//在Solr的示例配置中数值类型的precisionStep设置为8,date类型为6,建义使用这种默认设置。因为如果你选择更小的precisionStep值(但都>0),会导致Solr会为提高区间查询性能而增加索引的空间和时间。
//  srtableIntField,通常称为 "sint"。与Trie相似(precisionStep=0),这个类型还支持sortMissingFirst和SortMissingLast属性。
//  intField,通常称为 "pint"。过时不用了。
//  BCDIntFIeld(Binary Coded Decimal)。

//所有的数值类型都按它们的数值排序而不是按字典序排序。
//ExternalFileField类型是用于处理要进行排序或影响打分的域变化很快的情况的(比如,按投票或是点击率排序),用这种类型不用再对文档建索引。

     <!-- TrieIntField是默认的数值字段类型。为了范围查询更快,最好是tint/ tfloat / tlong / tdouble类型。
        这些字段支持doc values,但是这些字段只能是单值(默认或是实际需要的值)字段
    -->
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

    <!-- 当查询范围内值的数量很多时,数值字段类型就在各个精度级别创建索引来加快范围查询。NumericRangeQuery 的内部实现细节参见java文档
        较小的precisionStep 值(特别是bits)将导致更多的令牌索引每个值,更多的索引空间,更快地范围查询。为0的precisionStep 会禁用不同精度级别的索上。
    -->

    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>


     <!-- The format for this date field is of the form 1995-12-31T23:59:59Z, and
         is a more restricted form of the canonical representation of dateTime
         http://www.w3.org/TR/xmlschema-2/#dateTime    
         The trailing "Z" designates UTC time and is mandatory.
         Optional fractional seconds are allowed: 1995-12-31T23:59:59.999Z
         All other components are mandatory.

         Expressions can also be used to denote calculations that should be
         performed relative to "NOW" to determine the value, ie...

               NOW/HOUR
                  ... Round to the start of the current hour
               NOW-1DAY
                  ... Exactly 1 day prior to now
               NOW/DAY+6MONTHS+3DAYS
                  ... 6 months and 3 days in the future from the start of
                      the current day

         Consult the TrieDateField javadocs for more information.

         Note: For faster range queries, consider the tdate type
      -->
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>

    <!-- 单词查找树基于date字段,使得日期范围查询和日期字段和日期分块更快。 -->
    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>


    <!-- 二进制数据类型。被检索或是要发送的数据应该以 Base64编码成字符串 -->
    <fieldType name="binary" class="solr.BinaryField"/>

    <!-- "RandomSortField"不用来存储和搜索任何数据。你可以在schema中声明这种类型的字段来生成排序或功能文档的伪随机排序。该排序根据索引的字段名和版本生成。只有该索引的版本不变,且不能有相同的字段名,该文档的排序就会是一致的。如果你想要字段的同一版本有不同的文档伪随机排序,你可使用dynamicField且在请求中更改字段名。
     -->
    <fieldType name="random" class="solr.RandomSortField" indexed="true" />


    <!-- solr.TextField 允许指定自定义文本解析器的规范作为分词器和令牌过渡器,不同的解析器可以指定索引和查询。
        为了防止错误短语匹配跨域字段,可选的positionIncrementGap将该类型的多个字段之间的空间放在同一个文档里。
        获取更多关于自定义解析器链的信息,请参见:http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters 
     -->

     <!-- 还可以通过解析器的class属性指定一个现有的有默认构造器的解析器类
         如:
        <fieldType name="text_greek" class="solr.TextField">
          <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
        </fieldType>
    -->

    <!-- 文本字段只有以空格分割才能精确匹配单词 -->
    <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

     <!-- 合理的,通用的跨语言的普通字段:标记了StandardTokenizer,删除"stopwords. txt"(默认空)阻止单词大小写不敏感。在查询时,也适和于同义词。 -->
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- 在这上例子里,我们在查询 ,只用同义词。
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>


     <!-- 一个适用于英文的有默认值的文本字段: it
         tokenizes with StandardTokenizer, removes English stop words
         (lang/stopwords_en.txt), down cases, protects words from protwords.txt, and
         finally applies Porter's stemming.  The query time analyzer
         also applies synonyms from synonyms.txt. -->
    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- 在该例中,我们在查询时将使用同义词。
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- 阻止单词删除不区分大小写-->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
  <!-- 你可选择没那么积极的stemmer来替代PorterStemFilterFactory
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
  -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
  <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
  -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>


     <!-- 一个适用于英文的有默认值的文本字段,添加启用了分割单词和自动解析功能。该字段除了添加WordDelimiterFilter 使结构发生变化的单词,边界是字母和数字的单词,以及没有字母数字字符的单词能分割和匹配外,共他与text_en字段一样。这意味着某些复合词情况将起作用,例如查询"wi fi"将匹配"WiFi"或"wi-fi"。
        -->
    <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- 在该例中,我们在查询时将只使用同义词。 
         <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- 阻止单词删除时不区分大小写 -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>


     <!-- 匹配不灵在,匹配出错就越少。对于产品名称来说可能不理想,但可能有益于SKU。在错误的地方插入破折号依然能匹配。 -->
    <fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <!-- 该过滤器有删除出现在同一位置的任何重复的令牌 - sometimes
             possible with WordDelimiterFilter in conjuncton with stemming. -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- 除了为使通配符查询更有效率而反转每个令牌的角色外,其它和text_general 一样。 -->
    <fieldType name="text_general_rev" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
           maxPosAsterisk="3" maxPosQuestion="2" maxFractionAsterisk="0.33"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

     <!-- charFilter + WhitespaceTokenizer  -->
    <!--
    <fieldType name="text_char_norm" class="solr.TextField" positionIncrementGap="100" >
      <analyzer>
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>
    -->

    <!-- 这是一个使用KeywordTokenizer(分词器)和许多TokenFilterFactories一同产生一个可排序的字段,不包括源文件的一些属性。-->
    <fieldType name="alphaOnlySort" class="solr.TextField" sortMissingLast="true" omitNorms="true">
      <analyzer>
        <!-- KeywordTokenizer没有真正的分词,因此输入的全部字符串保存为一个单一的令牌。-->
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <!-- 当你想让你的排序不区分大小写,下面这个小写字母TokenFilter 就是你所需要的。 -->
        <filter class="solr.LowerCaseFilterFactory" />
        <!-- TrimFilter删除开头和结尾的所有空格 -->
        <filter class="solr.TrimFilterFactory" />
        <!-- PatternReplaceFilter让你灵活地使用Java正则表达式替换任何字符序列来匹配一个任意替换字符串的正则,这可能包括部分通过正则匹配的原始字符串。
             更多关于匹配和替换字符串的语法,请参见Java正则表达式文档:http://docs.oracle.com/javase/7/docs/api/java/util/regex/package-summary.html
          -->
        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement="" replace="all"
        />
      </analyzer>
    </fieldType>

    <fieldType name="phonetic" stored="false" indexed="true" class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.DoubleMetaphoneFilterFactory" inject="false"/>
      </analyzer>
    </fieldType>

    <fieldType name="payloads" stored="false" indexed="true" class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- DelimitedPayloadTokenFilter 可在令牌中设置载荷,例如:
            "foo|1.4"令技将被索引成载荷为1.4f的"foo"
            DelimitedPayloadTokenFilterFactory 的属性:
            "delimiter": 一个单字符分割符,默认是 |(管道字符);
            "encoder": 如何在载荷中编码以下几种值
                float -> org.apache.lucene.analysis.payloads.FloatEncoder,
                integer -> o.a.l.a.p.IntegerEncoder
                identity -> o.a.l.a.p.IdentityEncoder
            完全限定类名实现了PayloadEncoder,编码器必须有一个无参数的构造函数。
         -->
        <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="float"/>
      </analyzer>
    </fieldType>

    <!-- 作为一个令牌,要小写整个字段值 -->
    <fieldType name="lowercase" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
    </fieldType>

    <!-- 索引时使用PathHierarchyTokenizerFactory 的例子:查询路径匹配文档路径或派生路径。 -->
    <fieldType name="descendent_path" class="solr.TextField">
      <analyzer type="index">
  <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
      </analyzer>
      <analyzer type="query">
  <tokenizer class="solr.KeywordTokenizerFactory" />
      </analyzer>
    </fieldType>
     <!-- 索引时使用PathHierarchyTokenizerFactory 的例子:查询路径匹配文档路径或祖先路径。 -->
    <fieldType name="ancestor_path" class="solr.TextField">
      <analyzer type="index">
  <tokenizer class="solr.KeywordTokenizerFactory" />
      </analyzer>
      <analyzer type="query">
  <tokenizer class="solr.PathHierarchyTokenizerFactory" delimiter="/" />
      </analyzer>
    </fieldType>

    <!-- 因为这种类型的字段默认不存储或索引,添加任何数据都将被忽略。  --> 
    <fieldType name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />

    <!-- This point type indexes the coordinates as separate fields (subFields)
      If subFieldType is defined, it references a type, and a dynamic field
      definition is created matching *___<typename>.  Alternately, if 
      subFieldSuffix is defined, that is used to create the subFields.
      Example: if subFieldType="double", then the coordinates would be
        indexed in fields myloc_0___double,myloc_1___double.
      Example: if subFieldSuffix="_d" then the coordinates would be indexed
        in fields myloc_0_d,myloc_1_d
      The subFields are an implementation detail of the fieldType, and end
      users normally should not need to know about them.
     -->
    <fieldType name="point" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>

    <!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
    <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

    <!-- An alternative geospatial field type new to Solr 4.  It supports multiValued and polygon shapes.
      For more information about this and other Spatial fields new to Solr 4, see:
      http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
    -->
    <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
        geo="true" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" />

   <!-- Money/currency field type. See http://wiki.apache.org/solr/MoneyFieldType
        Parameters:
          defaultCurrency: Specifies the default currency if none specified. Defaults to "USD"
          precisionStep:   Specifies the precisionStep for the TrieLong field used for the amount
          providerClass:   Lets you plug in other exchange provider backend:
                           solr.FileExchangeRateProvider is the default and takes one parameter:
                             currencyConfig: name of an xml file holding exchange rates
                           solr.OpenExchangeRatesOrgProvider uses rates from openexchangerates.org:
                             ratesFileLocation: URL or path to rates JSON file (default latest.json on the web)
                             refreshInterval: Number of minutes between each rates fetch (default: 1440, min: 60)
   -->
    <fieldType name="currency" class="solr.CurrencyField" precisionStep="8" defaultCurrency="USD" currencyConfig="currency.xml" />



   <!-- 不同语言的一些示例(通常由国际化标准织组制定)-->

    <!-- Arabic -->
    <fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- for any non-arabic -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ar.txt" />
        <!-- normalizes ﻯ to ﻱ, etc -->
        <filter class="solr.ArabicNormalizationFilterFactory"/>
        <filter class="solr.ArabicStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Bulgarian -->
    <fieldType name="text_bg" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/> 
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_bg.txt" /> 
        <filter class="solr.BulgarianStemFilterFactory"/>       
      </analyzer>
    </fieldType>

    <!-- Catalan -->
    <fieldType name="text_ca" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- removes l', etc -->
        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ca.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ca.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Catalan"/>       
      </analyzer>
    </fieldType>

    <!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) -->
    <fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- normalize width before bigram, as e.g. half-width dakuten combine  -->
        <filter class="solr.CJKWidthFilterFactory"/>
        <!-- for any non-CJK -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.CJKBigramFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Kurdish -->
    <fieldType name="text_ckb" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SoraniNormalizationFilterFactory"/>
        <!-- for any latin text -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ckb.txt"/>
        <filter class="solr.SoraniStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Czech -->
    <fieldType name="text_cz" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_cz.txt" />
        <filter class="solr.CzechStemFilterFactory"/>       
      </analyzer>
    </fieldType>

    <!-- Danish -->
    <fieldType name="text_da" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_da.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Danish"/>       
      </analyzer>
    </fieldType>

    <!-- German -->
    <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
      </analyzer>
    </fieldType>

    <!-- Greek -->
    <fieldType name="text_el" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- greek specific lowercase for sigma -->
        <filter class="solr.GreekLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_el.txt" />
        <filter class="solr.GreekStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Spanish -->
    <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_es.txt" format="snowball" />
        <filter class="solr.SpanishLightStemFilterFactory"/>
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Spanish"/> -->
      </analyzer>
    </fieldType>

    <!-- Basque -->
    <fieldType name="text_eu" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_eu.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Basque"/>
      </analyzer>
    </fieldType>

    <!-- Persian -->
    <fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <!-- for ZWNJ -->
        <charFilter class="solr.PersianCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ArabicNormalizationFilterFactory"/>
        <filter class="solr.PersianNormalizationFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fa.txt" />
      </analyzer>
    </fieldType>

    <!-- Finnish -->
    <fieldType name="text_fi" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fi.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/>
        <!-- less aggressive: <filter class="solr.FinnishLightStemFilterFactory"/> -->
      </analyzer>
    </fieldType>

    <!-- French -->
    <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- removes l', etc -->
        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_fr.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt" format="snowball" />
        <filter class="solr.FrenchLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.FrenchMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="French"/> -->
      </analyzer>
    </fieldType>

    <!-- Irish -->
    <fieldType name="text_ga" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- removes d', etc -->
        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_ga.txt"/>
        <!-- removes n-, etc. position increments is intentionally false! -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/hyphenations_ga.txt"/>
        <filter class="solr.IrishLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ga.txt"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Irish"/>
      </analyzer>
    </fieldType>

    <!-- Galician -->
    <fieldType name="text_gl" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_gl.txt" />
        <filter class="solr.GalicianStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GalicianMinimalStemFilterFactory"/> -->
      </analyzer>
    </fieldType>

    <!-- Hindi -->
    <fieldType name="text_hi" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- normalizes unicode representation -->
        <filter class="solr.IndicNormalizationFilterFactory"/>
        <!-- normalizes variation in spelling -->
        <filter class="solr.HindiNormalizationFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hi.txt" />
        <filter class="solr.HindiStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Hungarian -->
    <fieldType name="text_hu" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hu.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Hungarian"/>
        <!-- less aggressive: <filter class="solr.HungarianLightStemFilterFactory"/> -->   
      </analyzer>
    </fieldType>

    <!-- Armenian -->
    <fieldType name="text_hy" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_hy.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Armenian"/>
      </analyzer>
    </fieldType>

    <!-- Indonesian -->
    <fieldType name="text_id" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_id.txt" />
        <!-- for a less aggressive approach (only inflectional suffixes), set stemDerivational to false -->
        <filter class="solr.IndonesianStemFilterFactory" stemDerivational="true"/>
      </analyzer>
    </fieldType>

    <!-- Italian -->
    <fieldType name="text_it" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- removes l', etc -->
        <filter class="solr.ElisionFilterFactory" ignoreCase="true" articles="lang/contractions_it.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_it.txt" format="snowball" />
        <filter class="solr.ItalianLightStemFilterFactory"/>
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Italian"/> -->
      </analyzer>
    </fieldType>

    <!-- Japanese using morphological analysis (see text_cjk for a configuration using bigramming)

         NOTE: If you want to optimize search for precision, use default operator AND in your query
         parser config with <solrQueryParser defaultOperator="AND"/> further down in this file.  Use 
         OR if you would like to optimize for recall (default).
    -->
    <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
      <analyzer>
      <!-- Kuromoji Japanese morphological analyzer/tokenizer (JapaneseTokenizer)

           Kuromoji has a search mode (default) that does segmentation useful for search.  A heuristic
           is used to segment compounds into its parts and the compound itself is kept as synonym.

           Valid values for attribute mode are:
              normal: regular segmentation
              search: segmentation useful for search with synonyms compounds (default)
            extended: same as search mode, but unigrams unknown words (experimental)

           For some applications it might be good to use search mode for indexing and normal mode for
           queries to reduce recall and prevent parts of compounds from being matched and highlighted.
           Use <analyzer type="index"> and <analyzer type="query"> for this and mode normal in query.

           Kuromoji also has a convenient user dictionary feature that allows overriding the statistical
           model with your own entries for segmentation, part-of-speech tags and readings without a need
           to specify weights.  Notice that user dictionaries have not been subject to extensive testing.

           User dictionary attributes are:
                     userDictionary: user dictionary filename
             userDictionaryEncoding: user dictionary encoding (default is UTF-8)

           See lang/userdict_ja.txt for a sample user dictionary file.

           Punctuation characters are discarded by default.  Use discardPunctuation="false" to keep them.

           See http://wiki.apache.org/solr/JapaneseLanguageSupport for more on Japanese language support.
        -->
        <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
        <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" userDictionary="lang/userdict_ja.txt"/>-->
        <!-- Reduces inflected verbs and adjectives to their base/dictionary forms (辞書形) -->
        <filter class="solr.JapaneseBaseFormFilterFactory"/>
        <!-- Removes tokens with certain part-of-speech tags -->
        <filter class="solr.JapanesePartOfSpeechStopFilterFactory" tags="lang/stoptags_ja.txt" />
        <!-- Normalizes full-width romaji to half-width and half-width kana to full-width (Unicode NFKC subset) -->
        <filter class="solr.CJKWidthFilterFactory"/>
        <!-- Removes common tokens typically not useful for search, but have a negative effect on ranking -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ja.txt" />
        <!-- Normalizes common katakana spelling variations by removing any last long sound character (U+30FC) -->
        <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/>
        <!-- Lower-cases romaji characters -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Latvian -->
    <fieldType name="text_lv" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_lv.txt" />
        <filter class="solr.LatvianStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- Dutch -->
    <fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_nl.txt" format="snowball" />
        <filter class="solr.StemmerOverrideFilterFactory" dictionary="lang/stemdict_nl.txt" ignoreCase="false"/>
        <filter class="solr.SnowballPorterFilterFactory" language="Dutch"/>
      </analyzer>
    </fieldType>

    <!-- Norwegian -->
    <fieldType name="text_no" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_no.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Norwegian"/>
        <!-- less aggressive: <filter class="solr.NorwegianLightStemFilterFactory" variant="nb"/> -->
        <!-- singular/plural: <filter class="solr.NorwegianMinimalStemFilterFactory" variant="nb"/> -->
        <!-- The "light" and "minimal" stemmers support variants: nb=Bokmål, nn=Nynorsk, no=Both -->
      </analyzer>
    </fieldType>

    <!-- Portuguese -->
    <fieldType name="text_pt" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_pt.txt" format="snowball" />
        <filter class="solr.PortugueseLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.PortugueseMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="Portuguese"/> -->
        <!-- most aggressive: <filter class="solr.PortugueseStemFilterFactory"/> -->
      </analyzer>
    </fieldType>

    <!-- Romanian -->
    <fieldType name="text_ro" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ro.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Romanian"/>
      </analyzer>
    </fieldType>

    <!-- Russian -->
    <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
        <!-- less aggressive: <filter class="solr.RussianLightStemFilterFactory"/> -->
      </analyzer>
    </fieldType>

    <!-- Swedish -->
    <fieldType name="text_sv" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_sv.txt" format="snowball" />
        <filter class="solr.SnowballPorterFilterFactory" language="Swedish"/>
        <!-- less aggressive: <filter class="solr.SwedishLightStemFilterFactory"/> -->
      </analyzer>
    </fieldType>

    <!-- Thai -->
    <fieldType name="text_th" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ThaiWordFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_th.txt" />
      </analyzer>
    </fieldType>

    <!-- Turkish -->
    <fieldType name="text_tr" class="solr.TextField" positionIncrementGap="100">
      <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ApostropheFilterFactory"/>
        <filter class="solr.TurkishLowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_tr.txt" />
        <filter class="solr.SnowballPorterFilterFactory" language="Turkish"/>
      </analyzer>
    </fieldType>

  <!-- Similarity is the scoring routine for each document vs. a query.
       A custom Similarity or SimilarityFactory may be specified here, but 
       the default is fine for most applications.  
       For more info: http://wiki.apache.org/solr/SchemaXml#Similarity
    -->
  <!--
     <similarity class="com.example.solr.CustomSimilarityFactory">
       <str name="paramkey">param value</str>
     </similarity>
    -->

</schema>


  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是Solr 8.7.0版本的schema.xml模板,你可以根据自己的需求进行相应的修改。 ``` <?xml version="1.0" encoding="UTF-8"?> <schema name="example" version="1.6"> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" /> <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.EnglishMinimalStemFilterFactory" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.EnglishMinimalStemFilterFactory" /> </analyzer> </fieldType> <fieldType name="date" class="solr.DatePointField" /> <fieldType name="int" class="solr.IntPointField" /> <fieldType name="long" class="solr.LongPointField" /> <fieldType name="float" class="solr.FloatPointField" /> <fieldType name="double" class="solr.DoublePointField" /> <fieldType name="boolean" class="solr.BoolField" /> </types> <fields> <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true" /> <field name="title" type="text_general" indexed="true" stored="true" multiValued="false" /> <field name="content" type="text_general" indexed="true" stored="true" multiValued="false" /> <field name="date" type="date" indexed="true" stored="true" multiValued="false" /> <field name="price" type="double" indexed="true" stored="true" multiValued="false" /> <field name="location" type="string" indexed="true" stored="true" multiValued="false" /> <field name="category" type="string" indexed="true" stored="true" multiValued="false" /> <field name="is_new" type="boolean" indexed="true" stored="true" multiValued="false" /> <dynamicField name="*_i" type="int" indexed="true" stored="true" multiValued="false" /> <dynamicField name="*_l" type="long" indexed="true" stored="true" multiValued="false" /> <dynamicField name="*_f" type="float" indexed="true" stored="true" multiValued="false" /> <dynamicField name="*_d" type="double" indexed="true" stored="true" multiValued="false" /> <dynamicField name="*_dt" type="date" indexed="true" stored="true" multiValued="false" /> </fields> <uniqueKey>id</uniqueKey> <defaultSearchField>content</defaultSearchField> <solrQueryParser defaultOperator="OR" /> </schema> ``` 其中,该模板定义了一些常用的字段类型,如string、text_general、date、int、long、float、double和boolean等。同时,该模板还定义了一些常用的字段,如id、title、content、date、price、location、category和is_new等,以及动态字段。你可以根据自己的需求进行相应的添加和修改。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值