概述
假设从 select title, content from T_NOTICE
中导入数据至 solr collection 中,且 managed-schema 中未定义 <field name="title " type="text_ansj"/>
,也没有相应的 dynamicField
,此时数据导入是成功的,默认情况下会在 collection/conf
下生成一个新的 managed-schema
文件,里面会自动追加<field name="title " type="text_general"/>
和 <copyField source="title" dest="title_str" maxLength="256"/>
,本文主要分析该字段是如何产生的,以及如何设置自定义的FieldType
。
solrconfig.xml 中搜索 unknown
关键词
add-unknown-fields-to-the-schema
<!-- The update.autoCreateFields property can be turned to false to disable schemaless mode -->
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}"
processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
processor 里面最后一个单词 add-schema-fields
add-schema-fields
<updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields">
<lst name="typeMapping">
<str name="valueClass">java.lang.String</str>
<str name="fieldType">text_general</str>
<lst name="copyField">
<str name="dest">*_str</str>
<int name="maxChars">256</int>
</lst>
<!-- Use as default mapping instead of defaultFieldType -->
<bool name="default">true</bool>
</lst>
...
可以看到定义的默认java.lang.String
对应的字段类型为 text_general
。
其实在 solrconfig.xml 中已经有做说明:
Add unknown fields to the schema
Field type guessing update processors that will
attempt to parse string-typed field values as Booleans, Longs,
Doubles, or Dates, and then add schema fields with the guessed
field types. Text content will be indexed as "text_general" as
well as a copy to a plain string version in *_str.
These require that the schema is both managed and mutable, by
declaring schemaFactory as ManagedIndexSchemaFactory, with
mutable specified as true.
详见 schema-factory-definition-in-solrconfig
修改为自定义的字段类型
我们在项目中使用的是 ansj 分词器,字段类型是text_ansj
,修改如下:
<updateProcessor class="solr.AddSchemaFieldsUpdateProcessorFactory" name="add-schema-fields">
<lst name="typeMapping">
<str name="valueClass">java.lang.String</str>
<str name="fieldType">text_ansj</str>
<!-- Use as default mapping instead of defaultFieldType -->
<bool name="default">true</bool>
</lst>
...
solr 字段属性默认值详见 defining-fields,可根据实际情况调整。