solr在创建索引时对id的值具有唯一性,跟数据库中的主键一样功能类似。在数据库中,主键不唯一会报错,而在solr中,如果id值重复,后面创建的索引值会覆盖前面的索引值。
有时我们可能需要将不止一个表的数据导入到solr中。这时可能会发生id值重复的情况。
我们可以通过将A表的主键字段命名为A_id,B表主键字段为B_id来区分多个表中的数据。
在db-data-config.xml中的document下添加entity标签,如下:
<dataConfig>
<dataSourcetype="JdbcDataSource" driver="com.mysql.jdbc.Driver"url="jdbc:mysql://127.0.0.1:3306/outofmemory" user="root"password="root"/>
<document>
<entity name="questions"query="select question_id,user_id,content,title,date from questions"pk="question_id">
<fieldcolumn="question_id" name="question_id" />
<fieldcolumn="user_id" name="user_id" />
<fieldcolumn="content" name="content" />
<field column="title"name="title" />
<field column="date"name="date" />
</entity>
<entity name="tag"query="select tag_id,tagname,nums from tag" pk="tag_id">
<fieldcolumn="tag_id" name="tag_id"/>
<fieldcolumn="tagname" name="tagname"/>
<fieldcolumn="nums" name="nums"/>
</entity>
</document>
</dataConfig>
但这样问题仍然不能解决,因为在solr中最好必须声明一个为uniquekey的字段来表示索引id。那将他去掉用其他方式行不行呢?
在managed-schema文件中,注释说明了最好不要把uniquekey去掉。
所以必须在为多个表的数据建立索引时必须为各表的分配不重复的id。如何做呢?用UUID
1.
- managed-schema文件
增加如下配置:
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
修改原有的id字段的类型为
<fieldname="id" type="uuid" indexed="true"stored="true" multiValued="false" required="true"/>
- solrconfig.xml
修改更新索引的requestHandler
<requestHandlername="/update"class="solr.UpdateRequestHandler">
<lstname="defaults">
<str name="update.chain">dispup</str>
</lst>
</requestHandler>
如果用到了dataImportHandler的方式从数据库导入数据
dataimport的配置也要进行修改,如下:
<requestHandlername="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<strname="config">db-data-config.xml</str>
<strname="update.chain">dispup</str>
</lst>
</requestHandler>
2.启动后如果报 Invalid UUID String: '1'
Causedby: org.apache.solr.common.SolrException: Invalid UUID String:'1'
atorg.apache.solr.schema.UUIDField.toInternal(UUIDField.java:88)
at org.apache.solr.schema.FieldType.readableToIndexed(FieldType.java:406)
atorg.apache.solr.handler.component.QueryElevationComponent$ElevationObj.<init>(QueryElevationComponent.java:150)
atorg.apache.solr.handler.component.QueryElevationComponent.loadElevationMap(QueryElevationComponent.java:324)
atorg.apache.solr.handler.component.QueryElevationComponent.inform(QueryElevationComponent.java:238)
... 12 more
google查阅到 https://issues.apache.org/jira/browse/SOLR-3398
It appears based on the bug report that you havemodified the example solr schema.xml to change the uniqueKey field to useUUIDField - however you also seem to still have the QueryElevationComponentenabled in your solrconfig.xml file, configured to use the uniqueKey field, andthe elevate.xml file has (example) entries referring to IDs which are not legalUUID values.
So you can either remove theQueryElevationComponent, or edit elevate.xml to remove the examples fromelevate.xml, or change the QueryElevationComponent configuration insolrconfig.xml to use some lookup field other then the uniqueKey field.
Any of those should work.
由于之前直接复制example里的配置文件进行配置,在solrconfig.xml配置的searchComponent使用了elevate.xml
<!-- Query Elevation Component
http://wiki.apache.org/solr/QueryElevationComponent
a search component that enables you to configure the top
results for a given query regardless of the normal lucene
scoring.
-->
<searchComponent name="elevator" class="solr.QueryElevationComponent" >
<!-- pick a fieldType to analyze queries -->
<str name="queryFieldType">string</str>
<str name="config-file">elevate.xml</str>
</searchComponent>
由于elevate.xml里有初始化的id值,如下
<elevate>
<query text="foo bar">
<doc id="1" />
<doc id="2" />
<doc id="3" />
</query>
<query text="ipod">
<doc id="MA147LL/A" /> <!-- put the actual ipod at the top -->
<doc id="IW-02"exclude="true" /> <!-- exclude this cable -->
</query>
</elevate>
将这些id值注释掉即可。
参考资料:
https://issues.apache.org/jira/browse/SOLR-3398