从数据库中导入数据到solr2

最新推荐文章于 2024-06-25 16:14:24 发布

fxfanglin

最新推荐文章于 2024-06-25 16:14:24 发布

阅读量905

点赞数 1

文章标签： solr

要建立自己的全文检索，一般都需要从数据库导入数据，在原来配置的基础上，增加导入的功能，这里以mysql为例子:

在solr的工作目录中选择一个core，我这里选择core1。进入配置文件夹：solr_tomcat\solr\core1\conf 。在solrconfig.xml中添加如下代码：

[html] view plain copy

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>

在同一目录下（配置文件夹）下新建data-config.xml，添加以下代码：

[html] view plain copy

<?xml version="1.0" encoding="utf-8"?>
<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/li"
user="root"
password="465864"/>
<document name="lhx">
<entity name="student" pk="sid" query="select sid,sname,sage,saddress,sdescript from student">
<field column="sid" name="id" />
<field column="sname" name="name" />
<field column="sage" name="age" />
<field column="saddress" name="address" />
<field column="sdescript" name="descript" />
</entity>
</document>
</dataConfig>

修改相应的url、user、password，<document name="lhx">这个随便取名。

name指表名，pk是主键名，query是查询SQL语句。

[html] view plain copy

<field column="sid" name="id" />

column是列名，对应数据库中字段的名称，name就是solr这边对应的名称。接下来就是配置name了，这要到schema.xml里面配置，现在打开这个文件，原来有的，可以保留，没有的就添加，最后的内容为：

[html] view plain copy

<?xml version="1.0" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<schema name="example core one" version="1.1">
<fieldtype name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="text_ik" class="solr.TextField">
<analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>
<field name="id" type="long" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="age" type="int" indexed="true" stored="true" multiValued="false" />
<field name="address" type="string" indexed="true" stored="true" multiValued="false" />
<field name="descript" type="text_ik" indexed="true" stored="true" multiValued="false" />
<field name="type" type="string" indexed="true" stored="true" multiValued="false" />
<field name="core1" type="string" indexed="true" stored="true" multiValued="false" />
<field name="_version_" type="long" indexed="true" stored="true"/>
<uniqueKey>id</uniqueKey>
<defaultSearchField>name</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
</schema>

text_ik是一个中文分词器ik-analyzer，专门处理中文分词。

申明field，field的名字应该和sql的查询结果集列名一致，如果不一致，需要在data-config.xml中entity标签中用field指明列和field的对应关系。

如下field是必须的，用于标记版本信息，由solr内部自己维护。

这个原来就有，不用修改。

该配置的东西都配置好了，现在就是往MySQL数据库里面写入内容，整个的SQL语句提供给大家：

[sql] view plain copy

SET FOREIGN_KEY_CHECKS=0;
-- ----------------------------
-- Table structure for student
-- ----------------------------
DROP TABLE IF EXISTS `student`;
CREATE TABLE `student` (
`sid` bigint(20) NOT NULL,
`sname` varchar(255) DEFAULT NULL,
`sage` int(11) DEFAULT NULL,
`saddress` varchar(255) DEFAULT NULL,
`sdescript` text,
PRIMARY KEY (`sid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- ----------------------------
-- Records of student
-- ----------------------------
INSERT INTO `student` VALUES ('1', '李四', '23', '中华路1号', '好学生');
INSERT INTO `student` VALUES ('2', '张三', '34', '华信路3号', '坏学生，经常做坏事！');

还要到solr的开发包里面找相应的jar包，位置：solr-4.10.2\dist ，有两个：

solr-dataimporthandler-extras-4.10.2

solr-dataimporthandler-4.10.2

还有一个jtds-1.2.4，自己到网上下载。

最后不忘了MySQL的连接包：mysql-connector-java-5.1.30

把这些都放到apache-tomcat-6.0.43\webapps\solr\WEB-INF\lib里面去。

启动tomcat，不报错就可以了！

打开solr，选择core1，下面的条目选择“Dataimport”，右边选择full-import，接着就是Execute执行，右边会显示“Indexing……”

不要一直等待，你一直等待，它都是这个样子的，要执行第4步：Refresh Status 。最后会出现：

到此大功告成！接着做以下查询，用到了IK Analyzer分词。

选择“Query”,接着在右边的q里面输入“要查询的字段”：“值”，查询的字段对应在data-config.xml对应的name属性名称，而不是数据表的字段名称。“wt”代表响应的数据，默认是json，不用修改，最后就是点击“Execute Query”按钮了。结果会马上呈现在右边的空白处。

后记：

一开始schema.xml里面的int类型我写成了以下的代码，启动solr后报出了错误。

[html] view plain copy

<fieldType name="int" class="solr.TrieIntegerField" precisionStep="0" positionIncrementGap="0"/>

Tomcat里面报出的错误提示就更容易理解了：

怎么写int类型才准确呢？我到collection1里面查考官网的带的schema.xml，里面就有对int的定义：

[html] view plain copy

<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>

修改后这个问题就解决了。

再重启solr，tomcat里面报出如下错误：

可见是缺少jar包了，添加solr-dataimporthandler-extras-4.10.2、solr-dataimporthandler-4.10.2和jtds-1.2.4。解决问题！

接着就是导入数据分析的时候一直卡在“Indexing……”字样，点击刷新也没反应，在tomcat里面可以看到如下错误提示：

缺少数据库驱动jar包，导入就没问题了！

搜索的时候，字段名写得不正确，是有红色波浪线提示的：

fxfanglin

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
从数据库中导入数据到solr2

要建立自己的全文检索，一般都需要从数据库导入数据，在原来配置的基础上，增加导入的功能，这里以mysql为例子:在solr的工作目录中选择一个core，我这里选择core1。进入配置文件夹：solr_tomcat\solr\core1\conf 。在solrconfig.xml中添加如下代码：[html] view plain copy &lt;requestHandler name="/datai...
复制链接

扫一扫