solr6.0 导入mysql数据

又要做搜索发现solr都到6.0了,跟4.5的还是有区别的就记录一下
solr6.0用到servlet3.0,必须是jdk1.8、tomcat8,(这里是:apache-tomcat-8.0.36、jdk1.8.0_102)
http://mirrors.hust.edu.cn/apache/tomcat/tomcat-8/v8.0.36/bin/apache-tomcat-8.0.36.zip
http://apache.fayea.com/lucene/solr/6.1.0/solr-6.1.0.zip


准本工作
  • 下载http://apache.fayea.com/lucene/solr/6.1.0/
    放到D:\search下并解压
  • jar包
    - mysql:mysql-connector-java-6.0.2.jar
    - 分词:mmseg4j-core-with-dic-1.8.6.jar mmseg4j-solr-2.3.0.jar
    - D:\search\solr-6.1.0\dist下的
    solr-dataimporthandler-6.1.0.jar
    solr-dataimporthandler-extras-6.1.0.jar
    - D:\search\solr-6.1.0\server\lib\ext下的所有jar
  • 在D:\search下新建文件夹solr/home
  • 把D:\search\solr-6.1.0\server\solr-webapp下的webapp重命名为solr复制到
    D:\search\apache-tomcat-8.0.36\webapps 下
  • 修改D:\search\apache-tomcat-8.0.36\webapps\solr\WEB-INF\web.xml
    找到env-entry 放开注释
<env-entry>
       <env-entry-name>solr/home</env-entry-name>
       <env-entry-value>D:\search\solr\home</env-entry-value>
       <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>

把D:\search\solr-6.1.0\server\resources\log4j.properties复制到
D:\search\apache-tomcat-8.0.36\webapps\solr\WEB-INF\classes下
这里写图片描述

准备工作搞定就可以边看solr文档边抄自带的例子了

有的地方跟solr4.5的还是差不多一样
看D:\search\solr-6.1.0\example\example-DIH\solr 下这个例子
照着抄比如我这里是做搜索问答的
就在D:\search\solr\home下建个ask文件夹
里面在新建conf、data两个目录
把示例下的solr.xml复制到D:\search\solr\home下,跟4.0的不一样,打开编辑毛都没有,
看文档https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml
找到Format of solr.xml 点进去就可以看到文档格式了,发现没有像4.0一样指定lib目录,不管了直接抄

<?xml version="1.0" encoding="UTF-8" ?>
<solr>
<solrcloud>
    <str name="host">${host:}</str>
    <int name="hostPort">${jetty.port:8983}</int>
    <str name="hostContext">${hostContext:solr}</str>
    <int name="zkClientTimeout">${zkClientTimeout:15000}</int>
    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
  </solrcloud>

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:0}</int>
    <int name="connTimeout">${connTimeout:0}</int>
  </shardHandlerFactory>
</solr>

在进入D:\search\solr-6.1.0\example\example-DIH\solr\db\conf下面接着抄
以前博客上写过要有 solrconfig.xml、data-config.xml、schema.xml这个几个文件,示例里面有solrconfig.xml、db-data-config.xml 那就复制过去,这个两个跟4.0有点区别,查看这些文件里面都是配置的demo里面的东西,那就找个简单的D:\search\solr-6.1.0\server\solr\configsets\basic_configs\conf\solrconfig.xml 这个看名字就知道是最基础的就他了,也要对比着demo里面的抄
看到example-DIH的solrconfig.xml 里面有加载lib包的就就抄过来

<lib dir="D:\search\apache-tomcat-8.0.36\webapps\solr\WEB-INF\lib" regex=".*\.jar" />

我们这里是ask 那就修改dataDir 为${solr.ask.data.dir:}

在配导入mysql数据,抄以前博客4.0的配置

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">     
    <lst name="defaults">     
               <str name="config">data-config.xml</str>     
    </lst>     
</requestHandler>

看demo的里面注释

 <!-- To enable dynamic schema REST APIs, use the following for <schemaFactory>:

       <schemaFactory class="ManagedIndexSchemaFactory">
         <bool name="mutable">true</bool>
         <str name="managedSchemaResourceName">managed-schema</str>
       </schemaFactory>

       When ManagedIndexSchemaFactory is specified, Solr will load the schema from
       he resource named in 'managedSchemaResourceName', rather than from schema.xml.
       Note that the managed schema resource CANNOT be named schema.xml.  If the managed
       schema does not exist, Solr will create it after reading schema.xml, then rename
       'schema.xml' to 'schema.xml.bak'. 

       Do NOT hand edit the managed schema - external modifications will be ignored and
       overwritten as a result of schema modification REST API calls.

       When ManagedIndexSchemaFactory is specified with mutable = true, schema
       modification REST API calls will be allowed; otherwise, error responses will be
       sent back for these requests. 
  -->

schema.xml有着落了,先建个空的放着
基本上solrconfig.xml就配置完了
修改data-config.xml

<dataConfig>
    <dataSource name="TestDB" type="JdbcDataSource" driver="com.mysql.cj.jdbc.Driver"
                url="jdbc:mysql://localhost:3306/cw?useUnicode=true&amp;serverTimezone=GMT&amp;characterEncoding=UTF-8&amp;userSSL=false" user="root"
                password="root"/>
  <document>
    <entity name="ask" pk="id" dataSource="CWDB"
            query="select id,classifyId,title,digest,userId,answerNum,solved,closed,createTime from question where deleted=0"
            deltaImportQuery="select id,classifyId,title,digest,userId,answerNum,solved,closed,createTime from question where deleted=0 and id='${dih.delta.id}'"
            deletedPkQuery="select id from question where deleted=1"
            deltaQuery="select id from question where deleted=0 and updateTime &gt; '${dih.last_index_time}'"> 
            <field column="id" name="id"/>          
            <field column="classifyId" name="classifyId"/>          
            <field column="title" name="title"/>
            <field column="digest" name="digest"/>
            <field column="userId" name="userId"/>
            <field column="answerNum" name="answerNum"/>
            <field column="solved" name="solved"/>
            <field column="closed" name="closed"/>
            <field column="createTime" name="createTime"/>
        </entity>
  </document>
</dataConfig>

修改schema.xml,就是抄以前4.0的,mmseg4j用2.3.0的就不用mmseg4j-analysis-1.9.1-SNAPSHOT.jar这个包了

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="askSchema" version="1.6">
 <types>  
    <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
    <fieldtype name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
    <fieldType name="integer" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="date" class="solr.TrieDateField" precisionStep="8" positionIncrementGap="0"/>

    <fieldType name="text" class="solr.TextField" >  
      <analyzer>  
        <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word"/>  
      </analyzer>  
   </fieldType>
 </types>

 <fields>
    <field name="id" type="integer" indexed="true" stored="true" required="true"/>
    <field name="classifyId" type="integer" indexed="true" stored="true"/>
    <field name="title" type="text" indexed="true" stored="true"/>
    <field name="digest" type="text" indexed="true" stored="true"/>
    <field name="userId" type="integer" indexed="true" stored="true"/>
    <field name="answerNum" type="integer" indexed="true" stored="true"/>
    <field name="solved" type="boolean" indexed="true" stored="true"/>
    <field name="closed" type="boolean" indexed="true" stored="true"/>
    <field name="createTime" type="date" indexed="false" stored="true"/>
    <field name="am" type="text" indexed="true" stored="false" multiValued="true"/> 
    <field name="_version_" type="long" indexed="true" stored="false" />
    <field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
 </fields>
 <copyField source="title" dest="am" />
 <copyField source="digest" dest="am" />
 <!-- field to use to determine and enforce document uniqueness. -->
 <uniqueKey>id</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>title</defaultSearchField>

 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
 <solrQueryParser defaultOperator="AND"/>

</schema>

基本上就搞定了,启动tomcat,打开http://localhost:8080/solr/index.html
加一个core,ask
这里写图片描述

开始导数据建索引
这里写图片描述
看看导进去的数据分词成功了没有
这里写图片描述

基本上就搞定了,高级一点的看文档抄

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值