又要做搜索发现solr都到6.0了,跟4.5的还是有区别的就记录一下
solr6.0用到servlet3.0,必须是jdk1.8、tomcat8,(这里是:apache-tomcat-8.0.36、jdk1.8.0_102)
http://mirrors.hust.edu.cn/apache/tomcat/tomcat-8/v8.0.36/bin/apache-tomcat-8.0.36.zip
http://apache.fayea.com/lucene/solr/6.1.0/solr-6.1.0.zip
准本工作
- 下载http://apache.fayea.com/lucene/solr/6.1.0/
放到D:\search下并解压 - jar包
- mysql:mysql-connector-java-6.0.2.jar
- 分词:mmseg4j-core-with-dic-1.8.6.jar mmseg4j-solr-2.3.0.jar
- D:\search\solr-6.1.0\dist下的
solr-dataimporthandler-6.1.0.jar
solr-dataimporthandler-extras-6.1.0.jar
- D:\search\solr-6.1.0\server\lib\ext下的所有jar - 在D:\search下新建文件夹solr/home
- 把D:\search\solr-6.1.0\server\solr-webapp下的webapp重命名为solr复制到
D:\search\apache-tomcat-8.0.36\webapps 下 - 修改D:\search\apache-tomcat-8.0.36\webapps\solr\WEB-INF\web.xml
找到env-entry 放开注释
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>D:\search\solr\home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
把D:\search\solr-6.1.0\server\resources\log4j.properties复制到
D:\search\apache-tomcat-8.0.36\webapps\solr\WEB-INF\classes下
准备工作搞定就可以边看solr文档边抄自带的例子了
有的地方跟solr4.5的还是差不多一样
看D:\search\solr-6.1.0\example\example-DIH\solr 下这个例子
照着抄比如我这里是做搜索问答的
就在D:\search\solr\home下建个ask文件夹
里面在新建conf、data两个目录
把示例下的solr.xml复制到D:\search\solr\home下,跟4.0的不一样,打开编辑毛都没有,
看文档https://cwiki.apache.org/confluence/display/solr/Solr+Cores+and+solr.xml
找到Format of solr.xml 点进去就可以看到文档格式了,发现没有像4.0一样指定lib目录,不管了直接抄
<?xml version="1.0" encoding="UTF-8" ?>
<solr>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8983}</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
</solr>
在进入D:\search\solr-6.1.0\example\example-DIH\solr\db\conf下面接着抄
以前博客上写过要有 solrconfig.xml、data-config.xml、schema.xml这个几个文件,示例里面有solrconfig.xml、db-data-config.xml 那就复制过去,这个两个跟4.0有点区别,查看这些文件里面都是配置的demo里面的东西,那就找个简单的D:\search\solr-6.1.0\server\solr\configsets\basic_configs\conf\solrconfig.xml 这个看名字就知道是最基础的就他了,也要对比着demo里面的抄
看到example-DIH的solrconfig.xml 里面有加载lib包的就就抄过来
<lib dir="D:\search\apache-tomcat-8.0.36\webapps\solr\WEB-INF\lib" regex=".*\.jar" />
我们这里是ask 那就修改dataDir 为${solr.ask.data.dir:}
在配导入mysql数据,抄以前博客4.0的配置
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
看demo的里面注释
<!-- To enable dynamic schema REST APIs, use the following for <schemaFactory>:
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
When ManagedIndexSchemaFactory is specified, Solr will load the schema from
he resource named in 'managedSchemaResourceName', rather than from schema.xml.
Note that the managed schema resource CANNOT be named schema.xml. If the managed
schema does not exist, Solr will create it after reading schema.xml, then rename
'schema.xml' to 'schema.xml.bak'.
Do NOT hand edit the managed schema - external modifications will be ignored and
overwritten as a result of schema modification REST API calls.
When ManagedIndexSchemaFactory is specified with mutable = true, schema
modification REST API calls will be allowed; otherwise, error responses will be
sent back for these requests.
-->
schema.xml有着落了,先建个空的放着
基本上solrconfig.xml就配置完了
修改data-config.xml
<dataConfig>
<dataSource name="TestDB" type="JdbcDataSource" driver="com.mysql.cj.jdbc.Driver"
url="jdbc:mysql://localhost:3306/cw?useUnicode=true&serverTimezone=GMT&characterEncoding=UTF-8&userSSL=false" user="root"
password="root"/>
<document>
<entity name="ask" pk="id" dataSource="CWDB"
query="select id,classifyId,title,digest,userId,answerNum,solved,closed,createTime from question where deleted=0"
deltaImportQuery="select id,classifyId,title,digest,userId,answerNum,solved,closed,createTime from question where deleted=0 and id='${dih.delta.id}'"
deletedPkQuery="select id from question where deleted=1"
deltaQuery="select id from question where deleted=0 and updateTime > '${dih.last_index_time}'">
<field column="id" name="id"/>
<field column="classifyId" name="classifyId"/>
<field column="title" name="title"/>
<field column="digest" name="digest"/>
<field column="userId" name="userId"/>
<field column="answerNum" name="answerNum"/>
<field column="solved" name="solved"/>
<field column="closed" name="closed"/>
<field column="createTime" name="createTime"/>
</entity>
</document>
</dataConfig>
修改schema.xml,就是抄以前4.0的,mmseg4j用2.3.0的就不用mmseg4j-analysis-1.9.1-SNAPSHOT.jar这个包了
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="askSchema" version="1.6">
<types>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
<fieldtype name="boolean" class="solr.BoolField" sortMissingLast="true" omitNorms="true"/>
<fieldType name="integer" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="date" class="solr.TrieDateField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="text" class="solr.TextField" >
<analyzer>
<tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="id" type="integer" indexed="true" stored="true" required="true"/>
<field name="classifyId" type="integer" indexed="true" stored="true"/>
<field name="title" type="text" indexed="true" stored="true"/>
<field name="digest" type="text" indexed="true" stored="true"/>
<field name="userId" type="integer" indexed="true" stored="true"/>
<field name="answerNum" type="integer" indexed="true" stored="true"/>
<field name="solved" type="boolean" indexed="true" stored="true"/>
<field name="closed" type="boolean" indexed="true" stored="true"/>
<field name="createTime" type="date" indexed="false" stored="true"/>
<field name="am" type="text" indexed="true" stored="false" multiValued="true"/>
<field name="_version_" type="long" indexed="true" stored="false" />
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
</fields>
<copyField source="title" dest="am" />
<copyField source="digest" dest="am" />
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>title</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="AND"/>
</schema>
基本上就搞定了,启动tomcat,打开http://localhost:8080/solr/index.html
加一个core,ask
开始导数据建索引
看看导进去的数据分词成功了没有
基本上就搞定了,高级一点的看文档抄