Apache SOLR and Carrot2 integration strategies 1

最新推荐文章于 2018-03-20 15:11:00 发布

ylzhjlinux

最新推荐文章于 2018-03-20 15:11:00 发布

阅读量107

点赞数

分类专栏： Solr Lucene Carrot 文章标签：人工智能 git java

本文链接：https://blog.csdn.net/ylzhjlinux/article/details/84682788

版权

Solr 同时被 3 个专栏收录

41 篇文章 0 订阅

订阅专栏

Lucene

9 篇文章 0 订阅

订阅专栏

Carrot

3 篇文章 0 订阅

订阅专栏

deploy carrot2-webapp

1. download soucre code

#git clone git://github.com/carrot2/carrot2.git

2.compile

#cd carrot2

#ant webapp

3.deploy

#cp tmp/webapp/carrot2-webapp.war /path/to/tomcat/webapps

4.configure carrot2

#cd /path/to/tomcat/webapps/carrot2-webapp/WEB-INF/suites

#mv suite-webapp.xml suite-webapp.xml.old

#cp source-solr.xml suite-webapp.xml

alter it like this:

<component-suite>
  <sources>
    <source component-class="org.carrot2.source.solr.SolrDocumentSource" id="solr"
            attribute-sets-resource="source-solr-attributes.xml">
      <label>Solr</label>
      <title>Solr Search Engine</title>
      <icon-path>icons/solr.png</icon-path>
      <mnemonic>s</mnemonic>
      <description>Solr document source queries an instance of Apache Solr search engine.</description>
      <example-queries>
        <example-query>test</example-query>
        <example-query>solr</example-query>
      </example-queries>
    </source>
  </sources>

  <include suite="algorithm-lingo.xml"></include>

</component-suite>

4. edit source-solr-attributes.xml

<attribute-sets default="overridden-attributes">
  <attribute-set id="overridden-attributes">
    <value-set>
      <label>overridden-attributes</label>
      <attribute key="SolrDocumentSource.serviceUrlBase">
        <value type="java.lang.String" value="http://192.168.10.204:8983/inokarticle/clustering"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrSummaryFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrTitleFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
    </value-set>
  </attribute-set>
</attribute-sets>

5. edit algorithm-lingo-attributes.xml algorithm-lingo.xml

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

integrate with solr

1. configure solrconfig.xml

a. import related jars

  <lib dir="../contrib/clustering/lib/" regex=".*\.jar" />
  <lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />

b. add component adn clustering requesthandler

<searchComponent name="clustering"
                   enable="true"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">lingo</str>
      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
      <str name="carrot.resourcesDir">clustering/carrot2</str>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
      <str name="PreprocessingPipeline.tokenizerFactory">org.carrot2.text.linguistic.DefaultTokenizerFactory</str>

      </lst>

  </searchComponent>
<requestHandler name="/clustering"
                  startup="lazy"
                  enable="true"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">lingo</str>
      <bool name="clustering.results">true</bool>
      <!-- Field name with the logical "title" of a each document (optional) -->
      <str name="carrot.title">content</str>
      <!-- Field name with the logical "URL" of a each document (optional) -->
      <str name="carrot.url">id</str>
      <!-- Field name with the logical "content" of a each document (optional) -->
      <str name="carrot.snippet">content</str>
      <!-- Apply highlighter to the title/ content and use this for clustering. -->
      <bool name="carrot.produceSummary">true</bool>
      <!-- the maximum number of labels per cluster -->
      <int name="carrot.numDescriptions">5</int>
      <!-- produce sub clusters -->
      <bool name="carrot.outputSubClusters">true</bool>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>

      <!-- Configure the remaining request handler parameters. -->
      <str name="defType">edismax</str>
      <str name="q.alt">*:*</str>
      <str name="rows">10</str>
      <str name="fl">*,score</str>
    </lst>
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

2.custom chinese tokenizer for clustering

a. modify related carrot souce code and recompile

b. copy related jars and lexicon to solr web lib dir

Details see Apache SOLR and Carrot2 integration strategies 2

References

http://wiki.apache.org/solr/ClusteringComponent

http://www.cnblogs.com/cy163/archive/2010/05/07/1730172.html

http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html

http://download.carrot2.org/head/manual/index.html#section.advanced-topics.building-from-source-code

http://www.cnblogs.com/shm10/p/3700604.html

ylzhjlinux

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Apache SOLR and Carrot2 integration strategies 1

deploy carrot2-webapp1. download soucre code#git clone git://github.com/carrot2/carrot2.git 2.compile#cd carrot2#ant webapp 3.deploy#cp tmp/webapp/carrot2-webapp.war /path/to/tom...
复制链接

扫一扫