Solr之自动聚类。

        Solr使用Carrot2完成了聚类功能,能够把检索到的内容自动分类,Carrot2聚类示例:

        要想Solr支持聚类功能,首先要把Solr发行包中的dist/solr-clustering-4.2.0.jar,复制到\solr\contrib\analysis-extras\lib下,然后打开sonlrconfig.xml进行添加配置:

<searchComponent name = "clustering" enable = "${solr.clustering.enabled:true}" class = "solr.clustering.ClusteringComponent">

<lst name = "engine">

<str name = "name">default</str>

<str name = "carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>

<str name = "LingoClusteringAlgorithm.desiredClusterCountBase">30</str> <!-- 2~100 -->

<str name = "LingoClusteringAlgorithm.clusterMergingThreshold">0.70</str> <!-- 0~1 -->

<str name = "LingoClusteringAlgorithm.scoreWeight">0</str> <!-- 0~1 -->

<str name = "LingoClusteringAlgorithm.labelAssigner">org.carrot2.clustering.lingo.SimpleLabelAssigner</str> <! -- org.carrot2.clustering.lingo.UniqueLabelAssigner -->

<str name = "LingoClusteringAlgorithm.phraseLabelBoost">1.5</str> <!-- 0~10 -->

<str name = "LingoClusteringAlgorithm.phraseLengthPenaltyStart">8</str> <!-- 2~8 -->

<str name = "LingoClusteringAlgorithm.phraseLengthPenaltyShop">8</str> <!-- 2~8 -->

<str name = "TermDocumentMatrixReducer.factorizationQuality">HIGH</str> <!-- LOW,MEDIUM,HIGH -->
.......

</lst>

</searchComponent>

         配好了聚类组件后,下面配置requestHandler:

<requestHandler name = "/clustering" startup = "lazy" enable = "${solr.clustering.enabled:true}" class = "solr.SearchHandler">

<lst name = "default">

<str name = "echoParams">explicit</str>

<bool name = "clustering">true</bool>

<str name = "clustering.engine">default</str>

<bool name = "clustering.results">default</bool>

<str name = "carrot.title">category_s</str>

<str name = "carrot.snippet">content</str>

<str name = "carrot.url">path</str>

<str name = "carrot.produceSummary">true</str>

</lst>

<arr name = "last-components">

<str>clustering</str>

</arr>

</requestHandler>

        有两个参数要注意carrot.title,carrot.snippet是聚类的比较计算字段,这两个参数必须是stored="true",carrot.title的权重要高于carrot.snippet,如果只有一个做计算的字段carrot.snippet可以去掉(是去掉不是值为空)。设完了用下面的URL就可以查询了。

http://localhost:8080/skyCore/clustering?q=*3A*&wt=xml&indent=true

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值