搜索系统出现过的一些问题
1. 已解决
http://192.168.239.128:8985/solr/pconline_cms/select?q.op=AND&useSynonym=true&fq=%2Bfilter%3A1+%2Bis_wap%3A0+AND+NOT+a_type%3Ay+AND+NOT+pathIds%3A000095064&facet=true&facet.field=cluster_category&facet.mincount=1&facet.limit=15&jsonq={%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22q%22%3A{%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22title%22%2C%22_st_%22%3A%22TEXT%22%2C%22bst%22%3A10%2C%22value%22%3A[%22%E4%B8%89%E6%98%9F%22]}}%2C{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22keyword%22%2C%22_st_%22%3A%22TEXT%22%2C%22bst%22%3A1.0E-12%2C%22value%22%3A[%22%E4%B8%89%E6%98%9F%22]}}%2C{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22text%22%2C%22_st_%22%3A%22TEXT%22%2C%22bst%22%3A1.0E-12%2C%22value%22%3A[%22%E4%B8%89%E6%98%9F%22]}}]}}%2C{%22q%22%3A{%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22title%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A10%2C%22s%22%3A0%2C%22value%22%3A[%22A5%22]}}%2C{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22keyword%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A1.0E-12%2C%22s%22%3A0%2C%22value%22%3A[%22A5%22]}}%2C{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22text%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A1.0E-12%2C%22s%22%3A0%2C%22value%22%3A[%22A5%22]}}]}}]}&sort=score+desc%2Csort_field+desc&hl=true&hl.simple.pre=%3Cfont+color%3D%22%23e10900%22%3E&hl.simple.post=%3C%2Ffont%3E&hl.fl=title&hl.fl=summary&f.title.hl.alternateField=title&f.summary.hl.alternateField=summary&f.summary.hl.maxAlternateFieldLength=120&hl.fragsize=120&start=0&rows=10&fl=a_id%2Cpub_url%2Cpub_date%2Cguide_pic_url&enableElevation=true&elevateIncludes=7337441%2C7321201%2C7306601%2C7298420&forceElevation=true
elevateIncludes=7337441%2C7321201%2C7306601%2C7298420是置顶的文章id,结果接口报错
编码solr插件时出了bug。
原生的solr竞价排名是通过文件传参数,扩展了插件后可以通过solr接口传参数,提高灵活性。不幸存在bug。
2. 已解决,方案不完美。
搜索“华硕Z10PA-D8”时未能匹配出结果,究其原因是分词器的问题,把“pa”和“d”组合成新词“pad”,恰巧pad有同义词“百黛”,导致查询表达式出错。
下面是solr的分词器配置(只要把catenateAll和catenateWords的值置为0就能取消新词的组合):
<fieldType name="text_cn" class="solr.TextField">
<analyzer type="index">
<tokenizer class="org.wltea.analyzer.henry.IKTokenizerFactory" useSmart="false" config="IKAnalyzer.cfg.xml" site="pconline" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory"
catenateWords="1"
preserveOriginal="1"
generateWordParts="1"
splitOnCaseChange="0"
splitOnNumerics="0"
catenateAll="1" />
<!--<filter class="solr.LowerCaseFilterFactory"/> -->
</analyzer>
<analyzer type="query">
<tokenizer class="org.wltea.analyzer.henry.IKTokenizerFactory" useSmart="false" config="IKAnalyzer.cfg.xml" site="pconline" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory"
catenateWords="1"
preserveOriginal="1"
generateWordParts="1"
splitOnCaseChange="0"
splitOnNumerics="0"
catenateAll="1" />
<!--<filter class="solr.LowerCaseFilterFactory"/>-->
</analyzer>
</fieldType>
下面是分词结果:
IKT | 华硕 | z | 10 | pa-d | 8 | |||
WDF | 华硕 | z | 10 | pa-d | pa | d | pad | 8 |
下面是solr查询接口:
http://192.168.239.192:8986/solr/pconline_product/select?q.op=AND&useSynonym=true&fl=title%2Csummary%2Cid%2Cpic%2Cseries_name%2Cseries_id%2Cpub_url%2Cdate%2Cprice%2Ceyp_count%2Chot_new%2CsubTitle&start=0&rows=24&hl=true&hl.fragsize=5000&hl.fl=title&hl.fl=summary&hl.simple.pre=%3Cem+class%3D%22red%22%3E&hl.simple.post=%3C%2Fem%3E&facet=true&facet.field=cluster_smalltype&facet.missing=false&facet.mincount=1&facet.limit=15&jsonq={%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22q%22%3A{%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22q%22%3A{%22_et%22%3A%22DISJUNCTION%22%2C%22_fs%22%3A{%22title%22%3A10%2C%22keyword%22%3A1.0E-5%2C%22params%22%3A1.0E-5}%2C%22_st_%22%3A%22EXPAND%22%2C%22_v%22%3A%22%E5%8D%8E%E7%A1%95%22}}%2C{%22q%22%3A{%22_st_%22%3A%22DJ_MAX%22%2C%22qs%22%3A[{%22_f%22%3A%22title%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A100%2C%22s%22%3A0%2C%22value%22%3A[%22Z10PA-D8%22]}%2C{%22_f%22%3A%22keyword%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A1.0E-11%2C%22s%22%3A0%2C%22value%22%3A[%22Z10PA-D8%22]}%2C{%22_f%22%3A%22params%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A1.0E-11%2C%22s%22%3A0%2C%22value%22%3A[%22Z10PA-D8%22]}]%2C%22tb%22%3A0}}]}}]}&sort=score+desc%2Caccess+desc%2Cdate+desc&debug=true
下面是查询表达式:
+DisjunctionMaxQuery(
(
(params:华硕^1.0E-5 | title:华硕^10.0 | keyword:华硕^1.0E-5) |
(params:asus^1.0E-5 | title:asus^10.0 | keyword:asus^1.0E-5) |
(params:asus^1.0E-5 | title:asus^10.0 | keyword:asus^1.0E-5) |
(params:asu^1.0E-5 | title:asu^10.0 | keyword:asu^1.0E-5) |
(params:aus^1.0E-5 | title:aus^10.0 | keyword:aus^1.0E-5)
)
) +DisjunctionMaxQuery(
(
spanNear([
spanNear([title:z, title:10], 0, true),
spanOr([title:pad, spanNear([title:百, title:黛], 0, true)]), title:8], 0, true)^100.0 |
spanNear([
spanNear([keyword:z, keyword:10], 0, true),
spanOr([keyword:pad, spanNear([keyword:百, keyword:黛], 0, true)]), keyword:8], 0, true)^1.0E-11 |
spanNear([
spanNear([params:z, params:10], 0, true),
spanOr([params:pad, spanNear([params:百, params:黛], 0, true)]), params:8], 0, true)^1.0E-11
)
)