工作遇到的问题备忘

搜索系统出现过的一些问题

1. 已解决

http://192.168.239.128:8985/solr/pconline_cms/select?q.op=AND&useSynonym=true&fq=%2Bfilter%3A1+%2Bis_wap%3A0+AND+NOT+a_type%3Ay+AND+NOT+pathIds%3A000095064&facet=true&facet.field=cluster_category&facet.mincount=1&facet.limit=15&jsonq={%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22q%22%3A{%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22title%22%2C%22_st_%22%3A%22TEXT%22%2C%22bst%22%3A10%2C%22value%22%3A[%22%E4%B8%89%E6%98%9F%22]}}%2C{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22keyword%22%2C%22_st_%22%3A%22TEXT%22%2C%22bst%22%3A1.0E-12%2C%22value%22%3A[%22%E4%B8%89%E6%98%9F%22]}}%2C{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22text%22%2C%22_st_%22%3A%22TEXT%22%2C%22bst%22%3A1.0E-12%2C%22value%22%3A[%22%E4%B8%89%E6%98%9F%22]}}]}}%2C{%22q%22%3A{%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22title%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A10%2C%22s%22%3A0%2C%22value%22%3A[%22A5%22]}}%2C{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22keyword%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A1.0E-12%2C%22s%22%3A0%2C%22value%22%3A[%22A5%22]}}%2C{%22ope%22%3A%22SHOULD%22%2C%22q%22%3A{%22_f%22%3A%22text%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A1.0E-12%2C%22s%22%3A0%2C%22value%22%3A[%22A5%22]}}]}}]}&sort=score+desc%2Csort_field+desc&hl=true&hl.simple.pre=%3Cfont+color%3D%22%23e10900%22%3E&hl.simple.post=%3C%2Ffont%3E&hl.fl=title&hl.fl=summary&f.title.hl.alternateField=title&f.summary.hl.alternateField=summary&f.summary.hl.maxAlternateFieldLength=120&hl.fragsize=120&start=0&rows=10&fl=a_id%2Cpub_url%2Cpub_date%2Cguide_pic_url&enableElevation=true&elevateIncludes=7337441%2C7321201%2C7306601%2C7298420&forceElevation=true

elevateIncludes=7337441%2C7321201%2C7306601%2C7298420是置顶的文章id,结果接口报错

编码solr插件时出了bug。

原生的solr竞价排名是通过文件传参数,扩展了插件后可以通过solr接口传参数,提高灵活性。不幸存在bug。

2. 已解决,方案不完美。

搜索“华硕Z10PA-D8”时未能匹配出结果,究其原因是分词器的问题,把“pa”和“d”组合成新词“pad”,恰巧pad有同义词“百黛”,导致查询表达式出错。

下面是solr的分词器配置(只要把catenateAll和catenateWords的值置为0就能取消新词的组合):

    <fieldType name="text_cn" class="solr.TextField">
        <analyzer type="index">
                <tokenizer class="org.wltea.analyzer.henry.IKTokenizerFactory" useSmart="false" config="IKAnalyzer.cfg.xml" site="pconline" ignoreCase="true"/>
				<filter class="solr.WordDelimiterFilterFactory"
					catenateWords="1"
					preserveOriginal="1"
					generateWordParts="1"
					splitOnCaseChange="0"
					splitOnNumerics="0"
					catenateAll="1" />
				<!--<filter class="solr.LowerCaseFilterFactory"/> -->
        </analyzer>
        <analyzer type="query">
                <tokenizer class="org.wltea.analyzer.henry.IKTokenizerFactory" useSmart="false" config="IKAnalyzer.cfg.xml" site="pconline" ignoreCase="true"/>
				<filter class="solr.WordDelimiterFilterFactory"
					catenateWords="1"
					preserveOriginal="1"
					generateWordParts="1"
					splitOnCaseChange="0"
					splitOnNumerics="0"
					catenateAll="1" />
                <!--<filter class="solr.LowerCaseFilterFactory"/>-->
        </analyzer>
    </fieldType>

下面是分词结果:

IKT华硕z10pa-d8
WDF华硕z10pa-dpadpad8
可以看到IKT(ik分词器)没有分词错误,而WDF(WordDelimiterFilterFactory)把“pa”和“d”组成了新词“pad”

下面是solr查询接口:
http://192.168.239.192:8986/solr/pconline_product/select?q.op=AND&useSynonym=true&fl=title%2Csummary%2Cid%2Cpic%2Cseries_name%2Cseries_id%2Cpub_url%2Cdate%2Cprice%2Ceyp_count%2Chot_new%2CsubTitle&start=0&rows=24&hl=true&hl.fragsize=5000&hl.fl=title&hl.fl=summary&hl.simple.pre=%3Cem+class%3D%22red%22%3E&hl.simple.post=%3C%2Fem%3E&facet=true&facet.field=cluster_smalltype&facet.missing=false&facet.mincount=1&facet.limit=15&jsonq={%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22q%22%3A{%22_st_%22%3A%22BOOL%22%2C%22es%22%3A[{%22q%22%3A{%22_et%22%3A%22DISJUNCTION%22%2C%22_fs%22%3A{%22title%22%3A10%2C%22keyword%22%3A1.0E-5%2C%22params%22%3A1.0E-5}%2C%22_st_%22%3A%22EXPAND%22%2C%22_v%22%3A%22%E5%8D%8E%E7%A1%95%22}}%2C{%22q%22%3A{%22_st_%22%3A%22DJ_MAX%22%2C%22qs%22%3A[{%22_f%22%3A%22title%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A100%2C%22s%22%3A0%2C%22value%22%3A[%22Z10PA-D8%22]}%2C{%22_f%22%3A%22keyword%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A1.0E-11%2C%22s%22%3A0%2C%22value%22%3A[%22Z10PA-D8%22]}%2C{%22_f%22%3A%22params%22%2C%22_st_%22%3A%22PHRASE%22%2C%22bst%22%3A1.0E-11%2C%22s%22%3A0%2C%22value%22%3A[%22Z10PA-D8%22]}]%2C%22tb%22%3A0}}]}}]}&sort=score+desc%2Caccess+desc%2Cdate+desc&debug=true

下面是查询表达式:

+DisjunctionMaxQuery(
    (
        (params:华硕^1.0E-5 | title:华硕^10.0 | keyword:华硕^1.0E-5) |
        (params:asus^1.0E-5 | title:asus^10.0 | keyword:asus^1.0E-5) |
        (params:asus^1.0E-5 | title:asus^10.0 | keyword:asus^1.0E-5) |
        (params:asu^1.0E-5 | title:asu^10.0 | keyword:asu^1.0E-5) |
        (params:aus^1.0E-5 | title:aus^10.0 | keyword:aus^1.0E-5)
    )
) +DisjunctionMaxQuery(
    (
        spanNear([
            spanNear([title:z, title:10], 0, true),
            spanOr([title:pad, spanNear([title:百, title:黛], 0, true)]), title:8], 0, true)^100.0 |
        spanNear([
            spanNear([keyword:z, keyword:10], 0, true),
            spanOr([keyword:pad, spanNear([keyword:百, keyword:黛], 0, true)]), keyword:8], 0, true)^1.0E-11 |
        spanNear([
            spanNear([params:z, params:10], 0, true),
            spanOr([params:pad, spanNear([params:百, params:黛], 0, true)]), params:8], 0, true)^1.0E-11
    )
)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值