进阶-第11__深度探秘搜索技术_案例实战基于tie_breaker参数优化dis_max搜索效果

 

1、搜索title或content中包含java beginner的帖子

GET /forum/article/_search

{

    "query": {

        "dis_max": {

            "queries": [

                { "match": { "title": "java beginner" }},

                { "match": { "body":  "java beginner" }}

            ]

        }

    }

}

结果:

{

  "took": 18,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 3,

    "max_score": 0.26742277,

    "hits": [

      {

        "_index": "forum",

        "_type": "article",

        "_id": "1",

        "_score": 0.26742277,

        "_source": {

          "articleID": "XHDK-A-1293-#fJ3",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-01",

          "tag": [

            "java",

            "hadoop"

          ],

          "tag_cnt": 2,

          "view_cnt": 30,

          "title": "this is java and elasticsearch blog",

          "content": "i like to write best elasticsearch article"

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "2",

        "_score": 0.19856805,

        "_source": {

          "articleID": "KDKE-B-9947-#kL5",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-02",

          "tag": [

            "java"

          ],

          "tag_cnt": 1,

          "view_cnt": 50,

          "title": "this is java blog",

          "content": "i think java is the best programming language"

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "4",

        "_score": 0.155468,

        "_source": {

          "articleID": "QQPX-R-3956-#aD8",

          "userID": 2,

          "hidden": true,

          "postDate": "2017-01-02",

          "tag": [

            "java",

            "elasticsearch"

          ],

          "tag_cnt": 2,

          "view_cnt": 80,

          "title": "this is java, elasticsearch, hadoop blog",

          "content": "elasticsearch and hadoop are all very good solution, i am a beginner"

        }

      }

    ]

  }

}

 

 

有些场景不是太好复现的,因为是这样,你需要尝试去构造不同的文本,然后去构造一些搜索出来,去达到你要的一个效果

 

可能在实际场景中出现的一个情况是这样的:

 

(1)某个帖子,doc1,title中包含java,content不包含java beginner任何一个关键词

(2)某个帖子,doc2,content中包含beginner,title中不包含任何一个关键词

(3)某个帖子,doc3,title中包含java,content中包含beginner

(4)最终搜索,可能出来的结果是,doc1和doc2排在doc3的前面,而不是我们期望的doc3排在最前面

 

dis_max,只是取分数最高的那个query的分数而已。如果这样说的话,(1)(2)(3)的最大的分数是一样的

 

2、dis_max只取某一个query最大的分数,完全不考虑其他query的分数

3、使用tie_breaker将其他query的分数也考虑进去

 

tie_breaker参数的意义,在于说,将其他query的分数,乘以tie_breaker,然后综合与最高分数的那个query的分数,综合在一起进行计算

除了取最高分以外,还会考虑其他的query的分数

tie_breaker的值,在0~1之间,是个小数,就ok

GET /forum/article/_search

{

    "query": {

        "dis_max": {

            "queries": [

                { "match": { "title": "java beginner" }},

                { "match": { "body":  "java beginner" }}

            ],

            "tie_breaker": 0.3

        }

    }

}

 

结果:

{

  "took": 10,

  "timed_out": false,

  "_shards": {

    "total": 5,

    "successful": 5,

    "failed": 0

  },

  "hits": {

    "total": 3,

    "max_score": 0.26742277,

    "hits": [

      {

        "_index": "forum",

        "_type": "article",

        "_id": "1",

        "_score": 0.26742277,

        "_source": {

          "articleID": "XHDK-A-1293-#fJ3",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-01",

          "tag": [

            "java",

            "hadoop"

          ],

          "tag_cnt": 2,

          "view_cnt": 30,

          "title": "this is java and elasticsearch blog",

          "content": "i like to write best elasticsearch article"

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "2",

        "_score": 0.19856805,

        "_source": {

          "articleID": "KDKE-B-9947-#kL5",

          "userID": 1,

          "hidden": false,

          "postDate": "2017-01-02",

          "tag": [

            "java"

          ],

          "tag_cnt": 1,

          "view_cnt": 50,

          "title": "this is java blog",

          "content": "i think java is the best programming language"

        }

      },

      {

        "_index": "forum",

        "_type": "article",

        "_id": "4",

        "_score": 0.155468,

        "_source": {

          "articleID": "QQPX-R-3956-#aD8",

          "userID": 2,

          "hidden": true,

          "postDate": "2017-01-02",

          "tag": [

            "java",

            "elasticsearch"

          ],

          "tag_cnt": 2,

          "view_cnt": 80,

          "title": "this is java, elasticsearch, hadoop blog",

          "content": "elasticsearch and hadoop are all very good solution, i am a beginner"

        }

      }

    ]

  }

}

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值