Elasticsearch学习笔记 - 11: _score的实例测试

#准备 
/PUT {{host}}:{{port}}/demo
{
    "mappings":{
        "article":{
            "properties":{
                "content":{
                    "type":"text"
                }
            }
        }
    }
}
#导入数据
[
  {
    "content": "测试语句1"
  },
  {
    "content": "测试语句2"
  },
  {
    "content": "测试语句3,字段长度不同"
  }
]
#查询
/POST {{host}}:{{port}}/demo/article/_search
{
    "query":{
        "match":{
            "content":"测"
        }
    }
}
#测试结果:
{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 0.2824934,
        "hits": [
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ90700f4t28Wzjdj",
                "_score": 0.2824934,
                "_source": {
                    "content": "测试语句2"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ71f00f4t28WzjZT",
                "_score": 0.21247853,
                "_source": {
                    "content": "测试语句1"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIRAEw00f4t28Wzjkd",
                "_score": 0.1293895,
                "_source": {
                    "content": "测试语句3,字段长度不同"
                }
            }
        ]
    }
}

奇怪的是,按照语句1和语句2的分数居然不同!因为他们两个文档的关键参数,词频,字段长度,逆向文档频率均相同,为什么算出来的分不同呢?

原因主要是因为 每个分片会根据 该分片内的所有文档计算一个本地 IDF 。而文档落在不同的分片就会导致逆向文档频率不同,算出来的分数也不同。

官方的解释

当文档数量比较大,分片分布均匀后,这个问题基本不会影响很大。那么在我们这个demo中使用添加

?search_type=dfs_query_then_fetch

来查询所有的idf

/POST {{host}}:{{port}}/demo/article/_search?search_type=dfs_query_then_fetch
{
    "query":{
        "match":{
            "content":"测"
        }
    }
}
#测试结果:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 0.14899126,
        "hits": [
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ71f00f4t28WzjZT",
                "_score": 0.14899126,
                "_source": {
                    "content": "测试语句1"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ90700f4t28Wzjdj",
                "_score": 0.14899126,
                "_source": {
                    "content": "测试语句2"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIRAEw00f4t28Wzjkd",
                "_score": 0.087505676,
                "_source": {
                    "content": "测试语句3,字段长度不同"
                }
            }
        ]
    }
}

可以看到,评分如我们所想得,文档1和2分数相同,而文档3因为长度更长,导致分数更低。

继续测试查询时权重的影响

/POST {{host}}:{{port}}/demo/article/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "content": {
              "query": "1",
              "boost": 2 
            }
          }
        },
        {
          "match": { 
            "content": "2"
          }
        }
      ]
    }
  }
}
#测试结果:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 2.1887734,
        "hits": [
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ71f00f4t28WzjZT",
                "_score": 2.1887734,
                "_source": {
                    "content": "测试语句1"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ90700f4t28Wzjdj",
                "_score": 1.0943867,
                "_source": {
                    "content": "测试语句2"
                }
            }
        ]
    }
}

可以看到,由于给予搜索关键字1更高的权重,因此文档1的分数比文档2分数要高,具体细节可以通过?explain查看。

其他更改评分的方法

  • 按受欢迎度提升权重
  • 过滤集提升权重
  • 随机评分
  • 越近越好
  • 脚本评分
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值