Elasticsearch学习笔记 - 11: _score的实例测试

最新推荐文章于 2024-01-17 08:15:00 发布

我请你们喝三鹿

最新推荐文章于 2024-01-17 08:15:00 发布

阅读量185

点赞数 1

分类专栏： ElasticSearch

本文链接：https://blog.csdn.net/u011682283/article/details/87517846

版权

ElasticSearch 专栏收录该内容

12 篇文章 4 订阅

订阅专栏

#准备 
/PUT {{host}}:{{port}}/demo
{
    "mappings":{
        "article":{
            "properties":{
                "content":{
                    "type":"text"
                }
            }
        }
    }
}

#导入数据
[
  {
    "content": "测试语句1"
  },
  {
    "content": "测试语句2"
  },
  {
    "content": "测试语句3，字段长度不同"
  }
]

#查询
/POST {{host}}:{{port}}/demo/article/_search
{
    "query":{
        "match":{
            "content":"测"
        }
    }
}

#测试结果：
{
    "took": 0,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 0.2824934,
        "hits": [
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ90700f4t28Wzjdj",
                "_score": 0.2824934,
                "_source": {
                    "content": "测试语句2"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ71f00f4t28WzjZT",
                "_score": 0.21247853,
                "_source": {
                    "content": "测试语句1"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIRAEw00f4t28Wzjkd",
                "_score": 0.1293895,
                "_source": {
                    "content": "测试语句3，字段长度不同"
                }
            }
        ]
    }
}

奇怪的是，按照语句1和语句2的分数居然不同！因为他们两个文档的关键参数，词频，字段长度，逆向文档频率均相同，为什么算出来的分不同呢？

原因主要是因为 每个分片会根据该分片内的所有文档计算一个本地 IDF 。而文档落在不同的分片就会导致逆向文档频率不同，算出来的分数也不同。

官方的解释

当文档数量比较大，分片分布均匀后，这个问题基本不会影响很大。那么在我们这个demo中使用添加

？search_type=dfs_query_then_fetch

来查询所有的idf。

/POST {{host}}:{{port}}/demo/article/_search?search_type=dfs_query_then_fetch
{
    "query":{
        "match":{
            "content":"测"
        }
    }
}

#测试结果：

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 0.14899126,
        "hits": [
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ71f00f4t28WzjZT",
                "_score": 0.14899126,
                "_source": {
                    "content": "测试语句1"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ90700f4t28Wzjdj",
                "_score": 0.14899126,
                "_source": {
                    "content": "测试语句2"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIRAEw00f4t28Wzjkd",
                "_score": 0.087505676,
                "_source": {
                    "content": "测试语句3，字段长度不同"
                }
            }
        ]
    }
}

可以看到，评分如我们所想得，文档1和2分数相同，而文档3因为长度更长，导致分数更低。

继续测试查询时权重的影响

/POST {{host}}:{{port}}/demo/article/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "content": {
              "query": "1",
              "boost": 2 
            }
          }
        },
        {
          "match": { 
            "content": "2"
          }
        }
      ]
    }
  }
}

#测试结果：

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 2.1887734,
        "hits": [
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ71f00f4t28WzjZT",
                "_score": 2.1887734,
                "_source": {
                    "content": "测试语句1"
                }
            },
            {
                "_index": "demo",
                "_type": "article",
                "_id": "AWEIQ90700f4t28Wzjdj",
                "_score": 1.0943867,
                "_source": {
                    "content": "测试语句2"
                }
            }
        ]
    }
}

可以看到，由于给予搜索关键字1更高的权重，因此文档1的分数比文档2分数要高，具体细节可以通过?explain查看。

其他更改评分的方法

按受欢迎度提升权重
过滤集提升权重
随机评分
越近越好
脚本评分

我请你们喝三鹿

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch学习笔记 - 11: _score的实例测试

#准备 /PUT {{host}}:{{port}}/demo{ "mappings":{ "article":{ "properties":{ "content":{ "type":"text" } }
复制链接

扫一扫

专栏目录