ElasticSearch--解释查询

最新推荐文章于 2024-01-17 08:15:00 发布

BtWangZhi

最新推荐文章于 2024-01-17 08:15:00 发布

阅读量298

点赞数

分类专栏： ElasticSearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/BtWangZhi/article/details/91359445

版权

ElasticSearch 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

解释执行语句

GET /_validate/query?explain
{
  "query": {
    "multi_match": {
      "query": "周杰伦",
      "type": "most_fields", 
      "fields": ["singer","wordAuthor"]
    }
  }
}

返回结果

{
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": ".kibana",
      "valid": true,
      "explanation": """(MatchNoDocsQuery("unmapped field [singer]") | MatchNoDocsQuery("unmapped field [wordAuthor]"))~1.0"""
    },
    {
      "index": "music",
      "valid": true,
      "explanation": "((singer:周杰伦 singer:周杰 singer:伦) | (wordAuthor:周杰伦 wordAuthor:周杰 wordAuthor:伦))~1.0"
    }
  ]
}

获取查询结果评分是如果计算出来的

GET /music/doc/_search
{
  "query": {
    "match_phrase": {
      "singer": "周杰伦"
    }
  },
  "explain": true
}

"_explanation": {
          "value": 17.777544,
          "description": """weight(singer:"周杰伦 周杰 伦" in 7695) [PerFieldSimilarity], result of:""",
          "details": [
            {
              "value": 17.777544,
              "description": "score(doc=7695,freq=1.0 = phraseFreq=1.0\n), product of:",
              "details": [
                {
                  "value": 19.915054,
                  "description": "idf(), sum of:",
                  "details": [
                    {
                      "value": 7.076377,
                      "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                      "details": [
                        {
                          "value": 30,
                          "description": "docFreq",
                          "details": []
                        },
                        {
                          "value": 36101,
                          "description": "docCount",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 7.076377,
                      "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                      "details": [
                        {
                          "value": 30,
                          "description": "docFreq",
                          "details": []
                        },
                        {
                          "value": 36101,
                          "description": "docCount",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 5.7623005,
                      "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
                      "details": [
                        {
                          "value": 113,
                          "description": "docFreq",
                          "details": []
                        },
                        {
                          "value": 36101,
                          "description": "docCount",
                          "details": []
                        }
                      ]
                    }
                  ]
                },
                {
                  "value": 0.8926686,
                  "description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
                  "details": [
                    {
                      "value": 1,
                      "description": "phraseFreq=1.0",
                      "details": []
                    },
                    {
                      "value": 1.2,
                      "description": "parameter k1",
                      "details": []
                    },
                    {
                      "value": 0.75,
                      "description": "parameter b",
                      "details": []
                    },
                    {
                      "value": 2.3185508,
                      "description": "avgFieldLength",
                      "details": []
                    },
                    {
                      "value": 3,
                      "description": "fieldLength",
                      "details": []
                    }
                  ]
                }
              ]
            }
          ]
        }

评分算法为BM25算法，BM25是概率性相关的算法，可以认为是给定文档和查询匹配的概率，解释分为两部分，
第一部分是IDF。计算公式为

log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))

docCount为文档所在的shard的文档个数，
docFreq为该词在shard中出现的次数。各个词计算后相加。
第二部分为：
TFnorm,计算公式为

(freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength))

freq为词在文档中出现的次数，
k1为词频对结果的影响程度，默认为1.2。
b为文档篇幅对结果的影响程度，默认为0.75
参考https://blog.csdn.net/hellozhxy/article/details/89387550

一篇文档为啥没被查询到

GET /music/doc/52892/_explain
{
  "query": {
    "match_phrase": {
      "singer": "周杰伦"
    }
  }
}

未完待续

BtWangZhi

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录