解释执行语句
GET /_validate/query?explain
{
"query": {
"multi_match": {
"query": "周杰伦",
"type": "most_fields",
"fields": ["singer","wordAuthor"]
}
}
}
返回结果
{
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"valid": true,
"explanations": [
{
"index": ".kibana",
"valid": true,
"explanation": """(MatchNoDocsQuery("unmapped field [singer]") | MatchNoDocsQuery("unmapped field [wordAuthor]"))~1.0"""
},
{
"index": "music",
"valid": true,
"explanation": "((singer:周杰伦 singer:周杰 singer:伦) | (wordAuthor:周杰伦 wordAuthor:周杰 wordAuthor:伦))~1.0"
}
]
}
获取查询结果评分是如果计算出来的
GET /music/doc/_search
{
"query": {
"match_phrase": {
"singer": "周杰伦"
}
},
"explain": true
}
"_explanation": {
"value": 17.777544,
"description": """weight(singer:"周杰伦 周杰 伦" in 7695) [PerFieldSimilarity], result of:""",
"details": [
{
"value": 17.777544,
"description": "score(doc=7695,freq=1.0 = phraseFreq=1.0\n), product of:",
"details": [
{
"value": 19.915054,
"description": "idf(), sum of:",
"details": [
{
"value": 7.076377,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 30,
"description": "docFreq",
"details": []
},
{
"value": 36101,
"description": "docCount",
"details": []
}
]
},
{
"value": 7.076377,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 30,
"description": "docFreq",
"details": []
},
{
"value": 36101,
"description": "docCount",
"details": []
}
]
},
{
"value": 5.7623005,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 113,
"description": "docFreq",
"details": []
},
{
"value": 36101,
"description": "docCount",
"details": []
}
]
}
]
},
{
"value": 0.8926686,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "phraseFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 2.3185508,
"description": "avgFieldLength",
"details": []
},
{
"value": 3,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
评分算法为BM25算法,BM25是概率性相关的算法,可以认为是给定文档和查询匹配的概率,解释分为两部分,
第一部分是IDF。计算公式为
log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
docCount为文档所在的shard的文档个数,
docFreq为该词在shard中出现的次数。各个词计算后相加。
第二部分为:
TFnorm,计算公式为
(freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength))
freq为词在文档中出现的次数,
k1为词频对结果的影响程度,默认为1.2。
b为文档篇幅对结果的影响程度,默认为0.75
参考https://blog.csdn.net/hellozhxy/article/details/89387550
一篇文档为啥没被查询到
GET /music/doc/52892/_explain
{
"query": {
"match_phrase": {
"singer": "周杰伦"
}
}
}
未完待续