一、使用方式
ElasticSearch提供explain API用来对搜索进行解析,使用方式多为以下2种;
1.搜索同时开启explain参数,这时搜索结果每一条命中的记录的信息都会被扩展
GET /my-index/_search
{
"explain":true,
"query":{
"match":{
"title":"文明"
}
},
"size":10,
"from":0,
"sort":[
]
}
2.专项对某个文档explain查询结果影响(此处0为文档id)
GET /my-index-000001/_explain/0
{
"query":{
"match":{
"title":"文明"
}
}
}
二、explain返回体解析
ElasticSearch根据版本或配置会使用不同的评分方式,所以explain Api的explanation信息公式会有不同;
但是无论什么,只需记住explain Api是为了解析搜索评分具体情况而生的就可以了;
例如上方我们对搜索进行我们搜索"文明",分词器为strandard,可以看到details将该词分解为'文','明'的两部分评分详情并描述了对其sum of:
在此例中,我们可以看weight对应的details即为所的评分的详情信息;
而每层对应所需的指标都嵌套在其details属性中;
这样对每项的计算都被包含在explain中,可以依据此对评分进行清晰的排查和计算
此json中可以看到score的公式,已经标明:boost * idf * tf; 即指定boost*idf(词项稀有程度的指标)*tf(文档中词项出现频率的指标)即可获得评分
{
"value":4.3822603,
"description":"sum of:",
"details":[
{
"value":2.1578245,
"description":"weight(to:文 in 178) [PerFieldSimilarity], result of:",
"details":[
{
"value":2.1578245,
"description":"score(freq=1.0), computed as boost * idf * tf from:",
"details":[
{
"value":2.2,
"description":"boost",
"details":[
]
},
{
"value":1.3599529,
"description":"idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details":[
{
"value":413,
"description":"n, number of documents containing term",
"details":[
]
},
{
"value":1610,
"description":"N, total number of documents with field",
"details":[
]
}
]
},
{
"value":0.721223,
"description":"tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details":[
{
"value":1,
"description":"freq, occurrences of term within document",
"details":[
]
},
{
"value":1.2,
"description":"k1, term saturation parameter",
"details":[
]
},
{
"value":0.75,
"description":"b, length normalization parameter",
"details":[
]
},
{
"value":104,
"description":"dl, length of field (approximate)",
"details":[
]
},
{
"value":1081.6583,
"description":"avgdl, average length of field",
"details":[
]
}
]
}
]
}
]
},
{
"value":2.224436,
"description":"weight(to:明 in 178) [PerFieldSimilarity], result of:",
"details":[
{
"value":2.224436,
"description":"score(freq=1.0), computed as boost * idf * tf from:",
"details":[
{
"value":2.2,
"description":"boost",
"details":[
]
},
{
"value":1.4019344,
"description":"idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details":[
{
"value":396,
"description":"n, number of documents containing term",
"details":[
]
},
{
"value":1610,
"description":"N, total number of documents with field",
"details":[
]
}
]
},
{
"value":0.721223,
"description":"tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details":[
{
"value":1,
"description":"freq, occurrences of term within document",
"details":[
]
},
{
"value":1.2,
"description":"k1, term saturation parameter",
"details":[
]
},
{
"value":0.75,
"description":"b, length normalization parameter",
"details":[
]
},
{
"value":104,
"description":"dl, length of field (approximate)",
"details":[
]
},
{
"value":1081.6583,
"description":"avgdl, average length of field",
"details":[
]
}
]
}
]
}
]
}
]
}
三、搜索时开启explain后返回体变化
可以看到在开启explain语句后,多出了
- shard: 文档所在分片
- node:分片所在节点
- explanation: 查询分析信息
四、explain注意事项
个人理解
- explain最好用于排查评分时使用,而不是常规的使用方式,否则冗长的json会影响IO性能;
- 使用explain时,最好是单独对某个文档进行排查也就是使用方式2进行查看,若使用方法1且命中文档过多,结果形式不易查看;