[ElasticSearch] explain使用方式与结果解析

小鱼收藏夹

已于 2023-04-14 11:49:14 修改

阅读量1.4k

点赞数 1

文章标签： elasticsearch 大数据搜索引擎

于 2023-04-14 11:48:35 首次发布

本文链接：https://blog.csdn.net/weixin_42484190/article/details/130150147

版权

ElasticSearch的explainAPI用于解析搜索查询的评分过程，提供详细信息包括文档得分的计算细节，如boost、idf和tf等指标。使用时可选择在搜索请求中开启explain参数或针对特定文档ID进行查询。由于返回的JSON体较大，可能影响性能，故推荐仅在排查问题时使用。

摘要由CSDN通过智能技术生成

一、使用方式

ElasticSearch提供explain API用来对搜索进行解析，使用方式多为以下2种；

1.搜索同时开启explain参数,这时搜索结果每一条命中的记录的信息都会被扩展

GET /my-index/_search

{
    "explain":true,
    "query":{
        "match":{
            "title":"文明"
        }
    },
    "size":10,
    "from":0,
    "sort":[

    ]
}

2.专项对某个文档explain查询结果影响(此处0为文档id)

GET /my-index-000001/_explain/0

{
    "query":{
        "match":{
            "title":"文明"
        }
    }
}

二、explain返回体解析

ElasticSearch根据版本或配置会使用不同的评分方式，所以explain Api的explanation信息公式会有不同；

但是无论什么,只需记住explain Api是为了解析搜索评分具体情况而生的就可以了；

例如上方我们对搜索进行我们搜索"文明"，分词器为strandard,可以看到details将该词分解为'文','明'的两部分评分详情并描述了对其sum of:

在此例中,我们可以看weight对应的details即为所的评分的详情信息;

而每层对应所需的指标都嵌套在其details属性中;

这样对每项的计算都被包含在explain中,可以依据此对评分进行清晰的排查和计算

此json中可以看到score的公式,已经标明：boost * idf * tf；即指定boost*idf(词项稀有程度的指标)*tf(文档中词项出现频率的指标)即可获得评分

{
    "value":4.3822603,
    "description":"sum of:",
    "details":[
        {
            "value":2.1578245,
            "description":"weight(to:文 in 178) [PerFieldSimilarity], result of:",
            "details":[
                {
                    "value":2.1578245,
                    "description":"score(freq=1.0), computed as boost * idf * tf from:",
                    "details":[
                        {
                            "value":2.2,
                            "description":"boost",
                            "details":[

                            ]
                        },
                        {
                            "value":1.3599529,
                            "description":"idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                            "details":[
                                {
                                    "value":413,
                                    "description":"n, number of documents containing term",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1610,
                                    "description":"N, total number of documents with field",
                                    "details":[

                                    ]
                                }
                            ]
                        },
                        {
                            "value":0.721223,
                            "description":"tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                            "details":[
                                {
                                    "value":1,
                                    "description":"freq, occurrences of term within document",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1.2,
                                    "description":"k1, term saturation parameter",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":0.75,
                                    "description":"b, length normalization parameter",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":104,
                                    "description":"dl, length of field (approximate)",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1081.6583,
                                    "description":"avgdl, average length of field",
                                    "details":[

                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        },
        {
            "value":2.224436,
            "description":"weight(to:明 in 178) [PerFieldSimilarity], result of:",
            "details":[
                {
                    "value":2.224436,
                    "description":"score(freq=1.0), computed as boost * idf * tf from:",
                    "details":[
                        {
                            "value":2.2,
                            "description":"boost",
                            "details":[

                            ]
                        },
                        {
                            "value":1.4019344,
                            "description":"idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                            "details":[
                                {
                                    "value":396,
                                    "description":"n, number of documents containing term",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1610,
                                    "description":"N, total number of documents with field",
                                    "details":[

                                    ]
                                }
                            ]
                        },
                        {
                            "value":0.721223,
                            "description":"tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                            "details":[
                                {
                                    "value":1,
                                    "description":"freq, occurrences of term within document",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1.2,
                                    "description":"k1, term saturation parameter",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":0.75,
                                    "description":"b, length normalization parameter",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":104,
                                    "description":"dl, length of field (approximate)",
                                    "details":[

                                    ]
                                },
                                {
                                    "value":1081.6583,
                                    "description":"avgdl, average length of field",
                                    "details":[

                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}