项目场景:
项目中有个查询订单汇总数据场景,根据电站分组汇总昨日的充电量
问题描述
订单存储在ES,数据量1.2T,单日40W条订单,最近查询昨日电站汇总数据报错search_phase_execution_exception all shards failed
接口返回异常:UT005023: Exception handling request to xx path
{
"error": {
"root_cause": [
{
"type": "task_cancelled_exception",
"reason": "cancelled"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "v_order_history",
"node": "y8heGo8fTE6FYtGO73qkZw",
"reason": {
"type": "task_cancelled_exception",
"reason": "cancelled"
}
}
]
},
"status": 500
}
原因分析:
查了网上类似异常分析的总结:
-
检查ES集群状态:通过HTTP API或Kibana Console检查ES集群状态,确保它处于正常运行状态。
-
查看ES日志: 在ES服务器上查看日志,分析错误信息,快速定位问题并进行处理。
-
恢复数据损坏: 如果数据被损坏,可以通过替换受影响的节点,更新索引或还原备份来恢复数据。
-
增加资源:如果ES集群容量不足,考虑增加节点或硬件资源,以缓解瓶颈并提高吞吐量。
以上原因都不是,
罪魁祸首:脚本问题,聚合查询返回的数据量太大,导致查询异常
# 根据指定条件分组汇总数据,并根据汇总的数据进行排序取前10条
GET xxx/_search
{
"query": { #根据查询条件过滤汇总的数据范围
"bool": {
"must": [
{
"terms": {
"xxx": [
"10"
],
"boost": 1.0
}
},
{
"range": {
"xxx.keyword": {
"from": "2023-05-16",
"to": "2023-05-16",
"include_lower": true,
"include_upper": true,
"format": "yyyy-MM-dd",
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"all_group": {
"terms": {
"field": "xxx", #group by 的字段
"size": 5000, #查询前5000
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [#排序
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"xxx": { #要汇总的字段
"sum": {
"field": "xxx"
}
}
}
}
}
}
解决方案:
***优化脚本***
GET xxx/_search
{
"size": 0, # 优化点:不需要返回明细数据,增加size=0
"query": { #根据查询条件过滤汇总的数据范围
"bool": {
"must": [
{
"terms": {
"xxx": [
"10"
],
"boost": 1.0
}
},
{
"range": {
"xxx.keyword": {
"from": "2023-05-16",
"to": "2023-05-16",
"include_lower": true,
"include_upper": true,
"format": "yyyy-MM-dd",
"boost": 1.0
}
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"aggregations": {
"all_group": {
"terms": {
"field": "xxx", #group by 的字段
"size": 10, #优化点:查询前10
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [#排序
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"xxx": { #要汇总的字段
"sum": {
"field": "xxx"
}
}
}
}
}
}
查询结果秒级出来