Elasticsearch聚合之sum_other_doc_count问题_doc count error upper bound-CSDN博客

本文链接：https://blog.csdn.net/qq_35526951/article/details/116658386

本文分析了Elasticsearch中聚合统计出现精度问题的原因，并提出了两种解决方案：一是通过不分片设置来避免分片处理引起的误差；二是通过提高聚合的数量来增加统计的准确性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、问题分析

1）、doc_count_error_upper_bound：表示没有在这次聚合中返回，但是可能存在的潜在聚合结果。
2）、sum_other_doc_count：表示这次聚合中没有统计到的文档数。这个好理解，因为ES统计的时候默认只会根据count显示排名前十的分桶。如果分类（这里是目的地）比较多，自然会有文档没有被统计到。

二、解决办法

1）、不分片
设置主分片为1，也就是不分片了。这个显而易见，上面分析聚合不精确的核心原因就在于分片，所以不分片肯定可以解决问题。但是缺点也是显然的，只适用于数据量小的情况下，如果数据量大都在一个分片上会影响ES的性能。

GET kibana_sample_data_flights/_search
{
“size”: 0,
“aggs”: {
“dest”: {
“terms”: {
“field”: “DestCountry”
, “size”: 3
}
}
}
}

结果：

{
“took” : 2,
“timed_out” : false,
“_shards” : {
“total” : 1,
“successful” : 1,
“skipped” : 0,
“failed” : 0
},
“hits” : {
“total” : {
“value” : 10000,
“relation” : “gte”
},
“max_score” : null,
“hits” : [ ]
},
“aggregations” : {
“dest” : {
“doc_count_error_upper_bound” : 0,
“sum_other_doc_count” : 7605,
“buckets” : [
{
“key” : “IT”,
“doc_count” : 2371
}