Elasticsearch 聚合查询（上）

最新推荐文章于 2024-07-21 20:21:46 发布

没事儿写两篇

最新推荐文章于 2024-07-21 20:21:46 发布

阅读量156

点赞数

分类专栏：人在江湖之Elasticsearch 文章标签： elasticsearch 分组函数聚合查询 having group by

本文链接：https://blog.csdn.net/forlinkext/article/details/131290171

版权

人在江湖之Elasticsearch 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

文章目录

avg 求平均
cardinality 计数
max（最大）min（最小）值 sum（求和）
stats 一次统计最大、最小等
string_stats 字符串统计
terms 分组（桶）查询
- 对聚合结果排序
- 其他参数
multi_terms 多字段分组
range 范围聚合

avg 求平均

GET /person/_search
{
  "aggs": { // aggs 固定写法，表示聚合查询
    "avg_grade": {  // avg_grade 聚合查询后的字段名称（自定义）
    	"avg": { // avg 表示求平均（聚合的方式）
    		"field": "age" // 指定聚合查询的字段
    	} 
    }
  },
  "size":0 // 指定只返回聚合信息
}

索引 person 中 “age” 年龄字段的平均值，字段结果为 “avg_grade”，“size” 为0，表示不查询具体的文档结果

{
    "took": 62,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1003,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "avg_grade": {
            "value": 30.19840478564307
        }
    }
}

平均年龄为：30.20，此查询类似于sql使用取平均的 avg 函数

cardinality 计数

{
    "size": 0,
    "aggs" :{
        "count":{
            "cardinality":{
                "field":"id"
            }
        }
    }
}

结果

{
    "took": 6,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1003,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "count": {
            "value": 1003
        }
    }
}

如果我们按sex进行计数，那么 count 就为 2。因为sex就只有两种值。我们示例是使用的字段中的id值，此值是不重复的值，所以计数为文档总数。

注意：此计数为近似值，其准确性可通过 precision_threshold 选项来控制，precision_threshold 支持最大值为 40000，超过将按 40000 计算。默认为 3000。是以内存换取准确性的做法。比如我们的实际数量为1003，如果我们设置 precision_threshold 为 100 或更小，将得到不一样的值。

max（最大）min（最小）值 sum（求和）

{
    "size": 0,
    "aggs" :{
        "max_age":{
            "max":{
                "field":"age"
            }
        }
    }
}

最大年龄

{
    "size": 0,
    "aggs" :{
        "min_age":{
            "min":{
                "field":"age"
            }
        }
    }
}

最小年龄

{
    "size": 0,
    "aggs" :{
        "sum_age":{
            "sum":{
                "field":"age"
            }
        }
    }
}

stats 一次统计最大、最小等

{
    "size": 0,
    "aggs" :{
        "stats_age":{
            "stats":{
                "field":"age"
            }
        }
    }
}

结果

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1003,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "stats_age": {
            "count": 1003,
            "min": 10.0,
            "max": 50.0,
            "avg": 30.19840478564307,
            "sum": 30289.0
        }
    }
}

注：以上聚合都是针对数字类型

string_stats 字符串统计

{
    "size": 0,
    "aggs" :{
        "stats_age":{
            "string_stats":{
                "field":"name"
            }
        }
    }
}

结果

{
    "took": 48,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1003,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "stats_age": {
            "count": 1003,
            "min_length": 2,
            "max_length": 3,
            "avg_length": 2.1046859421734796,
            "entropy": 6.51155113253397
        }
    }
}

entropy：是一个负责的计算，是对聚合收集的所有项计算的香农熵值。熵量化了字段中包含的信息量。学过信息论的应该知道。

注：默认情况下 text 类型数据无法统计，建议统计 keyword 类型。text 类型需进行字段设置，但可能会占用大量内存。

terms 分组（桶）查询

将我们 person 索引中的文档，按 sex 字段进行分组

{
    "size": 0,
    "aggs" :{
        "counts":{
            "terms":{
                "field":"sex"
            }
        }
    }
}

结果

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1003,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "counts": {
            "doc_count_error_upper_bound": 0,// 文档错误的数量
            "sum_other_doc_count": 0, // 不属于响应的 buckets 的文档数量
            "buckets": [ // 桶列表
                {
                    "key": 1, // 分组的值
                    "doc_count": 511 // 拥有该值的文档数
                },
                {
                    "key": 0,
                    "doc_count": 492
                }
            ]
        }
    }
}

默认情况下，我们也不能对 text 字段进行 terms 聚合（请使用 keyword 字段）。text 字段可以使用 fielddata 设置启用，但建议不要启用。

对聚合结果排序

{
   "size": 0,
   "aggs" :{
       "sex_aggs":{
           "terms":{
               "field":"age",
               "order":{
                   "_key":"asc"
               }
           }
       }
   }
}

_key 表示排序值，我们的示例为按排序值进行升序排列。示例表示查询索引 person ，按 age 分组，分组结果按照年龄升序排列。

执行结果

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1003,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "sex_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 784,
            "buckets": [
                {
                    "key": 10,
                    "doc_count": 14
                },
                {
                    "key": 11,
                    "doc_count": 26
                },
                {
                    "key": 12,
                    "doc_count": 30
                },
                {
                    "key": 13,
                    "doc_count": 21
                },
                {
                    "key": 14,
                    "doc_count": 24
                },
                {
                    "key": 15,
                    "doc_count": 21
                },
                {
                    "key": 16,
                    "doc_count": 27
                },
                {
                    "key": 17,
                    "doc_count": 24
                },
                {
                    "key": 18,
                    "doc_count": 18
                },
                {
                    "key": 19,
                    "doc_count": 14
                }
            ]
        }
    }
}

其他参数

size：返回桶（buckets）的个数，默认是10。（所以我们示例的buckets数量为10）。
shard_size：默认 size * 1.5 + 10。size越大，意味着计算成本越高。
show_term_doc_count_error：是否计算文档错误计数。默认为 false
order：排序值，默认为每个存储桶的文档数
min_doc_count：存储桶中要返回的最小文档数。默认为 1。
shard_min_doc_count：每个分片上的存储桶中要返回的最小文档数。默认为 min_doc_count.
collect_mode：指定数据收集策略。支持或模式depth_first。breadth_first默认为breadth_first.

multi_terms 多字段分组

{
    "size": 0,
    "aggs" :{
        "sex_aggs":{
            "multi_terms":{
                "terms":[{
                    "field":"age"
                },{
                    "field":"sex"
                }]
            }
        }
    }
}

按年龄和性别分组

执行结果

{
    "took": 29,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1003,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "sex_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 820,
            "buckets": [
                {
                    "key": [
                        24,
                        1
                    ],
                    "key_as_string": "24|1",
                    "doc_count": 23
                },
                {
                    "key": [
                        46,
                        0
                    ],
                    "key_as_string": "46|0",
                    "doc_count": 20
                },
                {
                    "key": [
                        44,
                        1
                    ],
                    "key_as_string": "44|1",
                    "doc_count": 19
                },
                {
                    "key": [
                        33,
                        0
                    ],
                    "key_as_string": "33|0",
                    "doc_count": 18
                },
                {
                    "key": [
                        49,
                        0
                    ],
                    "key_as_string": "49|0",
                    "doc_count": 18
                },
                {
                    "key": [
                        21,
                        1
                    ],
                    "key_as_string": "21|1",
                    "doc_count": 17
                },
                {
                    "key": [
                        22,
                        0
                    ],
                    "key_as_string": "22|0",
                    "doc_count": 17
                },
                {
                    "key": [
                        24,
                        0
                    ],
                    "key_as_string": "24|0",
                    "doc_count": 17
                },
                {
                    "key": [
                        26,
                        0
                    ],
                    "key_as_string": "26|0",
                    "doc_count": 17
                },
                {
                    "key": [
                        28,
                        0
                    ],
                    "key_as_string": "28|0",
                    "doc_count": 17
                }
            ]
        }
    }
}

range 范围聚合

{
    "size": 0,
    "aggs" :{
        "age_state":{ 
            "range":{
                "field": "age",
                "ranges":[{
                    "to": 10
                },{
                    "from": 10,
                    "to": 20
                },{
                    "from": 20
                }]
            }
        }
    }
}

结果：

{
    "took": 143,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1003,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "age_state": {
            "buckets": [
                {
                    "key": "*-10.0",
                    "to": 10.0,
                    "doc_count": 0
                },
                {
                    "key": "10.0-20.0",
                    "from": 10.0,
                    "to": 20.0,
                    "doc_count": 219
                },
                {
                    "key": "20.0-*",
                    "from": 20.0,
                    "doc_count": 784
                }
            ]
        }
    }
}

没事儿写两篇

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch 聚合查询（上）

是以内存换取准确性的做法。比如我们的实际数量为1003，如果我们设置 precision_threshold 为 100 或更小，将得到不一样的值。我们示例是使用的字段中的id值，此值是不重复的值，所以计数为文档总数。示例表示查询索引 person ，按 age 分组，分组结果按照年龄升序排列。索引 person 中 “age” 年龄字段的平均值，字段结果为 “avg_grade”，“size” 为0，表示不查询具体的文档结果。entropy：是一个负责的计算，是对聚合收集的所有项计算的香农熵值。
复制链接

扫一扫