文章目录
avg 求平均
GET /person/_search
{
"aggs": { // aggs 固定写法,表示聚合查询
"avg_grade": { // avg_grade 聚合查询后的字段名称(自定义)
"avg": { // avg 表示求平均(聚合的方式)
"field": "age" // 指定聚合查询的字段
}
}
},
"size":0 // 指定只返回聚合信息
}
索引 person 中 “age” 年龄字段的平均值,字段结果 为 “avg_grade”,“size” 为0,表示不查询具体的文档结果
{
"took": 62,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1003,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"avg_grade": {
"value": 30.19840478564307
}
}
}
平均年龄为:30.20,此查询类似于sql使用取平均的 avg 函数
cardinality 计数
{
"size": 0,
"aggs" :{
"count":{
"cardinality":{
"field":"id"
}
}
}
}
结果
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1003,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"count": {
"value": 1003
}
}
}
如果我们按sex进行计数,那么 count 就为 2。因为sex就只有两种值。我们示例是使用的字段中的id值,此值是不重复的值,所以计数为文档总数。
注意:此计数为近似值,其准确性可通过 precision_threshold 选项来控制,precision_threshold 支持最大值为 40000,超过将按 40000 计算。默认为 3000。是以内存换取准确性的做法。比如我们的实际数量为1003,如果我们设置 precision_threshold 为 100 或更小,将得到不一样的值。
max(最大)min(最小)值 sum(求和)
{
"size": 0,
"aggs" :{
"max_age":{
"max":{
"field":"age"
}
}
}
}
最大年龄
{
"size": 0,
"aggs" :{
"min_age":{
"min":{
"field":"age"
}
}
}
}
最小年龄
{
"size": 0,
"aggs" :{
"sum_age":{
"sum":{
"field":"age"
}
}
}
}
stats 一次统计最大、最小等
{
"size": 0,
"aggs" :{
"stats_age":{
"stats":{
"field":"age"
}
}
}
}
结果
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1003,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"stats_age": {
"count": 1003,
"min": 10.0,
"max": 50.0,
"avg": 30.19840478564307,
"sum": 30289.0
}
}
}
注:以上聚合都是针对数字类型
string_stats 字符串统计
{
"size": 0,
"aggs" :{
"stats_age":{
"string_stats":{
"field":"name"
}
}
}
}
结果
{
"took": 48,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1003,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"stats_age": {
"count": 1003,
"min_length": 2,
"max_length": 3,
"avg_length": 2.1046859421734796,
"entropy": 6.51155113253397
}
}
}
entropy:是一个负责的计算,是对聚合收集的所有项计算的香农熵值。熵量化了字段中包含的信息量。学过信息论的应该知道。
注:默认情况下 text 类型数据无法统计,建议统计 keyword 类型。text 类型需进行字段设置,但可能会占用大量内存。
terms 分组(桶)查询
将我们 person 索引中的文档,按 sex 字段进行分组
{
"size": 0,
"aggs" :{
"counts":{
"terms":{
"field":"sex"
}
}
}
}
结果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1003,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"counts": {
"doc_count_error_upper_bound": 0,// 文档错误的数量
"sum_other_doc_count": 0, // 不属于响应的 buckets 的文档数量
"buckets": [ // 桶列表
{
"key": 1, // 分组的值
"doc_count": 511 // 拥有该值的文档数
},
{
"key": 0,
"doc_count": 492
}
]
}
}
}
默认情况下,我们也不能对 text 字段进行 terms 聚合(请使用 keyword 字段)。text 字段可以使用 fielddata 设置启用,但建议不要启用。
对聚合结果排序
{
"size": 0,
"aggs" :{
"sex_aggs":{
"terms":{
"field":"age",
"order":{
"_key":"asc"
}
}
}
}
}
_key 表示排序值,我们的示例为按排序值进行升序排列。示例表示查询索引 person ,按 age 分组,分组结果按照 年龄 升序排列。
执行结果
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1003,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"sex_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 784,
"buckets": [
{
"key": 10,
"doc_count": 14
},
{
"key": 11,
"doc_count": 26
},
{
"key": 12,
"doc_count": 30
},
{
"key": 13,
"doc_count": 21
},
{
"key": 14,
"doc_count": 24
},
{
"key": 15,
"doc_count": 21
},
{
"key": 16,
"doc_count": 27
},
{
"key": 17,
"doc_count": 24
},
{
"key": 18,
"doc_count": 18
},
{
"key": 19,
"doc_count": 14
}
]
}
}
}
其他参数
- size:返回桶(buckets)的个数,默认是10。(所以我们示例的buckets数量为10)。
- shard_size:默认 size * 1.5 + 10。size越大,意味着计算成本越高。
- show_term_doc_count_error:是否计算文档错误计数。默认为 false
- order:排序值,默认为每个存储桶的文档数
- min_doc_count:存储桶中要返回的最小文档数。默认为 1。
- shard_min_doc_count:每个分片上的存储桶中要返回的最小文档数。默认为 min_doc_count.
- collect_mode:指定数据收集策略。支持或模式depth_first。breadth_first默认为breadth_first.
multi_terms 多字段分组
{
"size": 0,
"aggs" :{
"sex_aggs":{
"multi_terms":{
"terms":[{
"field":"age"
},{
"field":"sex"
}]
}
}
}
}
按 年龄和性别分组
执行结果
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1003,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"sex_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 820,
"buckets": [
{
"key": [
24,
1
],
"key_as_string": "24|1",
"doc_count": 23
},
{
"key": [
46,
0
],
"key_as_string": "46|0",
"doc_count": 20
},
{
"key": [
44,
1
],
"key_as_string": "44|1",
"doc_count": 19
},
{
"key": [
33,
0
],
"key_as_string": "33|0",
"doc_count": 18
},
{
"key": [
49,
0
],
"key_as_string": "49|0",
"doc_count": 18
},
{
"key": [
21,
1
],
"key_as_string": "21|1",
"doc_count": 17
},
{
"key": [
22,
0
],
"key_as_string": "22|0",
"doc_count": 17
},
{
"key": [
24,
0
],
"key_as_string": "24|0",
"doc_count": 17
},
{
"key": [
26,
0
],
"key_as_string": "26|0",
"doc_count": 17
},
{
"key": [
28,
0
],
"key_as_string": "28|0",
"doc_count": 17
}
]
}
}
}
range 范围聚合
{
"size": 0,
"aggs" :{
"age_state":{
"range":{
"field": "age",
"ranges":[{
"to": 10
},{
"from": 10,
"to": 20
},{
"from": 20
}]
}
}
}
}
结果:
{
"took": 143,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1003,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"age_state": {
"buckets": [
{
"key": "*-10.0",
"to": 10.0,
"doc_count": 0
},
{
"key": "10.0-20.0",
"from": 10.0,
"to": 20.0,
"doc_count": 219
},
{
"key": "20.0-*",
"from": 20.0,
"doc_count": 784
}
]
}
}
}