ElasticSearch学习笔记之二十三 桶聚合
桶聚合
桶聚合不同于指标聚合,它不对文档字段进行计算,而是对我们的文档进行分组, 每一个分组都关联一个标准 (依赖于聚合类型),这个 标准决定文档是否会划分到分组. 换句话说,桶就是一个文档的集合,除了桶本身,桶计算还计算并返回划分到每个桶的文档数量。
与指标聚合不用,桶聚合支持子聚合, 这些子聚合可以聚合由它们的父聚合创建的分组。
桶聚合有很多种, 每一个都有不同的 “bucketing” 策略. 有的策略定义一个分组(单分组聚合),有的策略定义固定数量的多个分组(多分组聚合),还有的策略在聚合执行的过程中动态的分组。
注意:
一次响应返回的分组的最大数被elasticsearch集群设置的search.max_buckets属性限制。一般来说它被设置为-1不作限制。但是当结果超过10,000(版本支持的默认最大值)分组的时候会得到一个弃用警告。
Children Aggregation(子聚合)
Children Aggregation 是下面 join
字段定义的选择有特定type
字段的子文档的的单分组聚合。
这类聚合有一个参数:
- type - 应该被选择的子文档的类型
举例来说, 我们有一个有questions 和 answers的索引. 有join
字段的answer类型文档映射:
PUT child_example
{
"mappings": {
"_doc": {
"properties": {
"join": {
"type": "join",
"relations": {
"question": "answer"
}
}
}
}
}
}
问题文档包含一个 tag
字段,答案文档包含一个owner
字段。 children aggregation可以把问题文档的 tag
字段分组映射到答案文档的owner
分组。
问题文档
PUT child_example/_doc/1
{
"join": {
"name": "question"
},
"body": "<p>I have Windows 2003 server and i bought a new Windows 2008 server...",
"title": "Whats the best way to file transfer my site from server to a newer one?",
"tags": [
"windows-server-2003",
"windows-server-2008",
"file-transfer"
]
}
答案如下:
PUT child_example/_doc/2?routing=1
{
"join": {
"name": "answer",
"parent": "1"
},
"owner": {
"location": "Norfolk, United Kingdom",
"display_name": "Sam",
"id": 48
},
"body": "<p>Unfortunately you're pretty much limited to FTP...",
"creation_date": "2009-05-04T13:45:37.030"
}
PUT child_example/_doc/3?routing=1&refresh
{
"join": {
"name": "answer",
"parent": "1"
},
"owner": {
"location": "Norfolk, United Kingdom",
"display_name": "Troll",
"id": 49
},
"body": "<p>Use Linux...",
"creation_date": "2009-05-05T13:45:37.030"
}
下面的请求可以把2者联合在一起:
POST child_example/_search?size=0
{
"aggs": {
"top-tags": {
"terms": {
"field": "tags.keyword",
"size": 10
},
"aggs": {
"to-answers": {
"children": {
"type" : "answer"
},
"aggs": {
"top-names": {
"terms": {
"field": "owner.display_name.keyword",
"size": 10
}
}
}
}
}
}
}
}
type 指向名为answer 的类型/ 映射 .
上面的案例返回置顶的问题标签和每个标签下置顶答案的所有者。
返回如下:
{
"took": 25,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"top-tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "file-transfer",
"doc_count": 1,
"to-answers": {
"doc_count": 2,
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sam",
"doc_count": 1
},
{
"key": "Troll",
"doc_count": 1
}
]
}
}
},
{
"key": "windows-server-2003",
"doc_count": 1,
"to-answers": {
"doc_count": 2,
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sam",
"doc_count": 1
},
{
"key": "Troll",
"doc_count": 1
}
]
}
}
},
{
"key": "windows-server-2008",
"doc_count": 1,
"to-answers": {
"doc_count": 2,
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sam",
"doc_count": 1
},
{
"key": "Troll",
"doc_count": 1
}
]
}
}
}
]
}
}
}
Range Aggregation(范围聚合)
Range Aggregation是一个可以用户自定义一系列范围,每个范围代表一个分组的多值分组聚合. 在聚合的过程中,从每个文档提取出值然后检查每个分组的范围并且正确的分组。 注意,聚合的每个范围会包含from
但是排除to
例如:
GET /_search
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"ranges" : [
{ "to" : 100.0 },
{ "from" : 100.0, "to" : 200.0 },
{ "from" : 200.0 }
]
}
}
}
}
结果如下:
{
...
"aggregations": {
"price_ranges" : {
"buckets": [
{
"key": "*-100.0",
"to": 100.0,
"doc_count": 2
},
{
"key": "100.0-200.0",
"from": 100.0,
"to": 200.0,
"doc_count": 2
},
{
"key": "200.0-*",
"from": 200.0,
"doc_count": 3
}
]
}
}
}
Keyed Response
设置 keyed
为 true
会将每个分组和一个独一无二的key关联并将返回作为hash返回而不是array:
GET /_search
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"keyed" : true,
"ranges" : [
{ "to" : 100 },
{ "from" : 100, "to" : 200 },
{ "from" : 200 }
]
}
}
}
}
结果如下:
{
...
"aggregations": {
"price_ranges" : {
"buckets": {
"*-100.0": {
"to": 100.0,
"doc_count": 2
},
"100.0-200.0": {
"from": 100.0,
"to": 200.0,
"doc_count": 2
},
"200.0-*": {
"from": 200.0,
"doc_count": 3
}
}
}
}
}
也支持为每个范围范围自定义key
:
GET /_search
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"keyed" : true,
"ranges" : [
{ "key" : "cheap", "to" : 100 },
{ "key" : "average", "from" : 100, "to" : 200 },
{ "key" : "expensive", "from" : 200 }
]
}
}
}
}
结果如下:
{
...
"aggregations": {
"price_ranges" : {
"buckets": {
"cheap": {
"to": 100.0,
"doc_count": 2
},
"average": {
"from": 100.0,
"to": 200.0,
"doc_count": 2
},
"expensive": {
"from": 200.0,
"doc_count": 3
}
}
}
}
}