首先创建实例,以汽车经销商为例(数据主要包含:品牌、制造商、售价、销售时间等)
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
1. 一个简单的聚合,查出哪个颜色的销量最好:
GET /cars/transactions/_search?search_type=count
{
"aggs" : {
"colors" : {
"terms" : {
"field" : "color"
}
}
}
}
详解:
1) 聚合语句写在第一层aggs参数之下(aggs也可以写成aggregations)
2) colors是该aggregations的名称。可以随意命名
3) terms是bucket的一种类型。
个人见解:数据都是存放在一个桶里面,查询语句就是为了把数据从大桶里面捞出来然后放到小桶里面进行其他的操作。
terms bucket:在该查询语句中,terms指定的field是color,就是为每一种颜色建立sub bucket(小桶),把那些数据放入几个小桶中进行重新归类。
运行上述语句,查询到的结果是:
{
...
"hits": {
"hits": []
},
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4
},
{
"key": "blue",
"doc_count": 2
},
{
"key": "green",
"doc_count": 2
}
]
}
}
}
看到了吧,根据所有的数据根据颜色的种类,都放到了小桶里面去了,buckets中就是所有的小桶了。其中key就相当于这个桶的label了,doc_count这个跟查询的方式有关系。
2. 聚合中复杂的统计(metric)
简单的buckets及计算每个bucket中的document number是一个简单的基础并且这种方式非常的有用。但是在应用中我们往往需要更复杂的统计。例如:统计每种颜色中车辆销售的均价是多少?
查询语句如下:
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
注意看语法结构,有定义了一个aggs,是nest在color中,也就是这个aggs是针对每一种color的。其中avg_price是metric的名字。
该语句的取值结果为:
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"avg_price": {
"value": 32500
}
},
{
"key": "blue",
"doc_count": 2,
"avg_price": {
"value": 20000
}
},
{
"key": "green",
"doc_count": 2,
"avg_price": {
"value": 21000
}
}
]
}
}
...
}
在每个bucket中增加了avg_price字段,说明每个bucket中添加metric成功!
3. buckets中嵌套buckets
概念简单,直接贴嵌套的查询语句吧:
GET /cars/transactions/_search?search_type=count
{
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"make": {
"terms": {
"field": "make"
}
}
}
}
}
}
注意看嵌套的buckets跟metric是在同一个level下,这个地方需要思考一下他们的嵌套关系,有助于了解aggregation中的buckets和metric的概念了。
查询结果如下:
{
...
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"make": {
"buckets": [
{
"key": "honda",
"doc_count": 3
},
{
"key": "bmw",
"doc_count": 1
}
]
},
"avg_price": {
"value": 32500
}
},
...
}
以上就是elasticsearch中关于aggregation的一点理解,需要经常思考一下语法结构,并想明白这种结构的合理性,这样就更容易理解了!