ElasticSearch聚合

目录:

一、基本概念

二、数据生成

       maven
       Java代码

三、查询方法

Metric 度量聚合
       求平均值,最大值,最小值,和,计数,统计
       百分比聚合
       百分比分级聚合
Matrix 分组聚合
       直方图聚合
       最小文档计数
       排序
       日期直方图聚合
       范围聚合
       过滤聚合
Pipeline 管道聚合
       平均分组聚合管道
       移动平均聚合
       总和累计聚合
       最大和小分组聚合
       统计分组聚合

—————————————————————————————

一、基本概念

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

aggregations
       The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data.
         集合框架帮助在查询的基础上聚合数据,它提供一个简单的建筑模块称为【聚合】,用于构建数据的复杂
       An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).
         【聚合】被看做为一个unit-of-work,在一系列的document上面进行分析信息。执行的上下文定义了这个文档集。
       There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into three main families:
         有许多不同类型的聚合,每个聚合都有自己的目的和输出。为了更好地理解这些类型,将它们分为三个主要的家庭通常比较容易。
Metric 度量聚合
       Aggregations that keep track and compute metrics over a set of documents.
Matrix 分组聚合
       A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.
Pipeline 管道聚合
       Aggregations that aggregate the output of other aggregations and their associated metrics

二、数据生成

maven
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>5.2.0</version>
</dependency>
Java代码
package cn.orcale.com.es;

import java.net.InetAddress;
import java.util.Random;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.index.IndexRequestBuilder;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.transport.client.PreBuiltTransportClient;

/***
 * 
 * @author yuhui
 *
 */
public class insetDatas{

    @SuppressWarnings({ "resource" })
    public static void main(String[] args) throws Exception {

            String[] brand = {"奔驰","宝马","宝马Z4","奔驰C300","保时捷","奔奔"};
            int[] product_price = {10000,20000,30000,40000};
            int[] sale_price = {10000,20000,30000,40000};
            String[] sale_date = {"2017-08-21","2017-08-22","2017-08-23","2017-08-24"};
            String[] colour = {"white","black","gray","red"};
            String[] info = {"我很喜欢","Very Nice","不错, 不错 ","我以后还会来的"};
            int num = 0;
            Random random = new Random();
            Settings settings = Settings.builder().put("cluster.name", "elasticsearch")
                    .build();

            @SuppressWarnings("unchecked")
            TransportClient client = new PreBuiltTransportClient(settings)
                    .addTransportAddress(new InetSocketTransportAddress(InetAddress
                            .getByName("localhost"), 9300));

            BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();

            for(int i =0 ;i<1000; i++){
                num++;
                String brandTemp = brand[random.nextInt(6)];
                //插入
                IndexRequestBuilder indexRequestBuilder = client.prepareIndex("car_shop", "sales", num+"").setSource(

                        XContentFactory.jsonBuilder().startObject()
                                .field("num", num)
                                .field("brand", brandTemp)
                                .field("colour", colour[random.nextInt(4)])
                                .field("product_price", product_price[random.nextInt(4)])
                                .field("sale_price",  sale_price[random.nextInt(4)])
                                .field("sale_date",  sale_date[random.nextInt(4)])
                                .field("info",  brandTemp+info[random.nextInt(4)])
                                .endObject());

                bulkRequestBuilder.add(indexRequestBuilder);
            }               

            bulkRequestBuilder.get();

            System.out.println("插入完成");

            client.close();
    }   
}

三、查询方法

Metric 度量聚合
求平均值,最大值,最小值,和,计数,统计
#求平均值,最大值,最小值,和,计数,统计
#在指定的查询范围内内求【求平均值,最大值,最小值,和,计数,统计】
GET /car_shop/sales/_search
{
  "aggs" : {
        "avg_grade" : { "avg" : { "field" : "sale_price" } },
        "max_price" : { "max" : { "field" : "sale_price" } },
        "min_price" : { "min" : { "field" : "sale_price" } },
        "intraday_return" : { "sum" : { "field" : "sale_price" } },
        "grades_count" : { "value_count" : { "field" : "sale_price" } },
        "grades_stats" : { "stats" : { "field" : "sale_price" } }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "max_price": {
      "value": 40000
    },
    "min_price": {
      "value": 10000
    },
    "grades_stats": {
      "count": 1000,
      "min": 10000,
      "max": 40000,
      "avg": 24030,
      "sum": 24030000
    },
    "intraday_return": {
      "value": 24030000
    },
    "grades_count": {
      "value": 1000
    },
    "avg_grade": {
      "value": 24030
    }
  }
百分比聚合
#百分比聚合,"percents":[25,100]按照100等份计算,0是最低值,100是最高值
GET /car_shop/sales/_search
{
  "aggs" : {
      "load_time_outlier" : {
            "percentiles" : {
                "field" : "num" ,"percents":[0,25,100]
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "load_time_outlier": {
      "values": {
        "0.0": 1,
        "25.0": 250.75,
        "100.0": 1000
      }
    }
  }
百分比分级聚合
#百分比分级聚合,"values":[5000,10000,30000,40000]指的是范围包括的比例
GET /car_shop/sales/_search
{
  "aggs" : {
      "load_time_outlier" : {
            "percentile_ranks" : {
                "field" : "product_price" ,"values":[5000,10000,30000,40000]
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "load_time_outlier": {
      "values": {
        "5000.0": 0,
        "10000.0": 27.1,
        "30000.0": 74.6,
        "40000.0": 100
      }
    }
  }
Matrix 分组聚合
直方图聚合
#直方图聚合,"interval":10000是将product_price按照10000等分区间的计数
GET /car_shop/sales/_search
{
  "aggs" : {
      "product_price" : {
            "histogram" : {
                "field" : "product_price" ,"interval":10000
            }
        }
    },
    "size": 0
}

返回结果如下

"aggregations": {
    "product_price": {
      "buckets": [
        {
          "key": 10000,
          "doc_count": 278
        },
        {
          "key": 20000,
          "doc_count": 251
        },
        {
          "key": 30000,
          "doc_count": 225
        },
        {
          "key": 40000,
          "doc_count": 246
        }
      ]
    }
  }
最小文档计数
#最小文档计数
GET /car_shop/sales/_search
{
  "aggs" : {
      "product_price" : {
            "histogram" : {
                "field" : "product_price" ,"interval":10000,"min_doc_count": 1
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "product_price": {
      "buckets": [
        {
          "key": 10000,
          "doc_count": 278
        },
        {
          "key": 20000,
          "doc_count": 251
        },
        {
          "key": 30000,
          "doc_count": 225
        },
        {
          "key": 40000,
          "doc_count": 246
        }
      ]
    }
  }
排序
#排序   _key 或者  _count
GET /car_shop/sales/_search
{
  "aggs" : {
      "product_price" : {
            "histogram" : {
                "field" : "product_price" ,"interval":10000,"order": {"_key": "desc"}
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "product_price": {
      "buckets": [
        {
          "key": 40000,
          "doc_count": 246
        },
        {
          "key": 30000,
          "doc_count": 225
        },
        {
          "key": 20000,
          "doc_count": 251
        },
        {
          "key": 10000,
          "doc_count": 278
        }
      ]
    }
  }
日期直方图聚合
#日期直方图聚合   按天, 按月, 按年
GET /car_shop/sales/_search
{
  "aggs" : {
      "articles_over_time" : {
            "date_histogram" : {
                "field" : "sale_date" ,"interval":"1d","format": "yyyy-MM-dd"
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "articles_over_time": {
      "buckets": [
        {
          "key_as_string": "2017-08-21",
          "key": 1503273600000,
          "doc_count": 235
        },
        {
          "key_as_string": "2017-08-22",
          "key": 1503360000000,
          "doc_count": 259
        },
        {
          "key_as_string": "2017-08-23",
          "key": 1503446400000,
          "doc_count": 256
        },
        {
          "key_as_string": "2017-08-24",
          "key": 1503532800000,
          "doc_count": 250
        }
      ]
    }
  }
范围聚合
#范围聚合
GET /car_shop/sales/_search
{
  "aggs" : {
      "product_price" : {
            "range" : {
                "field" : "product_price" ,"ranges":[
                   {"to":10000},
                   {"from":10000,"to" :20000},
                   {"from":40000}
                ]
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "product_price": {
      "buckets": [
        {
          "key": "*-10000.0",
          "to": 10000,
          "doc_count": 0
        },
        {
          "key": "10000.0-20000.0",
          "from": 10000,
          "to": 20000,
          "doc_count": 266
        },
        {
          "key": "40000.0-*",
          "from": 40000,
          "doc_count": 251
        }
      ]
    }
  }
过滤聚合
#过滤聚合(所有红色车子的平均价格)
GET /car_shop/sales/_search
{
  "aggs" : {
      "car_colour" : {
          "filter": {"term": {"colour": "red"}},
          "aggs": {"avg_price": {"avg": {"field":"sale_price"}}}
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "car_colour": {
      "doc_count": 258,
      "avg_price": {
        "value": 24069.767441860466
      }
    }
  }
Pipeline 管道聚合
平均分组聚合管道
#平均分组聚合管道(求出每天总销售量以及平均每天销售量)
#最后的avg_bucket 表示平均分组聚合, sales_per_day>sales是求平均值,是第一个aggs的别称sales_per_day和第二个aggs的别称sales比较,">"是聚合分隔符
GET /car_shop/sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_day": {
      "date_histogram": {
        "field": "sale_date",
        "interval": "day"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "sale_price"
          }
        }
      }
    },
    "avg_day_sales": {
      "avg_bucket": {
        "buckets_path": "sales_per_day>sales" 
      }
    }
  }
}

返回结果如下

    "avg_day_sales": {
      "value": 6007500
    }
移动平均聚合
#移动平均聚合(求总和分组,将所有天的值相加)
POST /car_shop/sales/_search
{
    "size": 0,
    "aggs" : {
        "sales_per_day" : {
            "date_histogram" : {
                "field" : "sale_date",
                "interval" : "day"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "sale_price"
                    }
                }
            }
        },
        "sum_days_sales": {
            "sum_bucket": {
                "buckets_path": "sales_per_day>sales" 
            }
        }
    }
}

返回结果如下

    "sum_days_sales": {
      "value": 24030000
    }
总和累计聚合
#总和累计聚合(求每天的累计总和,第一天,第一二天,第一二三天,第一二三四天,)
POST /car_shop/sales/_search
{
    "size": 0,
    "aggs" : {
        "sales_per_day" : {
            "date_histogram" : {
                "field" : "sale_date",
                "interval" : "day"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "sale_price"
                    }
                },
            "cumulative_sales": {
                  "cumulative_sum": {
                      "buckets_path": "sales" 
                    }
                }
            }
        } 
    }
}

返回结果如下

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "sales_per_day": {
      "buckets": [
        {
          "key_as_string": "2017-08-21T00:00:00.000Z",
          "key": 1503273600000,
          "doc_count": 235,
          "sales": {
            "value": 5620000 },
          "cumulative_sales": {
            "value": 5620000 }
        },
        {
          "key_as_string": "2017-08-22T00:00:00.000Z",
          "key": 1503360000000,
          "doc_count": 259,
          "sales": {
            "value": 6150000 },
          "cumulative_sales": {
            "value": 11770000 }
        },
        {
          "key_as_string": "2017-08-23T00:00:00.000Z",
          "key": 1503446400000,
          "doc_count": 256,
          "sales": {
            "value": 6050000 },
          "cumulative_sales": {
            "value": 17820000 }
        },
        {
          "key_as_string": "2017-08-24T00:00:00.000Z",
          "key": 1503532800000,
          "doc_count": 250,
          "sales": {
            "value": 6210000 },
          "cumulative_sales": {
            "value": 24030000 }
        }
      ]
    }
  }
}
最大和小分组聚合
#最大和小分组聚合(求出所有天内最大、最小的销售值和日期)
POST /car_shop/sales/_search
{
    "size": 0,
    "aggs" : {
        "sales_per_day" : {
            "date_histogram" : {
                "field" : "sale_date",
                "interval" : "day"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "sale_price"
                    }
                }
            }
        } ,
            "max_days_sales": {
            "max_bucket": {
                "buckets_path": "sales_per_day>sales" 
            }
        },
         "min_days_sales": {
            "min_bucket": {
                "buckets_path": "sales_per_day>sales" 
            }
        }
    }
}

返回结果如下

    "max_days_sales": {
      "value": 6210000,
      "keys": [
        "2017-08-24T00:00:00.000Z"
      ]
    },
    "min_days_sales": {
      "value": 5620000,
      "keys": [
        "2017-08-21T00:00:00.000Z"
      ]
    }
统计分组聚合
#统计分组聚合(求出所有天内统计包括:最小、最大、平均、总和的销售值和日期)
POST /car_shop/sales/_search
{
    "size": 0,
    "aggs" : {
        "sales_per_day" : {
            "date_histogram" : {
                "field" : "sale_date",
                "interval" : "day"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "sale_price"
                    }
                }
            }
        } ,"stats_days_sales": {
            "stats_bucket": {
                "buckets_path": "sales_per_day>sales" 
            }
        }
    }
}

返回结果如下

    "stats_days_sales": {
      "count": 4,
      "min": 5620000,
      "max": 6210000,
      "avg": 6007500,
      "sum": 24030000
    }
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

北京小辉

你的鼓舞将是我最大的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值