ElasticSearch聚合

203人阅读 评论(0) 收藏 举报
分类:

目录:

一、基本概念

二、数据生成

       maven

       Java代码

三、查询方法

Metric 度量聚合

       求平均值,最大值,最小值,和,计数,统计

       百分比聚合

       百分比分级聚合

Matrix 分组聚合

       直方图聚合

       最小文档计数

       排序

       日期直方图聚合

       范围聚合

       过滤聚合

Pipeline 管道聚合

       平均分组聚合管道

       移动平均聚合

       总和累计聚合

       最大和小分组聚合

       统计分组聚合

—————————————————————————————

一、基本概念

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html

aggregations
       The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data.
         集合框架帮助在查询的基础上聚合数据,它提供一个简单的建筑模块称为【聚合】,用于构建数据的复杂
       An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).
         【聚合】被看做为一个unit-of-work,在一系列的document上面进行分析信息。执行的上下文定义了这个文档集。
       There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into three main families:
         有许多不同类型的聚合,每个聚合都有自己的目的和输出。为了更好地理解这些类型,将它们分为三个主要的家庭通常比较容易。
Metric 度量聚合
       Aggregations that keep track and compute metrics over a set of documents.
Matrix 分组聚合
       A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.
Pipeline 管道聚合
       Aggregations that aggregate the output of other aggregations and their associated metrics

二、数据生成

maven

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
    <version>5.2.0</version>
</dependency>

Java代码

package cn.orcale.com.es;

import java.net.InetAddress;
import java.util.Random;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.index.IndexRequestBuilder;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.transport.client.PreBuiltTransportClient;

/***
 * 
 * @author yuhui
 *
 */
public class insetDatas{

    @SuppressWarnings({ "resource" })
    public static void main(String[] args) throws Exception {

            String[] brand = {"奔驰","宝马","宝马Z4","奔驰C300","保时捷","奔奔"};
            int[] product_price = {10000,20000,30000,40000};
            int[] sale_price = {10000,20000,30000,40000};
            String[] sale_date = {"2017-08-21","2017-08-22","2017-08-23","2017-08-24"};
            String[] colour = {"white","black","gray","red"};
            String[] info = {"我很喜欢","Very Nice","不错, 不错 ","我以后还会来的"};
            int num = 0;
            Random random = new Random();
            Settings settings = Settings.builder().put("cluster.name", "elasticsearch")
                    .build();

            @SuppressWarnings("unchecked")
            TransportClient client = new PreBuiltTransportClient(settings)
                    .addTransportAddress(new InetSocketTransportAddress(InetAddress
                            .getByName("localhost"), 9300));

            BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();

            for(int i =0 ;i<1000; i++){
                num++;
                String brandTemp = brand[random.nextInt(6)];
                //插入
                IndexRequestBuilder indexRequestBuilder = client.prepareIndex("car_shop", "sales", num+"").setSource(

                        XContentFactory.jsonBuilder().startObject()
                                .field("num", num)
                                .field("brand", brandTemp)
                                .field("colour", colour[random.nextInt(4)])
                                .field("product_price", product_price[random.nextInt(4)])
                                .field("sale_price",  sale_price[random.nextInt(4)])
                                .field("sale_date",  sale_date[random.nextInt(4)])
                                .field("info",  brandTemp+info[random.nextInt(4)])
                                .endObject());

                bulkRequestBuilder.add(indexRequestBuilder);
            }               

            bulkRequestBuilder.get();

            System.out.println("插入完成");

            client.close();
    }   
}

三、查询方法

Metric 度量聚合

求平均值,最大值,最小值,和,计数,统计

#求平均值,最大值,最小值,和,计数,统计
#在指定的查询范围内内求【求平均值,最大值,最小值,和,计数,统计】
GET /car_shop/sales/_search
{
  "aggs" : {
        "avg_grade" : { "avg" : { "field" : "sale_price" } },
        "max_price" : { "max" : { "field" : "sale_price" } },
        "min_price" : { "min" : { "field" : "sale_price" } },
        "intraday_return" : { "sum" : { "field" : "sale_price" } },
        "grades_count" : { "value_count" : { "field" : "sale_price" } },
        "grades_stats" : { "stats" : { "field" : "sale_price" } }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "max_price": {
      "value": 40000
    },
    "min_price": {
      "value": 10000
    },
    "grades_stats": {
      "count": 1000,
      "min": 10000,
      "max": 40000,
      "avg": 24030,
      "sum": 24030000
    },
    "intraday_return": {
      "value": 24030000
    },
    "grades_count": {
      "value": 1000
    },
    "avg_grade": {
      "value": 24030
    }
  }

百分比聚合

#百分比聚合,"percents":[25,100]按照100等份计算,0是最低值,100是最高值
GET /car_shop/sales/_search
{
  "aggs" : {
      "load_time_outlier" : {
            "percentiles" : {
                "field" : "num" ,"percents":[0,25,100]
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "load_time_outlier": {
      "values": {
        "0.0": 1,
        "25.0": 250.75,
        "100.0": 1000
      }
    }
  }

百分比分级聚合

#百分比分级聚合,"values":[5000,10000,30000,40000]指的是范围包括的比例
GET /car_shop/sales/_search
{
  "aggs" : {
      "load_time_outlier" : {
            "percentile_ranks" : {
                "field" : "product_price" ,"values":[5000,10000,30000,40000]
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "load_time_outlier": {
      "values": {
        "5000.0": 0,
        "10000.0": 27.1,
        "30000.0": 74.6,
        "40000.0": 100
      }
    }
  }

Matrix 分组聚合

直方图聚合

#直方图聚合,"interval":10000是将product_price按照10000等分区间的计数
GET /car_shop/sales/_search
{
  "aggs" : {
      "product_price" : {
            "histogram" : {
                "field" : "product_price" ,"interval":10000
            }
        }
    },
    "size": 0
}

返回结果如下

"aggregations": {
    "product_price": {
      "buckets": [
        {
          "key": 10000,
          "doc_count": 278
        },
        {
          "key": 20000,
          "doc_count": 251
        },
        {
          "key": 30000,
          "doc_count": 225
        },
        {
          "key": 40000,
          "doc_count": 246
        }
      ]
    }
  }

最小文档计数

#最小文档计数
GET /car_shop/sales/_search
{
  "aggs" : {
      "product_price" : {
            "histogram" : {
                "field" : "product_price" ,"interval":10000,"min_doc_count": 1
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "product_price": {
      "buckets": [
        {
          "key": 10000,
          "doc_count": 278
        },
        {
          "key": 20000,
          "doc_count": 251
        },
        {
          "key": 30000,
          "doc_count": 225
        },
        {
          "key": 40000,
          "doc_count": 246
        }
      ]
    }
  }

排序

#排序   _key 或者  _count
GET /car_shop/sales/_search
{
  "aggs" : {
      "product_price" : {
            "histogram" : {
                "field" : "product_price" ,"interval":10000,"order": {"_key": "desc"}
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "product_price": {
      "buckets": [
        {
          "key": 40000,
          "doc_count": 246
        },
        {
          "key": 30000,
          "doc_count": 225
        },
        {
          "key": 20000,
          "doc_count": 251
        },
        {
          "key": 10000,
          "doc_count": 278
        }
      ]
    }
  }

日期直方图聚合

#日期直方图聚合   按天, 按月, 按年
GET /car_shop/sales/_search
{
  "aggs" : {
      "articles_over_time" : {
            "date_histogram" : {
                "field" : "sale_date" ,"interval":"1d","format": "yyyy-MM-dd"
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "articles_over_time": {
      "buckets": [
        {
          "key_as_string": "2017-08-21",
          "key": 1503273600000,
          "doc_count": 235
        },
        {
          "key_as_string": "2017-08-22",
          "key": 1503360000000,
          "doc_count": 259
        },
        {
          "key_as_string": "2017-08-23",
          "key": 1503446400000,
          "doc_count": 256
        },
        {
          "key_as_string": "2017-08-24",
          "key": 1503532800000,
          "doc_count": 250
        }
      ]
    }
  }

范围聚合

#范围聚合
GET /car_shop/sales/_search
{
  "aggs" : {
      "product_price" : {
            "range" : {
                "field" : "product_price" ,"ranges":[
                   {"to":10000},
                   {"from":10000,"to" :20000},
                   {"from":40000}
                ]
            }
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "product_price": {
      "buckets": [
        {
          "key": "*-10000.0",
          "to": 10000,
          "doc_count": 0
        },
        {
          "key": "10000.0-20000.0",
          "from": 10000,
          "to": 20000,
          "doc_count": 266
        },
        {
          "key": "40000.0-*",
          "from": 40000,
          "doc_count": 251
        }
      ]
    }
  }

过滤聚合

#过滤聚合(所有红色车子的平均价格)
GET /car_shop/sales/_search
{
  "aggs" : {
      "car_colour" : {
          "filter": {"term": {"colour": "red"}},
          "aggs": {"avg_price": {"avg": {"field":"sale_price"}}}
        }
    },
    "size": 0
}

返回结果如下

  "aggregations": {
    "car_colour": {
      "doc_count": 258,
      "avg_price": {
        "value": 24069.767441860466
      }
    }
  }

Pipeline 管道聚合

平均分组聚合管道

#平均分组聚合管道(求出每天总销售量以及平均每天销售量)
#最后的avg_bucket 表示平均分组聚合, sales_per_day>sales是求平均值,是第一个aggs的别称sales_per_day和第二个aggs的别称sales比较,">"是聚合分隔符
GET /car_shop/sales/_search
{
  "size": 0,
  "aggs": {
    "sales_per_day": {
      "date_histogram": {
        "field": "sale_date",
        "interval": "day"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "sale_price"
          }
        }
      }
    },
    "avg_day_sales": {
      "avg_bucket": {
        "buckets_path": "sales_per_day>sales" 
      }
    }
  }
}

返回结果如下

    "avg_day_sales": {
      "value": 6007500
    }

移动平均聚合

#移动平均聚合(求总和分组,将所有天的值相加)
POST /car_shop/sales/_search
{
    "size": 0,
    "aggs" : {
        "sales_per_day" : {
            "date_histogram" : {
                "field" : "sale_date",
                "interval" : "day"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "sale_price"
                    }
                }
            }
        },
        "sum_days_sales": {
            "sum_bucket": {
                "buckets_path": "sales_per_day>sales" 
            }
        }
    }
}

返回结果如下

    "sum_days_sales": {
      "value": 24030000
    }

总和累计聚合

#总和累计聚合(求每天的累计总和,第一天,第一二天,第一二三天,第一二三四天,)
POST /car_shop/sales/_search
{
    "size": 0,
    "aggs" : {
        "sales_per_day" : {
            "date_histogram" : {
                "field" : "sale_date",
                "interval" : "day"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "sale_price"
                    }
                },
            "cumulative_sales": {
                  "cumulative_sum": {
                      "buckets_path": "sales" 
                    }
                }
            }
        } 
    }
}

返回结果如下

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "sales_per_day": {
      "buckets": [
        {
          "key_as_string": "2017-08-21T00:00:00.000Z",
          "key": 1503273600000,
          "doc_count": 235,
          "sales": {
            "value": 5620000
          },
          "cumulative_sales": {
            "value": 5620000
          }
        },
        {
          "key_as_string": "2017-08-22T00:00:00.000Z",
          "key": 1503360000000,
          "doc_count": 259,
          "sales": {
            "value": 6150000
          },
          "cumulative_sales": {
            "value": 11770000
          }
        },
        {
          "key_as_string": "2017-08-23T00:00:00.000Z",
          "key": 1503446400000,
          "doc_count": 256,
          "sales": {
            "value": 6050000
          },
          "cumulative_sales": {
            "value": 17820000
          }
        },
        {
          "key_as_string": "2017-08-24T00:00:00.000Z",
          "key": 1503532800000,
          "doc_count": 250,
          "sales": {
            "value": 6210000
          },
          "cumulative_sales": {
            "value": 24030000
          }
        }
      ]
    }
  }
}

最大和小分组聚合

#最大和小分组聚合(求出所有天内最大、最小的销售值和日期)
POST /car_shop/sales/_search
{
    "size": 0,
    "aggs" : {
        "sales_per_day" : {
            "date_histogram" : {
                "field" : "sale_date",
                "interval" : "day"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "sale_price"
                    }
                }
            }
        } ,
            "max_days_sales": {
            "max_bucket": {
                "buckets_path": "sales_per_day>sales" 
            }
        },
         "min_days_sales": {
            "min_bucket": {
                "buckets_path": "sales_per_day>sales" 
            }
        }
    }
}

返回结果如下

    "max_days_sales": {
      "value": 6210000,
      "keys": [
        "2017-08-24T00:00:00.000Z"
      ]
    },
    "min_days_sales": {
      "value": 5620000,
      "keys": [
        "2017-08-21T00:00:00.000Z"
      ]
    }

统计分组聚合

#统计分组聚合(求出所有天内统计包括:最小、最大、平均、总和的销售值和日期)
POST /car_shop/sales/_search
{
    "size": 0,
    "aggs" : {
        "sales_per_day" : {
            "date_histogram" : {
                "field" : "sale_date",
                "interval" : "day"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "sale_price"
                    }
                }
            }
        } ,"stats_days_sales": {
            "stats_bucket": {
                "buckets_path": "sales_per_day>sales" 
            }
        }
    }
}

返回结果如下

    "stats_days_sales": {
      "count": 4,
      "min": 5620000,
      "max": 6210000,
      "avg": 6007500,
      "sum": 24030000
    }
1
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人简介
    个人资料
    • 访问:359444次
    • 积分:6460
    • 等级:
    • 排名:第3927名
    • 原创:280篇
    • 转载:23篇
    • 译文:24篇
    • 评论:76条
    博客专栏
    【友情推荐】章鱼大数据