Elasticsearch的聚合分析详解以及应用场景

最新推荐文章于 2024-02-18 15:53:21 发布

古月充电器

最新推荐文章于 2024-02-18 15:53:21 发布

阅读量712

点赞数

分类专栏： elasticsearch 文章标签： java elasticsearch

本文链接：https://blog.csdn.net/qq_27017129/article/details/104853438

版权

elasticsearch 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

官网直通车

什么是聚合分析(aggregation)？

聚合分析es除搜索功能外提供的针对es数据做统计分析的功能

特点:

功能丰富:提供了Bucketing，Metric，Matrix，Pipeline等多种分析方式
实时性高:所有等计算结果都是实时返回等，而hadoop是T+1级别，也就是隔天

aggregation应用场景：

统计该商户一周每天的订单
统计一月每天的金额是多少
简单的说就是TOB端的数据看板TOC端的雷达统计

elasticsearch主要分析方式介绍:

Bucketing:分桶类型，类似SQL中的分组(GROUP BY)语法官网直通车
Metric：指标分析类型，比如计算最大值、最小值、平均值、总和等等官网直通车
Matrix：矩阵分析，比如每场测量样本从均值分布的程度，每个字段的平均值等官网直通车

实战演练：

Bucket聚合分析:

bucket意为桶，相当于分桶策略，上面有说类似于group by语法，分桶策略如下：
age<20的放进A桶，20<age<50的放进B桶，age>50的放进C桶

常见的bucket分析如下：

Terms ,Range, Date Range, Histogram, Date Histogram

Terms

Terms: 最简单的分桶策略，直接按照term来分桶，如果是text类型，则按照分词后的结果分桶
案例：统计该索引下字段的值出现次数

请求参数：
GET /my_index1/_search
{
  "size":0,
  "aggs": {
    "group_by_terms": {
      "terms": {
        "field": "terms.keyword"
      }
    }
  }
}

返回：
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "ABcdeFGHIjkhhh",
          "doc_count": 2
        },
        {
          "key": "充电器 ",
          "doc_count": 2
        },
        {
          "key": "ABcdeFGHIjk",
          "doc_count": 1
        }
      ]
    }
  }
}

Range

Range: 通过指定数值的范围来设定分桶规则
案例如下:注意看to,from的对比*号

请求参数
GET  /range/_search
{
  "size": 0,
  "aggs": {
    "range_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "to": 25
          },
          {
            "from": 25,
            "to": 35
          },
          {
            "from": 35
          }
        ]
      }
    }
  }
}
返回
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "range_age": {
      "buckets": [
        {
          "key": "*-25.0",
          "to": 25,
          "doc_count": 9
        },
        {
          "key": "25.0-35.0",
          "from": 25,
          "to": 35,
          "doc_count": 0
        },
        {
          "key": "35.0-*",
          "from": 35,
          "doc_count": 0
        }
      ]
    }
  }
}

Date Range

**Range: 通过指定日期的范围来设定分桶规则
顾名思义，案例省略了，自己写个玩玩

Historgram

Historgram:直方图，以文档最小值开始，固定间隔的策略来分割数据
这个地方我也没太理解官方为何如此设计，看语法应该是统计min～max之间的
案例如下:

请求：
GET historgram/_search
{
  "size": 0,
  "aggs": {
    "hist_age": {
      "histogram": {
        "field": "age",
        "interval": 10,
        "extended_bounds":{
          "min":30,
          "max":60
        }
      }
    }
  }
}
返回：
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "hist_age": {
      "buckets": [
        {
          "key": -20,
          "doc_count": 1
        },
        {
          "key": -10,
          "doc_count": 1
        },
        {
          "key": 0,
          "doc_count": 3
        },
        {
          "key": 10,
          "doc_count": 0
        },
        {
          "key": 20,
          "doc_count": 0
        },
        {
          "key": 30,
          "doc_count": 0
        },
        {
          "key": 40,
          "doc_count": 0
        },
        {
          "key": 50,
          "doc_count": 0
        },
        {
          "key": 60,
          "doc_count": 0
        }
      ]
    }
  }
}

Date Histogram

Date Histogram: 针对日期的直方图或者柱状图，是时序分析中常用的聚合分析类型

Metric聚合分析

Metric聚合分析分为单值分析和多值分析两类：

单值分析，只输出一个分析结果
1.min,max,avg,sum
2.cardinality
多值分析，输出多个分析结果
1.stats,extended stats
2.percentile,percentile rank
3.top hits

下面举两例说明：
其它介绍一下意思，具体转官网细看，直通车在文章首页

min,max,avg,sum 返回数值字段的最小值/最大值/平均值/总和：
cardinality：意为集合的势，或者基数，是指不同数值的个数，类似SQL中的distinct count概念，理解为去重统计即可
stats，extended stats
stats:返回一系列数值类型的统计值，包含min、max、avg、sum和count
extended stats：对stats的扩展，包含了更多的统计数据，比如方差、标准差等

min

返回最小值

请求参数:
GET my_index/_search
{
  "size": 0,  //不返回文档列表 
  "aggs": { //聚和方式
    "minCount": {//统计返回类型
      "min": {    //统计类型(最小值/最大值/平均值----->min/max/avg/sum)
        "field": "min" //统计哪个字段
      }
    }
  }
}
返回：
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": { //命中
    "total": 3, //命中数量
    "max_score": 0,
    "hits": []
  },
  "aggregations": { 
    "minCount": { //返回的key对象
      "value": 1 //返回的值
    }
  }
}

Percentile

Percentile: 百分位数统计

GET test1001/_search
{
  "size": 0,
  "aggs": {
    "per_age": {
      "percentiles": {
        "field": "age",
        "percents": [
          1,
          5,
          25
        ]
      }
    }
  }
}

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "per_age": {
      "values": {
        "1.0": 1.04,
        "5.0": 1.2,
        "25.0": 2
      }
    }
  }
}

古月充电器

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch的聚合分析详解以及应用场景

官网直通车什么是聚合分析(aggregation)？聚合分析es除搜索功能外提供的针对es数据做统计分析的功能特点:功能丰富:提供了Bucketing，Metric，Matrix，Pipeline等多种分析方式实时性高:所有等计算结果都是实时返回等，而hadoop是T+1级别，也就是隔天aggregation应用场景：统计该商户一周每天的订单统计一月每天的金额是多少...
复制链接

扫一扫