Elasticsearch聚合分析(三)

最新推荐文章于 2022-11-08 17:26:31 发布

酱g

最新推荐文章于 2022-11-08 17:26:31 发布

阅读量441

点赞数 1

分类专栏： elasticsearch 文章标签： elasticsearch 聚合分析 elasticsearch 统计

本文链接：https://blog.csdn.net/qq_33283716/article/details/81668821

版权

elasticsearch 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

　以一个家电卖场中的手机销售数据为背景，来对各种品牌，各种颜色的手机的销量和销售额，进行各种各样角度的分析，首先建立手机销售的索引，然后

添加几条销售记录

PUT /mobile
{
    "mappings": {
        "sales": {
            "properties": {
                "price": {
                    "type": "long"
                },
                "color": {
                    "type": "keyword"
                },
                "brand": {
                    "type": "keyword"
                },
                "sold_date": {
                    "type": "date"
                }
            }
        }
    }
}

POST /mobile/sales/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "red", "brand" : "iphone", "sold_date" : "2017-11-28" }
{ "index": {}}
{ "price" : 2000, "color" : "red", "brand" : "iphone", "sold_date" : "2017-10-05" }
{ "index": {}}
{ "price" : 3000, "color" : "green", "brand" : "samsung", "sold_date" : "2017-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "black", "brand" : "huawei", "sold_date" : "2017-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "green", "brand" : "huawei", "sold_date" : "2017-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "red", "brand" : "iphone", "sold_date" : "2017-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "red", "brand" : "meizu", "sold_date" : "2018-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "black", "brand" : "samsung", "sold_date" : "2018-02-12" }

1. 统计哪种颜色的手机销量最高

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
    "size" : 0,
    "aggs" : { 
        "height_sales" : { 
            "terms" : { 
              "field" : "color"
            }
        }
    }
}'

size：只获取聚合结果，而不要执行聚合的原始数据
aggs：固定语法，要对一份数据执行分组聚合操作
height_sales：就是对每个aggs，都要起一个名字，这个名字是随机的，你随便取什么都ok
terms：根据字段的值进行分组
field：根据指定的字段的值进行分组

2. 统计每种颜色手机的平均价格

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
    "size":0,
    "aggs":{
        "height_sales":{
            "terms":{
                "field":"color"
            },
            "aggs":{
                "avg_price":{
                    "avg":{
                        "field":"price"
                    }
                }
            }
        }
    }
}'

　　按照color去分bucket，可以拿到每个color bucket中的数量，这个仅仅只是一个bucket操作，doc_count其实只是es的bucket操作默认执行的一个内置metric，除了bucket操作，分组，还要对每个bucket执行一个metric聚合统计操作，在一个aggs执行的bucket操作（terms），平级的json结构下，再加一个aggs，这个第二个aggs内部，同样取个名字，执行一个metric操作，avg，对之前的每个bucket中的数据的指定的field，price field，求一个平均值

3. 颜色加品牌多层下钻分析

　　从颜色到品牌进行下钻分析，每种颜色的平均价格，以及找到每种颜色每个品牌的平均价格，下钻的意思是，已经分了一个组了，比如说颜色的分组，然后还要继续对这个分组内的数据，再分组，比如一个颜色内，还可以分成多个不同的品牌的组，最后对每个最小粒度的分组执行聚合分析操作，这就叫做下钻分析

　　es，下钻分析，就要对bucket进行多层嵌套，多次分组按照多个维度（颜色+品牌）多层下钻分析，而且学会了每个下钻维度（颜色，颜色+品牌），都可以对每个维度分别执行一次metric聚合操作

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "color_avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "group_by_brand": {
          "terms": {
            "field": "brand"
          },
          "aggs": {
            "brand_avg_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}'

其他metric，例如 count，avg

count：bucket，terms，自动就会有一个doc_count，就相当于是count
avg：avg aggs，求平均值
max：求一个bucket内，指定field值最大的那个数据
min：求一个bucket内，指定field值最小的那个数据
sum：求一个bucket内，指定field值的总和

一般来说，90%的常见的数据分析的操作，metric，无非就是count，avg，max，min，sum

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
    "size":0,
    "aggs":{
        "height_sales":{
            "terms":{
                "field":"color"
            },
            "aggs":{
                "avg_price":{
                    "avg":{
                        "field":"price"
                    }
                },
                "min_price":{
                    "min":{
                        "field":"price"
                    }
                },
                "max_price":{
                    "max":{
                        "field":"price"
                    }
                },
                "sum_price":{
                    "sum":{
                        "field":"price"
                    }
                }
            }
        }
    }
}'

4. histogram：类似于terms，也是进行bucket分组操作，接收一个field，按照这个field的值的各个范围区间，进行bucket分组操作

"histogram":{
"field": "price",
"interval": 2000
},

interval：2000，划分范围，0~2000，2000~4000，4000~6000，6000~8000，8000~10000，buckets

去根据price的值，比如2500，看落在哪个区间内，比如2000~4000，此时就会将这条数据放入2000~4000对应的那个bucket中

bucket划分的方法，terms，将field值相同的数据划分到一个bucket中

bucket有了之后，同样可以对每个bucket执行avg，count，sum，max，min，等各种metric操作，聚合分析

示例；按照价格区间统计销售额和手机销量

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
   "size" : 0,
   "aggs":{
      "price":{
         "histogram":{ 
            "field": "price",
            "interval": 2000
         },
         "aggs":{
            "revenue": {
               "sum": { 
                 "field" : "price"
               }
             }
         }
      }
   }
}'

bucket，分组操作，histogram，按照某个值指定的interval，划分一个一个的bucket

date histogram，按照我们指定的某个date类型的日期field，以及日期interval，按照一定的日期间隔，去划分bucket

date interval = 1m，

2017-01-01~2017-01-31，就是一个bucket
2017-02-01~2017-02-28，就是一个bucket

然后会去扫描每个数据的date field，判断date落在哪个bucket中，就将其放入那个bucket

2017-01-05，就将其放入2017-01-01~2017-01-31，就是一个bucket

min_doc_count：即使某个日期interval，2017-01-01~2017-01-31中，一条数据都没有，那么这个区间也是要返回的，不然默认是会过滤掉这个区间的
extended_bounds，min，max：划分bucket的时候，会限定在这个起始日期，和截止日期内

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
   "size" : 0,
   "aggs": {
      "sales": {
         "date_histogram": {
            "field": "sold_date",
            "interval": "month", 
            "format": "yyyy-MM-dd",
            "min_doc_count" : 0, 
            "extended_bounds" : { 
                "min" : "2016-01-01",
                "max" : "2017-12-31"
            }
         }
      }
   }
}'

示例：统计每个季度每个品牌的销售额

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
  "size": 0,
  "aggs": {
    "group_by_sold_date": {
      "date_histogram": {
        "field": "sold_date",
        "interval": "quarter",
        "format": "yyyy-MM-dd",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": "2016-01-01",
          "max": "2017-12-31"
        }
      },
      "aggs": {
        "group_by_brand": {
          "terms": {
            "field": "brand"
          },
          "aggs": {
            "sum_price": {
              "sum": {
                "field": "price"
              }
            }
          }
        },
        "total_sum_price": {
          "sum": {
            "field": "price"
          }
        }
      }
    }
  }
} '

5. 统计指定品牌下每个颜色的销量

任何的聚合，都必须在搜索出来的结果数据中进行，搜索结果，就是聚合分析操作的scope

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
  "size": 0,
  "query": {
    "term": {
      "brand": {
        "value": "iphone"
      }
    }
  },
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      }
    }
  }
}

6. 单个品牌与所有品牌销量对比

一个聚合操作，必须在query的搜索结果范围内执行出来两个结果，一个结果，是基于query搜索结果来聚合的; 一个结果，是对所有数据执行聚合的

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
  "size": 0, 
  "query": {
    "term": {
      "brand": {
        "value": "iphone"
      }
    }
  },
  "aggs": {
    "single_brand_avg_price": {
      "avg": {
        "field": "price"
      }
    },
    "all": {
      "global": {},
      "aggs": {
        "all_brand_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }

global：就是global bucket，就是将所有数据纳入聚合的scope，而不管之前的query

7. 统计价格大于1200的手机平均价格

搜索+聚合，过滤+聚合

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 1200
          }
        }
      }
    }
  },
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

8. 统计手机品牌最近一个月的销量

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
  "size": 0,
  "query": {
    "term": {
      "brand": {
        "value": "iphone"
      }
    }
  },
  "aggs": {
    "recent_150d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-150d"
          }
        }
      },
      "aggs": {
        "recent_150d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    },
    "recent_140d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-140d"
          }
        }
      },
      "aggs": {
        "recent_140d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    },
    "recent_130d": {
      "filter": {
        "range": {
          "sold_date": {
            "gte": "now-130d"
          }
        }
      },
      "aggs": {
        "recent_130d_avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

aggs.filter，针对的是聚合去做的，如果放query里面的filter，是全局的，会对所有的数据都有影响

但是，如果，比如说你要统计iphone最近1个月的平均值; 最近3个月的平均值; 最近6个月的平均值

bucket filter：就是对不同的bucket下的aggs，进行filter

9. 统计每个颜色的手机的销售额，按照销售额降序排序

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color",
        "order": {
          "avg_price": "asc"
        }
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

类似引用其他变量，本例中就是引用aggs中统计的每个颜色手机品牌的平均价格

6. 颜色+品牌下钻分析时按最深层metric进行排序

curl -XGET http://localhost:9200/mobile/sales/_search -d'
{
  "size": 0,
  "aggs": {
    "group_by_color": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "group_by_brand": {
          "terms": {
            "field": "brand",
            "order": {
              "avg_price": "desc"
            }
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}