数据准备
- 准备一张表
t_order
,字段有:
province:省
city:市
area:区
channel:渠道,1为小程序,2为app
price:价格
该表表示某个区域的某个渠道的订单交易额,这里忽略时间维度。
- 创建表
create table demo.t_order (
province string,
city string,
area string,
channel int,
price int
)
row format delimited fields terminated by ',';
t_order.txt
01,0101,010101,1,60
01,0101,010101,2,55
01,0101,010102,1,70
01,0101,010102,2,85
01,0101,010103,1,66
01,0101,010103,2,73
01,0102,010201,1,64
01,0102,010201,2,41
01,0102,010202,1,33
01,0102,010202,2,28
01,0102,010203,1,75
01,0102,010203,2,61
01,0103,010301,1,34
01,0103,010301,2,96
01,0103,010302,1,88
01,0103,010302,2,71
01,0103,010303,1,70
01,0103,010303,2,64
- hive加载数据
load data local inpath '/export/data/t_order.txt' overwrite into table demo.t_order;
Grouping set
- 根据(省、市、区),(省、市),(省)进行分组
select
province,
city,
area
from demo.t_order
group by
grouping sets (
province,
(province, city),
(province, city, area)
)
返回的数据为
即有多个组合,根据不同城市进行组合
Grouping
往上面的sql语句中添加Grouping操作,增加一个grouping_type
字段,表示是根据某个维度进行分组
select
province,
city,
area,
case when grouping (province, city, area) = 0
then 'area'
when grouping (province, city) = 0
then 'city'
when grouping (province) = 0
then 'province'
end as group_type
from demo.t_order
group by
grouping sets (
province,
(province, city),
(province, city, area)
)
得到的结果为:
以 grouping (province, city, area) = 0
为例,当分组的数据中包含了同个区域的省、市、区时,grouping 返回的值为二进制的000
,即数值为0,以此类推
- 根据对应的分组,返回不同渠道的订单总额
select
province,
city,
area,
case when grouping (province, city, area) = 0
then 'area'
when grouping (province, city) = 0
then 'city'
when grouping (province) = 0
then 'province'
end as group_type,
case when grouping (province, city, area) = 0
then sum(case when channel = 1 then price else 0 end)
when grouping (province, city) = 0
then sum(case when channel = 1 then price else 0 end)
when grouping (province) = 0
then sum(case when channel = 1 then price else 0 end)
end as mp_price,
case when grouping (province, city, area) = 0
then sum(case when channel = 2 then price else 0 end)
when grouping (province, city) = 0
then sum(case when channel = 2 then price else 0 end)
when grouping (province) = 0
then sum(case when channel = 2 then price else 0 end)
end as app_price
from demo.t_order
group by
grouping sets (
province,
(province, city),
(province, city, area)
)
返回结果为: