hive多维分析

Hive常见的多维度聚合函数有:

  • With Cube:任意维度聚合
  • Grouping Sets:指定维度聚合
  • With Rollup:GROUP BY最左侧的维度为主层级聚合

测试sql:

use db01;
create table if not exists db01.t12(id int,name string,age string,dt string)
row format delimited fields terminated by ',' stored as textfile;
insert into db01.t12 values(1,'aa','12','2021-01-01'),(1,'aa','12','2021-01-02'),(2,'aa','14','2021-02-01'),(2,'bb','14','2021-02-02'),(3,'cc','16','2020-01-01');

1.分組(n个维度1种)

select id,name,count(dt)
from db01.t11
group by id,name

+-----+-------+------+
| id  | name  | _c2  |
+-----+-------+------+
| 1   | aa    | 2    |
| 2   | aa    | 1    |
| 3   | cc    | 1    |
| 2   | bb    | 1    |
+-----+-------+------+

2. with cube(n个维度2^n种)

select id,name,count(dt)
from db01.t11
group by id,name
with cube
+-------+-------+------+
|  id   | name  | _c2  |
+-------+-------+------+
| NULL  | aa    | 3    |
| NULL  | bb    | 1    |
| 2     | NULL  | 2    |
| NULL  | NULL  | 5    |
| NULL  | cc    | 1    |
| 1     | NULL  | 2    |
| 3     | NULL  | 1    |
+-------+-------+------+
| 1     | aa    | 2    |
| 2     | aa    | 1    |
| 3     | cc    | 1    |
| 2     | bb    | 1    |

3.with rollup(n个维度n+1种)

select id,name,count(dt)
from db01.t11
group by id,name
with rollup
+-------+-------+------+
|  id   | name  | _c2  |
+-------+-------+------+
| 2     | NULL  | 2    |
| NULL  | NULL  | 5    |
| 1     | NULL  | 2    |
| 3     | NULL  | 1    |
+-------+-------+------+
| 1     | aa    | 2    |
| 2     | aa    | 1    |
| 3     | cc    | 1    |
| 2     | bb    | 1    |

4.grouping sets

select id,name,count(dt),GROUPING__ID --两个下划线
from db01.t11
group by id,name
grouping sets((id,name),(),(id),(name))

+-------+-------+------+---------------+
|  id   | name  | _c2  | grouping__id  |
+-------+-------+------+---------------+
| NULL  | aa    | 3    | 2             |
| NULL  | bb    | 1    | 2             |
| 2     | NULL  | 2    | 1             |
| 2     | bb    | 1    | 0             |
| NULL  | NULL  | 5    | 3             |
| NULL  | cc    | 1    | 2             |
| 1     | NULL  | 2    | 1             |
| 1     | aa    | 2    | 0             |
| 2     | aa    | 1    | 0             |
| 3     | NULL  | 1    | 1             |
| 3     | cc    | 1    | 0             |
+-------+-------+------+---------------+

grouping_id 算法

grouping_id的值是根据groupBy的列是否使用和列的顺序来决定。靠近groupBy的列为高位,远离groupBy的列为低位;列被使用则为’0’,列没有被使用则为’1’。按照此规则,对每种粒度的组合生成一组二进制数,然后将二进制数转成十进制数。(dim1, dim2), (dim1), (dim2), ( ) 对应的二进制数和十进制数见表格

代码实现 grouping_id

concat_ws  
    (
        ':'
      , case when dim1 is not null then 'dim1' else null end
      , case when dim2 is not null then 'dim2' else null end
    ) as group_id
    
聚合粒度 -> group_id
(dim1, dim2) -> dim1:dim2
(dim1) -> dim1
(dim2) -> dim2
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值