目录
0 需求背景
实际生产中,各种指标的报表统计,往往都会涉及到多维分析,比如,统计日活数,日会话次数,日回头访客数,日新,日用户平均访问时长,访问深度……都需要从不同维度,各种角度去分析,如下图1所示的统计需求,统计其访问深度需要做层次查询,或做多个维度的统计分析,如图2所示,面对此类需求我们应该如何做呢?一种我们可以采用UNION ALL的方式实现,但这种方式比较麻烦工作量大,代码繁琐也不便于维护,另一种就是采用Hive中为我们提供的高阶函数进行求解。
1 高阶函数使用
我们以如下案例来引出我们Hive中高阶函数的使用方法
表结构如下:
create table t
(
province_name string comment '省份名称'
,city_name string comment '城市名称'
,area_name string comment '地区名称'
,people_cnt int comment '人口数量'
,amt decimal(16,2) comment 'GDP'
)
row format delimited fields terminated by ','
create table t as
select '广东' as province_name,'广州' as city_name ,'白云区' as area_name,100 as people_cnt,30 as amt
union all
select '广东' as province_name,'广州' as city_name ,'番禺区' as area_name,120 as people_cnt,45 as amt
union all
select '广东' as province_name,'深圳' as city_name ,'福田区' as area_name,200 as people_cnt,67 as amt
union all
select '广东' as province_name,'深圳' as city_name ,'南山区' as area_name,290 as people_cnt,167 as amt
union all
select '浙江' as province_name,'杭州' as city_name ,'萧山区' as area_name,80 as people_cnt,20 as amt
union all
select '浙江' as province_name,'杭州' as city_name ,'滨江区' as area_name,120 as people_cnt,50 as amt
union all
select '浙江' as province_name,'宁波' as city_name ,'江东区' as area_name,80 as people_cnt,20 as amt
union all
select '浙江' as province_name,'宁波' as city_name ,'江北区' as area_name,45 as people_cnt,10 as amt
表数据如下: