HQL练习_03:用户观看节目

用户观看节目hql练习
感觉很好的题目,记录下,
数据如下
date user age programid Playtime
20190421 u1 30 a 4min
20190421 u1 30 b 10min
20190421 u2 27 a 2min
20190422 u3 35 c 3min
20190422 u2 27 d 1min

问题:
1.统计:用户总量,用户平均年龄,用户平均观看时长
2.统计:每10岁一个分段,统计每个区间的用户总量,用户平均观看时长
3.统计:每个用户最喜欢的节目
4.统计:观看时长大于5min的用户总量,只要有一个节目用户观看时间小于5min就不能算

-- 建表
create table program(
	date bigint,
	user string,
	age int,
	programid string,
	playtime int
)
row format delimited
fields terminated by '\t'
stored as textfile
;

题目:
1.统计:用户总量,用户平均年龄,用户平均观看时长
2.统计:每10岁一个分段,统计每个区间的用户总量,用户平均观看时长
3.统计:每个用户最喜欢的节目
4.统计:观看时长大于5min的用户总量,只要有一个节目用户观看时间小于5min就不能算

解答:

第一问:统计:用户总量,用户平均年龄,用户平均观看时长

select count(*),avg(age),avg(spt)
from 
(select uname,age,sum(playtime) as spt
from program group by uname,age) t;

第二问:每10岁一个分段,统计每个区间的用户总量,用户平均观看时长

一开始做错了:

select
sum(if(age<=10,1,0)) as  `0-10岁用户总数`,
coalesce(Round(sum(if(age<=10,playtime,0))/sum(if(age<=10,1,0)),2),0) as  `0-10岁平均时长`,
sum(if(age>10 and age<=20,1,0)) as  `10-20岁用户总数`,
coalesce(Round(sum(if(age>10 and age<=20,playtime,0))/sum(if(age>10 and age<=20,1,0)),2),0) as  `10-20岁平均时长`,
sum(if(age>20 and age<=30,1,0)) as  `20-30岁用户总数`,
coalesce(Round(sum(if(age>20 and age<=30,playtime,0))/sum(if(age>20 and age<=30,1,0)),2),0) as  `20-30岁平均时长`,
sum(if(age>30 and age<=40,1,0)) as  `30-40岁用户总数`,
coalesce(Round(sum(if(age>30 and age<=40,playtime,0))/sum(if(age>30 and age<=40,1,0)),2),0) as  `30-40岁平均时长`,
sum(if(age>40 and age<=50,1,0)) as  `40-50岁用户总数`,
coalesce(Round(sum(if(age>40 and age<=50,playtime,0))/sum(if(age>40 and age<=50,1,0)),2),0) as  `40-50岁平均时长`
from program;

错了!!因为,因为计算人数的时候根本没对用户进行去重
----------------------
一个job
OK
0-10岁用户总数  0-10岁平均时长  10-20岁用户总数 10-20岁平均时长 20-30岁用户总数 20-30岁平均时长 30-40岁用户总数 30-40岁平均时长40-50岁用户总数 40-50岁平均时长
0       0.0     0       0.0     4       4.25    1       3.0     0       0.0
Time taken: 51.096 seconds, Fetched: 1 row(s)

----------------------------

正确答案:

解法一:

select flag,
count(uname),
Round(avg(spt),2)
from 
(
select 
uname,
(case when age<10 then 0
when age>=10 and age<20 then 1
when age>=20 and age<30 then 2
when age>=30 and age<40 then 3
when age>=40 and age<50 then 4
end
) as flag,
sum(playtime) spt
from program
group by uname,age
)t 
group by flag
;
----------------
2个job
OK
flag    _c1     _c2
2       1       3.0
3       2       8.5
Time taken: 83.925 seconds, Fetched: 2 row(s)
------------------------------------------------------
解法二:
大佬使用floor函数,更加简洁,学习了

select 
  flag,count(uname),round(avg(spt),2)
from 
(select
uname,
floor(age/10) flag,
sum(playtime) as spt
from program
group by uname,age
)t
group by flag
;
---------------------------
OK
flag    _c1     _c2
2       1       3.0
3       2       8.5
Time taken: 88.804 seconds, Fetched: 2 row(s)

第三问:统计:每个用户最喜欢的节目

我的麻烦的写法:

select 
uname,programid
from(
select
uname,programid,
row_number() over(partition by uname order by spt desc) as rn
from
(select  uname,programid,sum(playtime) spt from program group by uname,programid) t1
)t2
where t2.rn=1;

2JOB
OK
uname   programid
u1      b
u2      a
u3      c
Time taken: 85.238 seconds, Fetched: 3 row(s)
----------------------------

人家写的:
少了一个嵌套,order by后面只写sum函数,之前不知道可以这样写

select 
uname,programid
from(
select
uname,programid,
sum(playtime),
row_number() over(partition by uname order by sum(playtime) desc) as rn
from program
 group by uname,programid
)t2
where t2.rn=1;

ps:有个小插曲,
如果在order by 后面直接写sum(playtime),那么要在select后面,写sum(play time)
否则报错:

//在orderby后面加sum函数报错,因为select后面没有加上sum(playtime)
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 4:0 Expression not in GROUP BY key 'playtime'



第四问:观看时长大于5min的用户总量,只要有一个节目用户观看时间小于5min就不能算

select 
count(uname)
from(
select 
uname
from program
group by uname
having max(playtime)>=5) t;


原来的错误思路:
select 
count(uname)
from program
group by uname
having max(playtime)>=5;
这样的得到的是每个分组的各自的数量,不是有多少个分组的数量
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值