Hive实践(二)

任务 1:全国各省(省、自治区、直辖市、特别行政区等省级行政
区)微博注册用户的比例。
说明:“比例”均使用百分数,精确到小数点后 2 位,如:23.45%

select province, count(*) from `user` group by province; 

select count(*) from `user`; 

说明:下边语句中的 round(d, n)是将 double 类型的 d 数字,保留
n 位小数位

select p.province, round(p.puser / u.totaluser, 4) from 
(select count(*) as puser from `user` group by province) p, 
(select count(*) as totaluser from `user`) u; 

最终语句:

select t2.provincename, concat(round(t1.rate * 100, 2),'%')
from (select p.province, p.puser / u.totaluser as rate 
from (select province, count(*) as puser from `user` group by province) p, 
(select count(*) as totaluser from `user`) u order by rate desc) t1 
join provinces t2 on  t1.province=t2.province; 

任务 2:喜欢在夜晚(20:00-24:00)发微博的用户比例。
说明:如果一个用户在夜晚发微博的数目大于在其他时间的发微博
数目,则定义他/她喜欢在夜晚发微博。

总微薄用户人数:

select count(distinct uid) from weekly; 
 
select count(*) from weekly group by uid, 
(hour(created_at) >=20 or hour(created_at) <= 24) as h; 
 
select count(*) from weekly where hour(created_at) >=20 or 
hour(created_at) <= 24; 
 
select a.cnt / b.cnt from (select count(*) as cnt from 
weekly where hour(created_at) >=20 or hour(created_at) <= 
24) a, (select count(*) as cnt from weekly) b; 

任务 3:大 V 用户的男女性别比例。
说明:V 是“Verified”的简称,即“userdata.csv”文件中的
verified 字段

select gender, count(*) from `user` where verified=true 
group by verified, gender; 
 
select a.cnt from (select gender, count(*) as cnt from 
`user` where verified=true group by verified, gender) a 
where a.gender='f' ;
 
select a.cnt from (select gender, count(*) as cnt from 
`user` where verified=true group by verified, gender) a 
where a.gender='m' ;
 
select count(*) from `user` where verified=true; 
 
select round(f.cnt / c.cnt, 4), round(m.cnt / c.cnt, 4) 
from  
(select a.cnt as cnt from (select gender, count(*) as cnt 
 from `user` where verified=true  group by verified, 
gender) a 
 where a.gender='f') f, 
(select a.cnt as cnt from (select gender, count(*) as cnt  
from `user` where verified=true group by verified, gender) 
a  
where a.gender='m') m, 
(select count(*) as cnt from `user` where verified=true) c; 

任务 4:最受关注的前 5 名用户 ID。
说明:总转发数大的用户,定义为更受关注的用户。

select retweeted_uid, count(*) as cnt from weekly where 
retweeted_uid !=''  group by retweeted_uid order by cnt 
desc limit 5;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值