有50W个京东店铺,每个顾客访客访问任何一个店铺的任何一个商品时都会产生一条访问日志,访问日志存储的表名为Visit,访客的用户id为user_id,被访问的店铺名称为shop,请统计:
1)每个店铺的UV(访客数)
2)每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数
u01 a
u02 b
u03 b
u04 a
u01 b
u01 c
u02 b
u01 c
u01 a
u02 b
u03 c
u04 a
u01 c
u01 c
u02 a
u01 bcreate table sql02(userid string,shop string)
row format delimited fields terminated by '\t'
stored as textfile;load data local inpath '/root/in/sql02' into table sql02;
1)每个店铺的UV(访客数)
去重每个店的用户
select shop,userid from sql02 group by shop,userid;t1计算pv
select
shop,
count(*)
from (select shop,userid from sql02 group by shop,userid)t1
group by shop;
去重每个店的每个用户,获得数量
select
userid,
shop,
count(*) num
from
sql02
group by userid,shop;t12)每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数
排名
select
shop,
userid,
num,
row_number() over(partition by shop order by num desc) rank
from
(select
userid,
shop,
count(*) num
from
sql02
group by userid,shop)t1;t2取前三
select shop,userid,num
from
(select
shop,
userid,
num,
row_number() over(partition by shop order by num desc) rank
from
(select
userid,
shop,
count(*) num
from
sql02
group by userid,shop)t1)t2
where rank<3;