备注:场景题是50道挑战题的提升,每日挑战五题
第一题:
了解哪些窗口函数,都是什么功能?找一个在某个业务中的应用? 手写窗口函数及功能意义,同时随便写一个带窗口函数的sql,并说明其sql的含义。
窗口函数:通常格式为 可用函数+over()函数 # 查询所有明细 select * from t_order; # 查询总量 select count(*) from t_order; # 等价于 select *,count(*) over() from t_order;
2、求出每个栏目的被观看次数及累计观看时长?
数据: video表
uid channel min 1 1 23 2 1 12 3 1 12 4 1 32 5 1 342 6 2 13 7 2 34 8 2 13 9 2 134
create table video( uid int, channel string, min int ) row format delimited fields terminated by ' ' ; load data local inpath './hivedata/video.txt' into table video;
答案:
问题:求出每个栏目的被观看次数及累计观看时长? 分析: channel:频道 sum(min):观看时长 通过channel进行分组 group by channel= 每个栏目 观看次数:count(*) select channel,count(*) countnum,sum(min) total from video group by channel; result: 1 5 421 2 4 194 channel 1 NULL
3、编写sql实现
数据:
userid,month,visits A,2015-01,5 A,2015-01,15 B,2015-01,5 A,2015-01,8 B,2015-01,25 A,2015-01,5 A,2015-02,4 A,2015-02,6 B,2015-02,10 B,2015-02,5 A,2015-03,16 A,2015-03,22 B,2015-03,23 B,2015-03,10 B,2015-03,1
drop table visits; create table visits( userid string, month string, visits int ) row format delimited fields terminated by ',' ; load data local inpath './hivedata/visits.txt' overwrite into table visits;
完成需求:每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数,结果数据格式如下:
+---------+----------+---------+-------------+---------------+--+ | userid | month | visits | max_visits | total_visits | +---------+----------+---------+-------------+---------------+--+ | A | 2015-01 | 33 | 33 | 33 | | A | 2015-02 | 10 | 33 | 43 | | A | 2015-03 | 38 | 38 | 81 | | B | 2015-01 | 30 | 30 | 30 | | B | 2015-02 | 15 | 30 | 45 | | B | 2015-03 | 34 | 34 | 79 | +---------+----------+---------+-------------+---------------+--+
完成需求:每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数,结果数据格式如下: 分析: select userid,month, visits, max(visits) over(distribute by userid sort by month) max_visits, sum(visits) over(distribute by userid sort by month) total_visits from ( select userid,month, sum(visits) visits from visits group by userid,month )t; result: A 2015-01 33 33 33 A 2015-02 10 33 43 A 2015-03 38 38 81 B 2015-01 30 30 30 B 2015-02 15 30 45 B 2015-03 34 34 79
4、编写连续7天登录的总人数:
数据: t1表
aa
drop table login; create table login( Uid int, dt string, login_status int ) row format delimited fields terminated by ' ' ; load data local inpath './hivedata/login.txt' into table login;
问什么? 编写求连续七天登录的总人数? 分析: 总人数是count进行统计 连续七天登录就是登入天数>=7 且登入状态为1 select count(*) from ( select distinct uid from( select uid,dt,lag(dt,6) over(partition by uid order by dt) pre_dt, sum(login_status) over(partition by uid order by dt rows between 6 preceding and current row) total from login ) t where date_sub(dt,6)=pre_dt and t.total=7) t1;
5、你知道的排名函数有哪些?说一说它们之间的区别? 文字说明即可
row_number():从1开始,按照顺序,生成分组内记录的序列,不会存在重复,当排序的值相同是,按照表中记录的顺序进行排列 -- 有序随机不重复 dense_rank():生成数据项在分组中的排名,排名相等会在名次中不会留下空位 -- 有序重复 rank(): 生成数据项在分组中的排名,排名相等会在名次中留下空位 -- 无序重复跳转