数据分析师-SQL笔试题-做透这道题就够了

笔者在求职数据分析师岗位时,在知乎上看到了一道非常经典的SQL题,问题来源: link笔者认为彻底的搞懂这道题以及涉及的知识点,能够通过绝大部分数据分析岗位的SQL笔试。下面是这道题的问题及解答,答案为笔者原创,代码均在oracle数据库调试通过,但可能有考虑不周全的地方或其他解法,欢迎大家在评论区交流

表结构介绍

1、用户活跃表:tmp_liujg_dau_based
1)imp_date,日期,string格式,样例20190601
2)qimei,用户唯一标识,string格式,无空值
3)is_new,新用户标识,string,1表示新用户,0表示老用户
说明:以imp_date、qimei为主键,一个用户1天只出现1次,出现即表示当日登陆

2、红包活动参与表:tmp_liujg_packet_based
1)imp_date,日期,string格式,样例20190601
2)report_time,领取时间戳,string格式
3)qimei,用户唯一标识,string格式,无空值
4)add_money,领取金额,string格式,表示领取金额,单位为分,无空值或0值
说明:日志流水表,每一行为领取1次红包。无特殊情况说明,一般不考虑领取红包但当日未登录的情况

问题描述

第一问

计算20190601至今,每日DAU(活跃用户量,即有登陆的用户)
输出维度:imp_date(日期)
输出指标:dau

第二问

计算20190601至今,每日领取红包的新用户数,老用户数,及人均领取金额,人均领取次数
输出维度:imp_date(日期),is_new(新用户1,老用户0,未知2)
输出指标:mean_money(人均领取金额),mean_get_count(人均领取次数)

第三问

计算2019年3月,每个月按领红包取天数为1、2、3……30、31天区分,计算取每个月领取红包的用户数,人均领取金额,人均领取次数
输出维度:month(月份),get_money_das(领取天数)
输出指标:user_count(用户数),mean_money(人均领取金额),mean_get_count(人均领取次数)

第四问

计算2019年3月,每个月领过红包用户和未领红包用户的数量,平均月活跃天数(即本月平均活跃多少天)
输出维度:month(月份),is_packet_user(红包用户1,非红包用户0)
输出指标:user_count(用户数量),mean_days_in_month(月活跃天数)

第五问

计算2019年3月至今,每个月活跃用户的注册日期,2019年3月1日前注册的用户日期填空即可
输出维度:month(月份),qimei(用户唯一标识)
输出指标:register_date(注册日期)

第六问

计算2019年3月至今,每日的用户次日留存率,领取红包用户的次日留存,未领取红包用户的次日留存率
输出维度:imp_date(日期)
输出指标:dau(当日用户数),retain_rate(次日留存率)
,packet_user_retain_rate(当日领红包用户次日留存率),unpacket_user_retain_rate(当日未领红包用户次日留存率)

第七问

计算2019年3月1日至今,每日新用户领取的第一个红包的金额
输出维度:imp_date(日期),qimei(用户唯一标识)
输出指标:money(第一个红包的金额)

第八问

计算2019年3月1日至今,每个新用户领取的第一个红包和第二个红包的时间差(只计算注册当日有领取红包的用户,注册当日及以后的DAU表中新用户为1的用户)
输出维度:imp_date(注册日期),qimei(用户唯一标识)
输出指标:first_action_time(首次领红包时间),second_action_time(第二次领红包时间),delta_times(时间差)

第九问

计算2019年6月1日至今,每日领取红包用户领取金额的中位数
输出维度:imp_date(日期)
输出指标:mid_money(用户当日领取红包金额的中位数)

问题解答

建表

create table tmp_liujg_dau_based(-- 活跃用户表
imp_date varchar2(32),-- 日期
qimei  varchar2(32),-- 用户唯一标识
is_new  varchar2(16));-- 新用户标识:1新用户 0老用户

create table tmp_liujg_packed_based  (-- 红包领取表
imp_date  varchar2(32),-- 日期
report_time   varchar2(32),-- 领取时间戳
qimei  varchar2(32),-- 用户唯一标识
add_money varchar2(32));

插入数据

-- 插入用户表数据
insert all
into tmp_liujg_dau_based values('20190301','001','0')
into tmp_liujg_dau_based values('20190301','002','0')
into tmp_liujg_dau_based values('20190301','003','1')
into tmp_liujg_dau_based values('20190301','004','1')
into tmp_liujg_dau_based values('20190301','005','1')
into tmp_liujg_dau_based values('20190301','006','1')
into tmp_liujg_dau_based values('20190302','001','0')
into tmp_liujg_dau_based values('20190302','002','0')
into tmp_liujg_dau_based values('20190302','003','0')
into tmp_liujg_dau_based values('20190302','005','0')
into tmp_liujg_dau_based values('20190302','006','0')
into tmp_liujg_dau_based values('20190302','007','1')
into tmp_liujg_dau_based values('20190303','005','0')
into tmp_liujg_dau_based values('20190303','006','0')
into tmp_liujg_dau_based values('20190303','007','0')
into tmp_liujg_dau_based values('20190303','008','1')
into tmp_liujg_dau_based values('20190303','009','1')
into tmp_liujg_dau_based values('20190303','010','1')
into tmp_liujg_dau_based values('20190401','008','0')
into tmp_liujg_dau_based values('20190401','009','0')
into tmp_liujg_dau_based values('20190401','010','0')
into tmp_liujg_dau_based values('20190401','011','1')
into tmp_liujg_dau_based values('20190401','012','1')
into tmp_liujg_dau_based values('20190402','009','0')
into tmp_liujg_dau_based values('20190402','010','0')
into tmp_liujg_dau_based values('20190402','011','0')
into tmp_liujg_dau_based values('20190402','012','0')
into tmp_liujg_dau_based values('20190402','013','1')
into tmp_liujg_dau_based values('20190402','014','1')
into tmp_liujg_dau_based values('20190501','001','0')
into tmp_liujg_dau_based values('20190501','002','0')
into tmp_liujg_dau_based values('20190501','008','0')
into tmp_liujg_dau_based values('20190501','007','0')
into tmp_liujg_dau_based values('20190501','015','1')
into tmp_liujg_dau_based values('20190501','016','1')
into tmp_liujg_dau_based values('20190501','017','1')
into tmp_liujg_dau_based values('20190501','018','1')
into tmp_liujg_dau_based values('20190601','008','0')
into tmp_liujg_dau_based values('20190601','017','0')
into tmp_liujg_dau_based values('20190601','018','0')
into tmp_liujg_dau_based values('20190601','019','1')
into tmp_liujg_dau_based values('20190601','020','1')
into tmp_liujg_dau_based values('20190601','021','1')
into tmp_liujg_dau_based values('20190601','022','1')
into tmp_liujg_dau_based values('20190603','021','0')
into tmp_liujg_dau_based values('20190603','022','0')
into tmp_liujg_dau_based values('20190603','011','0')
into tmp_liujg_dau_based values('20190603','012','0')
into tmp_liujg_dau_based values('20190603','023','1')
into tmp_liujg_dau_based values('20190701','023','0')
into tmp_liujg_dau_based values('20190701','008','0')
into tmp_liujg_dau_based values('20190701','011','0')
into tmp_liujg_dau_based values('20190701','022','0')
into tmp_liujg_dau_based values('20190701','012','0')
into tmp_liujg_dau_based values('20190701','024','1')
into tmp_liujg_dau_based values('20190701','025','1')
into tmp_liujg_dau_based values('20190701','026','1')
into tmp_liujg_dau_based values('20190701','027','1')
into tmp_liujg_dau_based values('20190705','026','0')
into tmp_liujg_dau_based values('20190705','027','0')
into tmp_liujg_dau_based values('20190705','009','0')
into tmp_liujg_dau_based values('20190705','010','0')
into tmp_liujg_dau_based values('20190705','028','1')
into tmp_liujg_dau_based values('20190705','029','1')
select 1 from dual
-- 插入红包表数据
insert all
 into tmp_liujg_packed_based values('20190301','2019/03/01 10:15:01','001','1.05')
 into tmp_liujg_packed_based values('20190301','2019/03/01 13:15:01','001','2.30')
 into tmp_liujg_packed_based values('20190301','2019/03/01 11:15:01','002','0.80')
 into tmp_liujg_packed_based values('20190301','2019/03/01 17:15:01','002','0.89')
 into tmp_liujg_packed_based values('20190301','2019/03/01 14:15:01','003','2.12')
 into tmp_liujg_packed_based values('20190301','2019/03/01 18:15:01','003','1.12')
 into tmp_liujg_packed_based values('20190301','2019/03/01 17:15:01','005','1.12')
 into tmp_liujg_packed_based values('20190301','2019/03/01 19:15:01','005','0.12')
 into tmp_liujg_packed_based values('20190301','2019/03/01 09:15:01','006','0.98')
 into tmp_liujg_packed_based values('20190301','2019/03/01 11:15:01','006','1.45')
 into tmp_liujg_packed_based values('20190302','2019/03/02 09:30:01','001','0.78')
 into tmp_liujg_packed_based values('20190302','2019/03/02 19:30:01','001','0.88')
 into tmp_liujg_packed_based values('20190302','2019/03/02 10:30:01','003','0.68')
 into tmp_liujg_packed_based values('20190302','2019/03/02 09:30:01','005','1.01')
 into tmp_liujg_packed_based values('20190302','2019/03/02 14:30:01','005','1.88')
 into tmp_liujg_packed_based values('20190302','2019/03/02 14:30:01','006','1.88')
 into tmp_liujg_packed_based values('20190302','2019/03/02 10:30:01','006','0.68')
 into tmp_liujg_packed_based values('20190302','2019/03/02 15:30:01','007','0.68')
 into tmp_liujg_packed_based values('20190302','2019/03/02 16:30:01','007','1.78')
 into tmp_liujg_packed_based values('20190303','2019/03/03 16:30:01','005','0.68')
 into tmp_liujg_packed_based values('20190303','2019/03/03 08:50:01','006','0.32')
 into tmp_liujg_packed_based values('20190303','2019/03/03 10:50:01','006','1.78')
 into tmp_liujg_packed_based values('20190303','2019/03/03 10:50:01','007','0.32')
 into tmp_liujg_packed_based values('20190303','2019/03/03 15:50:01','008','1.01')
 into tmp_liujg_packed_based values('20190303','2019/03/03 17:50:01','008','1.68')
 into tmp_liujg_packed_based values('20190303','2019/03/03 10:30:01','010','1.88')
 into tmp_liujg_packed_based values('20190303','2019/03/03 15:50:01','010','0.32')
 into tmp_liujg_packed_based values('20190401','2019/04/01 09:50:00','008','0.18')
 into tmp_liujg_packed_based values('20190401','2019/04/01 11:50:00','009','0.88')
 into tmp_liujg_packed_based values('20190401','2019/04/01 12:35:00','009','0.32')
 into tmp_liujg_packed_based values('20190401','2019/04/01 17:50:00','010','1.01')
 into tmp_liujg_packed_based values('20190401','2019/04/01 12:35:00','011','1.68')
 into tmp_liujg_packed_based values('20190401','2019/04/01 12:50:00','011','0.88')
 into tmp_liujg_packed_based values('20190401','2019/04/01 15:50:00','012','0.32')
 into tmp_liujg_packed_based values('20190401','2019/04/01 17:50:00','012','1.68')
 into tmp_liujg_packed_based values('20190401','2019/04/01 12:35:00','012','1.88')
 into tmp_liujg_packed_based values('20190402','2019/04/02 09:50:00','009','0.18')
 into tmp_liujg_packed_based values('20190402','2019/04/02 19:50:00','009','1.18')
 into tmp_liujg_packed_based values('20190402','2019/04/02 17:50:00','010','0.88')
 into tmp_liujg_packed_based values('20190402','2019/04/02 16:50:00','010','0.32')
 into tmp_liujg_packed_based values('20190402','2019/04/02 19:50:00','010','0.32')
 into tmp_liujg_packed_based values('20190402','2019/04/02 16:50:00','013','0.88')
 into tmp_liujg_packed_based values('20190402','2019/04/02 19:55:00','013','0.88')
 into tmp_liujg_packed_based values('20190402','2019/04/02 20:55:00','013','1.01')
 into tmp_liujg_packed_based values('20190402','2019/04/02 19:55:00','014','0.32')
 into tmp_liujg_packed_based values('20190402','2019/04/02 20:55:00','014','1.01')
 into tmp_liujg_packed_based values('20190501','2019/05/01 09:50:00','001','1.18')
 into tmp_liujg_packed_based values('20190501','2019/05/01 09:55:00','002','0.32')
 into tmp_liujg_packed_based values('20190501','2019/05/01 10:55:00','002','0.32')
 into tmp_liujg_packed_based values('20190501','2019/05/01 10:00:00','007','0.88')
 into tmp_liujg_packed_based values('20190501','2019/05/01 10:55:00','015','0.88')
 into tmp_liujg_packed_based values('20190501','2019/05/01 10:00:00','015','0.32')
 into tmp_liujg_packed_based values('20190501','2019/05/01 14:00:00','017','1.01')
 into tmp_liujg_packed_based values('20190501','2019/05/01 15:00:00','017','1.01')
 into tmp_liujg_packed_based values('20190501','2019/05/01 15:30:00','018','0.88')
 into tmp_liujg_packed_based values('20190501','2019/05/01 16:30:00','018','0.68')
 into tmp_liujg_packed_based values('20190601','2019/06/01 09:50:00','008','1.38')
 into tmp_liujg_packed_based values('20190601','2019/06/01 14:50:00','017','0.88')
 into tmp_liujg_packed_based values('20190601','2019/06/01 11:50:00','019','1.01')
 into tmp_liujg_packed_based values('20190601','2019/06/01 13:50:00','019','0.88')
 into tmp_liujg_packed_based values('20190601','2019/06/01 14:50:00','019','0.68')
 into tmp_liujg_packed_based values('20190602','2019/06/02 09:50:00','021','0.38')
 into tmp_liujg_packed_based values('20190602','2019/06/02 13:50:00','012','0.88')
 into tmp_liujg_packed_based values('20190602','2019/06/02 15:50:00','012','1.12')
 into tmp_liujg_packed_based values('20190602','2019/06/02 13:59:00','023','0.88')
 into tmp_liujg_packed_based values('20190602','2019/06/02 15:50:00','023','1.01')
 into tmp_liujg_packed_based values('20190701','2019/07/01 09:50:00','023','0.38')
 into tmp_liujg_packed_based values('20190701','2019/07/01 13:50:00','023','0.78')
 into tmp_liujg_packed_based values('20190701','2019/07/01 14:50:00','008','0.68')
 into tmp_liujg_packed_based values('20190701','2019/07/01 14:50:00','024','0.68')
 into tmp_liujg_packed_based values('20190701','2019/07/01 15:50:00','024','1.68')
 into tmp_liujg_packed_based values('20190701','2019/07/01 14:50:00','026','0.68')
 into tmp_liujg_packed_based values('20190701','2019/07/01 18:50:00','026','1.68')
 into tmp_liujg_packed_based values('20190701','2019/07/01 14:50:00','027','0.68')
 into tmp_liujg_packed_based values('20190701','2019/07/01 19:35:00','027','1.11')
 into tmp_liujg_packed_based values('20190702','2019/07/02 09:50:00','026','0.78')
 into tmp_liujg_packed_based values('20190702','2019/07/02 11:50:00','026','0.78')
 into tmp_liujg_packed_based values('20190702','2019/07/02 13:50:00','028','1.01')
 into tmp_liujg_packed_based values('20190702','2019/07/02 14:35:00','028','0.88')
 into tmp_liujg_packed_based values('20190702','2019/07/02 15:35:00','028','0.33')
 select 1 from dual

解答

-- 第一问
  select
   imp_date,-- 日期
   count(qimei) as dau
 from
   tmp_liujg_dau_based
 where to_date(imp_date,'yyyy-mm-dd') > to_date('20190601','yyyy-mm-dd')
 group by
   imp_date
-- 第二问
 select
   a.imp_date-- 日期
   ,case when b.is_new='1' then '1'
        when b.is_new='0' then '0'
        else '2' end as is_new -- 用户类型
   ,count(distinct a.qimei) user_num
   ,sum(a.add_money) / count(distinct a.qimei) as mean_money -- 人均领取金额
   ,count(1) / count(distinct a.qimei) as mean_get_count -- 人均领取次数
from(select
	        imp_date,
	        qimei,
	        add_money
           from tmp_liujg_packed_based
           where to_date(imp_date,'yyyy-mm-dd') > to_date('20190601','yyyy-mm-dd')
) a
left join(select
	   imp_date,
	   qimei,
	   is_new
	from tmp_liujg_dau_based
	where to_date(imp_date,'yyyy-mm-dd') > to_date('20190601','yyyy-mm-dd')) b on b.qimei=a.qimei and b.imp_date=a.imp_date
group by 
  a.imp_date
  ,case when b.is_new='1' then '1'
        when b.is_new='0' then '0'
        else '2' end 
-- 第三问
select
  month-- 月份
  ,get_money_days-- 领取天数
  ,count(qimei) as user_count-- 用户数
  ,sum(tol_money) / count(qimei) as mean_money-- 人均领取金额
  ,sum(tol_cnt) / count(qimei) as mean_get_count-- 人均领取次数
from(select
  to_char(to_date(imp_date,'yyyy-mm-dd'),'yyyymm') as month
  ,qimei
  ,count(distinct imp_date) as get_money_days
  ,sum(add_money) as tol_money
  ,count(1) as tol_cnt
from (select
       imp_date,
       qimei,
       report_time,
       add_money
      from tmp_liujg_packed_based
      where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd'))
group by  
  to_char(to_date(imp_date,'yyyy-mm-dd'),'yyyymm')
  ,qimei)
group by 
    month
  ,get_money_days
-- 第四问
select 
  a.imp_mon as month-- 月份
  ,case when b.qimei is not null then '1' 
            when b.qimei is null then '0'
            else null end as is_packet_user -- 用户类型 1红包用户 0非红包用户
  ,count(a.qimei) as user_count
  ,sum(a.active_days) / count(a.qimei) as mean_days_in_month
from 
  (select
  	substr(imp_date,1,6) as imp_mon
  	,qimei
  	,count(imp_date) as active_days
  from tmp_liujg_dau_based
  where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')
  group by 
   substr(imp_date,1,6)
   ,qimei ) a
left join
 (select 
  	substr(imp_date,1,6) as imp_mon
  	,qimei
  from tmp_liujg_packed_based
  where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')
  group by 
  substr(imp_date,1,6)
  ,qimei) b on b.imp_mon=a.imp_mon and b.qimei=a.qimei
group by 
   a.imp_mon
  ,case when b.qimei is not null then '1' 
       when b.qimei is null then '0'
  else null end 
-- 第五问
select
  a.month,
  a.qimei,
  case when b.imp_date is not null then b.imp_date else null end as register_date
from (select
	   to_char(to_date(a.imp_date,'yyyy-mm-dd'),'yyyymm') as month
	   ,qimei
	  from tmp_liujg_dau_based
      group by 
        to_char(to_date(a.imp_date,'yyyy-mm-dd'),'yyyymm')
        ,qimei
      where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')) a
left join(select
	       qimei
	       ,imp_date
	      from tmp_liujg_dau_based
	      group by 
	        qimei
	        ,imp_date
	      where is_new='1' 
	      and to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')) b on b.qimei=a.qimei
-- 第六问
select 
  a1.day
  ,count(a1.qimei) as dau
  ,count(case when a2.day-a1.day=1 then a1.qimei else null end) / count(a1.qimei) as retain_rate
 ,count(case when b.qimei is not null then 
    case when a2.day-a1.day=1 then a1.qimei else null end else null end)
  / decode(count(case when b.qimei is not null then a1.qimei else null end),0,1,count(case when b.qimei is not null then a1.qimei else null end)) as pick_user_rate
  ,count(case when b.qimei is null then
    case when a2.day-a1.day=1 then a1.qimei else null end else null end) 
  / decode(count(case when b.qimei is null then a1.qimei else null end),0,1,count(case when b.qimei is null then a1.qimei else null end)) as unpick_user_rate
from(select 
      to_date(imp_date,'yyyy-mm-dd')  as day,
      qimei
     from tmp_liujg_dau_based
     where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')) a1
left join(select 
            to_date(imp_date,'yyyy-mm-dd') as day,
            qimei
           from tmp_liujg_dau_based
           where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')) a2 on a2.qimei=a1.qimei
left join(select
           to_date(imp_date,'yyyy-mm-dd') as day,
           qimei
           from tmp_liujg_packed_based
           where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')
           group by
             imp_date
             , qimei ) b on b.day=a1.day and b.qimei=a1.qimei
group by 
 a1.day
-- 第七问
select
  imp_date
  ,qimei
  ,add_money as money
from(select
  a.imp_date
  ,a.qimei
  ,a.report_time
  ,a.add_money
  ,row_number()over(partition by a.imp_date,a.qimei order by a.report_time asc) as report_rank
from (select
       imp_date,
       qimei,
       add_money,
       report_time
     from tmp_liujg_packed_based
     where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')
)a
inner join(select
	       imp_date,
	       qimei
	      from tmp_liujg_dau_based
	      where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')
	      and is_new=1
)b on b.imp_date=a.imp_date and b.qimei=a.qimei)
where report_rank = 1
-- 第八问
select
  imp_date-- 注册日期
  ,qimei-- 用户唯一标识
  ,report_time as first_action_time -- 首次领红包时间
  ,n1_report_time as second_action_time -- 第二次领红包时间
  ,to_date(n1_report_time,'yyyy-mm-dd hh24:mi:ss') - to_date(report_time,'yyyy-mm-dd hh24:mi:ss') as delta_times -- 时间差
from(select
   b.imp_date
  ,a.qimei
  ,a.report_time
  ,row_number()over(partition by a.imp_date,a.qimei order by a.report_time asc) as report_rank -- 领红包时间的排序
  ,lead(a.report_time,1)over(partition by a.imp_date,a.qimei order by a.report_time asc) as n1_report_time -- 第二次领红包的时间
from(select
	  imp_date,
	  qimei,
	  report_time
	 from tmp_liujg_packed_based
	 where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd'))a
inner join(select
	         imp_date,
	         qimei
	       from tmp_liujg_dau_based
	       where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190301','yyyy-mm-dd')
	       and is_new=1
)b on b.imp_date=a.imp_date and b.qimei=a.qimei)
where report_rank = 1
-- 第九问
9.计算201961日至今,每日领取红包用户领取金额的中位数
输出维度:imp_date(日期)
输出指标:mid_money(用户当日领取红包金额的中位数)
select
  imp_date-- 日期
  ,median(to_number(add_money)) as mid_money -- 用户当日领取红包金额的中位数
from tmp_liujg_packed_based
where to_date(imp_date,'yyyy-mm-dd') >= to_date('20190601','yyyy-mm-dd')
group by imp_date
  • 9
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值