前言
今天开始为期一个多月的 HQL 练习,共 55 道 HQL 题,大概每天两道,从初级函数到中级函数。这次的练习不再是基础的 join 那种通用 SQL 语法了,而是引入了更多 Hive 的函数(单行函数、窗口函数等)。
我会把 HQL 中函数和语法的一些注意事项写在每一题下面的 "知识点" 中,方便上课复习。同样这博客估计没人看,如果谁实在需要建表语句给我留言就行。
3-10
1、查询累积销量排名第二的商品(中级)
SELECT sku_id from(
SELECT sku_id,
rank() OVER(ORDER BY order_sum desc) rk
from(
SELECT sku_id,sum(sku_num) order_sum
FROM order_detail
GROUP BY sku_id
ORDER BY order_sum desc
LIMIT 2
)as t1
)as t2
WHERE rk=2;
知识点:
- SQL 中 distinct 必须跟在 select 之后
- distinct 不能单独用于选择性地仅对结果集中的某个字段去重,而不影响其他字段
select distinct sku_id, sku_num,rk from(
...
);
-- 尽管查询结果中 sku_id 字段的值可能重复,但是不能通 select distinct 来对单个属性去重
sku_id sku_num rk
1 2 1
1 3 2
- Hive 的子查询必须要有别名 !
3-12
1、筛选2021年总销量小于100的商品(初级)
- 需求:从订单明细表(order_detail)中筛选出2021年总销量小于100的商品及其销量,假设今天的日期是2022-01-10,不考虑上架时间小于一个月的商品。
- 思路:拿 2021 年总销量小于100的商品id和上架时间大于30的商品id进行join
order_detail_id | order_id | sku_id | create_date | price | sku_num |
---|---|---|---|---|---|
1 | 1 | 1 | 2021-09-27 | 2000.00 | 2 |
2 | 1 | 3 | 2021-09-27 | 5000.00 | 5 |
3 | 2 | 4 | 2021-09-28 | 6000.00 | 9 |
4 | 2 | 5 | 2021-09-28 | 500.00 | 33 |
2.1、查询出2021年总销量小于 100 的商品
-- 1.1 2021年销售总量小于100的商品
select sku_id, sum(sku_num) order_sum
from order_detail
where year(create_date)=2021
group by sku_id
having order_sum<100;
2.2、查询出上架时间大于30天的商品
-- 1.2 上架时间小于 30 天的商品
select sku_id,name from sku_info
where datediff('2022-01-10',from_date)>30;
2.3、join
-- join 两个子表
select t1.sku_id,name from (
select sku_id, sum(sku_num) order_sum
from order_detail
where year(create_date)=2021
group by sku_id
having order_sum<100
)t1 join (
select sku_id,name from sku_info
where datediff('2022-01-10',from_date)>30
)t2 on t1.sku_id = t2.sku_id;
知识点:
- datediff('2022-01-10','2021-01-10') = 365,注意:日期1必须大于日期2否则结果是负数
2、查询每日新增用户(初级)
user_login_datail:
uer_id | ip_address | login_ts | logout_ts |
---|---|---|---|
101 | 180.149.130.161 | 2021-09-21 08:00:00 | 2021-09-27 08:30:00 |
101 | 180.149.130.161 | 2021-09-27 08:00:00 | 2021-09-27 08:30:00 |
101 | 180.149.130.161 | 2021-09-28 09:00:00 | 2021-09-28 09:10:00 |
101 | 180.149.130.161 | 2021-09-29 13:30:00 | 2021-09-29 13:50:00 |
思路1:每天有多少人是首日登录就有多少新增用户。查询出每个用户的首日登录时间,然后按照日期分组聚合就得到了每日新增用户。而不是去考虑开窗(我是这么想的)
思路2:开窗也可以实现,用 row_numer 对每个用户的登录时间进行排名(group by user_id),然后根据登录时间进行分区将该天 row_number=1 的值(说明是首次登录)进行聚合。
思路1
2.1、查询用户首日登录日期
-- 查询用户首次登录的日期
select user_id,min(date_format(login_ts,'yyyy-MM-dd')) first_login_date
from user_login_detail
group by user_id;
2.2、查询每天有多少用户是首日登录
-- 按照日期分组得到每天的新增用户
select first_login_date,count(*) from(
select user_id,min(date_format(login_ts,'yyyy-MM-dd')) first_login_date
from user_login_detail
group by user_id
)t1
group by first_login_date;
注意:怎么把 login_ts (格式:2021-09-21 08:00:00)这种时间字符串指定的字段取出来?
我是这么实现的:
select concat_ws('-',string(year(date_format(login_ts,'yyyy-MM-dd HH:mm:ss'))),string(month(date_format(login_ts,'yyyy-MM-dd HH:mm:ss'))),string(day(date_format(login_ts,'yyyy-MM-dd HH:mm:ss')))),
标准:
select date_format(login_ts,'yyyy-MM-dd') from user_login_detail;
思路2
select dt,sum(`if`(rk=1,1,0)) new_user_nums from(
select user_id,
date_format(login_ts,'yyyy-MM-dd') dt,
row_number() over (partition by user_id order by login_ts) rk
from user_login_detail
)t1
group by dt
having new_user_nums>0;
3、用户注册、登录、下单综合统计(初级)
需求:从用户登录明细表(user_login_detail)和订单信息表(order_info)中查询每个用户的注册日期(首次登录日期)、总登录次数,以及2021年的登录次数、订单数和订单总额。
思路:无脑 join 没有什么难度
order_info:
序号 | 编号 | 日期 | 金额 |
---|---|---|---|
1 | 101 | 2021-09-27 | 29000.00 |
2 | 101 | 2021-09-28 | 70500.00 |
3 | 101 | 2021-09-29 | 43300.00 |
4 | 101 | 2021-09-30 | 860.00 |
user_login_detail:
3.1、用户首日登录日期
-- 用户首日登录日期
select user_id,min(date_format(login_ts,'yyyy-MM-dd')) register_date
from user_login_detail
group by user_id;
注意:能 group by 就 group by 不然 join 之后报错。
3.2、用户累积登录次数
-- 用户累积登录次数
select user_id,size(collect_set(date_format(login_ts,'yyyy-MM-dd'))) total_login_count
from user_login_detail
group by user_id;
知识点: 利用 collect_set() 把登录日期收集到一个集合里,正好做了去重,就不用担心用户一天登录多次的情况了。
3.3、用户2021年登录次数
-- 用户2021登录次数
select user_id,size(collect_set(date_format(login_ts,'yyyy-MM-dd'))) login_count_2021
from user_login_detail
where year(date_format(login_ts,'yyyy-MM-dd'))=2021
group by user_id;
3.4、用户2021年下单次数和下单金额
-- 用户2021年下单次数和下单金额
select user_id,count(order_id) order_count_2021,sum(total_amount) order_amount_2021
from order_info
where year(create_date)=2021
group by user_id,year(create_date);
3.5、join起来
select t1.user_id,register_date,total_login_count,login_count_2021,order_count_2021,order_amount_2021 from(
select user_id,min(date_format(login_ts,'yyyy-MM-dd')) register_date from user_login_detail group by user_id
)t1 join (
select user_id,size(collect_set(date_format(login_ts,'yyyy-MM-dd'))) total_login_count
from user_login_detail
group by user_id
)t2 on t1.user_id=t2.user_id
join (
select user_id,size(collect_set(date_format(login_ts,'yyyy-MM-dd'))) login_count_2021
from user_login_detail
where year(date_format(login_ts,'yyyy-MM-dd'))=2021
group by user_id
)t3 on t1.user_id=t3.user_id
join (
select user_id,count(order_id) order_count_2021,sum(total_amount) order_amount_2021
from order_info
where year(create_date)=2021
group by user_id,year(create_date)
)t4 on t1.user_id=t4.user_id;
3.13
1、向用户推荐朋友收藏的商品
需求:请向所有用户推荐其朋友收藏但是自己未收藏的商品,从好友关系表(friendship_info)和收藏表(favor_info)中查询出应向哪位用户推荐哪些商品。
firendship_info:
user1_id | user2_id |
---|---|
101 | 1010 |
101 | 108 |
101 | 106 |
101 | 104 |
favor_info:
user_id | sku_id | create_date |
---|---|---|
101 | 3 | 2021-09-23 |
101 | 12 | 2021-09-23 |
101 | 6 | 2021-09-25 |
101 | 10 | 2021-09-21 |
思路:
- 核心就是 left join ,因为 left join 可以把保留左表的内容(这里我们保留的是好友的商品收藏表),我们只要根据用户喜欢的商品id和好友喜欢的商品id进行 left join ,得到的字段"sku_id"如果不为 null 就说明这件商品他俩都收藏了,如果为 null 就说明这件商品好友收藏了,但是用户没有收藏。
1.1、获取用户所有好友
-- 查询所有用户的好友
select user1_id user_id,user2_id friend_id from friendship_info
union
select user2_id,user1_id from friendship_info;
知识点:
- join 是横向合并,会形成宽表;而 union 是纵向合并,形成长表(union 会对结果进行排序去重,union all 不会)
1.2、得到用户好友的收藏列表
-- join得到用户好友收藏的商品
select user1_id user_id,user2_id friend_id from friendship_info
union
select user2_id,user1_id from friendship_info
join favor_info firend_favor
on user2_id=firend_favor.user_id;
1.3、left join 过滤
select distinct t1.user_id,firend_favor.sku_id
from (
select user1_id user_id,user2_id friend_id from friendship_info
union
select user2_id,user1_id from friendship_info
)t1
join favor_info firend_favor
on t1.friend_id=firend_favor.user_id
left join favor_info user_favor
on t1.user_id=user_favor.user_id and firend_favor.sku_id=user_favor.sku_id
where user_favor.sku_id is null;
2、男性和女性每日的购物总金额统计(初级)
需求:从订单信息表(order_info)和用户信息表(user_info)中,分别统计每天男性和女性用户的订单总金额,如果当天男性或者女性没有购物,则统计结果为0。
order_info:
user_info:
编号 | 性别 | 出生日期 |
---|---|---|
101 | 男 | 1990-01-01 |
102 | 女 | 1991-02-01 |
103 | 女 | 1992-03-01 |
104 | 男 | 1993-04-01 |
思路1
1、获取不同性别的消费信息
select t2.gender,t1.create_date,t1.total_amount
from order_info t1
join user_info t2 on t1.user_id=t2.user_id
我们没有必要查询用户的 id 信息,只需要性别(后面我们需要根据性别过滤)、创建订单的日期(后面我们需要根据日期分组)和订单总额(我们需要根据不同性别统计每天的订单总额)即可。
2、按照日期 join 不同性别的每天销售总额
select coalesce(t3.create_date,t4.create_date),`if`(t3.total_amount_male is null,0,t3.total_amount_male),`if`(t4.total_amount_female is null ,0,t4.total_amount_female) from(
select create_date,sum(total_amount) total_amount_male from(
select t2.gender,t1.create_date,t1.total_amount
from order_info t1
join user_info t2 on t1.user_id=t2.user_id
)t1
where gender='男'
group by create_date
)t3 full join (
select create_date,sum(total_amount) total_amount_female from(
select t2.gender,t1.create_date,t1.total_amount
from order_info t1
join user_info t2 on t1.user_id=t2.user_id
)t2
where gender='女'
group by create_date
)t4 on t3.create_date=t4.create_date
知识点:
- 显然 t3 和 t4 这两个子表分别是男性和女性的每天购物总额,这里我们进行的是 full join 这样会保留两张表的所有数据,因为数据中存在某 一天男生购物了但是女生没有,或者女士购物了男性没有。
- 对于最后查询结果的日期字段就需要保证这个日期不能为 null,但是我们又不能显示 t3 t4 两个日期,所以我们使用了 coalesce 字段来获取非 null 的日期字段(前后顺序并不影响)
- COALESCE 函数用于返回多个表达式中的第一个非NULL值。
思路2
思路1是我自己实现的一种方式,思路2是答案,不得不说还是这种写法高级:
select create_date,
cast(sum(`if`(gender='男',total_amount,0)) as decimal(16,2)) total_amount_male,
cast(sum(`if`(gender='女',total_amount,0)) as decimal(16,2)) total_amount_female
from order_info oi
join user_info ui on oi.user_id=ui.user_id
group by create_date;
知识点:
-
cast(expr as <type>):将expr的执行结果转换为<type>类型的数据并返回,expr可以是函数(可以嵌套)、字段或字面值。转换失败返回null,对于cast(expr as boolean),对任意的非空字符串expr返回true
-
decimal(精度,标度):比如 decimal(16,2)表示一个十进制数,其中16是总的数字数量(精度),而2是小数点后的数字数量(标度)
3.17
1、购买过商品1和商品2但是没有购买商品3的顾客
order_detail:
order_info:
select user_id
from(
select user_id,
collect_set(sku_id) skus
from order_detail od
join
order_info oi
on od.order_id = oi.order_id
group by user_id
)t1
where array_contains(skus,'1')
and array_contains(skus,'2')
and !array_contains(skus,'3');
知识点:
- array_contains() 可以判断集合里是否包含某个元素,set list arr 都可以
2、统计每日商品1和商品2销量的差值
order_detail:
思路1
我的做法是分别查询商品1和商品2的销量,因为两件商品有时候不是每天都有卖出去的,所以选择用 full join,对 create_date 为 null 的字段进行处理,并对当天销量为 null 的字段补 0。
select '1' sku_1,'2' sku_2,coalesce(t1.create_date,t2.create_date),(sum(`if`(t1.sku_num is null,0,t1.sku_num))-sum(`if`(t2.sku_num is null,0,t2.sku_num))) diff from
(
select sku_id,create_date,sku_num
from order_detail
where sku_id='1'
)t1
full outer join(
select sku_id,create_date,sku_num
from order_detail
where sku_id='2'
)t2 on t1.create_date=t2.create_date
group by t1.create_date,t2.create_date;
思路2
select create_date,
sum(if(sku_id = '1', sku_num, 0)) - sum(if(sku_id = '2', sku_num, 0)) diff
from order_detail
where sku_id in ('1', '2')
group by create_date;
这种思路比较简单,思路1是我想复杂了。
3、根据商品销售情况进行商品分类
需求:通过订单详情表(order_detail)的数据,根据销售件数对商品进行分类,销售件数0-5000为冷门商品,5001-19999为一般商品,20000以上为热门商品,统计不同类别商品的数量。
order_detail:
select category,count(category) from (
select
case
when 0<sku_sum and sku_sum<=5000 then '冷门商品'
when 5000<sku_sum and sku_sum<=19999 then '一般商品'
when 20000<sku_sum then '热门商品'
end category
from (
select sum(sku_num) sku_sum
from order_detail
group by sku_id
)t1
)t2
group by category;
或者
SELECT category,
count(sku_id)
FROM
(
SELECT sku_id,
CASE
WHEN sum(sku_num)>=0 AND sum(sku_num)<=5000 THEN '冷门商品'
WHEN sum(sku_num)>=5001 AND sum(sku_num)<=19999 THEN '一般商品'
WHEN sum(sku_num)>=20000 THEN '热门商品'
END category
FROM order_detail
GROUP BY sku_id
)t1
GROUP BY category;
4、查询有新增用户的日期的新增用户数和新增用户一日留存率
需求:
- 从用户登录明细表(user_login_detail)中统计有新增用户的日期的新增用户数(若某日未新增用户,则不出现在统计结果中),并统计这些新增用户的一日留存率。
- 用户首次登录为当天新增,次日也登录则为一日留存。一日留存用户占新增用户数的比率为一日留存率。
4.1、计算新用户注册日期
-- 查询出每天新用户的个数
select date_format(login_ts,'yyyy-MM-dd') register_date,
count(user_id) cnt
from(
select user_id,
login_ts,
row_number() over (partition by user_id order by login_ts) rk
from user_login_detail
-- group by user_id,login_ts 不需要,over中已经分区过了
)t1
where t1.rk=1
group by date_format(login_ts,'yyyy-MM-dd');
注意:
- row_numer() 内我们已经分区过了,所以不需要再 group by
- 最外层的 group by 不能直接使用引用(register_date),要把它转为表达式
其实查询用户注册日期可以使用 min 来巧妙计算:
select user_id,date_format(min(login_ts),'yyyy-MM-dd')
from user_login_detail
group by user_id;
4.2、留存率实现
select t3.register_date,
t3.register_count,
cast(cast(t3.retention_1_count as decimal(16,2)) / t3.register_count as decimal(16,2)) retention_1_rate
from(
select t1.register_date,
count(distinct t1.user_id) register_count,
count(distinct t2.user_id) retention_1_count -- 必须去重,因为有人一天多次登录
from(
select user_id,
date_format(min(login_ts),'yyyy-MM-dd') register_date
from user_login_detail
group by user_id
)t1
left join user_login_detail t2
on t1.user_id=t2.user_id and datediff(date_format(t2.login_ts,'yyyy-MM-dd'),t1.register_date)=1
group by t1.register_date
)t3;
主要思路就是对用户注册时间表进行一个 left join ,因为 left join 可能会匹配到很多行,所以用 datediff 过滤出注册后第二天也登录的记录。因为有的用户可能一天多次登录,所以需要对结果去重。最后 昨天注册人数/第二天登录人数 得出留存率。
5、登录次数及交易次数统计
user_login_detail:
delivery_info:
delivery_id | order_id | user_id | order_date | costom_date |
---|---|---|---|---|
1 | 1 | 101 | 2021-09-27 | 2021-09-29 |
2 | 2 | 101 | 2021-09-28 | 2021-09-28 |
3 | 3 | 101 | 2021-09-29 | 2021-09-30 |
4 | 4 | 101 | 2021-09-30 | 2021-10-01 |
需求:分别从登录明细表(user_login_detail)和配送信息表(delivery_info)中的用户登录时间和下单时间字段,统计登陆次数和交易次数
select t1.user_id,login_date,login_count,`if`(order_count is null,0,order_count) from
(
select user_id,
date_format(login_ts,'yyyy-MM-dd') login_date,
count(*) login_count
from user_login_detail
group by user_id, date_format(login_ts,'yyyy-MM-dd')
)t1 left join
(
select user_id,
order_date,
count(*) order_count
from delivery_info
group by user_id, order_date
)t2 on t1.user_id=t2.user_id and login_date=order_date
order by user_id,login_date;
6、统计每个商品各年度销售总额
需求:从订单明细表(order_detail)中统计每个商品各年度的销售总额。
-- 统计每个商品各年度销售总额
select sku_id,
year(create_date) year_date,
cast(sum(price * sku_num) as decimal(16,2)) sku_sum
from order_detail
group by sku_id,year(create_date);
太简单了,不需要解释。
7、某周内每件商品每天销售情况
需求:从订单详情表(order_detail)中查询2021年9月27号-2021年10月3号这一周所有商品每天销售件数
select sku_id,
sum(`if`(create_date='2021-09-27',sku_num,0)) Monday,
sum(`if`(create_date='2021-09-28',sku_num,0)) Tuesday,
sum(`if`(create_date='2021-09-29',sku_num,0)) Wednesday,
sum(`if`(create_date='2021-09-30',sku_num,0)) Thursday,
sum(`if`(create_date='2021-10-01',sku_num,0)) Friday,
sum(`if`(create_date='2021-10-02',sku_num,0)) Saturday,
sum(`if`(create_date='2021-10-03',sku_num,0)) Sunday
from order_detail
where create_date>='2021-09-27' and create_date<='2021-10-03'
group by sku_id;
或者:
select sku_id,
sum(case when `dayofweek`(create_date)='2' then sku_num else 0 end) Monday,
sum(case when `dayofweek`(create_date)='3' then sku_num else 0 end) Tuesday,
sum(case when `dayofweek`(create_date)='4' then sku_num else 0 end) Wednesday,
sum(case when `dayofweek`(create_date)='5' then sku_num else 0 end) Thursday,
sum(case when `dayofweek`(create_date)='6' then sku_num else 0 end) Friday,
sum(case when `dayofweek`(create_date)='7' then sku_num else 0 end) Saturday,
sum(case when `dayofweek`(create_date)='1' then sku_num else 0 end) Sunday
from order_detail
where create_date>='2021-09-27' and create_date<='2021-10-03'
group by sku_id;
8、同期商品售卖分析表
需求:从订单明细表(order_detail)中,统计同一个商品在2021年和2022年中同一个月的销量对比。
思路1
分别求出2021年和 2020年每个商品每个月的总销量(注意:按照月份进行 group by),然后将两张表进行 full join:
select coalesce(t1.sku_id,t2.sku_id),
coalesce(t1.month,t2.month),
if(t1.2020_sku_sum is null ,0,t1.2020_sku_sum),
if(t2.2021_sku_sum is null ,0,t2.2021_sku_sum)
from(
select sku_id,
month(create_date) `month`,
sum(sku_num) 2020_sku_sum
from order_detail
where year(create_date)='2020'
group by sku_id,month(create_date)
)t1 full outer join
(
select sku_id,
month(create_date) `month`,
sum(sku_num) 2021_sku_sum
from order_detail
where year(create_date)='2021'
group by sku_id,month(create_date)
)t2 on t1.sku_id=t2.sku_id and t1.month=t2.month
思路2
select sku_id,
month(create_date) `month`,
sum(`if`(year(create_date)='2020',sku_num,0)) 2020_sku_sum,
sum(`if`(year(create_date)='2021',sku_num,0)) 2021_sku_sum
from order_detail
where year(create_date) = '2021' or year(create_date) = '2020'
group by sku_id, month(create_date);
这种也好理解,第一次写写错了。
注意点:这里还是要重复提醒一下,group by 后面不能跟函数表达式的别名字段,必须跟函数表达式,或者表格自身的字段。
9、国庆期间每个sku的收藏量和购买量
需求:从订单明细表(order_detail)和收藏信息表(favor_info)中统计2021年国庆节期间(10月1日-10月7日),每个商品的购买总数量和总收藏次数。
-- 国庆期间每个sku的收藏量和购买量
select t1.sku_id,
t1.sku_sum,
`if`(t2.favor_sum is null,0,t2.favor_sum)
from(
select sku_id,
sum(sku_num) sku_sum
from order_detail
where create_date>='2021-10-01' and create_date<='2021-10-07'
group by sku_id
)t1
full join
(
select sku_id,
count(*) favor_sum
from favor_info
where create_date>='2021-10-01' and create_date<='2021-10-07'
group by sku_id
)t2
on t1.sku_id=t2.sku_id;
知识点:这里我们和之前一样对于 null 的单元格填充 0 ,除了用 if 函数用 nvl 更加专业。
10、国庆节期间各品类商品的7日动销率和滞销率
需求:
- 动销率的定义为某品类的商品中一段时间内有销量的商品种类数占当前已上架总商品种类数的比例(有销量的商品种类数/已上架总商品种类数)。
- 滞销率的定义为某分类商品中一段时间内没有销量的商品种类数占当前已上架总商品种类数的比例(没有销量的商品种类数/已上架总商品种类数)。
- 只要当天任一店铺有任何商品的销量就输出该天的统计结果。
- 从订单明细表(order_detail)和商品信息表(sku_info)表中统计国庆节期间(10月1日-10月7日)每天每个分类的商品的动销率和滞销率
select category_id,
cast(first_sale_rate as decimal(16, 2)) first_sale_rate,
cast((1 - first_sale_rate) as decimal(16, 2)) first_unsale_rate,
cast(second_sale_rate as decimal(16, 2)) second_sale_rate,
cast((1 - second_sale_rate) as decimal(16, 2)) second_unsale_rate,
cast(third_sale_rate as decimal(16, 2)) third_sale_rate,
cast((1 - third_sale_rate) as decimal(16, 2)) third_unsale_rate,
cast(fourth_sale_rate as decimal(16, 2)) fourth_sale_rate,
cast((1 - fourth_sale_rate) as decimal(16, 2)) fourth_unsale_rate,
cast(fifth_sale_rate as decimal(16, 2)) fifth_sale_rate,
cast((1 - fifth_sale_rate) as decimal(16, 2)) fifth_unsale_rate,
cast(sixth_sale_rate as decimal(16, 2)) sixth_sale_rate,
cast((1 - sixth_sale_rate) as decimal(16, 2)) sixth_unsale_rate,
cast(seventh_sale_rate as decimal(16, 2)) seventh_sale_rate,
cast((1 - seventh_sale_rate) as decimal(16, 2)) seventh_unsale_rate
from (select category_id,
cast(count(distinct if(t1.create_date = '2021-10-01', t1.sku_id, null)) as decimal(16, 2)) /
count(distinct if(sku_info.from_date <= '2021-10-01', sku_info.sku_id, null)) first_sale_rate,
cast(count(distinct if(t1.create_date = '2021-10-02', t1.sku_id, null)) as decimal(16, 2)) /
count(distinct if(sku_info.from_date <= '2021-10-02', sku_info.sku_id, null)) second_sale_rate,
cast(count(distinct if(t1.create_date = '2021-10-03', t1.sku_id, null)) as decimal(16, 2)) /
count(distinct if(sku_info.from_date <= '2021-10-03', sku_info.sku_id, null)) third_sale_rate,
cast(count(distinct if(t1.create_date = '2021-10-04', t1.sku_id, null)) as decimal(16, 2)) /
count(distinct if(sku_info.from_date <= '2021-10-04', sku_info.sku_id, null)) fourth_sale_rate,
cast(count(distinct if(t1.create_date = '2021-10-05', t1.sku_id, null)) as decimal(16, 2)) /
count(distinct if(sku_info.from_date <= '2021-10-05', sku_info.sku_id, null)) fifth_sale_rate,
cast(count(distinct if(t1.create_date = '2021-10-06', t1.sku_id, null)) as decimal(16, 2)) /
count(distinct if(sku_info.from_date <= '2021-10-06', sku_info.sku_id, null)) sixth_sale_rate,
cast(count(distinct if(t1.create_date = '2021-10-07', t1.sku_id, null)) as decimal(16, 2)) /
count(distinct if(sku_info.from_date <= '2021-10-07', sku_info.sku_id, null)) seventh_sale_rate
from (select sku_id,
create_date
from order_detail
where create_date >= '2021-10-01'
and create_date <= '2021-10-07') t1
right join sku_info
on t1.sku_id = sku_info.sku_id
group by category_id) t2;
3-22
1、查询至少连续三天下单的用户
SELECT DISTINCT user_id
FROM
(
SELECT user_id,datediff(ld,create_date) diff
FROM
(
SELECT user_id,
create_date,
lead(create_date,2,'9999-12-31') OVER (PARTITION BY user_id ORDER BY create_date) ld
FROM
(
-- 查询用户的所有下单日期
SELECT DISTINCT user_id,create_date FROM order_info
)t1
)t2
)t3
WHERE diff=2;
知识点:借助窗口函数 lead 向后两行进行取值,并通过计算每个下单日期和后两行的差值来过滤出连续下单超3天的用户。
2024-4-10:这个思路是存在问题的:如果有人连续5天10天下单那他必然是连续3天下单的,但是diff>2,但是这里如果条件修改为 diff>=2 ,那么即使不连续,它的 diff 也会大于2;
标准答案
SELECT user_id
FROM
(
SELECT user_id,
create_date,
row_number() OVER(PARTITION BY user_id ORDER BY create_date) rk
FROM
(
SELECT DISTINCT user_id,create_date FROM order_info -- 防止用户一天多次下单
)t0
)t1
GROUP BY user_id,date_sub(create_date,rk)
HAVING count(*)>2;
2、查询各品类销售商品的种类数及销量最高的商品
SELECT
t3.category_id,
t3.category_name,
t3.sku_cnt,
t3.name,
t3.sku_id,
t3.order_num
from
(
SELECT
t1.*,
t2.order_num,
row_number() over (
PARTITION BY
t1.category_id
ORDER BY
t2.order_num desc
) rk
FROM
(
-- 商品的分类名
SELECT
ci.category_id,
category_name,
sku_id,
name,
sku_cnt
FROM
category_info ci
JOIN sku_info si ON ci.category_id = si.category_id
LEFT JOIN (
SELECT
category_id,
count(sku_id) sku_cnt
FROM
sku_info
GROUP BY
category_id
) t ON t.category_id = si.category_id
) t1
LEFT JOIN (
SELECT
sku_id,
sum(sku_num) order_num
FROM
order_detail
GROUP BY
sku_id
) t2 ON t1.sku_id = t2.sku_id
) t3
WHERE
rk = 1;
3、查询用户的累计消费金额及VIP等级
SELECT
t1.*,
CASE
WHEN sum_so_far>=0 AND sum_so_far<10000 THEN '普通会员'
WHEN sum_so_far>=10000 AND sum_so_far<30000 THEN '青铜会员'
WHEN sum_so_far>=30000 AND sum_so_far<50000 THEN '白银会员'
WHEN sum_so_far>=50000 AND sum_so_far<80000 THEN '黄金会员'
WHEN sum_so_far>=80000 AND sum_so_far<100000 THEN '白金会员'
WHEN sum_so_far>=100000 THEN '钻石会员'
END vip_level
FROM
(
SELECT user_id,
create_date,
sum(sum_amount) OVER(PARTITION BY user_id ORDER BY create_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sum_so_far
FROM
(
SELECT user_id,
create_date,
sum(total_amount) sum_amount
FROM order_info
GROUP BY user_id,create_date
)t
)t1;
4、查询首次下单后第二天连续下单的用户比率
SELECT concat(round(count(DISTINCT user_id)/10*100,1),'%') percentage
FROM
(
SELECT t1.user_id,
datediff(ld,create_date) diff,
t2.cnt
from
(
SELECT user_id,
create_date,
lead(create_date,1,'9999-12-31') OVER(PARTITION BY user_id ORDER BY create_date) ld
FROM order_info
GROUP BY user_id,create_date
)t1
JOIN (SELECT count(DISTINCT user_id) cnt FROM order_info)t2
)t3
WHERE diff=1;
3-24
1、筛选去年总销量小于100的商品
需求:从订单明细表(order_detail)中筛选出去年总销量小于100的商品及其销量,假设今天的日期是2022-01-10,不考虑上架时间小于一个月的商品
SELECT t1.sku_id,t2.name,t1.order_num FROM
(
SELECT sku_id,
sum(sku_num) order_num
FROM order_detail od
WHERE date_format(create_date,'yyyy')='2021'
GROUP BY sku_id
)t1
RIGHT JOIN
(
SELECT sku_id,
name
FROM sku_info
WHERE datediff('2022-01-10',from_date)>30
)t2 ON t1.sku_id=t2.sku_id
WHERE t1.order_num<100;
3-25、
1、向用户推荐朋友收藏的商品
SELECT DISTINCT t1.user1_id user_id,t2.sku_id
from friendship_info t1
JOIN favor_info t2
on t1.user2_id=t2.user_id
where concat(t1.user1_id,t2.sku_id) not in (
SELECT concat(user_id,sku_id)
from favor_info
);
3-26
1、统计每个商品的销量最高的日期
SELECT sku_id,create_date,sum_num
FROM
(
SELECT t1.*,
row_number() over(PARTITION BY sku_id ORDER BY sum_num DESC) rk
FROM
(
SELECT sku_id,
create_date,
sum(sku_num) sum_num
FROM order_detail
GROUP BY sku_id,create_date
)t1
)t2 WHERE rk=1;
2、查询每日新用户数
SELECT login_date_first,count(*) user_count
FROM
(
SELECT
user_id,
min(date_format(login_ts,'yyyy-MM-dd')) login_date_first
FROM user_login_detail
GROUP BY user_id
)t1
GROUP BY login_date_first;
3、查询销售件数高于品类平均数的商品
SELECT t2.sku_id,t2.name,t2.sum_num,t4.cate_avg_num
FROM
(
SELECT od.sku_id,name,category_id,sum(sku_num) sum_num
FROM order_detail od
JOIN sku_info si ON od.sku_id=si.sku_id
GROUP BY category_id,od.sku_id,name
)t2
LEFT JOIN
(
-- 各品类的总销量
SELECT category_id,CAST (sum(sku_num)/count(DISTINCT sku_id) AS int) cate_avg_num
FROM
(
SELECT od.*,si.category_id FROM order_detail od
JOIN sku_info si ON od.sku_id=si.sku_id
)t3
GROUP BY category_id
)t4 ON t2.category_id=t4.category_id
WHERE t2.sum_num>t4.cate_avg_num;
注意:这里犯了个错,join 操作的时候直接在外层指定投影字段,没必要里边再查询子表的部分字段,比如上面的子表 t2 原本是这样的:
SELECT od.sku_id,name,category_id,sum(sku_num) sum_num
FROM order_detail od
JOIN
(
SELECT si.sku_id,si.category_id FROM
sku_info si ON od.sku_id=si.sku_id
)t1
GROUP BY category_id,od.sku_id,name
完全可以这样:
SELECT od.sku_id,name,category_id,sum(sku_num) sum_num
FROM order_detail od
JOIN sku_info si ON od.sku_id=si.sku_id
GROUP BY category_id,od.sku_id,name
其实效果是差不多的,但是上面的让人读的时候思路更加清晰,本来嵌套查询就够乱的了。
4、用户注册、登录、下单综合统计
SELECT DISTINCT t1.user_id,t1.register_date,t2.total_login_count,t3.login_count_2021,t4.order_count_2021,t5.order_amount_2021
FROM
(
-- 查询用户首日登录日期
SELECT user_id,
min(date_format(login_ts,'yyyy-MM-dd')) register_date
FROM user_login_detail
GROUP BY user_id
)t1
LEFT JOIN
(
-- 查询用户总登录次数
SELECT user_id,count(*) total_login_count
FROM user_login_detail
GROUP BY user_id
)t2 ON t1.user_id=t2.user_id
LEFT JOIN
(
-- 查询用户2021年的登录次数
SELECT user_id,count(*) login_count_2021
FROM user_login_detail
WHERE year(login_ts)='2021'
GROUP BY user_id
)t3 ON t2.user_id=t3.user_id
LEFT JOIN
(
-- 2021年下单次数
SELECT user_id,count(*) order_count_2021
FROM order_info
WHERE year(create_date)='2021'
GROUP BY user_id
)t4 ON t3.user_id=t4.user_id
LEFT JOIN
(
-- 2021年下单金额
SELECT user_id,sum(total_amount) order_amount_2021
FROM order_info
WHERE year(create_date)='2021'
GROUP BY user_id
)t5 ON t4.user_id=t5.user_id
WHERE t4.order_count_2021 IS NOT NULL;
5、查询指定日期的全部商品价格
需求:查询所有商品(sku_info表)截至到2021年10月01号的最新商品价格(需要结合价格修改表进行分析)
SELECT si.sku_id,CAST(coalesce(t2.new_price,price) AS DECIMAL(16,2)) price
FROM sku_info si
LEFT JOIN
(
-- 查询截止2021-10-01修改的商品id
SELECT sku_id,new_price
FROM
(
SELECT sku_id,
new_price,
row_number() over(PARTITION BY sku_id ORDER BY change_date DESC) rk
FROM sku_price_modify_detail
WHERE datediff('2021-10-01',change_date)>=0
)t1 WHERE rk=1
)t2 ON si.sku_id=t2.sku_id;
6、及时订单比例
SELECT CAST(sum(IF(t2.first_order_date=t2.custom_date,1,0))/count(*) AS DECIMAL(16,2)) percentage
FROM
(
-- 查询所有用户的首单日期和期望配送日期
SELECT DISTINCT t1.user_id,t1.first_order_date,di.custom_date
FROM
(
SELECT user_id,
min(order_date) first_order_date
FROM delivery_info
GROUP BY user_id
)t1
LEFT JOIN delivery_info di
ON di.user_id=t1.user_id AND t1.first_order_date=di.order_date
)t2;
7、男性和女性每日的购物总金额统计
需求:从订单信息表(order_info)和用户信息表(user_info)中,分别统计每天男性和女性用户的订单总金额,如果当天男性或者女性没有购物,则统计结果为0。
SELECT create_date,
sum(IF(gender='男',total_amount,0)) total_amount_male,
sum(IF(gender='女',total_amount,0)) total_amount_female
FROM
(
SELECT oi.user_id,gender,total_amount,create_date
FROM order_info oi
LEFT JOIN user_info ui ON oi.user_id=ui.user_id
)t1
GROUP BY create_date;
或者
SELECT create_date,
SUM(CASE WHEN gender = '男' THEN total_amount ELSE 0 END) AS total_amount_male,
SUM(CASE WHEN gender = '女' THEN total_amount ELSE 0 END) AS total_amount_female
FROM (
SELECT oi.user_id, ui.gender, oi.total_amount, oi.create_date
FROM order_info oi
LEFT JOIN user_info ui ON oi.user_id = ui.user_id
) t1
GROUP BY create_date;
3-27
1、购买过商品1和商品2但是没有购买商品3的顾客
SELECT user_id
FROM
(
SELECT user_id,
sum(IF(sku_id IN (1,2),1,0)) sum_num,
MAX(CASE WHEN sku_id = 3 THEN 1 ELSE 0 END) AS has_sku_3
FROM
(
SELECT DISTINCT od.sku_id,oi.user_id
FROM order_detail od
JOIN order_info oi ON od.order_id=oi.order_id
)t1
GROUP BY user_id
)t2
WHERE has_sku_3=0 AND sum_num=2;
判断是否包含 sku_id = 3 的记录不能使用功能 if,这里用的是 MAX(CASE WHEN) 新的思路,需要好好记住!
2、订单金额趋势分析
SELECT
create_date,
cast(sum(total_amount) over(partition by ds rows between 2 preceding and current ROW) as decimal(16,2)) total_3d,
cast(avg(total_amount) over(partition by ds rows between 2 preceding and current row) as decimal(16,2)) avg_3d
FROM
(
SELECT
t1.create_date,
total_amount,
date_sub(create_date,row_number() over(order by create_date)) ds
FROM
(
SELECT
create_date,
sum(total_amount) total_amount
FROM order_info
group by create_date
)t1
)t2;
3-29
1、统计每日商品1和商品2销量的差值
SELECT coalesce(t1.create_date,t2.create_date) create_date,
IF(t1.sku_num is null,0,t1.sku_num)-IF(t2.sku_num is null,0,t2.sku_num) diff
FROM
(
SELECT create_date,
sum(sku_num) sku_num
FROM order_detail
WHERE sku_id='1'
GROUP BY create_date
)t1
FULL OUTER JOIN
(
SELECT create_date,
sum(sku_num) sku_num
FROM order_detail
WHERE sku_id='2'
GROUP BY create_date
)t2
ON t1.create_date=t2.create_date;
2、查询出每个用户的最近三笔订单
SELECT user_id,
order_id,
create_date
FROM
(
SELECT user_id,
order_id,
create_date,
dense_rank() OVER(PARTITION BY user_id ORDER BY create_date DESC) rk
FROM order_info
)t1
WHERE rk<=3;
3、查询每个用户登录日期的最大空档期
SELECT user_id,
max(diff) max_diff
FROM
(
SELECT user_id,
datediff(date_format(lead(login_ts,1,'2021-10-10') OVER(PARTITION BY user_id ORDER BY login_ts),'yyyy-MM-dd'),date_format(login_ts,'yyyy-MM-dd')) diff
FROM user_login_detail
)t1
GROUP BY user_id;
4、查询统一时刻多地登录的用户
SELECT DISTINCT user_id
FROM
(
SELECT user_id,
if(ip != next_ip AND login_ts <= next_login_ts,1,0) status
FROM
(
SELECT user_id,
ip_address ip,
login_ts,
lead(ip_address,1) OVER(PARTITION BY user_id ORDER BY login_ts) next_ip,
lead(login_ts,1) OVER(PARTITION BY user_id ORDER BY login_ts) next_login_ts
FROM user_login_detail
)t1
)t2
WHERE t2.status=1;
5、各品类销量前三的所有商品
SELECT sku_id,category_id
FROM
(
SELECT t1.sku_id,
sum_num,
si.category_id,
row_number() OVER(PARTITION BY si.category_id ORDER BY sum_num DESC) rk
FROM
(
SELECT sku_id,sum(sku_num) sum_num FROM order_detail GROUP BY sku_id
)t1
JOIN sku_info si ON si.sku_id=t1.sku_id
)t2 WHERE rk<=3;
4-9
网站崩了几天,所以没更。
1、各品类中的商品价格中位数
需求:从商品信息表(sku_info)求出各分类商品价格的中位数,如果一个分类下的商品个数为偶数则输出中间两个值的平均值,如果是奇数,则输出中间数即可。
思路1
SELECT category_id,
CAST(sum(price)/count(*) AS DECIMAL(16,2)) medprice
FROM
(
SELECT category_id,
price,
rk,
max(rk) OVER(PARTITION BY category_id) max_rk
FROM
(
SELECT category_id,
price,
row_number() OVER(PARTITION BY category_id ORDER BY price) rk
FROM sku_info
)t1
)t2
WHERE rk IN (ceil((max_rk+1)/2),floor((max_rk+1)/2))
GROUP BY category_id;
思路2
select
category_id,
cast(avg(price) as DECIMAL(10,2)) medprice
from (
select
*,
ceil(count / 2) mid,
`if`(count % 2 = 0, 1, 0) flag
from (
select
sku_id,
category_id,
price,
row_number() over (partition by category_id order by price) rn,
count(1) over (partition by category_id) count
from sku_info
)t
)t1 where rn = mid or rn = mid + flag
group by category_id;
4-10
1、找出销售额连续3天超过100的商品
需求:从订单详情表(order_detail)中找出销售额连续3天超过100的商品。
SELECT DISTINCT sku_id
FROM
(
SELECT sku_id,
create_date,
row_number() OVER(PARTITION BY sku_id ORDER BY create_date) rk
FROM
GROUP BY sku_id,create_date
HAVING sum(price*sku_num)>100
)t1
GROUP BY sku_id,date_sub(create_date,rk)
HAVING count(*)>2;
子表 t1 中的 GROUP BY sku_id,create_date 的作用是对一天多个订单进行去重。
2、查询所有用户的连续登录两天及以上的日期区间
需求:从登录明细表(user_login_detail)中查询出,所有用户的连续登录两天及以上的日期区间,以登录时间(login_ts)为准。
SELECT user_id,
min(login_date) start_date,
max(login_date) end_date
FROM
(
SELECT user_id,
login_date,
row_number() OVER(PARTITION BY user_id ORDER BY login_date) rk
FROM
(
SELECT DISTINCT user_id,date_format(login_ts,'yyyy-MM-dd') login_date
FROM user_login_detail
)t1
)t2
GROUP BY user_id,date_sub(login_date,rk)
HAVING count(*)>1;
3、求出商品连续售卖的时间区间
需求:从订单详情表(order_detail)中,求出商品连续售卖的时间区间(1天也算连续)
SELECT sku_id,
min(create_date) start_date,
max(create_date) end_date
FROM
(
SELECT sku_id,
create_date,
row_number() OVER(PARTITION BY sku_id ORDER BY create_date) rk
FROM
(
SELECT DISTINCT sku_id,create_date FROM order_detail
)t1
)t2
GROUP BY sku_id,date_sub(create_date,rk)
HAVING count(*)>0;
4、登录次数及交易次数统计
需求:分别从登陆明细表(user_login_detail)和配送信息表中用户登录时间和下单时间统计登陆次数和交易次数。
SELECT coalesce(t1.user_id,t0.user_id) user_id,
coalesce(t1.login_date,t0.order_date) login_date,
nvl(t1.login_count,0) login_count,
nvl(t0.order_count,0) order_count
FROM
(
SELECT user_id,
order_date,
count(order_date) order_count
FROM delivery_info
GROUP BY user_id,order_date
)t0
RIGHT JOIN
(
SELECT user_id,
date_format(login_ts,'yyyy-MM-dd') login_date,
count(user_id) login_count
FROM user_login_detail
GROUP BY user_id,date_format(login_ts,'yyyy-MM-dd')
)t1
ON t0.user_id=t1.user_id AND t1.login_date=t0.order_date
;
备注:下单的前提的登录,所以是 right join (个人感觉 full join )更合理
4-11
1、查看每件商品的售价涨幅情况
需求:从商品价格变更明细表(sku_price_modify_detail),得到最近一次价格的涨幅情况,并按照涨幅升序排序。
SELECT sku_id,
price_change
FROM
(
SELECT sku_id,
change_date,
rk,
new_price-lead(new_price,1,0) OVER(PARTITION BY sku_id ORDER BY change_date DESC) price_change
FROM
(
SELECT sku_id,
change_date,
new_price,
row_number() OVER(PARTITION BY sku_id ORDER BY change_date DESC) rk
FROM sku_price_modify_detail
)t0
WHERE rk<=2
)t1
WHERE rk=1;
2、同期商品售卖分析表
需求:从订单明细表(order_detail)中。求出同一个商品在2021年和2022年中同一个月的售卖情况对比。
SELECT t0.sku_id,
nvl(t0.`month`,t1.`month`) `month`,
nvl(t1.2020_skusum,0) 2020_skusum,
nvl(t0.2021_skusum,0) 2021_skusum
FROM
(
SELECT sku_id,
CAST(date_format(create_date,'MM') AS bigint) `month`,
sum(sku_num) 2021_skusum
FROM order_detail
WHERE date_format(create_date,'yyyy')='2021'
GROUP BY sku_id,date_format(create_date,'MM')
)t0
FULL JOIN
(
SELECT sku_id,
CAST(date_format(create_date,'MM') AS bigint) `month`,
sum(sku_num) 2020_skusum
FROM order_detail
WHERE date_format(create_date,'yyyy')='2020'
GROUP BY sku_id,date_format(create_date,'MM')
)t1 ON t0.sku_id=t1.sku_id AND t0.month=t1.month;
3、国庆期间每个品类的商品的收藏量和购买量
需求:从订单明细表(order_detail)和收藏信息表(favor_info)统计2021国庆期间,每个商品总收藏量和购买量。
SELECT t0.sku_id,
nvl(sku_sum,0) sku_sum,
nvl(favor_cn,0) favor_cn
FROM
(
SELECT sku_id,
sum(sku_num) sku_sum
FROM order_detail
WHERE create_date>='2021-10-01' AND create_date<='2021-10-07'
GROUP BY sku_id
)t0
FULL JOIN
(
SELECT sku_id,
count(*) favor_cn
FROM favor_info
WHERE create_date>='2021-10-01' AND create_date<='2021-10-07'
GROUP BY sku_id
)t1
ON t0.sku_id=t1.sku_id;
4、连续签到领金币数
需求:
- 用户每天签到可以领1金币,并可以累计签到天数,连续签到的第3、7天分别可以额外领2和6金币。
- 每连续签到7天重新累积签到天数。
- 从用户登录明细表中求出每个用户金币总数,并按照金币总数倒序排序。
SELECT coalesce(t1.user_id,t2.user_id) user_id,
IF(t2.coins IS NOT NULL,t1.coin+t2.coins,t1.coin) sum_coin_cn
FROM
-- 查询用户登录次数
(
SELECT user_id,
sum(1) coin
FROM
(
SELECT DISTINCT
user_id,
date_format(login_ts,'yyyy-MM-dd') login_date
FROM user_login_detail
)t0
GROUP BY user_id
)t1
LEFT JOIN
(
-- 查询用户连续登录的次数所能获得的金币
SELECT user_id,
IF(count(*)<7,2,8) coins
FROM
(
SELECT user_id,
login_date,
row_number() OVER(PARTITION BY user_id ORDER BY login_date) rk
FROM
(
SELECT DISTINCT
user_id,
date_format(login_ts,'yyyy-MM-dd') login_date
FROM user_login_detail
)t
)t1
GROUP BY user_id,date_sub(login_date,rk)
HAVING count(*)>2
)t2
ON t1.user_id=t2.user_id
;
4-12
1、每个商品销售首年的年份、销售数量和销售金额
需求:从订单明细表(order_detail)统计每个商品销售首年的年份,销售数量和销售总额。
SELECT sku_id,
`year`,
sum(price*sku_num) order_amount,
sum(sku_num) order_num
FROM
(
SELECT * FROM
(
SELECT sku_id,
price,
create_date,
sku_num
FROM order_detail
)t1
RIGHT JOIN
(
-- 销售首年
SELECT sku_id,
min(date_format(create_date,'yyyy')) `year`
FROM order_detail
GROUP BY sku_id
)t2 ON t1.sku_id=t2.sku_id AND date_format(create_date,'yyyy')=`year`
)t3 GROUP BY sku_id,`year`
;
2、销售订单首购和次购分析
需求:通过商品信息表(sku_info)订单信息表(order_info)订单明细表(order_detail)分析如果有一个用户成功下单两个及两个以上的购买成功的手机订单(购买商品为xiaomi 10,apple 12,小米13)那么输出这个用户的id及第一次成功购买手机的日期和第二次成功购买手机的日期,以及购买手机成功的次数。
SELECT user_id,
min(create_date) first_date,
max(create_date) last_date,
max(rk) cn
FROM
(
SELECT user_id,
sku_num,
oi.create_date,
row_number() OVER(PARTITION BY user_id ORDER BY oi.create_date) rk
FROM order_info oi
RIGHT JOIN
(
SELECT od.order_id,
od.sku_id,
od.sku_num,
od.create_date,name
FROM order_detail od
RIGHT JOIN
(
SELECT sku_id,name
FROM sku_info
WHERE name IN ('xiaomi 10','apple 12','xiaomi 13')
)t ON t.sku_id=od.sku_id
)t1 ON t1.order_id=oi.order_id
)t2
GROUP BY user_id
;
3、查询有新注册用户的当天的新用户数量、新用户的第一天留存率
需求:从用户登录明细表(user_login_detail)中首次登录算作当天新增,第二天也登录了算作一日留存
SELECT t2.register_date first_login,
t2.register,
CAST(t5.retents/t2.register AS DECIMAL(16,2)) retention
FROM
(
SELECT register_date,
count(register_date) register
FROM
(
SELECT user_id,
min(date_format(login_ts,'yyyy-MM-dd')) register_date
FROM user_login_detail
GROUP BY user_id
)t1
GROUP BY register_date
)t2
JOIN
(
SELECT first_login,
sum(retent) retents
FROM
(
SELECT user_id,
login_date first_login,
IF(datediff(next_login_date,login_date)=1,1,0) retent
FROM
(
SELECT user_id,
login_date,
lead(login_date,1,'9999-12-31') OVER(PARTITION BY user_id ORDER BY login_date) next_login_date
FROM
(
SELECT user_id,
login_date,
rk
FROM
(
SELECT DISTINCT
user_id,
date_format(login_ts,'yyyy-MM-dd') login_date,
row_number() OVER(PARTITION BY user_id ORDER BY date_format(login_ts,'yyyy-MM-dd')) rk
FROM user_login_detail
)t1
WHERE rk<=2
)t2
)t3
WHERE next_login_date != '9999-12-31'
)t4
GROUP BY first_login
)t5
ON t5.first_login=register_date;