也可在 微信公众号 上观看
1、说说问题
遇到了一个问题:给定一个订单表,里面包含用户的商品购买记录,求解用户每个月的复购率。
复购率定义:12月下单的用户中历史上在广告和自然渠道支付成功超过2单的用户数 / 12月支付成功的用户总数
直接上输入数据(文末有数据创建sql),看前后结果更直观点儿。
字段解释:
- order_id:订单id
- pass_id:用户id
- flow_info:来源,(3,4)代表广告渠道,其他代表自然渠道。注意:in 的反义不是 not in!
- status:订单状态,200代表支付成功
- app_id:业务线id,这个关系不大
- time:订单创建时间
结果:
字段解释:
- channel:频道
- repurchase_user:重复购买人数
- all_users:总下单人数
- proportion:比例
2、解决方案
思路:
首先拆分问题,问题需要每个月,咱就先查一个月的,例如12月的。
(1)查询到12月下单的用户集合 N,注意要去重!
(2)查询12月底及之前的历史时间内、且是N中用户的下单记录集合F,注意这里最直白就是用 where pass_id in ( N ),为了加速使用left join。
(3)统计F中用户的下单数集合 及数量大于等于2的结果,并获得12月内的复购用户数A。
(4)查询12月内下单的总用户数B
(5)A/B得到结果。
select
f1.channel,
f1.repurchase_user,
f2.all_users,
ROUND(f1.repurchase_user / f2.all_users, 2) as proportion
from
(
# 查询在12月重复购买的用户数
select
channel,
sum(if(order_nums > 1, 1, 0)) as repurchase_user
from(
SELECT
pass_id,
channel,
count(order_id) as order_nums
from(
SELECT
t1.*,
t2.order_id
from
(
-- 12月内频道、用户去重数据
select
distinct pass_id,
case
when flow_info in (3, 4) then 'ad'
when flow_info not in (3, 4)
or flow_info is null then 'nad'
end channel
from
order_info
where
app_id = 4
and `status` = 200
and time BETWEEN '2021-12-01'
and '2021-12-31'
) t1
left JOIN (
# 获取12月及之前用户的全部购买记录
select
pass_id,
case
when flow_info in (3, 4) then 'ad'
when flow_info not in (3, 4)
or flow_info is null then 'nad'
end channel,
order_id
from
order_info
where
app_id = 4
and `status` = 200
and time <= '2021-12-31'
) t2 ON t1.pass_id = t2.pass_id
and t1.channel = t2.channel
) t
GROUP BY
channel,
pass_id
) m
group by
channel
) f1
left join (
# 查询在12月购买的总用户数
SELECT
channel,
count(pass_id) as all_users
from
(
select
distinct pass_id,
case
when flow_info in (3, 4) then 'ad'
when flow_info not in (3, 4)
or flow_info is null then 'nad'
end channel
from
order_info
where
app_id = 4
and `status` = 200
and time BETWEEN '2021-12-01'
and '2021-12-31'
) p
group by
channel
) f2 on f1.channel = f2.channel
3、附加需求:求每个月总体的复购率
思路比较简单:就是将每个各个频道的数据进行相加。
首先将前面的数据进行转储成一个新的表tb_repurchase_rate,并新加一列时间date,例如’2021-12’,之后看sql…
select
*
from
(
select
date,
'ALL' as channel,
sum(repurchase_user) as repurchase_user,
sum(all_users) as all_users,
sum(repurchase_user) / sum(all_users) as proportion
from
tb_repurchase_rate
where
date between '202111'
and '202201'
group by
date
UNION ALL
select
date,
channel,
repurchase_user,
all_users,
proportion
from
tb_repurchase_rate
where
date between '202111'
and '202201'
) t
order by
t.date,
t.channel desc
4、附录:输入数据结构及数据SQL
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for order_info
-- ----------------------------
DROP TABLE IF EXISTS `order_info`;
CREATE TABLE `order_info` (
`order_id` varchar(255) NOT NULL,
`pass_id` varchar(255) DEFAULT NULL,
`flow_info` int DEFAULT NULL,
`status` int DEFAULT NULL,
`app_id` int DEFAULT NULL,
`time` date DEFAULT NULL,
PRIMARY KEY (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
-- ----------------------------
-- Records of order_info
-- ----------------------------
BEGIN;
INSERT INTO `order_info` VALUES ('100', 'abc', 3, 200, 4, '2021-11-18');
INSERT INTO `order_info` VALUES ('101', 'ccc', 4, 200, 4, '2022-01-03');
INSERT INTO `order_info` VALUES ('102', 'aaa', NULL, 200, 4, '2021-10-05');
INSERT INTO `order_info` VALUES ('103', 'ccc', 3, 200, 4, '2021-09-09');
INSERT INTO `order_info` VALUES ('104', 'ddd', 1, 200, 3, '2021-12-16');
INSERT INTO `order_info` VALUES ('105', 'abc', 3, 200, 4, '2021-12-09');
INSERT INTO `order_info` VALUES ('106', 'def', 3, 200, 4, '2021-12-16');
INSERT INTO `order_info` VALUES ('107', 'aaa', NULL, 200, 4, '2021-12-22');
INSERT INTO `order_info` VALUES ('108', 'abc', 4, 200, 4, '2021-11-06');
INSERT INTO `order_info` VALUES ('109', 'ppp', 3, 200, 4, '2021-12-24');
INSERT INTO `order_info` VALUES ('110', 'ppp', 3, 200, 4, '2021-10-20');
INSERT INTO `order_info` VALUES ('111', 'aaa', NULL, 200, 4, '2021-12-17');
COMMIT;
SET FOREIGN_KEY_CHECKS = 1;