SQL统计次月留存率

1.需求如下:

  • 将下面的表命名为sheet1,表结构有以下字段,saleid(销售id) 为主键,rzdate(入职日期) 为员工的入职日期,lzdate(离职日期) 为员工的离职日期,dq(地区)为员工的地区,需要统计2020年1-12月每月新增的入职人数,以及在2020年1-12月的留存人数。
  • 源表:                                                                  

  • 需要输出的结果表

2.SQL Server 数据库进行统计

1.首先用sql server 创建一个表 用来存储2020年1-12月份的数据,且单独插入一条'9999-01' 的数据为了后面匹配离职日期为空的数据

‘-- 创建表  
CREATE TABLE MonthlyData (  
    MonthYear VARCHAR(10) PRIMARY KEY,  -- 使用VARCHAR来存储日期字符串  
    -- 如果需要日期类型,并且接受非标准格式,可以使用DATE或DATETIME,但需要进行转换或处理  
    -- MonthYear DATE  -- 例如,使用非标准日期格式并接受它可能带来的问题  
);  
  
-- 插入2020年1月到12月的数据  
DECLARE @Counter INT = 1;  
WHILE @Counter <= 12  
BEGIN  
    INSERT INTO MonthlyData (MonthYear)  
    VALUES (CONCAT('2020-', FORMAT(@Counter, '00')));  
    SET @Counter = @Counter + 1;  
END; 

-- 插入'9999-01'的数据(作为字符串)  
INSERT INTO MonthlyData (MonthYear)  
VALUES ('9999-01'); 

查询生成的monthlyData 数据

2.利用monthlyData 产生一个笛卡尔积的数据(ta),并把一列作为入职月份,另外一列作为离职月份,离职月份必须大于等于入职月份,并且剃掉入职月份为'9999-01' 的数据

SELECT a.monthyear as rzmonth,b.MonthYear as lzmonth FROM MonthlyData a,MonthlyData b  where a.MonthYear<>'9999-01' and a.MonthYear<=b.MonthYear ;

3.统计出2020年每个入职离职月份的数据(t1),rzmonth(入职月份),lzmonth(离职月份),cnt(人数),且将离职月份为空的数据和离职月份大于2020-12的月份转换为‘9999-01’

select rzmonth,count(distinct saleid) as cnt,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end as lzmonth
from (
select saleid,
FORMAT(rzdate, 'yyyy-MM') as rzmonth,FORMAT(lzdate, 'yyyy-MM') as lzmonth from sheet1 t1
where year(rzdate)='2020' 
) a 
group by rzmonth,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end

4.用第2步笛卡尔积产生的表(ta)做为主表 关联 第3步统计的数据作为从表(t1)作为(tb),确保每一个离职月份都有数据,且将没有离职月份数据的人数由空值转为0

with t1 as
(select rzmonth,count(distinct saleid) as cnt,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end as lzmonth
from (
select saleid,
FORMAT(rzdate, 'yyyy-MM') as rzmonth,FORMAT(lzdate, 'yyyy-MM') as lzmonth from sheet1 t1
where year(rzdate)='2020' 
) a 
group by rzmonth,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end
),
ta AS (SELECT a.monthyear as rzmonth,b.MonthYear as lzmonth FROM MonthlyData a,MonthlyData b  where a.MonthYear<>'9999-01' and a.MonthYear<=b.MonthYear ), -- 作为主表
tb as (select ta.rzmonth,ta.lzmonth,ISNULL(t1.cnt, 0) as cnt from ta left join t1 on ta.rzmonth=t1.rzmonth and ta.lzmonth=t1.lzmonth)
select * from tb order by rzmonth,lzmonth

这步的操作就是为了 当源数据 2020年1月份入职 但是在2020年4月份没有人离职时,也产生一条数据

5.利用开窗函数,分别求 每月入职的总人数  及 每月入职,按离职月份升序的离职人数(t2

with t1 as
(select rzmonth,count(distinct saleid) as cnt,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end as lzmonth
from (
select saleid,
FORMAT(rzdate, 'yyyy-MM') as rzmonth,FORMAT(lzdate, 'yyyy-MM') as lzmonth from sheet1 t1
where year(rzdate)='2020' 
) a 
group by rzmonth,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end
),
ta AS (SELECT a.monthyear as rzmonth,b.MonthYear as lzmonth FROM MonthlyData a,MonthlyData b  where a.MonthYear<>'9999-01' and a.MonthYear<=b.MonthYear ), -- 作为主表
tb as (select ta.rzmonth,ta.lzmonth,ISNULL(t1.cnt, 0) as cnt from ta left join t1 on ta.rzmonth=t1.rzmonth and ta.lzmonth=t1.lzmonth),
t2 as(
select rzmonth,cnt,lzmonth,
sum(cnt) over(partition by CAST(rzmonth AS char(7)) order by CAST(lzmonth AS char(7)) asc) as ljlzrs, -- 按照入职月份分组,离职月份升序排序 累加人数
sum(cnt) over(partition by rzmonth) zezrs  -- -- 按照入职月份分组,求每个入职月份的入职人数
from tb )
select * from t2

 

解读以上结果:2020年1月入职,2020年1月离职的 人数是4个,2020年1月入职,2020年2月离职的人数3(cnt),2020年1月入职到2020年2月离职的人数是3+4=7(ljzrs),

2020年1月入职的总人数是24(zezrs)

6.在5步骤的基础上,用总人数-累加的离职人数(zers-ljlzrs)即为每个月份留下来的人数(t3

with t1 as
(select rzmonth,count(distinct saleid) as cnt,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end as lzmonth
from (
select saleid,
FORMAT(rzdate, 'yyyy-MM') as rzmonth,FORMAT(lzdate, 'yyyy-MM') as lzmonth from sheet1 t1
where year(rzdate)='2020' 
) a 
group by rzmonth,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end
),
ta AS (SELECT a.monthyear as rzmonth,b.MonthYear as lzmonth FROM MonthlyData a,MonthlyData b  where a.MonthYear<>'9999-01' and a.MonthYear<=b.MonthYear ), -- 作为主表
tb as (select ta.rzmonth,ta.lzmonth,ISNULL(t1.cnt, 0) as cnt from ta left join t1 on ta.rzmonth=t1.rzmonth and ta.lzmonth=t1.lzmonth),
t2 as(
select rzmonth,cnt,lzmonth,
sum(cnt) over(partition by CAST(rzmonth AS char(7)) order by CAST(lzmonth AS char(7)) asc) as ljlzrs, -- 按照入职月份分组,离职月份升序排序 累加人数
sum(cnt) over(partition by rzmonth) zezrs  -- -- 按照入职月份分组,求每个入职月份的入职人数
from tb ),
t3 as (
select rzmonth,lzmonth,zezrs,zezrs-ljlzrs as lcrs
from t2 )
select * from t3

7.在t3的基础再去做行列转换,生成结果,最终代码如下

with t1 as
(select rzmonth,count(distinct saleid) as cnt,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end as lzmonth
from (
select saleid,
FORMAT(rzdate, 'yyyy-MM') as rzmonth,FORMAT(lzdate, 'yyyy-MM') as lzmonth from sheet1 t1
where year(rzdate)='2020' 
) a 
group by rzmonth,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end
),
ta AS (SELECT a.monthyear as rzmonth,b.MonthYear as lzmonth FROM MonthlyData a,MonthlyData b  where a.MonthYear<>'9999-01' and a.MonthYear<=b.MonthYear ), -- 作为主表
tb as (select ta.rzmonth,ta.lzmonth,ISNULL(t1.cnt, 0) as cnt from ta left join t1 on ta.rzmonth=t1.rzmonth and ta.lzmonth=t1.lzmonth),
t2 as(
select rzmonth,cnt,lzmonth,
sum(cnt) over(partition by CAST(rzmonth AS char(7)) order by CAST(lzmonth AS char(7)) asc) as ljlzrs, -- 按照入职月份分组,离职月份升序排序 累加人数
sum(cnt) over(partition by rzmonth) zezrs  -- -- 按照入职月份分组,求每个入职月份的入职人数
from tb ),
t3 as (
select rzmonth,lzmonth,zezrs,zezrs-ljlzrs as lcrs
from t2 )
select rzmonth as 入职月份,zezrs as 入职人数,sum(case when lzmonth='2020-01' then lcrs else 0 end) as month_01,
sum(case when lzmonth='2020-02' then lcrs else 0 end) as month_02,
sum(case when lzmonth='2020-03' then lcrs else 0 end) as month_03,
sum(case when lzmonth='2020-04' then lcrs else 0 end) as month_04,
sum(case when lzmonth='2020-05' then lcrs else 0 end) as month_05,
sum(case when lzmonth='2020-06' then lcrs else 0 end) as month_06,
sum(case when lzmonth='2020-07' then lcrs else 0 end) as month_07,
sum(case when lzmonth='2020-08' then lcrs else 0 end) as month_08,
sum(case when lzmonth='2020-09' then lcrs else 0 end) as month_09,
sum(case when lzmonth='2020-10' then lcrs else 0 end) as month_10,
sum(case when lzmonth='2020-11' then lcrs else 0 end) as month_11,
sum(case when lzmonth='2020-12' then lcrs else 0 end) as month_12
from t3 group by rzmonth,zezrs  order by rzmonth asc

结果解读:在2020年1月入职总人数有24人,在1月留存人数有20人

(注:以上方法的口径为:如果当月就离职就按当月不留存)

8.如果当月入职,当月离职算成本月的留存,应当在离职日期月份再加上一个月

with t1 as
(select rzmonth,count(distinct saleid) as cnt,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end as lzmonth
from (
select saleid,
FORMAT(rzdate, 'yyyy-MM') as rzmonth,FORMAT(DATEADD(MONTH, 1, CAST(lzdate AS DATE)), 'yyyy-MM') as lzmonth from sheet1 t1
where year(rzdate)='2020' 
) a 
group by rzmonth,case when (lzmonth is null or lzmonth>'2020-12') then '9999-01' else lzmonth end),
ta AS (SELECT a.monthyear as rzmonth,b.MonthYear as lzmonth FROM MonthlyData a,MonthlyData b  where a.MonthYear<>'9999-01' and a.MonthYear<=b.MonthYear ), -- 作为主表
tb as (select ta.rzmonth,ta.lzmonth,ISNULL(t1.cnt, 0) as cnt from ta left join t1 on ta.rzmonth=t1.rzmonth and ta.lzmonth=t1.lzmonth),
t2 as(
select rzmonth,cnt,lzmonth,
sum(cnt) over(partition by CAST(rzmonth AS char(7)) order by CAST(lzmonth AS char(7)) asc) as ljlzrs, -- 按照入职月份分组,离职月份升序排序 累加人数
sum(cnt) over(partition by rzmonth) zezrs  -- -- 按照入职月份分组,求每个入职月份的入职人数
from tb ),
t3 as (
select rzmonth,lzmonth,zezrs,zezrs-ljlzrs as lcrs
from t2 )
select rzmonth as 入职月份,zezrs as 入职人数,sum(case when lzmonth='2020-01' then lcrs else 0 end) as month_01,
sum(case when lzmonth='2020-02' then lcrs else 0 end) as month_02,
sum(case when lzmonth='2020-03' then lcrs else 0 end) as month_03,
sum(case when lzmonth='2020-04' then lcrs else 0 end) as month_04,
sum(case when lzmonth='2020-05' then lcrs else 0 end) as month_05,
sum(case when lzmonth='2020-06' then lcrs else 0 end) as month_06,
sum(case when lzmonth='2020-07' then lcrs else 0 end) as month_07,
sum(case when lzmonth='2020-08' then lcrs else 0 end) as month_08,
sum(case when lzmonth='2020-09' then lcrs else 0 end) as month_09,
sum(case when lzmonth='2020-10' then lcrs else 0 end) as month_10,
sum(case when lzmonth='2020-11' then lcrs else 0 end) as month_11,
sum(case when lzmonth='2020-12' then lcrs else 0 end) as month_12
from t3 group by rzmonth,zezrs  order by rzmonth asc

运行结果如下: 

3.MySQL 数据库进行统计

1.首先利用MySQL的 递归 将每个saleid,根据入职日期拆成12条,并生成多一列月份,在入职日期开始起的12个月,并且把拆好的表存成sales_months

-- 创建新表  
CREATE TABLE sales_months (  
    saleid INT,  
    rzdate DATE, 
    month_date DATE 
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; -- 根据你的需要选择合适的存储引擎和字符集  
  
-- 插入数据  
INSERT INTO sales_months (saleid, rzdate, month_date)  
WITH RECURSIVE MonthSeries AS (  
    SELECT       
        saleid,      
        rzdate,      
        DATE_FORMAT(rzdate, '%Y-%m-01') AS start_of_month,  -- 入职月份的第一天    
        1 AS month_num      
    FROM       
        sheet1   
  
    UNION ALL      
    SELECT       
        saleid,      
        rzdate,  -- 可以省略这一列,因为它在每一行中都是相同的    
        DATE_FORMAT(DATE_ADD(start_of_month, INTERVAL 1 MONTH), '%Y-%m-01') AS start_of_month,  -- 直接添加一个月,然后格式化为第一天    
        month_num + 1      
    FROM       
        MonthSeries      
    WHERE       
        month_num < 12  -- 生成接下来11个月的记录(总共12个月)    
)  
SELECT       
    saleid,    
    rzdate, 		  
    start_of_month AS month_date  -- 显示销售ID和每个月的开始日期    
FROM       
    MonthSeries    
WHERE       
    saleid IS NOT NULL    
ORDER BY       
    saleid,     
    month_date;

再利用sales_months 作为主表去关联回去sheet1 表 取生成的月份列小于等于离职月份,以下代码留存口径为(如果当月入职,当月离职算成本月的留存)

with t1 as (
SELECT   
    a.saleid,  
    DATE_FORMAT(a.rzdate, '%Y-%m') AS rzdate_month,  
    DATE_FORMAT(a.month_date, '%Y-%m') as month_date, -- 这里已经是月份的第一天,所以可能不需要再转换  
    DATE_FORMAT(IFNULL(b.lzdate, '9999-01-01'), '%Y-%m') AS lzdate_month
FROM   
    sales_months a  
LEFT JOIN   
    sheet1 b ON a.saleid = b.saleid),
 t2 as (
select rzdate_month,month_date,count(distinct saleid) cnt
 from t1 where month_date<=lzdate_month 
group by rzdate_month,month_date)
select 
rzdate_month,max(cnt) as 入职总人数,
sum(case when month_date='2020-01' then cnt else 0 end) as month_01,
sum(case when month_date='2020-02' then cnt else 0 end) as month_02,
sum(case when month_date='2020-03' then cnt else 0 end) as month_03,
sum(case when month_date='2020-04' then cnt else 0 end) as month_04,
sum(case when month_date='2020-05' then cnt else 0 end) as month_05,
sum(case when month_date='2020-06' then cnt else 0 end) as month_06,
sum(case when month_date='2020-07' then cnt else 0 end) as month_07,
sum(case when month_date='2020-08' then cnt else 0 end) as month_08,
sum(case when month_date='2020-09' then cnt else 0 end) as month_09,
sum(case when month_date='2020-10' then cnt else 0 end) as month_10,
sum(case when month_date='2020-11' then cnt else 0 end) as month_11,
sum(case when month_date='2020-12' then cnt else 0 end) as month_12
from t2 group by rzdate_month


 

  • 19
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值