SQL切片表转拉链表,用row_number按A分组B排序,向下合并去重,再用row_number转拉链表

原始数据如下

dateid_nostate
2022013130001101
2022022830001101
2022033130001102
2022043030001103
2022053130001101
2022063030001101
2022073130001101
2022083130001102
2022093030001101
2022033130002101
2022043030002102
2022053130002103
2022063030002103
2022073130002103

要求向下合并,并转成拉链表,形式如下

id_nostatebegin_dateend_date
300011012022013120220331
300011022022033120220430
300011032022043020220531
300011012022053120220831
300011022022083120220930
300021012022033120220430
300021022022043020220531
300021032022053120220731

由于是向下合并,每个id_no的最后一个date的状态会被舍去,如果需要加上这个状态,文末会有方法。

为了大家理解方便,我没有把代码写在一起,采取分步制表形式,另外我的数据库版本很老,不支持with X as的语句。。。

步骤一加序号

先按照id_no分组,按照date排序加上序号(order_no)

CREATE TABLE TMP_cut_to_zip1 AS
select date,id_no,state,ROW_NUMBER() OVER(PARTITION BY id_no ORDER BY date) as order_no from TMP_cut_to_zip

dateid_nostateorder_no
20220131300011011
20220228300011012
20220331300011023
20220430300011034
20220531300011015
20220630300011016
20220731300011017
20220831300011028
20220930300011019
20220331300021011
20220430300021022
20220531300021033
20220630300021034
20220731300021035

步骤二去重

我们向下向上自关联一下表格,观察一下情况

select *
from TMP_cut_to_zip1 a
left join TMP_cut_to_zip1 b on a.order_no=b.order_no-1 and a.id_no=b.id_no
left join TMP_cut_to_zip1 c on a.order_no=c.order_no+1 and a.id_no=c.id_no
order by 2,1

dateid_nostateorder_nodate_1id_no_1state_1order_no_1date_2id_no_2state_2order_no_2
2022013130001101120220228300011012NoneNoneNoneNone
202202283000110122022033130001102320220131300011011
202203313000110232022043030001103420220228300011012
202204303000110342022053130001101520220331300011023
202205313000110152022063030001101620220430300011034
202206303000110162022073130001101720220531300011015
202207313000110172022083130001102820220630300011016
202208313000110282022093030001101920220731300011017
20220930300011019NoneNoneNoneNone20220831300011028
2022033130002101120220430300021022NoneNoneNoneNone
202204303000210222022053130002103320220331300021011
202205313000210332022063030002103420220430300021022
202206303000210342022073130002103520220531300021033
20220731300021035NoneNoneNoneNone20220630300021034

以上b是向下关联,c是向上关联,由于我们需要向下合并记录,仔细观察上表,按照ab,ac的关系排列组合

1.a=b,a=c:上下都相同,去除

2.a<>b,a<>c:上下都不同,保留

3.a=b,a<>c:与下相同,与上不同,保留

4.a<>b,a=c:与下不同,与上相同,去除

另外遇到倒数2条记录都相同的情况,尾条必须保留

select *
from TMP_cut_to_zip1 a
left join TMP_cut_to_zip1 b on a.order_no=b.order_no-1 and a.id_no=b.id_no
left join TMP_cut_to_zip1 c on a.order_no=c.order_no+1 and a.id_no=c.id_no
where (a.state<>ifnull(b.state,0) and a.state<>ifnull(c.state,0)) #上下都不同
or (a.state=ifnull(b.state,0) and a.state<>ifnull(c.state,0))     #与下相同与上不同
or b.state is null                                                #保留尾条
order by 2,1

dateid_nostateorder_nodate_1id_no_1state_1order_no_1date_2id_no_2state_2order_no_2
2022013130001101120220228300011012NoneNoneNoneNone
202203313000110232022043030001103420220228300011012
202204303000110342022053130001101520220331300011023
202205313000110152022063030001101620220430300011034
202208313000110282022093030001101920220731300011017
20220930300011019NoneNoneNoneNone20220831300011028
2022033130002101120220430300021022NoneNoneNoneNone
202204303000210222022053130002103320220331300021011
202205313000210332022063030002103420220430300021022
20220731300021035NoneNoneNoneNone20220630300021034

然后就可以提取我们需要的了,顺便再按照id分组date排序加个编号,给下一步用

CREATE TABLE TMP_cut_to_zip2 AS
select a.date,a.id_no,a.state,
ROW_NUMBER() OVER(PARTITION BY a.id_no ORDER BY a.date) as order_no
from TMP_cut_to_zip1 a
left join TMP_cut_to_zip1 b on a.order_no=b.order_no-1 and a.id_no=b.id_no
left join TMP_cut_to_zip1 c on a.order_no=c.order_no+1 and a.id_no=c.id_no
where (a.state<>ifnull(b.state,0) and a.state<>ifnull(c.state,0)) 
or (a.state=ifnull(b.state,0) and a.state<>ifnull(c.state,0))
or b.state is null

结果如下

dateid_nostateorder_no
20220131300011011
20220331300011022
20220430300011033
20220531300011014
20220831300011025
20220930300011016
20220331300021011
20220430300021022
20220531300021033
20220731300021034

第三步出结果

先向下自连接,观察一下

select *
from TMP_cut_to_zip2 a
left join TMP_cut_to_zip2 b on a.order_no=b.order_no-1 and a.id_no=b.id_no
order by 2,1

dateid_nostateorder_nodate_1id_no_1state_1order_no_1
2022013130001101120220331300011022
2022033130001102220220430300011033
2022043030001103320220531300011014
2022053130001101420220831300011025
2022083130001102520220930300011016
20220930300011016NoneNoneNoneNone
2022033130002101120220430300021022
2022043030002102220220531300021033
2022053130002103320220731300021034
20220731300021034NoneNoneNoneNone

观察后,取a.state,a.date做开始日期,b.date做结束日期,即可

%%sql
select a.id_no,a.state,a.date as begin_date,b.date as end_date
from TMP_cut_to_zip2 a
inner join TMP_cut_to_zip2 b on a.order_no=b.order_no-1 and a.id_no=b.id_no
order by 1,3

最后结果如下

id_nostatebegin_dateend_date
300011012022013120220331
300011022022033120220430
300011032022043020220531
300011012022053120220831
300011022022083120220930
300021012022033120220430
300021022022043020220531
300021032022053120220731

补充:如果需要保留每个id的最后一个状态,那需要在原表中加入29991231的日期,此日期沿用最后一个时间的状态。

select (select '29991231')as date,id_no,state from TMP_cut_to_zip a
where not exists(select 1 from TMP_cut_to_zip where id_no=a.id_no and date>a.date)
union
select * from TMP_cut_to_zip

效果如下:

dateid_nostate
2022073130001101
2022093030001101
2022073130002103
2022033130001102
2022033130002101
2022053130001101
2022053130002103
2022083130001102
2022013130001101
2999123130002103
2022043030001103
2022043030002102
2999123130001101
2022063030001101
2022063030002103
2022022830001101

重复步骤一、二、三的代码即可

CREATE TABLE TMP_cut_to_zip1_1 AS
select date,id_no,state,ROW_NUMBER() OVER(PARTITION BY id_no ORDER BY date) as order_no from 
(select (select '29991231')as date,id_no,state from TMP_cut_to_zip a
where not exists(select 1 from TMP_cut_to_zip where id_no=a.id_no and date>a.date)
union
select * from TMP_cut_to_zip)a;

CREATE TABLE TMP_cut_to_zip1_2 AS
select a.date,a.id_no,a.state,
ROW_NUMBER() OVER(PARTITION BY a.id_no ORDER BY a.date) as order_no
from TMP_cut_to_zip1_1 a
left join TMP_cut_to_zip1_1 b on a.order_no=b.order_no-1 and a.id_no=b.id_no
left join TMP_cut_to_zip1_1 c on a.order_no=c.order_no+1 and a.id_no=c.id_no
where (a.state<>ifnull(b.state,0) and a.state<>ifnull(c.state,0)) or (a.state=ifnull(b.state,0) and a.state<>ifnull(c.state,0))
or b.state is null;

select a.id_no,a.state,a.date as begin_date,b.date as end_date
from TMP_cut_to_zip1_2 a
inner join TMP_cut_to_zip1_2 b on a.order_no=b.order_no-1 and a.id_no=b.id_no
order by 1,3

结果如下

id_nostatebegin_dateend_date
300011012022013120220331
300011022022033120220430
300011032022043020220531
300011012022053120220831
300011022022083120220930
300011012022093029991231
300021012022033120220430
300021022022043020220531
300021032022053129991231
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值