步骤一
准备数据:temp_ combine_202208,三个字段,机构号,id号,日期
organ_no | id_no | date |
1001 | 30001 | 20220131 |
1001 | 30001 | 20220228 |
1001 | 30001 | 20220331 |
1002 | 30001 | 20220430 |
1002 | 30001 | 20220531 |
1001 | 30001 | 20220630 |
1001 | 30001 | 20220731 |
1002 | 30002 | 20220331 |
1002 | 30002 | 20220430 |
1002 | 30002 | 20220531 |
1002 | 30002 | 20220630 |
要求:按照id号,机构号进行分组合并,并记录起始日期和结束日期。目前能查到用lag函数,但是我的sql中没有这个函数,因此采取下文的方法。
步骤二
分析数据:问题的关键是按照日期的连续性排序
1.采用ROW_NUMBER对数据进行编号
SELECT organ_no,id_no,date,ROW_NUMBER() OVER(PARTITION BY id_no,organ_no ORDER BY substr(date,1,6)) as group_number
from temp_combine_202208
order by organ_no,id_no,date
结果如下:
organ_no | id_no | date | group_number |
---|---|---|---|
1001 | 30001 | 20220131 | 1 |
1001 | 30001 | 20220228 | 2 |
1001 | 30001 | 20220331 | 3 |
1001 | 30001 | 20220630 | 4 |
1001 | 30001 | 20220731 | 5 |
1002 | 30001 | 20220430 | 1 |
1002 | 30001 | 20220531 | 2 |
1002 | 30002 | 20220331 | 1 |
1002 | 30002 | 20220430 | 2 |
1002 | 30002 | 20220531 | 3 |
1002 | 30002 | 20220630 | 4 |
2.关键处理:采取date减去group_number的方法即可得到分组数,然后就可以按连续性进行分组
SELECT organ_no,id_no,date,
substr(date,4,3)-ROW_NUMBER() OVER(PARTITION BY id_no,organ_no ORDER BY substr(date,1,6)) as group_number
from temp_combine_202208
order by id_no,date
结果如下:
organ_no | id_no | date | group_number |
---|---|---|---|
1001 | 30001 | 20220131 | 200.0 |
1001 | 30001 | 20220228 | 200.0 |
1001 | 30001 | 20220331 | 200.0 |
1002 | 30001 | 20220430 | 203.0 |
1002 | 30001 | 20220531 | 203.0 |
1001 | 30001 | 20220630 | 202.0 |
1001 | 30001 | 20220731 | 202.0 |
1002 | 30002 | 20220331 | 202.0 |
1002 | 30002 | 20220430 | 202.0 |
1002 | 30002 | 20220531 | 202.0 |
1002 | 30002 | 20220630 | 202.0 |
步骤三
最后按照organ_no,id_no,group_number进行分组即可
select organ_no,id_no,group_number,min(date) as month_date_begin,max(date) as month_date_end
FROM
(SELECT organ_no,id_no,date,
substr(date,4,3)-ROW_NUMBER() OVER(PARTITION BY id_no,organ_no ORDER BY substr(date,1,6)) as group_number
from temp_combine_202208)a
group by 1,2,3
order by 2,4
最后结果:
organ_no | id_no | group_number | month_date_begin | month_date_end |
---|---|---|---|---|
1001 | 30001 | 200.0 | 20220131 | 20220331 |
1002 | 30001 | 203.0 | 20220430 | 20220531 |
1001 | 30001 | 202.0 | 20220630 | 20220731 |
1002 | 30002 | 202.0 | 20220331 | 20220630 |