startdate | version |
2021-02-28 | 12 |
2021-03-27 | 11 |
2021-03-29 | 10 |
2021-10-28 | 12 |
2021-11-26 | 11 |
2021-11-27 | 1 |
2021-11-28 | 10 |
2021-12-26 | 9 |
需求:
如果两个startdate相差小于30天,那么认为这两个startdate对应的数据在一个集合中;如果第三条数据的startdate与这个集合中任意一条数据的startdate相差小于30天,认为这条数据也是属于这个集合;以此类推,将所有数据分为不同的集合。
输出每个集合中最大的version和它对应的那条数据。
思路:
重点在于分组。
先对所有数据按日期进行升序排序,如果一条数据的日期与前面集合最大日期的差值在30以内,则认为这条数据属于前一个集合。
在排好序的基础上,一条数据的日期与前一条数据的差值如果小于30就是0;如果大于30就是1(认为是一个新的集合)
按顺序是1,0,0,0,1,0,0等的情况。
然后sum over() ,按顺序是1,1,1,1,2,2,2等的情况
这样不同的集合就有了不同的值。
与前一条数据日期差是否在30以内为0 | ||||
startdate | version | per_startdate | flag | sum over |
2021-02-28 | 12 | 1900-01-01 | 1 | 1 |
2021-03-27 | 11 | 2021-02-28 | 0 | 1 |
2021-03-29 | 10 | 2021-03-27 | 0 | 1 |
2021-10-28 | 12 | 2021-03-29 | 1 | 2 |
2021-11-26 | 11 | 2021-10-28 | 0 | 2 |
2021-11-27 | 1 | 2021-11-26 | 0 | 2 |
2021-11-28 | 10 | 2021-11-27 | 0 | 2 |
2021-12-26 | 9 | 2021-11-28 | 0 | 2 |
with testdataset as (
select '2021-02-28' as startdate, 12 as version
union
select '2021-03-27' as startdate, 11 as version
union
select '2021-03-29' as startdate, 10 as version
union
select '2021-10-28' as startdate, 12 as version
union
select '2021-11-26' as startdate, 11 as version
union
select '2021-11-27' as startdate, 1 as version
union
select '2021-11-28' as startdate, 10 as version
union
select '2021-12-26' as startdate, 9 as version
)
select startdate,version,row_number()over(partition by sumr order by version desc) as rank
from (
select startdate,version,sum(case when datediff(startdate,per_startdate) < 30 then 0 else 1 end) over (order by startdate asc) as sumr
from (
select startdate,version,lag(startdate,1,'1900-01-01') over (ORDER BY startdate asc) as per_startdate
from testdataset
)
)
having rank = 1
-- startdate version rank
-- 2021-02-28 12 1
-- 2021-10-28 12 1