- 场景
有一张表记录了每5分钟 各表的数据量,大概长这样,有每五分钟的开始和结束时间,表名,和数据条数,要求出每张表每次数据延迟的开始和结束时间,也就是如果0-5,5-10,10-15分都延迟,则记录为 0-15分延迟。
首先查出符合延迟条件的每5分钟的记录,并且按照是否是重新开统计的打标签 1
这里用的lag窗口函数:从同一结果集中的当前行访问上一行的数据
select * ,
lag(END_TIME) OVER (partition by TABLE_NAME ORDER BY BEGIN_TIME) lag_end_time,
CASE WHEN lag(END_TIME) OVER (partition by TABLE_NAME ORDER BY BEGIN_TIME) = BEGIN_TIME THEN 0 ELSE 1 END AS is_group
from REC_DATA_COUNT_5M
where DATA_COUNT <100
得到结果如图:
再把数据分组
select a1.*,
sum(is_group) over (partition by TABLE_NAME order by BEGIN_TIME) is_g
from
(
select * , lag(END_TIME) OVER (partition by TABLE_NAME ORDER BY BEGIN_TIME) lag_end_time,
CASE WHEN lag(END_TIME) OVER (partition by TABLE_NAME ORDER BY BEGIN_TIME) = BEGIN_TIME THEN 0 ELSE 1 END AS is_group
from REC_DATA_COUNT_5M
where DATA_COUNT <100
)a1
得到结果如图:
最后对分组的数据取组内开始时间最小值和结束时间最大值 得到延迟的开始和结束时间
select TABLE_NAME,
min(BEGIN_TIME) DELAYED_BEGIN_TIME,
max(END_TIME) DELAYED_END_TIME,
sysdate insert_time
from
(
select a1.*,
sum(is_group) over (partition by TABLE_NAME order by BEGIN_TIME) is_g
from
(
select * , lag(END_TIME) OVER (partition by TABLE_NAME ORDER BY BEGIN_TIME) lag_end_time,
CASE WHEN lag(END_TIME) OVER (partition by TABLE_NAME ORDER BY BEGIN_TIME) = BEGIN_TIME THEN 0 ELSE 1 END AS is_group
from REC_DATA_COUNT_5M
where DATA_COUNT <100
)a1
)a2
group by TABLE_NAME,is_g
结果:
记录一种对同一类连续数据进行分组的写法
如果有别的写法希望大神指点指点