在写统计分析相关sql时,我们经常会遇到这样的场景:在给定时间范围内统计每半个小时(或一个小时)的数据,如果某一“半个小时”内无数据需要用0填充,针对这样的场景思路如下:
一、生成给定时间范围临时结果集如:
time |
2021-07-05 15:30 |
2021-07-05 16:00 |
2021-07-05 16:30 |
2021-07-05 17:00 |
2021-07-05 17:30 |
对应sql:
SELECT
to_char(b, 'YYYY-MM-DD hh24:mi') AS TIME
FROM
generate_series (
to_timestamp(
'2021-07-05 15:30:00',
'YYYY-MM-DD hh24:mi:ss'
),
to_timestamp(
'2021-07-05 17:30:00',
'YYYY-MM-DD hh24:mi:ss'
),
'30 minutes'
) AS b
ORDER BY
TIME
注:这里主要用generate_series(start, stop, step_interval) 函数实现,间隔时间为半个小时
二、编写业务数据sql
以半个小时为例如果数据在00至30分钟数据统计到30分钟的时刻上,如果数据在30分钟至下一个小时的00分钟之间数据统计至下一个小时的00分分钟的时刻上
sql如下:
SELECT
CASE
WHEN substr(
to_char(
create_time,
'yyyy-mm-dd hh24:mi'
),
15,
16
) :: INTEGER <= 30 THEN
to_char(
create_time,
'yyyy-mm-dd hh24'
) || ':30'
ELSE
to_char(
create_time + '1 hour',
'yyyy-mm-dd hh24'
) || ':00'
END AS TIME,
count(*) as num
FROM
t_order_detail
WHERE
create_time >= '2021-07-05 15:13:00'
AND create_time <= '2021-07-05 17:13:00'
GROUP BY
TIME
三、以上两个sql做full join
sql:
SELECT
to_char(CAST(tmp1.TIME AS timestamp), 'HH24:MI') AS time,
COALESCE (tmp2.num, 0) AS num
FROM
(
SELECT
to_char(b, 'YYYY-MM-DD hh24:mi') AS TIME
FROM
generate_series (
to_timestamp(
'2021-07-05 15:30:00',
'YYYY-MM-DD hh24:mi:ss'
),
to_timestamp(
'2021-07-05 17:30:00',
'YYYY-MM-DD hh24:mi:ss'
),
'30 minutes'
) AS b
ORDER BY
TIME
) tmp1
FULL OUTER JOIN (
SELECT
CASE
WHEN substr(
to_char(
create_time,
'yyyy-mm-dd hh24:mi'
),
15,
16
) :: INTEGER <= 30 THEN
to_char(
create_time,
'yyyy-mm-dd hh24'
) || ':30'
ELSE
to_char(
create_time + '1 hour',
'yyyy-mm-dd hh24'
) || ':00'
END AS TIME ,
count(*) as num
FROM
t_order_detail
WHERE
create_time >= '2021-07-05 15:13:00'
AND create_time <= '2021-07-05 17:13:00'
GROUP BY
TIME
) tmp2 ON tmp1. TIME = tmp2. TIME
ORDER BY
TIME ASC
四、最终结果
time | num |
15:30 | 105 |
16:00 | 120 |
16:30 | 0 |
17:00 | 0 |
17:30 | 300 |
注:这里16:00-17:00之间是没有数据的结果用0填充