Table: Failed
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| fail_date | date |
+--------------+---------+
Primary key for this table is fail_date.
Failed table contains the days of failed tasks.
Table: Succeeded
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| success_date | date |
+--------------+---------+
Primary key for this table is success_date.
Succeeded table contains the days of succeeded tasks.
A system is running one task every day. Every task is independent of the previous tasks. The tasks can fail or succeed.
Write an SQL query to generate a report of period_state
for each continuous interval of days in the period from 2019-01-01 to 2019-12-31.
period_state
is 'failed' if tasks in this interval failed or 'succeeded' if tasks in this interval succeeded. Interval of days are retrieved as start_date
and end_date.
Order result by start_date
.
The query result format is in the following example:
Failed table:
+-------------------+
| fail_date |
+-------------------+
| 2018-12-28 |
| 2018-12-29 |
| 2019-01-04 |
| 2019-01-05 |
+-------------------+
Succeeded table:
+-------------------+
| success_date |
+-------------------+
| 2018-12-30 |
| 2018-12-31 |
| 2019-01-01 |
| 2019-01-02 |
| 2019-01-03 |
| 2019-01-06 |
+-------------------+
Result table:
+--------------+--------------+--------------+
| period_state | start_date | end_date |
+--------------+--------------+--------------+
| succeeded | 2019-01-01 | 2019-01-03 |
| failed | 2019-01-04 | 2019-01-05 |
| succeeded | 2019-01-06 | 2019-01-06 |
+--------------+--------------+--------------+
The report ignored the system state in 2018 as we care about the system in the period 2019-01-01 to 2019-12-31.
From 2019-01-01 to 2019-01-03 all tasks succeeded and the system state was "succeeded".
From 2019-01-04 to 2019-01-05 all tasks failed and system state was "failed".
From 2019-01-06 to 2019-01-06 all tasks succeeded and system state was "succeeded".
思路一:比较复杂,原创
1、根据日期的状态将failed 和 successed的日期union存储
SELECT "failed" AS statues, fail_date AS date_list
FROM failed
WHERE fail_date BETWEEN "2019-01-01" AND "2019-12-31"
UNION ALL
SELECT "succeeded" AS statues, success_date
FROM Succeeded
WHERE success_date BETWEEN "2019-01-01" AND "2019-12-31"
ORDER BY date_list
2、判断start_date,并根据时间排序
SELECT s1,
CASE WHEN s1 != s2 THEN d1
WHEN s2 IS NULL THEN d1 END AS end_date,
@rank := @rank +1 AS rnk
FROM (SELECT t1.statues s1, t1.date_list d1, t2.statues s2, t2.date_list d2
FROM tmp t1 LEFT JOIN tmp t2
ON DATEDIFF(t1.date_list, t2.date_list) = -1
ORDER BY d1) t, (SELECT @rank := 0) r
HAVING end_date IS NOT NULL
3、判断end_date,并根据时间排序
SELECT s1,
CASE WHEN s1 != s2 THEN d1
WHEN s2 IS NULL THEN d1 END AS start_date,
@rank1 := @rank1 +1 AS rnk
FROM (SELECT t1.statues s1, t1.date_list d1, t2.statues s2, t2.date_list d2
FROM tmp t1 LEFT JOIN tmp t2
ON DATEDIFF(t1.date_list, t2.date_list) = 1
ORDER BY d1) t2, (SELECT @rank1 := 0) r
HAVING start_date IS NOT NULL
4、合并start_date和end_date
SELECT tmp1.s1 AS period_state, start_date, end_date
FROM tmp1, tmp2
WHERE tmp1.rnk = tmp2.rnk
思路二:第一步和思路一第一步相同,然后通过给记录排序得到rank,然后根据rank跟连续时间组之间的差值固定,分辨出不同组的连续数据,最后选择同组数据中最大值和最小值,作为每组数据的start_date和end_date
with cte as (select fail_date as cal_date, 'failed' as state
from Failed
union all
select success_date as cal_date, 'succeeded' as state
from Succeeded)
select state as period_state,
min(cal_date) as start_date,
max(cal_date) as end_date
from
(select state, cal_date, rank() over (partition by state order by cal_date) as ranking, rank() over (order by cal_date) as id
from cte
where cal_date between '2019-01-01' and '2019-12-31') t
group by state, (id - ranking)
order by 2
同类型参考:1285. Find the Start and End Number of Continuous Ranges