从一个大表(TEST_TAB,大于1亿行)中根据特定条件筛选出一部分数据,在筛选这部分数据所在的月/周的数据,再根据月/周进行汇总,筛选数据的字段和日期字段(天)均有索引。
系统中有日期定义表CALENDAR如下:
DATE_TIMEKEY | WEEK_TIMEKEY | MONTH_TIMEKEY | YEAR_TIMEKEY |
---|---|---|---|
20231011 | 2023W41 | 2023M10 | 2023 |
20231012 | 2023W41 | 2023M10 | 2023 |
20231013 | 2023W41 | 2023M10 | 2023 |
20231014 | 2023W41 | 2023M10 | 2023 |
20231015 | 2023W42 | 2023M10 | 2023 |
20231016 | 2023W42 | 2023M10 | 2023 |
20231017 | 2023W42 | 2023M10 | 2023 |
20231018 | 2023W42 | 2023M10 | 2023 |
20231019 | 2023W42 | 2023M10 | 2023 |
SQL如下:
SELECT
C.MONTH_TIMEKEY,
A.COL1,
A.COL2,
A.COL3,
SUM(A.QTY) QTY
FROM TEST_TAB A
JOIN (
SELECT DISTINCT B.COL1,B.COL2,S2.DATE_TIMEKEY,S2.MONTH_TIMEKEY
FROM TEST_TAB B
JOIN CALENDAR S1
ON B.DATE_TIMEKEY = S1.DATE_TIMEKEY
JOIN CALENDAR S2
ON S1.MONTH_TIMEKEY = S2.MONTH_TIMEKEY
WHERE B.COL1 = 'AAA' AND B.COL2 = 'BBB'
) C ON A.DATE_TIMEKEY = C.DATE_TIMEKEY
AND A.COL1 = C.COL1
AND A.COL2 = C.COL2
GROUP BY
C.MONTH_TIMEKEY,
A.COL1,
A.COL2,
A.COL3
但这个SQL实际执行很慢,需要10min左右,查看了执行计划,A和B均走了相应索引。
测试1:如果不进行按月汇总,SQL只需1s就执行出来了
SELECT
C.MONTH_TIMEKEY,
A.COL1,
A.COL2,
A.COL3,
A.QTY
FROM TEST_TAB A
JOIN (
SELECT DISTINCT B.COL1,B.COL2,S2.DATE_TIMEKEY,S2.MONTH_TIMEKEY
FROM TEST_TAB B
JOIN CALENDAR S1
ON B.DATE_TIMEKEY = S1.DATE_TIMEKEY
JOIN CALENDAR S2
ON S1.MONTH_TIMEKEY = S2.MONTH_TIMEKEY
WHERE B.COL1 = 'AAA' AND B.COL2 = 'BBB'
) C ON A.DATE_TIMEKEY = C.DATE_TIMEKEY
AND A.COL1 = C.COL1
AND A.COL2 = C.COL2
测试2:MONTH_TIMEKEY不取C的字段,SQL同样1s执行出来
SELECT
SUBSTR(A.DATE_TIMEKEY,1,4)||'M'||SUBSTR(A.DATE_TIMEKEY,5,2) MONTH_TIMEKEY,
A.COL1,
A.COL2,
A.COL3,
SUM(A.QTY) QTY
FROM TEST_TAB A
JOIN (
SELECT DISTINCT B.COL1,B.COL2,S2.DATE_TIMEKEY,S2.MONTH_TIMEKEY
FROM TEST_TAB B
JOIN CALENDAR S1
ON B.DATE_TIMEKEY = S1.DATE_TIMEKEY
JOIN CALENDAR S2
ON S1.MONTH_TIMEKEY = S2.MONTH_TIMEKEY
WHERE B.COL1 = 'AAA' AND B.COL2 = 'BBB'
) C ON A.DATE_TIMEKEY = C.DATE_TIMEKEY
AND A.COL1 = C.COL1
AND A.COL2 = C.COL2
GROUP BY
SUBSTR(A.DATE_TIMEKEY,1,4)||'M'||SUBSTR(A.DATE_TIMEKEY,5,2),
A.COL1,
A.COL2,
A.COL3
这种方式可以解决按月汇总的问题,但由于周无法直接从日期字段中截取,因此无法解决按周汇总的问题。
测试3:参考测试1结果,使用临时表(使用materialize hint)固化后再按周汇总
WITH D AS (
SELECT
C.WEEK_TIMEKEY,
A.COL1,
A.COL2,
A.COL3,
A.QTY
FROM TEST_TAB A
JOIN (
SELECT DISTINCT B.COL1,B.COL2,S2.DATE_TIMEKEY,S2.WEEK_TIMEKEY
FROM TEST_TAB B
JOIN CALENDAR S1
ON B.DATE_TIMEKEY = S1.DATE_TIMEKEY
JOIN CALENDAR S2
ON S1.WEEK_TIMEKEY = S2.WEEK_TIMEKEY
WHERE B.COL1 = 'AAA' AND B.COL2 = 'BBB'
) C ON A.DATE_TIMEKEY = C.DATE_TIMEKEY
AND A.COL1 = C.COL1
AND A.COL2 = C.COL2
)
SELECT /*+materialize*/
D.WEEK_TIMEKEY,
D.COL1,
D.COL2,
D.COL3,
SUM(D.QTY) QTY
FROM TEST_TAB D
GROUP BY
D.WEEK_TIMEKEY,
D.COL1,
D.COL2,
D.COL3
测试结果也是可以1s执行出来