背景:某系统跑批的一个存储过程一直报错ORA-1652: unable to extend temp segment by 128 in tablespace xxx
查看alertlog 获取报错的时间点:
收集 报错期间的AWR 直接在SQL Statistics 里面可以定位到SQL
MERGE INTO T_AA_BBB_CCCC_DDDD_E_G4H3 G4H3
USING (SELECT T.DATA_DT,
M.ITEM_ID,
COUNT(T.DEBT_ID) AS DEBT_NUM_Y,
COUNT(CASE
WHEN T.IS_DEFAULT = '1' THEN
T.DEBT_ID
END) AS N_D_DEBT_NUM,
COUNT(CASE
WHEN T.IS_DEFAULT = '1' THEN
T.DEBT_ID
END) /
DECODE(COUNT(T.DEBT_ID), 0, 11111, COUNT(T.DEBT_ID)) AS FACT_D_RATE,
:B3 AS ORG_ID
FROM T_AA_BBB_CCCC_HISTORY_TMP T
INNER JOIN T_AA_BBB_CCCC_DDD_MAP M ON T.EXPO_TYPE_IRB_YEAR LIKE
(M.EXPO_TYPE_IRB || '%')
AND T.PD >= M.PD_T
AND T.PD < M.PD_B
AND M.REPORT_FLAG = :B1
AND M.SHOW_GRADE = '2'
WHERE T.DATA_DT = :B2
AND T.IS_DEFAULT_YEAR = '0'
AND T.EXPO_TYPE_IRB_YEAR IS NOT NULL
AND T.PD_YEAR IS NOT NULL
AND T.EAD_YEAR IS NOT NULL
AND T.LGD_YEAR IS NOT NULL
GROUP BY M.ITEM_ID, T.DATA_DT) B
ON (G4H3.DATA_DT = B.DATA_DT AND G4H3.ITEM_ID = B.ITEM_ID AND G4H3.ORG_ID = :B3 AND G4H3.CALC_SCENE_ID = :B5 AND G4H3.SCOPE_FLAG = :B4)
WHEN MATCHED THEN
UPDATE
SET G4H3.DEBT_NUM_Y = B.DEBT_NUM_Y,
G4H3.N_D_DEBT_NUM = B.N_D_DEBT_NUM,
G4H3.FACT_D_RATE = B.FACT_D_RATE
查看SQL中涉及到的对象的详细信息:
GB OWNER SEGMENT_NAME
---------- ------------------------------ --------------------------------------------------------------------------------
76.2685546 RWA T_AA_BBB_CCCC_HISTORY_TMP
0.00012207 RWA T_AA_BBB_CCCC_DDD_MAP
0.00585937 RWA T_AA_BBB_CCCC_DDDD_E_G4H3
查看SQL的执行计划
Plan hash value: 4019117077
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 57 | 2729K (1)| 09:05:52 |
| 1 | HASH GROUP BY | | 1 | 57 | 2729K (1)| 09:05:52 |
| 2 | MERGE JOIN | | 1 | 57 | 2729K (1)| 09:05:52 |
| 3 | SORT JOIN | | 2 | 76 | 2729K (1)| 09:05:52 |
|* 4 | TABLE ACCESS FULL | T_AA_BBB_CCCC_HISTORY_TMP | 2 | 76 | 2729K (1)| 09:05:52 |
|* 5 | FILTER | | | | | |
|* 6 | SORT JOIN | | 72 | 1368 | 6 (17)| 00:00:01 |
|* 7 | TABLE ACCESS FULL| T_AA_BBB_CCCC_DDD_MAP | 72 | 1368 | 5 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("T"."PD_YEAR" IS NOT NULL AND "T"."LGD_YEAR" IS NOT NULL AND "T"."EAD_YEAR"
IS NOT NULL AND "T"."EXPO_TYPE_IRB_YEAR" IS NOT NULL AND "T"."DATA_DT"=:B2 AND
"T"."IS_DEFAULT_YEAR"='0')
5 - filter("T"."EXPO_TYPE_IRB_YEAR" LIKE "M"."EXPO_TYPE_IRB"||'%' AND
"T"."PD"<"M"."PD_B")
6 - access(INTERNAL_FUNCTION("T"."PD")>=INTERNAL_FUNCTION("M"."PD_T"))
filter(INTERNAL_FUNCTION("T"."PD")>=INTERNAL_FUNCTION("M"."PD_T"))
7 - filter("M"."REPORT_FLAG"=:B1 AND "M"."SHOW_GRADE"='2')
从执行计划中我们可以看出,SQL使用了SORT MERGE JOIN的连接方式(Id = 2,3),这也可以解释为什么会报临时表空间不足的错误。SORT MERGE JOIN的原理是先对两个表或者结果集进行排序,然后在按照NEST LOOP的方式进行循环连接去显示结果集。而涉及到排序的两个表 T_AA_BBB_CCCC_HISTORY_TMP (76GB),T_AA_BBB_CCCC_DDD_MAP (0) ,在PGA只有8G的情况下。还剩68G数据需要借助temp去排序
为什么SQL会选择SORT MERGE JOIN 而不是 HASH
我们之前提过HASH只能适用于等值连接,因为HASH的原理就是通过等值构造HASH桶
这时候回头看SQL的关联条件:
ON T.EXPO_TYPE_IRB_YEAR LIKE (M.EXPO_TYPE_IRB || '%') AND T.PD >= M.PD_T AND T.PD < M.PD_B
-----------------------> 不等值连接。排序合并连接的唯一适用场景就是不等值连接
所以 以上SQL可以通过如下方法优化:
1.连接列建索引,消除排序。(这也是排序合并连接的充分条件)
2.修改业务 让SQL变成等值连接,去选择HASH到连接方式
针对这个SQL,优先修改业务,如果不行最后考虑方法1。通过和开发商量确认 ON SUBSTR(T.EXPO_TYPE_IRB_YEAR,XX,XX) = M.EXPO_TYPE_IRB 可行
修改后重跑存储过程。搞定!