昨天晚上 我做了一个关于Hash Join的调整
老外说 某个JOB Hung住了,用户名是IDWSQ6
于是我查询等待事件:direct patch write temp
SQL> SELECT INST_ID,SID,EVENT,P1,P2,P3,MACHINE FROM GV$SESSION WHERE USERNAME='IDWSQ6';
INST_ID SID EVENT P1 P2 P3 MACHINE
---------- ---------- ------------------------------ ---------- ---------- ---------- -------------------
4 4752 SQL*Net message from client 1413697536 1 0 bdhp4423
4 4813 direct path write temp 20008 1132507 7 bdhp4423
等待事件为direct path write temp ,说明这个session在写数据到磁盘中
SQL> select a.username,a.inst_id, a.sid, a.serial#, a.machine,a.sql_id,b.tablespace, b.blocks*
2 (select value from v$parameter where name='db_block_size')/1024/1024 "Size(M)",b.segtype
3 from gv$session a, gv$tempseg_usage b where a.inst_id=b.inst_id and a.saddr = b.session_addr
4 and a.inst_id=4 and a.sid=4813;
USERNAME INST_ID SID SERIAL# MACHINE SQL_ID TABLESPACE Size(M) SEGTYPE
-------------------- ---------- ---------- ---------- --------------- --------------- --------------- ---------- --------
IDWSQ6 4 4813 35649 bdhp4423 bqamju47q25wu TEMP 1 LOB_DATA
IDWSQ6 4 4813 35649 bdhp4423 bqamju47q25wu TEMP 1893 HASH
根据查询我发现 是由于HASH JOIN导致的,并且占用了1893m
下面是 该SQL的执行计划:
select * from table(dbms_xplan.display_cursor('bqamju47q25wu','0'));
Plan hash value: 1285921926
----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | | | 63603 (100)| | | |
| 1 | LOAD AS SELECT | | | | | | | |
|* 2 | HASH JOIN OUTER | | 91 | 72709 | 63601 (49)| 00:10:37 | | |
|* 3 | HASH JOIN RIGHT OUTER | | 91 | 68523 | 48900 (49)| 00:08:09 | | |
|* 4 | MAT_VIEW ACCESS FULL | SH30_V_PROD_BOM_DENORM_FAC | 1 | 51 | 1847 (11)| 00:00:19 | | |
| 5 | VIEW | SH30_8530337_8453_0_TMP | 91 | 63882 | 47052 (51)| 00:07:51 | | |
|* 6 | HASH JOIN | | 91 | 28574 | 47051 (51)| 00:07:51 | | |
|* 7 | HASH JOIN | | 1 | 77 | 301 (12)| 00:00:04 | | |
|* 8 | HASH JOIN | | 14 | 784 | 224 (14)| 00:00:03 | | |
|* 9 | TABLE ACCESS BY INDEX ROWID | TIME_PERD | 301 | 6321 | 76 (4)| 00:00:01 | | |
|* 10 | INDEX RANGE SCAN | TIME_PERD_IDX3 | 602 | | 6 (17)| 00:00:01 | | |
|* 11 | HASH JOIN | | 564 | 19740 | 147 (19)| 00:00:02 | | |
|* 12 | TABLE ACCESS BY INDEX ROWID| TIME_PERD | 301 | 6321 | 76 (4)| 00:00:01 | | |
|* 13 | INDEX RANGE SCAN | TIME_PERD_IDX3 | 602 | | 6 (17)| 00:00:01 | | |
| 14 | INDEX FULL SCAN | TIME_PERD_IDX2 | 22887 | 312K| 63 (26)| 00:00:01 | | |
|* 15 | TABLE ACCESS BY INDEX ROWID | TIME_PERD | 301 | 6321 | 76 (4)| 00:00:01 | | |
|* 16 | INDEX RANGE SCAN | TIME_PERD_IDX3 | 602 | | 6 (17)| 00:00:01 | | |
| 17 | TABLE ACCESS FULL | SH30_8530337_8452_0_TMP | 6093K| 1377M| 44845 (49)| 00:07:29 | | |
| 18 | PARTITION RANGE SINGLE | | 5024K| 220M| 13130 (41)| 00:02:12 | 3 | 3 |
|* 19 | TABLE ACCESS FULL | PROD_UNIT_OF_MEASR | 5024K| 220M| 13130 (41)| 00:02:12 | 3 | 3 |
----------------------------------------------------------------------------------------------------------------------------------
我查看了SQL,执行计划是正确的。
但是这个 这个JOB 就一直Huang住,已经跑了10多个小时了。
其实在这个Job 被Huang住之前,之前的Job(不同的SQL) 也是由于这个原因被Huang住过。
那么出现这样的问题就比较郁闷了。我采取了一个很 “变态” 的方法:
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM x$ksppi x, x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE '%pga_max_size%'
7 /
NAME VALUE DESCRIB
------------------------------ ---------- ------------------------------
_pga_max_size 2147483648 Maximum size of the PGA memory
for one process
SQL> select 2147483648/1024/1024/1024 from dual;
2147483648/1024/1024/1024
-------------------------
2
Oracle有个限制 ,一个进程不能够分配超过2G的内存
SQL> alter session set workarea_size_policy = manual;
Session altered.
SQL> alter session set hash_area_size =2147483648;
alter session set hash_area_size =2147483648
*
ERROR at line 1:
ORA-02017: integer value required 这里超过了2G 就报错了
SQL> alter session set hash_area_size =2147483647;
Session altered.
我手工配置接近2G 的hash_area_size就解决了Job 被Huang住的问题。
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM x$ksppi x, x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE '%_smm_max_size%'
7 /
NAME VALUE DESCRIB
------------------------------ ---------- ------------------------------
_smm_max_size 1048576 maximum work area size in auto
mode (serial)
隐含参数_smm_max_size表示在auto 模式下 最大的work area
SQL> select 1048576/1024/1024 from dual;
1048576/1024/1024
-----------------
1
这里查询出来为1G
由于手动设置加大了HASH_AREA的大小,使得HASH JOIN能够尽可能的避免写数据到temp 表空间,所以解决了Job Huang 问题。大家如果遇到类似的case,不妨可以一试。当然了前提是 SQL不能优化的前提下。