两表关联返回少量数据应该走嵌套循环,两表关联返回数据量大的应该走HASH JOIN连接。
HASH JOIN :两表等值关联,将数据量小的表作为驱动表,将数据量大的表作为被驱动表,对驱动表的连接列进行HASH 生成HASH TABLE ,并且也对被驱动表的连接列进行HASH生成HASH TABLE。能够匹配的数据关联上。HASH JOIN 只支持等值连接。
scott@orclpdb1:orclcdb>
select /*+gather_plan_statistics use_hash(e,d)*/
2 e.ename, e.job, d.dname
from emp e, dept d
4 where e.deptno = d.deptno;
ENAME JOB DNAME
---------- --------- --------------
SMITH CLERK RESEARCH
ALLEN SALESMAN SALES
WARD SALESMAN SALES
JONES MANAGER RESEARCH
MARTIN SALESMAN SALES
BLAKE MANAGER SALES
CLARK MANAGER ACCOUNTING
SCOTT ANALYST RESEARCH
KING PRESIDENT ACCOUNTING
TURNER SALESMAN SALES
ADAMS CLERK RESEARCH
JAMES CLERK SALES
FORD ANALYST RESEARCH
MILLER CLERK ACCOUNTING
14 rows selected.
scott@orclpdb1:orclcdb> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID d3uaffp94gd3x, child number 0
-------------------------------------
select /*+gather_plan_statistics use_hash(e,d)*/ e.ename, e.job,
d.dname from emp e, dept d where e.deptno = d.deptno
Plan hash value: 615168685
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 14 |00:00:00.01 | 54 | 5 | | | |
|* 1 | HASH JOIN | | 1 | 14 | 14 |00:00:00.01 | 54 | 5 | 1695K| 1695K| 1079K (0)|
| 2 | TABLE ACCESS FULL| DEPT | 1 | 4 | 4 |00:00:00.01 | 6 | 5 | | | |
| 3 | TABLE ACCESS FULL| EMP | 1 | 14 | 14 |00:00:00.01 | 7 | 0 | | | |
-------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("E"."DEPTNO"="D"."DEPTNO")
21 rows selected.
scott@orclpdb1:orclcdb>
执行计划中离HASH JOIN 最近的表为驱动表,则DEPT为驱动表,EMP为被驱动表。驱动表 扫描一次,被驱动表也扫描一次。
Note: NL中被驱动表需要扫描多次, HASH JOIN 中被驱动表只需扫描一次。
Used-Mem表示HASH JOIN 消耗的PGA , 当驱动表太大、PGA不能完全容纳驱动表时,驱动表就会溢出到临时表空间,进而产生磁盘HASH JOIN。此刻HASH JOIN 会产生严重的性能问题。而NL无须消耗PGA.
NL 会进行值传递。(NL循环一次,驱动表连接列传值给被驱动表连接列)
HASH JOIN 没有传值的过程。(HASH JOIN 的驱动表与被驱动表的连接列都无须创建索引)
OLTP 环境(一般高并发小事务居多):SQL执行计划多以NL为主。SGA 设置较大,PGA设置较小。(NL不消耗PGA)
OLAP环境 (多数SQL都是大规模ETL),SQL返回结果集很多,SQL执行计划通常以HASH连接为主。往往会消耗大量的PGA,故PGA设置较大。
当两表使用外连接进行关联,如果执行计划走的是HASH JOIN ,想要更改驱动表,我们需要使用swap_join_inputs, 而不是leading.
scott@orclpdb1:orclcdb>
explain plan for select /*+use_hash(d,e) leading(e)*/
2 *
from dept d
4 left join emp e
5 on d.deptno = e.deptno;
Explained.
scott@orclpdb1:orclcdb> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3713469723
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 870 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN OUTER | | 15 | 870 | 6 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPT | 4 | 80 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 532 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPTNO"="E"."DEPTNO"(+))
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 2 (U - Unused (2))
---------------------------------------------------------------------------
1 - SEL$2BFA4EE4
U - leading(e)
2 - SEL$2BFA4EE4 / D@SEL$1
U - use_hash(d,e)
25 rows selected.
scott@orclpdb1:orclcdb>
从上执行计划中所得,两表关联走的是HASH JOIN , 表明use_hash(d,e)生效,而驱动表为DEPT故设置leading(e) 未生效。因此需要使用swap_join_inputs来更改外连接中HASH JOIN的驱动表。
scott@orclpdb1:orclcdb> explain plan for select /*+use_hash(d,e) swap_join_inputs(e)*/
2 *
from dept d
4 left join emp e
5 on d.deptno = e.deptno;
Explained.
scott@orclpdb1:orclcdb> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3590956717
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 870 | 6 (0)| 00:00:01 |
|* 1 | HASH JOIN RIGHT OUTER| | 15 | 870 | 6 (0)| 00:00:01 |
| 2 | TABLE ACCESS FULL | EMP | 14 | 532 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | DEPT | 4 | 80 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("D"."DEPTNO"="E"."DEPTNO"(+))
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
3 - SEL$2BFA4EE4 / D@SEL$1
U - use_hash(d,e)
22 rows selected.
scott@orclpdb1:orclcdb>
swap_join_inputs(e) 更改了外连接中的HASH JOIN 的驱动表
Q: 如何优化HASH JOIN ?
- 避免写SELECT * FROM 语句(由于HASH JOIN 将驱动表的select 检索列 和 join连接列 放到PGA中,尽可能的减少对PGA的占用,避免驱动表被溢出到临时表空间,从而提升查询性能。)
- 对表进行拆分。