哈希连接(HASH JOIN)是一种两个表在做表连接时主要依靠哈希运算来得到连接结果集的表连接方法。
对于排序合并连接,如果两个表在施加了目标SQL中指定的谓词条件后得到的结果集很大而且需要排序,则排序合并连接的执行效率一定不高;而对于嵌套循环连接,如果驱动表所对应的驱动结果集的记录数很大,即便在被驱动表的连接列上存在索引,此时使用嵌套循环连接的执行效率也会同样不高。为了解决这个问题,于是ORACLE引进了哈希连接。在ORACLE 10g及其以后的版本中,优化器 (实际上是CBO,因为哈希连接仅适用于CBO)在解析目标SQL的时候是否考虑哈希连接受限于隐含参数_HASH_JOIN_ENABLED,默认值是TRUE.
对于哈希连接的优缺点及适用场景如下:
a,哈希连接不一定会排序,或者说大多数情况下都不需要排序
b,哈希连接的驱动表所对应的连接列的选择性尽可能好。
c,哈希只能用于CBO,而且只能用于等值连接的条件。(即使是哈希反连接,ORACLE实际上也是将其换成等值连接)。
c,哈希连接很适用小表和大表之间做连接且连接结果集的记录数较多的情形,特别是小表的选择性非常好的情况下,这个时候哈希连接的执行时间就可以近似看做和全表扫描个个大表的费用时间相当。
e,当两个哈希连接的时候,如果在施加了目标SQL中指定的谓词条件后得到的数据量较小的那个结果集所对应的HASH TABLE能够完全被容纳在内存中(PGA的工作区),此时的哈希连接的执行效率非常高。
oracle表之间的连接之哈希连接(Hash Join),其特点如下:
1,驱动表和被驱动表都是最多只被访问一次。
2,哈希连接的表有驱动顺序。
3,哈希表连接的表无需要排序,但是他在做连接之前做哈希运算的时候,会用到HASH_AREA_SIZE来创建哈希表。
4,哈希连接不适用于的连接条件是:不等于<>,大于>,小于<,小于等于<=,大于等于>=,like。
5,哈希连接索引列在表连接中无特殊要求,与单表情况无异。
下面我来做个实验来证实如上的结论:
具体的测试基础表请查看本人Blog 如下链接:
oracle表连接之----〉嵌套循环(Nested Loops Join)
1,驱动表和被驱动表的访问次数
测试T2表仅被访问1次
SQL> select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id;
此处省略执行的结果记录集
SQL> select sql_id, child_number, sql_text from v$sql where sql_text like '%use_hash(t2)%';
SQL_ID CHILD_NUMBER SQL_TEXT
------------- ------------ --------------------------------------------------------------------------------
7d64k5stnc3sk 0 select sql_id, child_number, sql_text from v$sql where sql_text like '%use_hash
036fyatp73h9n 0 select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id
SQL> select * from table(dbms_xplan.display_cursor('036fyatp73h9n',0,'allstats last'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID 036fyatp73h9n, child number 0
-------------------------------------
select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id
Plan hash value: 1838229974
--------------------------------------------------------------------------------
| Id | Operation | Name | Starts| E-Rows | A-Rows | A-Time | Buff
--------------------------------------------------------------------------------
|* 1 | HASH JOIN | | 1 | 100 | 100 |00:00:00.04 | 1
| 2 | TABLE ACCESS FULL| T1 | 1 | 100 | 100 |00:00:00.01 |
| 3 | TABLE ACCESS FULL| T2 | 1 | 100K| 100K|00:00:00.01 | 1
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."T1_ID")
Note
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-----
- dynamic sampling used for this statement
23 rows selected
从上面的执行计划可以看出,HASH连接中,驱动表和被驱动表都只会被访问1次。
下面的实验来证明驱动表和被驱动表都是被访问0次。
SQL> select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id and 1=2;
ID NUM INFORMATION ID T1_ID NUM INFORMATION
---------- ---------- -------------------------------------------------------------------------------- ---------- ---------- ---------- --------------------------------------------------------------------------------
SQL> select sql_id, child_number, sql_text from v$sql where sql_text like '%use_hash(t2)%';
SQL_ID CHILD_NUMBER SQL_TEXT
------------- ------------ --------------------------------------------------------------------------------
7d64k5stnc3sk 0 select sql_id, child_number, sql_text from v$sql where sql_text like '%use_hash
cknub2x1sx8tn 0 select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id and 1=2
2jhn0mg57v1tz 0 select sql_id, child_number, sql_text from v$sql where sql_text like '%use_hash
036fyatp73h9n 0 select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id
SQL> select * from table(dbms_xplan.display_cursor('cknub2x1sx8tn',0,'allstats last'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID cknub2x1sx8tn, child number 0
-------------------------------------
select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id and 1=2
Plan hash value: 487071653
--------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | OM
--------------------------------------------------------------------------------
|* 1 | FILTER | | 1 | | 0 |00:00:00.01 |
|* 2 | HASH JOIN | | 0 | 100 | 0 |00:00:00.01 | 7
| 3 | TABLE ACCESS FULL| T1 | 0 | 100 | 0 |00:00:00.01 |
| 4 | TABLE ACCESS FULL| T2 | 0 | 100K| 0 |00:00:00.01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NULL IS NOT NULL)
2 - access("T1"."ID"="T2"."T1_ID")
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Note
-----
- dynamic sampling used for this statement
25 rows selected
通过上面的两个执行计划可以得出结论:HASH连接中,驱动表和被驱动表都只会被访问1次或者0次。
2,哈希连接的表的驱动顺序
下面是T1作为驱动表
SQL〉select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id and t1.num=20;
Elapsed: 00:00:00.23
SQL> select sql_id,child_number,sql_text from v$sql where sql_text like '%leading(t1) use_hash(t2)%';
SQL_ID CHILD_NUMBER
------------- ------------
SQL_TEXT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0r4hzmyfz74gh 0
select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id and t1.num=20
8wyc5rgupmsrc 0
select sql_id,child_number,sql_text from v$sql where sql_text like '%leading(t1) use_hash(t2)%'
Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display_cursor('0r4hzmyfz74gh',0,'allstats last'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0r4hzmyfz74gh, child number 0
-------------------------------------
select /*+ leading(t1) use_hash(t2)*/ * from t1,t2 where t1.id=t2.t1_id and t1.num=20
Plan hash value: 1838229974
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.22 | 3480 | 764K| 764K| 299K (0)|
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1 | 1 |00:00:00.01 | 6 | | | |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 3 | TABLE ACCESS FULL| T2 | 1 | 87396 | 100K|00:00:00.01 | 3457 | | | |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."T1_ID")
2 - filter("T1"."NUM"=20)
Note
-----
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- dynamic sampling used for this statement
24 rows selected.
Elapsed: 00:00:00.01
下面是T2作为驱动表
SQL〉select /*+ leading(t2) use_hash(t1)*/ * from t1,t2 where t1.id=t2.t1_id and t1.num=20;
Elapsed: 00:00:01.27
SQL> select * from table(dbms_xplan.display_cursor('090z3a8qh9mrk',0,'allstats last'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 090z3a8qh9mrk, child number 0
-------------------------------------
select /*+ leading(t2) use_hash(t1)*/ * from t1,t2 where t1.id=t2.t1_id and t1.num=20
Plan hash value: 2959412835
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
--------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:01.23 | 3545 | 1072 | 1071 | 14M| 2028K| 18M (0)| 9216 |
| 2 | TABLE ACCESS FULL| T2 | 1 | 87396 | 100K|00:00:00.01 | 3456 | 0 | 0 | | | | |
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 3 | TABLE ACCESS FULL| T1 | 1 | 1 | 1 |00:00:00.01 | 7 | 0 | 0 | | | | |
--------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."T1_ID")
3 - filter("T1"."NUM"=20)
Note
-----
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- dynamic sampling used for this statement
24 rows selected.
Elapsed: 00:00:00.01
从上面的执行计划可以看出,T1表作为驱动表的时候,执行时间是Elapsed: 00:00:00.23,used_mem是299K,而T2表作为驱动表的时候,执行时间是
Elapsed: 00:00:01.27,used_mem是18M。
结论:哈希连接中驱动表的顺序是非常重要的,性能差别也大。
3,哈希连接不适用于的连接条件是:不等于<>,大于>,小于<,小于等于<=,大于等于>=,like。
具体实验如下:
SQL> explain plan for select /*+ leading(t2) use_hash(t1)*/ * from t1,t2 where t1.id<>t2.t1_id and t1.num=20;SQL>
Explained.
Elapsed: 00:00:01.16
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4016936828
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99999 | 21M| 82709 (1)| 00:15:10 |
| 1 | NESTED LOOPS | | 99999 | 21M| 82709 (1)| 00:15:10 |
| 2 | TABLE ACCESS FULL| T2 | 100K| 10M| 710 (1)| 00:00:08 |
|* 3 | TABLE ACCESS FULL| T1 | 1 | 107 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("T1"."NUM"=20 AND "T1"."ID"<>"T2"."T1_ID")
15 rows selected.
Elapsed: 00:00:01.89
从上面可以看出执行计划走的是NESTED LOOPS JOIN。而不是HINT指定的HASH ,按照上面的方法可以证明哈希连接不支持不等值连接不等于<>,大于>,小于<,小于等于<=,大于等于>=,like。