半连接是指两个数据集之间的一种特殊连接类型,驱动数据集中返回的数据行会根据被驱动集出现或不出现至少一个相匹配的数据行来确定返回至多一条或不返回数据行。也就是说半连接和普通的内连接的区别就是,半连接会去重。
半连接是一个可以提升某些查询性能的优化方法,这里总结几点使用半连接的必要条件。
1、语句在IN(=ANY)或EXISTS中有子查询(能够做子查询展开)。
2、使用EXISTS语法时为了使得到的结果有意义,要使用相关的子查询。
3、IN和EXISTS子句不能包含于OR分支中。
注:这里的代码清单里所涉及的表是HR Sample Schema的表。
代码清单1-1 IN的半连接与普通连接
- select count(DEPARTMENT_ID)
- from DEPARTMENTS
- where DEPARTMENT_ID in (select DEPARTMENT_ID from EMPLOYEES);
-
- COUNT(DEPARTMENT_ID)
- --------------------
- 11
----------------------------------------------------------
Plan hash value: 1973601309
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | NESTED LOOPS SEMI| | 10 | 70 | 1 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | DEPT_ID_PK | 27 | 108 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
- select count(dept.DEPARTMENT_ID) from DEPARTMENTS dept,EMPLOYEES emp
- where dept.DEPARTMENT_ID = emp.DEPARTMENT_ID;
-
- COUNT(DEPT.DEPARTMENT_ID)
- -------------------------
- 106
----------------------------------------------------------
Plan hash value: 2265163291
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | INDEX FULL SCAN| EMP_DEPARTMENT_IX | 106 | 318 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("EMP"."DEPARTMENT_ID" IS NOT NULL)
==>上述代码中,有两点需要注意。
1、使用IN的代码中的执行计划步骤2中的关键字‘NESTED LOOPS SEMI’,就是半连接的提现。此外,这里做了子查询嵌套,这里不详细说明。
2、半连接和普通连接的返回值是11:106,说明这里的半连接的确有去重(即在被驱动集中找到一条满足的记录后,立马停止搜索,不管被驱动集中后面重复的记录直接返回驱动集中的记录), 这里的普通连接可以试着使用DISTINCT函数来去重。
代码清单1-2 EXITS半连接的例子及不相关子查询
- SELECT count(DEPARTMENT_ID) FROM departments
- WHERE EXISTS
- (SELECT * FROM employees
- WHERE departments.department_id = employees.department_id);
-
- COUNT(DEPARTMENT_ID)
- --------------------
- 11
----------------------------------------------------------
Plan hash value: 1973601309
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | NESTED LOOPS SEMI| | 10 | 70 | 1 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | DEPT_ID_PK | 27 | 108 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("DEPARTMENTS"."DEPARTMENT_ID"="EMPLOYEES"."DEPARTMENT_ID")
- SELECT count(DEPARTMENT_ID) FROM departments
- WHERE EXISTS
- (SELECT * FROM DUAL);
-
- COUNT(DEPARTMENT_ID)
- --------------------
- 27
----------------------------------------------------------
Plan hash value: 1192314091
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
|* 2 | FILTER | | | | |
| 3 | INDEX FULL SCAN| DEPT_ID_PK | 27 | 1 (0)| 00:00:01 |
| 4 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( EXISTS (SELECT 0 FROM "SYS"."DUAL" "DUAL"))
==>在这部分的代码清单中,可以看到第一条sql的执行计划步骤2中出现了关键字‘NESTED LOOPS SEMI’,这也是半连接的提现。
另外,使用IN和EXISTS执行计划是一致的,这也消除了使用IN和EXISTS对查询处理使用不同方法的观点。
第二条sql展示了非相关子查询的示例,这里相当于对departments表进行了全表扫描,但这并不是我们想要的结果。
代码清单1-3 用=ANY 替代IN和EXISTS
- select count(DEPARTMENT_ID)
- from DEPARTMENTS
- where DEPARTMENT_ID =ANY (select DEPARTMENT_ID from EMPLOYEES);
-
- COUNT(DEPARTMENT_ID)
- --------------------
- 11
----------------------------------------------------------
Plan hash value: 1973601309
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | NESTED LOOPS SEMI| | 10 | 70 | 1 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | DEPT_ID_PK | 27 | 108 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
==> =any其实就是IN和EXISTS的可替换写法。
代码清单1-4 使用or分支禁用半连接
- select count(DEPARTMENT_ID)
- from DEPARTMENTS
- where 1=2 or DEPARTMENT_ID in (select DEPARTMENT_ID from EMPLOYEES);
-
- COUNT(DEPARTMENT_ID)
- --------------------
- 11
----------------------------------------------------------
Plan hash value: 2957648803
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 2 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX FULL SCAN | DEPT_ID_PK | 1 | 4 | 1 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| EMP_DEPARTMENT_IX | 2 | 6 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( EXISTS (SELECT 0 FROM "EMPLOYEES" "EMPLOYEES" WHERE
"DEPARTMENT_ID"=:B1))
3 - access("DEPARTMENT_ID"=:B1)
- SELECT count(DEPARTMENT_ID) FROM departments
- WHERE EXISTS
- (SELECT * FROM employees
- WHERE 1=2 or departments.department_id = employees.department_id);
-
- COUNT(DEPARTMENT_ID)
- --------------------
- 11
----------------------------------------------------------
Plan hash value: 2957648803
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 15 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX FULL SCAN | DEPT_ID_PK | 1 | 4 | 1 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN| EMP_DEPARTMENT_IX | 2 | 6 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter( EXISTS (SELECT 0 FROM "EMPLOYEES" "EMPLOYEES" WHERE
"EMPLOYEES"."DEPARTMENT_ID"=:B1))
3 - access("EMPLOYEES"."DEPARTMENT_ID"=:B1)
==》or 分支中使用IN和EXISTS子句,这里执行计划中没有出现‘xxx SEMI’关键字,这是未走半连接的提现。
最后介绍下如何控制半连接执行计划
1、使用提示(hint)
SEMIJOIN-进行半连接
NO_SEMIJOIN-不进行半连接
2、通过隐含参数
_always_semi_join 默认值为CHOOSE,允许优化器选择最高效的执行方法。将值设置为OFF则禁用了半连接,可在实例级别设定该参数。
end!
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/25923810/viewspace-2130719/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/25923810/viewspace-2130719/