SORT MERGE JOIN 主要用于两表非等值关联。 例如 >,>=,<,<=,<>
但不能用于 instr,substr,like,regexp_like关联(只能走NL)
SORT MERGE JOIN : 两表关联,先对两表根据连接列进行排序,将较小的表作为驱动表,然后从驱动表中取出连接列的值。去已完成排序的被驱动表中进行数据匹配,成功匹配,表明关联成功。驱动表返回多少行,被驱动表就会被匹配多少次。与NL相似,但NL是从被驱动表的索引中进行数据匹配。而SORT MERGE JOIN 是在内存(PGA)中进行数据匹配。
scott@orclpdb1:orclcdb> select /*+gather_plan_statistics*/
2 e.ename, e.job, d.dname
from emp e, dept d
4 where e.deptno >= d.deptno;
ENAME JOB DNAME
---------- --------- --------------
CLARK MANAGER ACCOUNTING
KING PRESIDENT ACCOUNTING
MILLER CLERK ACCOUNTING
JONES MANAGER ACCOUNTING
FORD ANALYST ACCOUNTING
ADAMS CLERK ACCOUNTING
SMITH CLERK ACCOUNTING
SCOTT ANALYST ACCOUNTING
WARD SALESMAN ACCOUNTING
TURNER SALESMAN ACCOUNTING
ALLEN SALESMAN ACCOUNTING
JAMES CLERK ACCOUNTING
BLAKE MANAGER ACCOUNTING
MARTIN SALESMAN ACCOUNTING
JONES MANAGER RESEARCH
FORD ANALYST RESEARCH
ADAMS CLERK RESEARCH
SMITH CLERK RESEARCH
SCOTT ANALYST RESEARCH
WARD SALESMAN RESEARCH
TURNER SALESMAN RESEARCH
ALLEN SALESMAN RESEARCH
JAMES CLERK RESEARCH
BLAKE MANAGER RESEARCH
MARTIN SALESMAN RESEARCH
WARD SALESMAN SALES
TURNER SALESMAN SALES
ALLEN SALESMAN SALES
JAMES CLERK SALES
BLAKE MANAGER SALES
MARTIN SALESMAN SALES
31 rows selected.
scott@orclpdb1:orclcdb> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID g38n2ng1g2jma, child number 0
-------------------------------------
select /*+gather_plan_statistics*/ e.ename, e.job, d.dname from emp
e, dept d where e.deptno >= d.deptno
Plan hash value: 844388907
-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 31 |00:00:00.01 | 14 | | | |
| 1 | MERGE JOIN | | 1 | 28 | 31 |00:00:00.01 | 14 | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 1 | 4 | 4 |00:00:00.01 | 8 | | | |
| 3 | INDEX FULL SCAN | PK_DEPT | 1 | 4 | 4 |00:00:00.01 | 4 | | | |
|* 4 | SORT JOIN | | 4 | 14 | 31 |00:00:00.01 | 6 | 2048 | 2048 | 2048 (0)|
| 5 | TABLE ACCESS FULL | EMP | 1 | 14 | 14 |00:00:00.01 | 6 | | | |
-----------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("E"."DEPTNO">="D"."DEPTNO")
filter("E"."DEPTNO">="D"."DEPTNO")
24 rows selected.
scott@orclpdb1:orclcdb>
从上述执行计划中可知道,DEPT为驱动表,EMP为被驱动表。驱动表扫描一次,被驱动表也扫描一次。如下图标注所示

因DEPT 走INDEX FULL SCAN,INDEX FULL SCAN返回的数据是有序的,故DEPT无须排序。EMP走TABLE ACCESS FULL,返回数据是无序的,故此EMP在PGA中进行排序。
实际工作中一定要注意INDEX FULL SCAN返回的数据量,若INDEX FULL SCAN返回数据行数过多(INDEX FULL SCAN产生了回表,INDEX FULL SCAN单块读,回表也是单块读,应该走全表扫描性能更佳),则应强制更改执行计划,走全表扫描。
故此将DEPT表强制走全表扫描,执行计划如下:
scott@orclpdb1:orclcdb> select /*+ full(d)*/
2 e.ename, e.job, d.dname
3 from emp e, dept d
4 where e.deptno >= d.deptno;
ENAME JOB DNAME
---------- --------- --------------
CLARK MANAGER ACCOUNTING
KING PRESIDENT ACCOUNTING
MILLER CLERK ACCOUNTING
JONES MANAGER ACCOUNTING
FORD ANALYST ACCOUNTING
ADAMS CLERK ACCOUNTING
SMITH CLERK ACCOUNTING
SCOTT ANALYST ACCOUNTING
WARD SALESMAN ACCOUNTING
TURNER SALESMAN ACCOUNTING
ALLEN SALESMAN ACCOUNTING
JAMES CLERK ACCOUNTING
BLAKE MANAGER ACCOUNTING
MARTIN SALESMAN ACCOUNTING
JONES MANAGER RESEARCH
FORD ANALYST RESEARCH
ADAMS CLERK RESEARCH
SMITH CLERK RESEARCH
SCOTT ANALYST RESEARCH
WARD SALESMAN RESEARCH
TURNER SALESMAN RESEARCH
ALLEN SALESMAN RESEARCH
JAMES CLERK RESEARCH
BLAKE MANAGER RESEARCH
MARTIN SALESMAN RESEARCH
WARD SALESMAN SALES
TURNER SALESMAN SALES
ALLEN SALESMAN SALES
JAMES CLERK SALES
BLAKE MANAGER SALES
MARTIN SALESMAN SALES
31 rows selected.
scott@orclpdb1:orclcdb> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f6p4ampywwx9n, child number 0
-------------------------------------
select /*+ full(d)*/ e.ename, e.job, d.dname from emp e, dept d
where e.deptno >= d.deptno
Plan hash value: 1407029907
------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
| 1 | MERGE JOIN | | 28 | | | |
| 2 | SORT JOIN | | 4 | 2048 | 2048 | 2048 (0)|
| 3 | TABLE ACCESS FULL| DEPT | 4 | | | |
|* 4 | SORT JOIN | | 14 | 2048 | 2048 | 2048 (0)|
| 5 | TABLE ACCESS FULL| EMP | 14 | | | |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("E"."DEPTNO">="D"."DEPTNO")
filter("E"."DEPTNO">="D"."DEPTNO")
Note
-----
- Warning: basic plan statistics not available. These are only collected when:
* hint 'gather_plan_statistics' is used for the statement or
* parameter 'statistics_level' is set to 'ALL', at session or system level
30 rows selected.
scott@orclpdb1:orclcdb>
若两表是等值关联,不建议走SORT MERGE JOIN,由于SORT MERGE JOIN将两表都放入PGA中,消耗大量的PGA。 而HASH JOIN 仅需将驱动表放入PGA中,SORT MERGE JOIN 比HASH JOIN 消耗更多的PGA资源。即便SORT MERGE JOIN 中有一个表走INDEX FULL SCAN,另外一个表也需要放入PGA中,放入PGA中的表往往是大表。若走HASH JOIN ,大表会被作为被驱动表,是不会被放入PGA中。故此两表等值连接时,要么走NL(SQL返回行的数据量少),要么走HASH JOIN (SQL返回数据行数据量多),一般情况不走SORT MERGE JOIN。
Q: 如何优化SORT MERGE JOIN?
- 若两表关联为等值连接,走的是SORT MERGE JOIN,则可以让其走HASH JOIN .
- 若两表关联为非等值连接,像>,>=,<,<=,<>, 可以先从业务下手,将非等值转换为等值。若无法进行非等值对等值的转换,则可以考虑增加过滤条件,将两表的数据量减少,外加开启并行查询加速SQL执行速度。
| Table connection way | Driver Table | PGA | Output Collection | non-equivalence connection | drivred table return rows be scaned |
| NL | 有 | 不消耗 | 少 | support | 等于驱动表返回的行数 |
| HASH JOIN | 有 | 消耗 | 多 | don't support | 1 |
| SORT MERGE JOIN | 无 | 消耗 | 多 | support | 1 |
305

被折叠的 条评论
为什么被折叠?



