一、适合hash join的情况
1) SQL演示:
SQL> create table tb as select * from dba_objects;
表已创建。//创建大表
SQL> create table ts as select * from dba_objects where rownum<100;
表已创建。//创建小表
SQL> select tb.* from tb,ts where tb.object_id=ts.object_id;
已选择99行。
执行计划
----------------------------------------------------------
Plan hash value: 2843080966
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 18810 | 175 (4)| 00:00:03 |
|* 1 | HASH JOIN | | 99 | 18810 | 175 (4)| 00:00:03 |
| 2 | TABLE ACCESS FULL| TS | 99 | 1287 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TB | 51783 | 8950K| 171 (3)| 00:00:03 |
---------------------------------------------------------------------------
// 分别对大表和小表进行全表扫描,在内存中构造两个hash结构,代价分别为171(大表)和3(小表 )。构造了hash结构后,后面关联的时候 代价就非常小了,每次匹配都可以直接根据键值寻址访问判定,代价增加约为1,最后总的代价是:171+3+1=175.
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TB"."OBJECT_ID"="TS"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
746 consistent gets / /一致性读比较低
735 physical reads
0 redo size
5173 bytes sent via SQL*Net to client
466 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
99 rows processed
SQL> select /*+ use_nl(tb ts) */ tb.* from tb,ts where tb.object_id=ts.object_id;
已选择99行。
执行计划
----------------------------------------------------------
Plan hash value: 3711445679
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 18810 | 16717 (3)| 00:03:21 |
| 1 | NESTED LOOPS | | 99 | 18810 | 16717 (3)| 00:03:21 |
| 2 | TABLE ACCESS FULL| TS | 99 | 1287 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| TB | 1 | 177 | 169 (3)| 00:00:03 |
---------------------------------------------------------------------------
// 以小表作驱动表,逐行到大表中寻找是否有匹配的记录,由于两个表上都没有索引,无论是驱动表还是inner table都是走全表扫描,驱动表(小表)共有99行,故循环99次,每次循环都要对inner table(大表)进行全表扫描寻找是否有匹配的记录,每次循环内表扫描的代价为169,CBO预计每次循环内表返回一条记录;169×99=16731,在加上其他一些因素(例如CPU处理时间等)的考虑,故最后总的代价为16717(与上面计算16731在同一数量级,而且非常接近)。
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("TB"."OBJECT_ID"="TS"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
72880 consistent gets // 一致性读相当高,7万多。
0 physical reads
0 redo size
5173 bytes sent via SQL*Net to client
466 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
99 rows processed
2)分析说明:
Ø 使用dba_objects构造分别构造大表TB(7万多行)和小表TS(99行),两个表上均无索引。
Ø 执行等值查询;
Ø CBO默认走hash join,通过上面与nested loop join的执行计划对比可以看出,hash join 的执行效率确实比较高;具体原因可能是,hash join在第一次扫描两个表并构造了hash 结构之后,扫描驱动表并匹配的代价非常低,虽然在把大表构造成hash 结构时消耗了一定的代价,但这个还是值得。
Ø 每次进行实验前最好先用“ALTER SYSTEM FLUSH BUFFER_CACHE;”把缓存区清空,若连续执行多次同一个SQL,会发现第二次执行计算的代价与第一执行的代价略有不同,可能跟第二次执行时不需要再进行物理读有关。
二、适合nested loop join的
1) SQL演示:
SQL> create table tb as select * from dba_objects;
表已创建。//创建大表
SQL> create table ts as select * from dba_objects where rownum<100;
表已创建。//创建小表
SQL> create index idx_tb on tb(object_id);
索引已创建。//在大表上创建索引
SQL> select * from tb,ts where tb.object_id=ts.object_id;
执行计划
----------------------------------------------------------
Plan hash value: 2538607451
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 40986 | 201 (0)| 00:00:03 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 99 | 40986 | 201 (0)| 00:00:03 |
| 3 | TABLE ACCESS FULL | TS | 99 | 20493 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_TB | 1 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TB | 1 | 207 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
// 以小表作驱动表,逐行到大表中寻找是否有匹配的记录,大表在关联键上有索引,所以在匹配时走INDEX RANGE SCAN,每匹配一行的代价为2,(主要看TABLE ACCESS BY INDEX ROWID 的代价),故总的代价为99×2+3=198+3=201
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("TB"."OBJECT_ID"="TS"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement (level=2)
SQL> select /*+ use_hash(tb ts) */ * from tb,ts where tb.object_id=ts.object_id;
执行计划
----------------------------------------------------------
Plan hash value: 2843080966
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 40986 | 296 (2)| 00:00:04 |
|* 1 | HASH JOIN | | 99 | 40986 | 296 (2)| 00:00:04 |
| 2 | TABLE ACCESS FULL| TS | 99 | 20493 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TB | 69437 | 13M| 292 (2)| 00:00:04 |
---------------------------------------------------------------------------
//对大表进行hash构造,消耗代价292,292+3=295;再综合其他因素例如cpu计算时间等,最终的代价为296,这个SQL进行hash join的代价比nested loop的代价(201)高
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TB"."OBJECT_ID"="TS"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement (level=2)
2)分析说明:
使用dba_objects构造分别构造大表TB(7万多行)和小表TS(99行),并在大表上建立索引。
在大表的索引键上执行等值查询;
CBO默认走nested loop join,通过上面与hash join的执行计划对比可以看出,nested loop join 的执行效率确实比较高一些;具体原因可能是,内表在索引键上建立了索引,nested loop join的每次循环中,扫描内表时可以走索引,大大的降低了每次循环匹配的消耗,即使嵌套循环了,总体代价还是比在内存中构造大表的hash结构的代价更低一些
这个实验在10g上并不能测试成功,在10g上同样的sql默认走的是hash join。在10g中NL的适用场景又是什么呢?
(下面是11g中的应用场景
关联中一个表比较小
被关联表的关联字段上没有索引
索引的键值的重复率不应该很高)
三、适合merge join的??
1) SQL演示:
SQL> create table t1 as select * from dba_objects;
表已创建。
SQL> create table t2 as select * from dba_objects;
表已创建。
SQL> select a.* from (select * from t1 where object_id<100 order by object_id)a,(select * from t2 where object_id<100 order by object_id)b where a.object_id>50;
已选择4802行。
执行计划
----------------------------------------------------------
Plan hash value: 3344246350
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 140 | 30800 | 3765 (1)| 00:00:46 |
| 1 | SORT ORDER BY | | 140 | 30800 | 3765 (1)| 00:00:46 |
| 2 | MERGE JOIN CARTESIAN| | 140 | 30800 | 3764 (1)| 00:00:46 |
| 3 | VIEW | | 12 | 2484 | 292 (2)| 00:00:04 |
| 4 | SORT ORDER BY | | 12 | 2484 | 292 (2)| 00:00:04 |
|* 5 | TABLE ACCESS FULL| T1 | 12 | 2484 | 291 (1)| 00:00:04 |
| 6 | BUFFER SORT | | 12 | 156 | 3765 (1)| 00:00:46 |
|* 7 | TABLE ACCESS FULL | T2 | 12 | 156 | 289 (1)| 00:00:04 |
//CBO默认走Merge join ,两个表均预先排序,排序时走全表扫描,代价均接近300,排序的代价比较大(3765),但排序后连接的代价增加得很小(最终返回的Cost也是3765)
其实不是能看懂这个执行计划的cost是如何累加的,希望老师能找个时间讲解一下。^_^
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("OBJECT_ID"<100 AND "T1"."OBJECT_ID">50)
7 - filter("OBJECT_ID"<100)
Note
-----
- dynamic sampling used for this statement (level=2)
统计信息
----------------------------------------------------------
372 recursive calls
0 db block gets
2252 consistent gets
2070 physical reads //一致性读和物理读的次数均为2000多。
0 redo size
194177 bytes sent via SQL*Net to client
4039 bytes received via SQL*Net from client
322 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
4802 rows processed
SQL> select /*+ use_nl(a b) */ a.* from (select * from t1 where object_id<100 order by object_id)a,(select * from t2 where object_id<100 order by object_id)b where a.object_id>50;
已选择4802行。
执行计划
----------------------------------------------------------
Plan hash value: 1185711386
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 140 | 30800 | 3765 (1)| 00:00:46 |
| 1 | SORT ORDER BY | | 140 | 30800 | 3765 (1)| 00:00:46 |
| 2 | NESTED LOOPS | | 140 | 30800 | 3764 (1)| 00:00:46 |
| 3 | VIEW | | 12 | 2484 | 292 (2)| 00:00:04 |
| 4 | SORT ORDER BY | | 12 | 2484 | 292 (2)| 00:00:04 |
|* 5 | TABLE ACCESS FULL| T1 | 12 | 2484 | 291 (1)| 00:00:04 |
|* 6 | TABLE ACCESS FULL | T2 | 12 | 156 | 289 (1)| 00:00:04 |
//使用hints强制走nested loop join,同样是先全表扫描,全表扫的代价跟上面的差不多;这里应该是T1做驱动表,T1预计返回12行,循环12次,计算所得的代价约为 12×289+291=3468+291=3759,再加上其他因素影响,最终计算的cost为3765,跟上面的推算相当接近了。
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("OBJECT_ID"<100 AND "T1"."OBJECT_ID">50)
6 - filter("OBJECT_ID"<100)
Note
-----
- dynamic sampling used for this statement (level=2)
统计信息
----------------------------------------------------------
7 recursive calls
0 db block gets
52136 consistent gets //一致性读为5万多,比merge join的数量级高很多
0 physical reads
0 redo size
194177 bytes sent via SQL*Net to client
4039 bytes received via SQL*Net from client
322 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
4802 rows processed
2) 疑问:
尝试构造过一些SQL,也跟群里的同学讨论过,似乎很多人的实验结果都是,merge join 跟 nested loop join 的Cost和Time都差不多,但统计信息里的一致性读,merge join更低一些,不知道这样是否已经显示merge join的效率更高了,希望老师有机会的时候能讲解一下。
(我尝试过对同样的sql 使用hash join的hints,但没有生效,未知是否因为两个表的大小几乎一样?)