分析日志的时候发现,一个sql执行非常慢。看执行计划是因为not in子查询走了filter执行计划。类似于下面测试的情形。
测试数据:
drop table test1;
create table test1 as select object_id,object_name from dba_objects where rownum<=1000;
create table test2 as select object_id,object_name from dba_objects where rownum<=1000;
analyze table test1 compute statistics for table for all indexes for all indexed columns;
analyze table test2 compute statistics for table for all indexes for all indexed columns;
表结构:
SQL> desc test1
Name Type Nullable Default Comments
----------- ------------- -------- ------- --------
OBJECT_ID NUMBER Y
OBJECT_NAME VARCHAR2(128) Y
SQL> desc test2
Name Type Nullable Default Comments
----------- ------------- -------- ------- --------
OBJECT_ID NUMBER Y
OBJECT_NAME VARCHAR2(128) Y
执行计划:
1、原始sql类似下面的查询
从下面执行计划来看,子查询TEST2走了filter执行计划。而我们知道filter执行计划,test1返回多少条记录,test2要扫描多少次。当test1和test2数据量都比较大的时候,效率可想而知。
SQL> select *
2 from test1
3 where object_id not in (select object_id from test2);
未选定行
执行计划
----------------------------------------------------------
Plan hash value: 3110810548
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 968 | 21296 | 66 (0)| 00:00:01 |
|* 1 | FILTER | | | | | |
| 2 | TABLE ACCESS FULL| TEST1 | 1000 | 22000 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| TEST2 | 2 | 26 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "TEST2" "TEST2" WHERE
LNNVL("OBJECT_ID"<>:B1)))
3 - filter(LNNVL("OBJECT_ID"<>:B1))
统计信息
----------------------------------------------------------
0 recursive calls
0 db block gets
4091 consistent gets
0 physical reads
0 redo size
204 bytes sent via SQL*Net to client
236 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
通过
alter table test1 modify (object_id not null);
或
alter table test2 modify (object_id not null);
都走上面的filter执行计划。
2、将test1和test2的object_id都modify为not null。
下面这个执行计划为hash-anti连接。test1和test2都通过一次扫描即可搞定。
SQL> select *
2 from test1
3 where test1.object_id is not null
4 and test1.object_id not in
5 (select object_id from test2 where object_id is not null);
未选定行
执行计划
----------------------------------------------------------
Plan hash value: 2299773985
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 7 (15)| 00:00:01 |
|* 1 | HASH JOIN RIGHT ANTI| | 1 | 35 | 7 (15)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | TEST2 | 1000 | 13000 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | TEST1 | 1000 | 22000 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TEST1"."OBJECT_ID"="OBJECT_ID")
2 - filter("OBJECT_ID" IS NOT NULL)
3 - filter("TEST1"."OBJECT_ID" IS NOT NULL)
Note
-----
- dynamic sampling used for this statement
统计信息
----------------------------------------------------------
8 recursive calls
0 db block gets
26 consistent gets
0 physical reads
0 redo size
204 bytes sent via SQL*Net to client
236 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
由上面这两个测试,主要想说明一个问题:
当执行not in操作时,如果主查询和子查询关联字段,表定义都可以为null,即使表里相应字段不为null。
Oracle在解析并生成执行计划阶段,它只能通过现有统计信息去生成执行计划,而这个阶段还没有查具体的数据,所以它并不清楚字段是不是有空值,而定义该字段可以为空,那么就按可以为空算喽。
11g中,会直接走hash-anti半连接。这个算是在11g之前的一个书写规则问题吧。