今天,发现production上有一个SQL 耗了特多CPU time,于是抓了下来,看看是否可以tuning .
原始 SQL 如下(table名字改了一下):
代码:
DELETE FROM test_table_1 a
WHERE NOT EXISTS (SELECT *
FROM test_table_2 b
WHERE a.parent_id = b.id)
执行时间 60S
Buffer get: 160W
执行计划
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
DELETE STATEMENT Optimizer Mode=CHOOSE 1 2069
DELETE TEST_TABLE_1
HASH JOIN ANTI 1 26 2069
TABLE ACCESS FULL TEST_TABLE_1 5 M 106 M 1210
TABLE ACCESS FULL TEST_TABLE_2 46 K 181 K 48
其中
test_table_1 row count 为 4百万
test_table_2 为 为 4万
TEST_TABLE_1上的 parent_id上有index.
...
第一种变换(not in)
现在来试一下变换
用not in 代替 not exists
代码:
DELETE FROM test_table_1 a
WHERE parent_id NOT IN (SELECT id
FROM test_table_2 b)
..
在9i 里面,Oracle把这个语句和前面的语句看为同一个语句,当然执行计划和执行时间也没有差别。
第二种变换(minus)
第二种变换(minus)
用minus,
把 not exists 部分先用minus去掉,再用in 来选择data。
代码:
DELETE FROM test_table_1 a
WHERE parent_id IN (SELECT parent_id
FROM test_table_1
MINUS
SELECT ID
FROM test_table_2 b)
执行时间 : 8S
Buffer get :20000
执行计划
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
DELETE STATEMENT Optimizer Mode=CHOOSE 2 G 23052
DELETE TEST_TABLE_1
HASH JOIN 2 G 89G 23052
VIEW SYS.VW_NSO_1 5 M 63 M 5488
MINUS
SORT UNIQUE 5 M 19 M
INDEX FAST FULL SCAN TEST_INDX_1 5 M 19 M 457
SORT UNIQUE 46 K 181 K
TABLE ACCESS FULL TEST_TABLE_2 46 K 181 K 48
TABLE ACCESS FULL TEST_TABLE_1 5 M 106 M 1210
速度提高了不少。
...
(minus + distinct )
在 minus之前加个 distinct看看
代码:
DELETE FROM test_table_1 a
WHERE parent_id IN (SELECT distinct parent_id
FROM test_table_1
MINUS
SELECT ID
FROM test_table_2 b)
执行时间 : 8S
Buffer get :20000
执行计划
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
DELETE STATEMENT Optimizer Mode=CHOOSE 2 G 23052
DELETE TEST_TABLE_1
HASH JOIN 2 G 89G 23052
VIEW SYS.VW_NSO_1 5 M 63 M 5488
MINUS
SORT UNIQUE 5 M 19 M
INDEX FAST FULL SCAN TEST_INDX_1 5 M 19 M 457
SORT UNIQUE 46 K 181 K
TABLE ACCESS FULL TEST_TABLE_2 46 K 181 K 48
TABLE ACCESS FULL TEST_TABLE_1 5 M 106 M 1210
和上一个一模一样。
由于用了in 的操作符,Oracle内部就给加了个 (SORT UNIQUE )
可以用 in ,当然就可以用 exists 或者join 了(delete不好用join,只好改为SELECT)
代码:
Join
SELECT * FROM (SELECT DISTINCT parent_id
FROM TEST_TABLE_1
MINUS
SELECT id
FROM TEST_TABLE_2 b ) tmp,TEST_TABLE_1 a
WHERE tmp.parent_id = a.id
执行时间 8S
Buffer get: 20000
Operation OBJECT Name ROWS Bytes Cost OBJECT Node IN/OUT PStart PStop
SELECT STATEMENT Optimizer MODE=CHOOSE 5 M 4493
HASH JOIN 5 M 237 M 4493
VIEW 9 K 125 K 3161
MINUS
SORT UNIQUE 9 K 38 K
INDEX FAST FULL SCAN TEST_INDX_1 5 M 19 M 457
SORT UNIQUE 46 K 181 K
TABLE ACCESS FULL TEST_TABLE_2 46 K 181 K 48
TABLE ACCESS FULL TEST_TABLE_1 5 M 164 M 1210
用exists的
DELETE FROM TEST_TABLE_1 a WHERE EXISTS (SELECT 1 FROM (SELECT DISTINCT parent_id
FROM TEST_TABLE_1
MINUS
SELECT id
FROM TEST_TABLE_2 b )
WHERE parent_id = a.parent_id
执行效率惨不忍睹
...
除了上面几种,还有一种不是很常用的方法
就是(outer join + is null )(也是为了方便,先用SELECT来演示,)
代码:
select * from test_table_1 a,test_table_2 b
where a.parent_id = b.id(+)
and b.id is null
在本例中执行效率也挺差的。(超过1min)
Operation OBJECT Name ROWS Bytes Cost OBJECT Node IN/OUT PStart PStop
SELECT STATEMENT Optimizer MODE=CHOOSE 24 M 3311
FILTER
HASH JOIN OUTER
TABLE ACCESS FULL TEST_TABLE_1 5 M 164 M 1210
TABLE ACCESS FULL TEST_TABLE_2 46 K 5 M 48 <
代码:
加个 distinct 看看
SELECT a.* FROM TEST_TABLE_2 b ,(SELECT DISTINCT parentid FROM TEST_TABLE_1) a
WHERE a.parentid = b.id(+)
AND b.id IS NULL
执行时间:1S
Buffer gets : 1000
Operation OBJECT Name ROWS Bytes Cost OBJECT Node IN/OUT PStart PStop
SELECT STATEMENT Optimizer MODE=CHOOSE 47 K 3117
FILTER
HASH JOIN OUTER
VIEW 9 K 125 K 3067
SORT UNIQUE 9 K 38 K 3067
INDEX FAST FULL SCAN TEST_INDX_1 5 M 19 M 457
TABLE ACCESS FULL TEST_TABLE_2 46 K 181 K 48
...
看来最后一种是最有效哦
改成DELETE形式看看
代码:
DELETE FROM TEST_TABLE_1 WHERE parentid IN (
SELEC parentid FROM TEST_TABLE_2 b ,(SELECT DISTINCT parentid FROM TEST_TABLE_1) a
WHERE a.parentid = b.id(+)
AND b.id IS NULL
)
成功了,执行时间 1S (实际上是 800-900ms)
Buffer_gets :1000
执行计划
Operation OBJECT Name ROWS Bytes Cost OBJECT Node IN/OUT PStart PStop
DELETE STATEMENT Optimizer MODE=CHOOSE 6 K 119
DELETE TEST_TABLE_1
HASH JOIN SEMI 6 K 171 K 119
TABLE ACCESS FULL TEST_TABLE_1 6 K 91 K 26
VIEW SYS.VW_NSO_1 2 M 30 M 71
FILTER
HASH JOIN OUTER
INDEX FAST FULL SCAN TEST_INDX_1 6 K 18 K 23
TABLE ACCESS FULL TEST_TABLE_2 46 K 136 K 47
...
quote:
最初由 rollingpig 发布
在 9i 里面,in 和 exists已经基本一样,Oracle会在内部自动作转换
倒是 我提到的其中几个转换值的深究
1。in / exists ==> join
2。Add distinct in IN Clause or subquery
3. Add condition where rownum < 2 in exists clause (这个忘了演示了)
4, not exists / not in ==> outrt join + is null condition
甚至,可以反过来转换,有时候也有意想不到的效果。
某些时候
1。join ==> in / exists
2. 去掉某些distinct
3. outrt join + is null condition ==> not exists / not in
正确的应该是
outer join + is null condition
如
代码:
select * from a
where a.col1 not in (select col2 from b)
==>
select a.* from a,b
where a.col1 = b.col2(+)
and col2 is null
..