In VS exists， in achived by rewirte sql, existis rewrite using pl/sql block-CSDN博客

Tom:

can you give me some example at which situation
IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to 
the original table -- typically.


As opposed to 

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:


   for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index 
on T1(x).


So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
    ( select y from T2 )

is "huge" and takes a long time.  But the table T1 is relatively small and executing ( 
select null from t2 where y = x.x ) is very very fast (nice index on t2(y)).  Then the 
exists will be faster as the time to full scan T1 and do the index probe into T2 could be 
less then the time to simply full scan T2 to build the subquery we need to distinct on.


Lets say the result of the subquery is small -- then IN is typicaly more appropriate.


If both the subquery and the outer table are huge -- either might work as well as the 
other -- depends on the indexes and other factors.

Tom

Can you illustrate your answer by using the emp dept table.
and by 
1.increasing and decreasing the number of rows in both the tables.
2.by dropping and adding indexes on both the tables

That way your explanation will be vindicated.

Can you do that please.

Followup:

Well -- I'm not going to use EMP and DEPT as I would  have to generate tons of EMP and DEPT data to 
illustrate my point (feel free to do that if you want to ;)

I'll use BIG and SMALL to make the point. I ran:

rem create table big as select * from all_objects;
rem insert /*+ append */ into big select * from big;
rem commit;
rem insert /*+ append */ into big select * from big;
rem commit;
rem insert /*+ append */ into big select * from big;
rem create index big_idx on big(object_id);
rem
rem
rem create table small as select * from all_objects where rownum < 100;
rem create index small_idx on small(object_id);
rem
rem analyze table big compute statistics
rem for table
rem for all indexes
rem for all indexed columns
rem /
rem analyze table small compute statistics
rem for table
rem for all indexes
rem for all indexed columns
rem /

so, small has 99 rows, big has 133,000+

select count(subobject_name)
  from big
 where object_id in ( select object_id from small )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.02       0.02          0        993          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.03       0.03          0        993          0           1

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
    792    MERGE JOIN
    100     SORT (JOIN)
    100      VIEW OF 'VW_NSO_1'
     99       SORT (UNIQUE)
    792        INDEX   GOAL: ANALYZED (FULL SCAN) OF 'SMALL_IDX'
                   (NON-UNIQUE)
    891     SORT (JOIN)
      0      TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'BIG'


versus:

select count(subobject_name)
  from big
 where exists ( select null from small where small.object_id = big.object_id )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      4.12       4.12          0     135356         15           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      4.12       4.12          0     135356         15           1

Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
    792    FILTER
 135297     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'BIG'
 133504     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX'
                (NON-UNIQUE)

That shows if the outer query is "big" and the inner query is "small", in is generally more 
efficient then NOT EXISTS.

Now:

select count(subobject_name)
  from small
 where object_id in ( select object_id from big )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch        2      0.51       0.82         50        298         22           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        5      0.52       0.83         50        298         22           1



Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      1   SORT (AGGREGATE)
     99    MERGE JOIN
  16913     SORT (JOIN)
  16912      VIEW OF 'VW_NSO_1'
  16912       SORT (UNIQUE)
 135296        INDEX   GOAL: ANALYZED (FAST FULL SCAN) OF 'BIG_IDX'
                   (NON-UNIQUE)
     99     SORT (JOIN)
     99      TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'SMALL'


versus:
select count(subobject_name)
  from small
 where exists ( select null from big where small.object_id = big.object_id )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.01       0.01          0        204         12           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.01          0        204         12           1

EGATE)
     99    FILTER
    100     TABLE ACCESS   GOAL: ANALYZED (FULL) OF 'SMALL'
     99     INDEX   GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)

shows that is the outer query is "small" and the inner query is "big" -- a WHERE EXISTS can be 
quite efficient.

[@more@]

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/412/viewspace-899138/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/412/viewspace-899138/