wallacel -- Thanks for the question regarding "IN & EXISTS", version 8.1.6-CSDN博客

wallacel -- Thanks for the question regarding "IN & EXISTS", version 8.1.6

Submitted on 24-Oct-2000 11:41 Eastern US time

Tom's latest followup | Bookmark | Bottom

Last updated 15-Mar-2008 9:37

You Asked

Tom:



can you give me some example at which situation

IN is better than exist, and vice versa.

and we said...

Well, the two are processed very very differently.



Select * from T1 where x in ( select y from T2 )



is typically processed as:



select * 

  from t1, ( select distinct y from t2 ) t2

 where t1.x = t2.y;



The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to 

the original table -- typically.





As opposed to 



select * from t1 where exists ( select null from t2 where y = x )



That is processed more like:





   for x in ( select * from t1 )

   loop

      if ( exists ( select null from t2 where y = x.x )

      then 

         OUTPUT THE RECORD

      end if

   end loop



It always results in a full scan of T1 whereas the first query can make use of an index 

on T1(x).





So, when is where exists appropriate and in appropriate?



Lets say the result of the subquery

    ( select y from T2 )



is "huge" and takes a long time.  But the table T1 is relatively small and executing ( 

select null from t2 where y = x.x ) is very very fast (nice index on t2(y)).  Then the 

exists will be faster as the time to full scan T1 and do the index probe into T2 could be 

less then the time to simply full scan T2 to build the subquery we need to distinct on.





Lets say the result of the subquery is small -- then IN is typicaly more appropriate.





If both the subquery and the outer table are huge -- either might work as well as the 

other -- depends on the indexes and other factors. 

----------------------------------------------------------------------------------------------------------------------

Tom



Can you illustrate your answer by using the emp dept table.

and by 

1.increasing and decreasing the number of rows in both the tables.

2.by dropping and adding indexes on both the tables



That way your explanation will be vindicated.



Can you do that please.

Followup September 29, 2001 - 2pm US/Eastern:

Well -- I'm not going to use EMP and DEPT as I would have to generate tons of EMP and DEPT data to illustrate my point (feel free to do that if you want to ;) I'll use BIG and SMALL to make the point. I ran: rem create table big as select * from all_objects; rem insert /*+ append */ into big select * from big; rem commit; rem insert /*+ append */ into big select * from big; rem commit; rem insert /*+ append */ into big select * from big; rem create index big_idx on big(object_id); rem rem rem create table small as select * from all_objects where rownum < 100; rem create index small_idx on small(object_id); rem rem analyze table big compute statistics rem for table rem for all indexes rem for all indexed columns rem / rem analyze table small compute statistics rem for table rem for all indexes rem for all indexed columns rem / so, small has 99 rows, big has 133,000+ select count(subobject_name) from big where object_id in ( select object_id from small ) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.01 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.02 0.02 0 993 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.03 0.03 0 993 0 1 Rows Execution Plan ------- --------------------------------------------------- 0 SELECT STATEMENT GOAL: CHOOSE 1 SORT (AGGREGATE) 792 MERGE JOIN 100 SORT (JOIN) 100 VIEW OF 'VW_NSO_1' 99 SORT (UNIQUE) 792 INDEX GOAL: ANALYZED (FULL SCAN) OF 'SMALL_IDX' (NON-UNIQUE) 891 SORT (JOIN) 0 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'BIG' versus: select count(subobject_name) from big where exists ( select null from small where small.object_id = big.object_id ) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 4.12 4.12 0 135356 15 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 4.12 4.12 0 135356 15 1 Rows Execution Plan ------- --------------------------------------------------- 0 SELECT STATEMENT GOAL: CHOOSE 1 SORT (AGGREGATE) 792 FILTER 135297 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'BIG' 133504 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'SMALL_IDX' (NON-UNIQUE) That shows if the outer query is "big" and the inner query is "small", in is generally more efficient then NOT EXISTS. Now: select count(subobject_name) from small where object_id in ( select object_id from big ) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.01 0 0 0 0 Execute 2 0.00 0.00 0 0 0 0 Fetch 2 0.51 0.82 50 298 22 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 5 0.52 0.83 50 298 22 1 Rows Execution Plan ------- --------------------------------------------------- 0 SELECT STATEMENT GOAL: CHOOSE 1 SORT (AGGREGATE) 99 MERGE JOIN 16913 SORT (JOIN) 16912 VIEW OF 'VW_NSO_1' 16912 SORT (UNIQUE) 135296 INDEX GOAL: ANALYZED (FAST FULL SCAN) OF 'BIG_IDX' (NON-UNIQUE) 99 SORT (JOIN) 99 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'SMALL' versus: select count(subobject_name) from small where exists ( select null from big where small.object_id = big.object_id ) call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.01 0.01 0 204 12 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.01 0.01 0 204 12 1 EGATE) 99 FILTER 100 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'SMALL' 99 INDEX GOAL: ANALYZED (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE) shows that is the outer query is "small" and the inner query is "big" -- a WHERE EXISTS can be quite efficient.