SQL> set autotrace traceonly
SQL> select t1.investment_number,
2 nvl(t1.sum1, 0),
3 t2.debt_profit,
4 nvl(t1.sum1, 0) - t2.debt_profit
5 from (select t.investment_number, sum(t.invest_yield) sum1
6 from CLSPUSER.CRF_P2P_INTEREST_SPLIT_MAIN t
7 where t.interest_ym = '2016-11'
8 and t.interest_type = 1
9 group by t.investment_number) t1,
10 (select t.investment_number, t.debt_profit
11 from apsuser.p2p_investment_monthlybill_pre t
12 where t.start_date >= to_date('2016-10-26', 'yyyy-MM-dd')
13 and t.debt_profit > 0) t2
14 where t1.investment_number = t2.investment_number
15 and t1.sum1 != t2.debt_profit;
no rows selected
Execution Plan
----------------------------------------------------------
Plan hash value: 3120132859
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 50 | 2153 (3)| 00:00:26 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 1 | 50 | 2153 (3)| 00:00:26 |
|* 3 | TABLE ACCESS BY INDEX ROWID| CRF_P2P_INTEREST_SPLIT_MAIN | 1 | 20 | 3 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | 1 | 50 | 2152 (3)| 00:00:26 |
|* 5 | TABLE ACCESS FULL | P2P_INVESTMENT_MONTHLYBILL_PRE | 1 | 30 | 2149 (3)| 00:00:26 |
|* 6 | INDEX RANGE SCAN | SPLIT_MAIN_INTEREST_YM | 1 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("T"."DEBT_PROFIT"<>SUM("T"."INVEST_YIELD"))
3 - filter(TO_NUMBER("T"."INTEREST_TYPE")=1 AND "T"."INVESTMENT_NUMBER"="T"."INVESTMENT_NUMBER")
5 - filter("T"."START_DATE">=TO_DATE('2016-10-26 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
"T"."DEBT_PROFIT">0)
6 - access("T"."INTEREST_YM"='2016-11')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
16891583 consistent gets
10103 physical reads
0 redo size
562 bytes sent via SQL*Net to client
458 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL> SQL> SQL>
执行计划显示很快实际执行了10mins
查看相关数据量:
SQL> select count(*) from CLSPUSER.CRF_P2P_INTEREST_SPLIT_MAIN;
COUNT(*)
----------
1065333
SQL> select count(*) from apsuser.p2p_investment_monthlybill_pre;
COUNT(*)
----------
520163
SQL>
很明显统计信息是有问题的,因为上百万级的数据量而根据以上执行计划rows仅为1行50bytes
导致Cost-Based Optimization错选了执行计划NL,收集下统计信息:
为了保险起见,这里连同索引的统计信息一起收集
exec dbms_stats.gather_table_stats(ownname =>'CLSPUSER' ,tabname=>'CRF_P2P_INTEREST_SPLIT_MAIN' ,estimate_percent => 20,degree => 10,granularity => 'ALL',cascade => TRUE);
exec dbms_stats.gather_table_stats(ownname =>'APSUSER' ,tabname=>'P2P_INVESTMENT_MONTHLYBILL_PRE' ,estimate_percent => 20,degree => 10,granularity => 'ALL',cascade => TRUE);
瞬间执行完毕
执行计划如下:
Execution Plan
----------------------------------------------------------
Plan hash value: 3109755057
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9179 | 394K| 2968 (2)| 00:00:36 |
|* 1 | HASH JOIN | | 9179 | 394K| 2968 (2)| 00:00:36 |
| 2 | VIEW | | 8034 | 203K| 704 (1)| 00:00:09 |
| 3 | HASH GROUP BY | | 8034 | 156K| 704 (1)| 00:00:09 |
|* 4 | TABLE ACCESS BY INDEX ROWID| CRF_P2P_INTEREST_SPLIT_MAIN | 8596 | 167K| 702 (1)| 00:00:09 |
|* 5 | INDEX RANGE SCAN | SPLIT_MAIN_INTEREST_YM | 34384 | | 97 (2)| 00:00:02 |
|* 6 | TABLE ACCESS FULL | P2P_INVESTMENT_MONTHLYBILL_PRE | 21546 | 378K| 2263 (3)| 00:00:28 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."INVESTMENT_NUMBER"="T"."INVESTMENT_NUMBER")
filter("T1"."SUM1"<>"T"."DEBT_PROFIT")
4 - filter(TO_NUMBER("T"."INTEREST_TYPE")=1)
5 - access("T"."INTEREST_YM"='2016-11')
6 - filter("T"."START_DATE">=TO_DATE('2016-10-26 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
"T"."DEBT_PROFIT">0)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
10916 consistent gets
10044 physical reads
0 redo size
562 bytes sent via SQL*Net to client
458 bytes received via SQL*Net from client
1 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
0 rows processed
SQL>
SQL>
一下摘录NL和HS的区别:
1))嵌套循环(nest loop):
对于被连接的数据子集较小的情况,嵌套循环连接是较好的选择。在嵌套循环中,外表驱动内表,外表返回的每一行都要在内表中检索找到它匹配的行,因此整个查询返回的结果集不能太大(大于10000不合适),要把返回子集较小的表作为外表(驱动表),而且在内表的连接字段上一定要有索引。
2)哈希连接(hash join):
哈希连接是大数据集连接时常用的方式,优化器使用两个表中较小的表,利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。
这种方式适用于较小的表完全可以放入内存的情况,这样成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段。
哈希连接只能应用于等值连接(如WHERE A.COL3 = B.COL4)、非等值连接(WHERE A.COL3 > B.COL4)、外连接(WHERE A.COL3 = B.COL4(+))。
参考:http://trophy.iteye.com/blog/1416410