统计信息失准导致CBO选错执行计划一例

SQL>  set autotrace traceonly   
SQL> select t1.investment_number,
  2         nvl(t1.sum1, 0),
  3         t2.debt_profit,
  4         nvl(t1.sum1, 0) - t2.debt_profit
  5    from (select t.investment_number, sum(t.invest_yield) sum1
  6            from CLSPUSER.CRF_P2P_INTEREST_SPLIT_MAIN t
  7           where t.interest_ym = '2016-11'
  8             and t.interest_type = 1
  9           group by t.investment_number) t1,
 10         (select t.investment_number, t.debt_profit
 11            from apsuser.p2p_investment_monthlybill_pre t
 12           where t.start_date >= to_date('2016-10-26', 'yyyy-MM-dd')
 13             and t.debt_profit > 0) t2
 14   where t1.investment_number = t2.investment_number
 15     and t1.sum1 != t2.debt_profit;



no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 3120132859

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                                |     1 |    50 |  2153   (3)| 00:00:26 |
|*  1 |  FILTER                       |                                |       |       |            |          |
|   2 |   HASH GROUP BY               |                                |     1 |    50 |  2153   (3)| 00:00:26 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| CRF_P2P_INTEREST_SPLIT_MAIN    |     1 |    20 |     3   (0)| 00:00:01 |
|   4 |     NESTED LOOPS              |                                |     1 |    50 |  2152   (3)| 00:00:26 |
|*  5 |      TABLE ACCESS FULL        | P2P_INVESTMENT_MONTHLYBILL_PRE |     1 |    30 |  2149   (3)| 00:00:26 |
|*  6 |      INDEX RANGE SCAN         | SPLIT_MAIN_INTEREST_YM         |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T"."DEBT_PROFIT"<>SUM("T"."INVEST_YIELD"))
   3 - filter(TO_NUMBER("T"."INTEREST_TYPE")=1 AND "T"."INVESTMENT_NUMBER"="T"."INVESTMENT_NUMBER")
   5 - filter("T"."START_DATE">=TO_DATE('2016-10-26 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
              "T"."DEBT_PROFIT">0)
   6 - access("T"."INTEREST_YM"='2016-11')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
   16891583  consistent gets
      10103  physical reads
          0  redo size
        562  bytes sent via SQL*Net to client
        458  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL> SQL> SQL> 

执行计划显示很快实际执行了10mins

查看相关数据量:

SQL> select count(*) from CLSPUSER.CRF_P2P_INTEREST_SPLIT_MAIN;

  COUNT(*)
----------
   1065333

SQL> select count(*) from apsuser.p2p_investment_monthlybill_pre;

  COUNT(*)
----------
    520163

SQL> 
很明显统计信息是有问题的,因为上百万级的数据量而根据以上执行计划rows仅为1行50bytes

导致Cost-Based Optimization错选了执行计划NL,收集下统计信息:

为了保险起见,这里连同索引的统计信息一起收集

exec dbms_stats.gather_table_stats(ownname =>'CLSPUSER' ,tabname=>'CRF_P2P_INTEREST_SPLIT_MAIN' ,estimate_percent => 20,degree => 10,granularity => 'ALL',cascade => TRUE);
 
 exec dbms_stats.gather_table_stats(ownname =>'APSUSER' ,tabname=>'P2P_INVESTMENT_MONTHLYBILL_PRE' ,estimate_percent => 20,degree => 10,granularity => 'ALL',cascade => TRUE);

瞬间执行完毕

执行计划如下:

Execution Plan
----------------------------------------------------------
Plan hash value: 3109755057

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                                |  9179 |   394K|  2968   (2)| 00:00:36 |
|*  1 |  HASH JOIN                     |                                |  9179 |   394K|  2968   (2)| 00:00:36 |
|   2 |   VIEW                         |                                |  8034 |   203K|   704   (1)| 00:00:09 |
|   3 |    HASH GROUP BY               |                                |  8034 |   156K|   704   (1)| 00:00:09 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| CRF_P2P_INTEREST_SPLIT_MAIN    |  8596 |   167K|   702   (1)| 00:00:09 |
|*  5 |      INDEX RANGE SCAN          | SPLIT_MAIN_INTEREST_YM         | 34384 |       |    97   (2)| 00:00:02 |
|*  6 |   TABLE ACCESS FULL            | P2P_INVESTMENT_MONTHLYBILL_PRE | 21546 |   378K|  2263   (3)| 00:00:28 |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."INVESTMENT_NUMBER"="T"."INVESTMENT_NUMBER")
       filter("T1"."SUM1"<>"T"."DEBT_PROFIT")
   4 - filter(TO_NUMBER("T"."INTEREST_TYPE")=1)
   5 - access("T"."INTEREST_YM"='2016-11')
   6 - filter("T"."START_DATE">=TO_DATE('2016-10-26 00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
              "T"."DEBT_PROFIT">0)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      10916  consistent gets
      10044  physical reads
          0  redo size
        562  bytes sent via SQL*Net to client
        458  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL> 
SQL> 
一下摘录NL和HS的区别:

1))嵌套循环(nest loop):

          对于被连接的数据子集较小的情况,嵌套循环连接是较好的选择。在嵌套循环中,外表驱动内表,外表返回的每一行都要在内表中检索找到它匹配的行,因此整个查询返回的结果集不能太大(大于10000不合适),要把返回子集较小的表作为外表(驱动表),而且在内表的连接字段上一定要有索引。

2)哈希连接(hash join):

         哈希连接是大数据集连接时常用的方式,优化器使用两个表中较小的表,利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。

        这种方式适用于较小的表完全可以放入内存的情况,这样成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段。

        哈希连接只能应用于等值连接(如WHERE A.COL3 = B.COL4)、非等值连接(WHERE A.COL3 > B.COL4)、外连接(WHERE A.COL3 = B.COL4(+))。

参考:http://trophy.iteye.com/blog/1416410



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值