问题描述:
数据库从10g升级到11g后发现以下查询语句的性能下降,其平均执行时间从0.04秒增长到了0.32秒
- with q as
- (Select trading, calculated_date,
- rank() over (partition by calculated_date order by mkt_for_nda_id nulls last) rank1
- From Operating_Calendar
- Where trunc(CALCULATED_DATE) BETWEEN v3_stdt and v3_eddt
- AND (mkt_for_nda_id = v_mkt_nda_id or geo_for_Nda_id in
- (Select Nda_id From Geog
- Where Nda_id in
- (Select Geo_in_nda_id From Geog_Structure
- Start With Geo_with_nda_id = v_geog_nda_id
- Connect By Prior Geo_in_nda_id = Geo_with_nda_id)
- UNION (SELECT NDA_ID FROM GEOG WHERE NDA_ID = v_geog_nda_id)
- )
- )
- )
- select T_calcdt_reason_Type(calculated_date, NULL)
- bulk collect into v_calcdt_reason -- bulk collect into a table object
- from q
- where rank1 = 1 and trading = \'H\';
分析:
该语句中包括一个connect by查询, 对于connect by查询来说, 数据库在解析阶段很难根据统计信息对其cardinality值作出准确估计,所以oracle在为这种查询选择执行计划时,由于该局限性,常常难以得到令人满意的结果,如下例:
- explain plan for SELECT GEO_IN_NDA_ID FROM GEOG_STRUCTURE START WITH GEO_WITH_NDA_ID = 858 CONNECT BY PRIOR GEO_IN_NDA_ID = GEO_WITH_NDA_ID;
- PLAN_TABLE_OUTPUT
- ----------------------------------------------------------------------------------------------------
- Plan hash value: 498378953
-
- ---------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
- ---------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 2 | 52 | 3 (34)| 00:00:01
- |* 1 | CONNECT BY NO FILTERING WITH START-WITH| | | | |
- | 2 | INDEX FAST FULL SCAN | GST_UK1 | 2356 | 18848 | 2 (0)| 00:00:01
- -------------------------------------------------------------------------------------------------
- explain plan for SELECT GEO_IN_NDA_ID FROM GEOG_STRUCTURE START WITH GEO_WITH_NDA_ID = 100 CONNECT BY PRIOR GEO_IN_NDA_ID = GEO_WITH_NDA_ID;
- PLAN_TABLE_OUTPUT
- -------------------------------------------------------------------------------------------------
- Plan hash value: 498378953
- -------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
- -------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 2 | 52 | 3 (34)| 00:00:01 |* 1 | CONNECT BY NO FILTERING WITH START-WITH| | | | |
- | 2 | INDEX FAST FULL SCAN | GST_UK1 | 2356 | 18848 | 2 (0)| 00:00:01
- -------------------------------------------------------------------------------------------------
- SQL> select count(*) from GEOG_STRUCTURE
- START WITH GEO_WITH_NDA_ID = 858
- CONNECT BY PRIOR GEO_IN_NDA_ID = GEO_WITH_NDA_ID;
- COUNT(*)
- ----------
- 7
- SQL> select count(*) from GEOG_STRUCTURE
- START WITH GEO_WITH_NDA_ID = 100
- CONNECT BY PRIOR GEO_IN_NDA_ID = GEO_WITH_NDA_ID;
- COUNT(*)
- ----------
- 4
从以上测试中可看到,在给查询谓词中GEO_WITH_NDA_ID列分别指定100或858时,执行计划中cardinality的估计值均是2,但实际上他们分别为7和4. 说到这里仍然没有解释为什么前面的查询语句在数据库升级后性能发生了大幅下降, 但实际上正是这种对connect by查询cardinality估计的缺陷导致了这类查询执行计划的不稳定性.
我们来看看上述语句在10g和11g中的执行计划有什么不同:
10g:
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 32 (100)| |
|* 1 | VIEW | | 5 | 125 | 32 (13)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK | | 5 | 80 | 32 (13)| 00:00:01 |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL | OPERATING_CALENDAR | 99 | 1584 | 31 (10)| 00:00:01 |
| 5 | SORT UNIQUE | | 2 | 21 | 7 (58)| 00:00:01 |
| 6 | UNION-ALL | | | | | |
| 7 | NESTED LOOPS | | 1 | 17 | 4 (25)| 00:00:01 |
|* 8 | VIEW | VW_NSO_1 | 1 | 13 | 3 (34)| 00:00:01 |
| 9 | SORT UNIQUE | | 1 | 8 | 3 (34)| 00:00:01 |
|* 10 | CONNECT BY WITH FILTERING | | | | | |
| 11 | TABLE ACCESS BY INDEX ROWID | GEOG_STRUCTURE | 1 | 8 | 2 (0)| 00:00:01 |
|* 12 | INDEX RANGE SCAN | GST_IND03 | 1 | | 1 (0)| 00:00:01 |
| 13 | NESTED LOOPS | | | | | |
| 14 | CONNECT BY PUMP | | | | | |
| 15 | TABLE ACCESS BY INDEX ROWID| GEOG_STRUCTURE | 1 | 8 | 2 (0)| 00:00:01 |
|* 16 | INDEX RANGE SCAN | GST_IND03 | 1 | | 1 (0)| 00:00:01 |
|* 17 | INDEX RANGE SCAN | GEO_PK | 1 | 4 | 1 (0)| 00:00:01 |
|* 18 | FILTER | | | | | |
|* 19 | INDEX RANGE SCAN | GEO_PK | 1 | 4 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
11g:
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 32 (100)| |
|* 1 | VIEW | | 5 | 125 | 32 (13)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK | | 5 | 80 | 32 (13)| 00:00:01 |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL | OPERATING_CALENDAR | 99 | 1584 | 31 (10)| 00:00:01 |
| 5 | SORT UNIQUE | | 2 | 21 | 8 (63)| 00:00:01 |
| 6 | UNION-ALL | | | | | |
|* 7 | HASH JOIN | | 1 | 17 | 5 (40)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | GEO_PK | 1 | 4 | 1 (0)| 00:00:01 |
|* 9 | VIEW | VW_NSO_1 | 2 | 26 | 3 (34)| 00:00:01 |
| 10 | SORT UNIQUE | | 2 | 34 | 3 (34)| 00:00:01 |
|* 11 | CONNECT BY NO FILTERING WITH SW (UNIQUE)| | | | | |
| 12 | INDEX FAST FULL SCAN | GST_UK1 | 2356 | 18848 | 2 (0)| 00:00:01 |
|* 13 | FILTER | | | | | |
|* 14 | INDEX RANGE SCAN | GEO_PK | 1 | 4 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------
比较执行计划可以看到, connect by查询的执行计划从10g的CONNECT BY WITH FILTERING变为了11g中的CONNECT BY NO FILTERING WITH SW (UNIQUE).
查询相关资料可发现:这是oracle对connect by查询所引入的一种新的算法,在原来的算法中(CONNECT BY WITH FILTERING),会在每次循环中作行过滤操作,然后只读取需要的行.而在新的算法中(CONNECT BY NO FILTERING WITH SW),则会把所有需要的数据读入内存,然后统一作内存排序来过滤掉不需要的列.
个人认为前一种算法适用于从数据集中取很小一部分数据的场景,而后一种算法则适用于从大数据量中取大量数据的操作.而我们的场景正是前一种:全表有2000多行,而每次只是取其中的几行,但由于oracle对connect by语句cardinality估算的不准确性,导致oracle并不能准确地在两种算法间作出合理的抉择.
备注:这里所说的”新的算法”实际上是在10.2.0.2版本就引入了,但以前的版本中似乎并没有将其作为一种优先选择.
好在oracle提供了hint /*+ connect_by_filtering */来帮助我们干预执行计划的选择.我们给原语句加上hint:
- WITH Q AS
- (SELECT TRADING,
- CALCULATED_DATE,
- RANK() OVER (PARTITION BY CALCULATED_DATE ORDER BY MKT_FOR_NDA_ID NULLS LAST) RANK1
- FROM OPERATING_CALENDAR
- WHERE TRUNC(CALCULATED_DATE) BETWEEN to_date(\'20130704\',\'yyyymmdd\') AND to_date(\'40000101\',\'yyyymmdd\')
- AND (MKT_FOR_NDA_ID = 455
- OR GEO_FOR_NDA_ID IN
- (SELECT NDA_ID
- FROM GEOG
- WHERE NDA_ID IN
- (SELECT /*+ connect_by_filtering */GEO_IN_NDA_ID
- FROM GEOG_STRUCTURE
- START WITH GEO_WITH_NDA_ID = 858
- CONNECT BY PRIOR GEO_IN_NDA_ID = GEO_WITH_NDA_ID
- )
- UNION
- (SELECT NDA_ID FROM GEOG WHERE NDA_ID = 858
- )
- ) )
- )
- SELECT T_CALCDT_REASON_TYPE(CALCULATED_DATE, NULL)
- FROM Q
- WHERE RANK1 = 1
- AND TRADING = \'H\';
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 32 (100)| |
|* 1 | VIEW | | 44 | 1100 | 32 (13)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK | | 44 | 704 | 32 (13)| 00:00:01 |
|* 3 | FILTER | | | | | |
|* 4 | TABLE ACCESS FULL | OPERATING_CALENDAR | 872 | 13952 | 31 (10)| 00:00:01 |
| 5 | SORT UNIQUE | | 2 | 21 | 12 (42)| 00:00:01 |
| 6 | UNION-ALL | | | | | |
|* 7 | HASH JOIN | | 1 | 17 | 9 (23)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | GEO_PK | 1 | 4 | 1 (0)| 00:00:01 |
|* 9 | VIEW | VW_NSO_1 | 2 | 26 | 7 (15)| 00:00:01 |
| 10 | SORT UNIQUE | | 2 | 34 | 7 (15)| 00:00:01 |
|* 11 | CONNECT BY WITH FILTERING (UNIQUE)| | | | | |
| 12 | TABLE ACCESS BY INDEX ROWID | GEOG_STRUCTURE | 1 | 8 | 2 (0)| 00:00:01 |
|* 13 | INDEX RANGE SCAN | GST_IND03 | 1 | | 1 (0)| 00:00:01 |
| 14 | NESTED LOOPS | | 1 | 21 | 4 (0)| 00:00:01 |
| 15 | CONNECT BY PUMP | | | | | |
| 16 | TABLE ACCESS BY INDEX ROWID | GEOG_STRUCTURE | 1 | 8 | 2 (0)| 00:00:01 |
|* 17 | INDEX RANGE SCAN | GST_IND03 | 1 | | 1 (0)| 00:00:01 |
|* 18 | FILTER | | | | | |
|* 19 | INDEX RANGE SCAN | GEO_PK | 1 | 4 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------
从新的执行计划我们可以看到,oracle已经选择了CONNECT BY WITH FILTERING执行计划.
到这里,与10g中的执行计划比较会发现,新的执行计划还是与原有的计划有所不同,主要体现在第7步,原来采用的是nested loop连接,而这里使用的是hash join. view VW_NSO_1的真实结果集实际上只有几行, GEO_PK索引所在的GEOG表大约2000行,这种情况理论上nested loop会比较合适,实际测试也发现的确是采用nested loop的执行计划更高效,那为什么这里会选择hash join呢?
通过10053事件跟踪解析过程,发现11g中新增的隐含参数'_optimizer_connect_by_elim_dups'和'_connect_by_use_union_all'较为可疑,查询metalink和google也发现11g中这两个参数的引入伴随着许多的bug,所以有些专家建议在新的系统中disable掉这两个参数.我对这两个参数尚没有更为深入的理解,但可以肯定的是这里他们对connect by查询的cost结果是有影响的,进而会干扰后面的连接方式的选择.
所以这里我尝试着disable了他们:
alter system set "_optimizer_connect_by_elim_dups" = false;
alter system set "_connect_by_use_union_all" = "old_plan_mode";
然后,可以看到执行计划回归了与10g中所采用的执行计划.
注: 如果不想在整个数据库更改以上隐含参数,可以使用opt_param hint在语句级修改。
总结:
前面的分析过程已经包括解决方案,就是使用connect_by_filtering以及修改隐含参数来影响语句执行计划的选择。由于该案例中我们知道什么样的执行计划是适用的,所以也可以选择hint, sql profile等常规手段去控制。该文的重点不是采用什么方式去解决该问题,而是了解connect by查询本身所存在的问题,以及connect by查询获取数据的可能方式,从而在遇到这类查询时心中有数。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/27243841/viewspace-1147046/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/27243841/viewspace-1147046/