昨天开发同仁找到我,说有一个sql,在其他的库可以秒出结果,但是这个库需要100多秒,什么情况,跟着博主来一步步分析。
异常的sql
SELECT *
FROM (SELEct rownum t_rownum, T.*
FROM (SELECT L.*
FROM LOT L, LOT_EXT LE
WHERE L.LOT_RRN = LE.LOT_RRN
AND L.LOT_STATUS NOT IN ('TERMINATED', 'SCRAPED')
AND NOT EXISTS (SELECT 1
FROM LOT_LABEL_PRINT
WHERE LOT_ID = L.LOT_ID)
AND NOT EXISTS
(SELECT 1
FROM PRINT_LOG
WHERE LOT_ID = L.LOT_ID
AND PRINTTYPE = 'barcode')
AND LE.ATTRIBUTE_DATA2 = 'xxxxxxxxx'
ORDER BY L.LOT_RRN) T
where rownum <= 1)
WHERE T_ROWNUM > 0;
异常执行计划如下,原来的NL变成了 merge sort
Plan hash value: 2080666894
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 746 (100)| |
| 1 | VIEW | | 1 | 4794 | 746 (2)| 00:00:09 |
| 2 | COUNT STOPKEY | | | | | |
| 3 | VIEW | | 2 | 9562 | 746 (2)| 00:00:09 |
| 4 | NESTED LOOPS ANTI | | 2 | 694 | 746 (2)| 00:00:09 |
| 5 | NESTED LOOPS ANTI | | 2 | 664 | 744 (2)| 00:00:09 |
| 6 | MERGE JOIN | | 3 | 924 | 735 (2)| 00:00:09 |
| 7 | SORT JOIN | | 52321 | 14M| 680 (2)| 00:00:09 |
| 8 | TABLE ACCESS FULL | LOT | 35190 | 9965K| 680 (2)| 00:00:09 |
| 9 | SORT JOIN | | 1488 | 26784 | 55 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| LOT_EXT | 1488 | 26784 | 55 (0)| 00:00:01 |
| 11 | INDEX RANGE SCAN | INDX_LE_ATTRIBUTE_DATA2 | 1529 | | 11 (0)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | PRINT_LOG | 3398K| 77M| 3 (0)| 00:00:01 |
| 13 | INDEX RANGE SCAN | IDX_PL_LOT_ID | 1 | | 2 (0)| 00:00:01 |
| 14 | INDEX UNIQUE SCAN | SYS_C0017922 | 1 | 15 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------
其他库正常执行计划如下都是NL,问题的关键为 LOT表有没有走PK的索引
用coe_xfr_sql_profile.sql查过这个系统重绑定变量的sql 只有一个sql plan
一 . 尝试改写SQL
首先看到如果拿掉外层的两个rownum 后 执行计划正常,
SELEct rownum t_rownum, T.*
FROM (SELECT L.*
FROM LOT L, LOT_EXT LE
WHERE L.LOT_RRN = LE.LOT_RRN
AND L.LOT_STATUS NOT IN ('TERMINATED', 'SCRAPED')
AND NOT EXISTS (SELECT 1
FROM LOT_LABEL_PRINT
WHERE LOT_ID = L.LOT_ID)
AND NOT EXISTS
(SELECT 1
FROM PRINT_LOG
WHERE LOT_ID = L.LOT_ID
AND PRINTTYPE = 'barcode')
AND LE.ATTRIBUTE_DATA2 = 'xxxxxxxxx'
ORDER BY L.LOT_RRN) T
内层虽然有个不太优的not in,但是还是正常的走了索引,执行结果秒出
问题关键在加了where rownum <= 1 执行计划就变成full table了。
尝试改写sql如下,将内层where rownum <= 1 拿掉改为放在外层加上WHERE T_ROWNUM <= 1 结果集一样,执行计划恢复正常。
把这个结果反馈给开发同事,但是开发同事不同意改代码原因有二 1.其他的数据库一样的sql,可以秒出结果(查过统计信息也都正常)
2.如果改代码需要改很多(为常用分页条件)
SELECT *
FROM (SELEct rownum t_rownum, T.*
FROM (SELECT L.*
FROM LOT L, LOT_EXT LE
WHERE L.LOT_RRN = LE.LOT_RRN
AND L.LOT_STATUS NOT IN ('TERMINATED', 'SCRAPED')
AND NOT EXISTS (SELECT 1
FROM LOT_LABEL_PRINT
WHERE LOT_ID = L.LOT_ID)
AND NOT EXISTS
(SELECT 1
FROM PRINT_LOG
WHERE LOT_ID = L.LOT_ID
AND PRINTTYPE = 'barcode')
AND LE.ATTRIBUTE_DATA2 = 'xxxxxxxxx'
ORDER BY L.LOT_RRN) T
--where rownum <= 1 ##拿掉这个条件 sql秒出结果
)
WHERE T_ROWNUM <= 1;
二.不改代码如何优化
更细致的分析这里建议使用 sqlhc 可以了
生成的文件可以关键看最下的tuning advisor
SQL>SELECT DBMS_SQLTUNE.REPORT_TUNING_TASK( '^^2._tuning_task') from DUAL;
GENERAL INFORMATION SECTION
-------------------------------------------------------------------------------
Tuning Task Name : 8g6k1uqth8d6c_tuning_task
Tuning Task Owner : SYS
Workload Type : Single SQL Statement
Scope : COMPREHENSIVE
Time Limit(seconds): 1200
Completion Status : COMPLETED
Started at : 05/16/2024 16:08:19
Completed at : 05/16/2024 16:09:46
-------------------------------------------------------------------------------
Schema Name: MYCIM
SQL ID : 8g6k1uqth8d6c
SQL Text : SELECT * FROM ( SELECT T.*, ROWNUM T_ROWNUM FROM ( SELECT L.*
FROM LOT L, LOT_EXT LE WHERE L.LOT_RRN = LE.LOT_RRN AND
L.LOT_STATUS NOT IN ('TERMINATED', 'SCRAPED') AND NOT EXISTS
(SELECT 1 FROM LOT_LABEL_PRINT WHERE LOT_ID = L.LOT_ID) AND NOT
EXISTS (SELECT 1 FROM PRINT_LOG WHERE LOT_ID = L.LOT_ID AND
PRINTTYPE = 'barcode') AND LE.ATTRIBUTE_DATA2 = :1 ORDER BY
L.LOT_RRN )T WHERE ROWNUM <= 1) WHERE T_ROWNUM > 0
Bind Variables :
1 - (VARCHAR2(128)):1240566049
-------------------------------------------------------------------------------
FINDINGS SECTION (5 findings)
-------------------------------------------------------------------------------
1- SQL Profile Finding (see explain plans section below)
--------------------------------------------------------
2 potentially better execution plans were found for this statement. Choose
one of the following SQL profiles to implement.
Recommendation (estimated benefit: 94.91%)
------------------------------------------
- Consider accepting the recommended SQL profile.
execute dbms_sqltune.accept_sql_profile(task_name =>
'8g6k1uqth8d6c_tuning_task', task_owner => 'SYS', replace =>
TRUE);
Validation results
------------------
The SQL profile was tested by executing both its plan and the original plan
and measuring their respective execution statistics. A plan may have been
only partially executed if the other could be run to completion in less time.
Original Plan With SQL Profile % Improved
------------- ---------------- ----------
Completion Status: PARTIAL COMPLETE
Elapsed Time (s): 15.643824 .022347 99.85 %
CPU Time (s): 10.529276 .013999 99.86 %
User I/O Time (s): 5.290386 0 100 %
Buffer Gets: 227218 11607 94.89 %
Physical Read Requests: 2222 0 100 %
Physical Write Requests: 8175 0 100 %
Physical Read Bytes: 1840365568 0 100 %
Physical Write Bytes: 1742979072 0 100 %
Rows Processed: 0 0
Fetches: 0 0
Executions: 0 1
Notes
-----
1. Statistics for the original plan were averaged over 0 executions.
2. Statistics for the SQL profile plan were averaged over 10 executions.
Recommendation (estimated benefit: 99.92%)
------------------------------------------
- Consider accepting the recommended SQL profile to use parallel execution
for this statement.
execute dbms_sqltune.accept_sql_profile(task_name =>
'8g6k1uqth8d6c_tuning_task', task_owner => 'SYS', replace =>
TRUE, profile_type => DBMS_SQLTUNE.PX_PROFILE);
Executing this query parallel with DOP 192 will improve its response time
99.10% over the SQL profile plan. However, there is some cost in enabling
parallel execution. It will increase the statement's resource consumption by
an estimated 72.69% which may result in a reduction of system throughput.
Also, because these resources are consumed over a much smaller duration, the
response time of concurrent statements might be negatively impacted if
sufficient hardware capacity is not available.
The following data shows some sampled statistics for this SQL from the past
week and projected weekly values when parallel execution is enabled.
Past week sampled statistics for this SQL
-----------------------------------------
Number of executions 0
Percent of total activity 0
Percent of samples with #Active Sessions > 2*CPU 0
Weekly DB time (in sec) 0
Projected statistics with Parallel Execution
--------------------------------------------
Weekly DB time (in sec) 0
2- Restructure SQL finding (see plan 1 in explain plans section)
----------------------------------------------------------------
Predicate "L"."LOT_STATUS"<>'TERMINATED' used at line ID 8 of the execution
plan is an inequality condition on indexed column "LOT_STATUS". This
inequality condition prevents the optimizer from efficiently using indices
on table "MYCIM"."LOT".
Recommendation
--------------
- Rewrite the predicate into an equivalent form to take advantage of
indices.
3- Restructure SQL finding (see plan 1 in explain plans section)
----------------------------------------------------------------
Predicate "L"."LOT_STATUS"<>'SCRAPED' used at line ID 8 of the execution
plan is an inequality condition on indexed column "LOT_STATUS". This
inequality condition prevents the optimizer from efficiently using indices
on table "MYCIM"."LOT".
Recommendation
--------------
- Rewrite the predicate into an equivalent form to take advantage of
indices.
4- Restructure SQL finding (see plan 1 in explain plans section)
----------------------------------------------------------------
Predicate "L"."LOT_STATUS"<>'TERMINATED' used at line ID 8 of the execution
plan is an inequality condition on indexed column "LOT_STATUS". This
inequality condition prevents the optimizer from efficiently using indices
on table "MYCIM"."LOT".
Recommendation
--------------
- Rewrite the predicate into an equivalent form to take advantage of
indices.
5- Restructure SQL finding (see plan 1 in explain plans section)
----------------------------------------------------------------
Predicate "L"."LOT_STATUS"<>'SCRAPED' used at line ID 8 of the execution
plan is an inequality condition on indexed column "LOT_STATUS". This
inequality condition prevents the optimizer from efficiently using indices
on table "MYCIM"."LOT".
Recommendation
--------------
- Rewrite the predicate into an equivalent form to take advantage of
indices.
-------------------------------------------------------------------------------
ADDITIONAL INFORMATION SECTION
-------------------------------------------------------------------------------
- The optimizer could not merge the view at line ID 3 of the execution plan.
The optimizer cannot merge a view that contains an "ORDER BY" clause unless
the statement is a "DELETE" or an "UPDATE" and the parent query is the top
most query in the statement.
- The optimizer could not merge the view at line ID 1 of the execution plan.
The optimizer cannot merge a view that contains a "ROWNUM" pseudo column.
按着这里的建议理论可以加速99.92% 但是添加sql profile,但是并没有起作用。
Recommendation (estimated benefit: 99.92%)
------------------------------------------
- Consider accepting the recommended SQL profile to use parallel execution
for this statement.
execute dbms_sqltune.accept_sql_profile(task_name =>
'8g6k1uqth8d6c_tuning_task', task_owner => 'SYS', replace =>
TRUE, profile_type => DBMS_SQLTUNE.PX_PROFILE);
sql profile也不起作用,又不修改代码来优化,那么能够想到的就是改参数,但是这个涉及到什么参数呢?这个sql 变差是和rownum有关 那么先查一下隐含参数中有哪些涉及到了rownum
查询哪些隐含参数设计到了rownum(sys用户)
SELECT a.ksppinm "Parameter",
b.ksppstvl "Session Value",
c.ksppstvl "Instance Value"
FROM x$ksppi a,
x$ksppcv b,
x$ksppsv c
WHERE a.indx = b.indx
AND a.indx = c.indx
AND a.ksppinm like '%rownum%';
-------------------------------------
_px_rownum_pd TRUE TRUE
_optimizer_rownum_pred_based_fkr TRUE TRUE
_optimizer_rownum_bind_default 10 10
可以看到_optimizer_rownum_pred_based_fkr和(FKR,first k row)_optimizer_rownum_bind_default 都是和优化器,rowum相关,MOS中按这两个关键字检索能看到如下几篇doc和BUG,都是因为查询了NUM造成了sql执行缓慢,
Query Containing a ROWNUM Predicate is Slow. Without it the Query is fast (Doc ID 833286.1)
Bug 19710102 - wrong plan with _optimizer_enable_extended_stats(default value of true) (Doc ID 19710102.8)
Pagination Query Became Slow When The Pagination Range Is Relatively Narrow (Doc ID 2941203.1)
参考了几篇MOS文章确实有一定的概率在使用rownum查询时触发bug造成sql执行缓慢,在12.2后该BUG被修复,临时的解决办法有如下两种
1. session or system 级别将隐含参数g改为false
alter session set "_optimizer_rownum_pred_based_fkr" = FALSE;
alter system set "_optimizer_rownum_pred_based_fkr" = FALSE;
2. sql添加hint
SELECT /*+ opt_param('_optimizer_rownum_pred_based_fkr','false') */
尝试使用这两种办法都可以秒出结果。
三. 后记
如果在12.2之前的版本,使用伪列rownum查询,出现莫名其妙的性能问题,可以试试session级别修改_optimizer_rownum_pred_based_fkr参数为false 可能会起到妙手回春的效果。
时间紧写的不够细致,更多的细节可以参考mos文档
相关优化文章