Statspack检查中,发现数据库出现性能异常,15分钟的采样报告:
系统的CPU消耗上升,以下是系统的负载概要信息:
Load Profile Per Second Per Transaction
Redo size: 876.497.06 25.821.52
Logical reads: 39,103.18 1,151.98
Block changes: 3,662.94 107.91
Physical reads: 287.40 8.47
Physical writes: 180.96 5.33
User calls: 8,700.70 256.32
Parses: 715.31 21.07
Hard parses: 0.16 0.00
Sorts: 143.47 4.23
Logons: 0.03 0.00
Executes: 4,277.99 126.03
Transactions: 33.94
在接下来的性能指标中,Rows per Sort引起了我们的重点关注,平均每次排序的行数高达9058行:
% Blocks changed per Read:9.37 Recursive Call %: 37.52
Rollback per transaction %:0.13 Rows per Sort: 9058.18
Instance Efficiency Percentages(Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
Buffer Nowait %:99.99 Redo NoWait %: 100.00
Buffer Hit %:99.63 In-memory Sort %:100.00
Library Hit %:99.99 Soft Parse %:99.98
Execute to Parse %:83.28 Latch Hit %:99.88
Parse CPU to Parse Elapsd %:67.45 %Non-Parse CPU:98.75
CPU 使用统治的升,会不会和排序有关呢?
作为一个产品数据库,出现这么高的rows/sort不可思议。排序需要CPU计算,消耗大量CPU。从这个出发点定位SQL,继续看statspack 找到最消耗CPU的SQL:
Top 5 Timed Events
EVENT Waits Time(s) % Total Ela Time
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CPU time 3.878 88.43
db file sequential read 20,986 170 3.88
log file sync 62,767 103 2.36
db file scattered read 10,886 55 1.25
direct path write 61,261 40 .92
SQL ordered by Gets for Db:MESWP Instance:meswp Snaps:2554-2555
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by all SQL statements called within the PL/SQL code. As individual SQL
statements are alse reported, it is possible and valid for the summed total % to exceed 100
Buffer Gets Executions Gets per Exec %Total CPU Time(s) Elapsd Time(s) Hash Value
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
14,991,160 8,799 1,703.7 42.6 2888.67 3172.58 2784....
Module : InSiteXMLServer.exe
select PRODUCTMATERIALLISTITEMID From orderedmateriallist where bomid=:"SYS_B_0" order by DidplayPriority
Hash_value=2784的SQL消耗的CPU Time已经达到总消耗的2888.67/3878 = 74.5% 消耗的Logical Reads 已经达到消耗的14991160/(39103.18*60*15)=42.6%
这个SQL的CPU Time/Elapsd Time = 2888.67/3172.58=91.1%,也就是说这个SQL执行的总消耗中CPU的时间消耗占到了91.1%,消耗这这么多的CPU时间到底做了什么,会不会是排序?
SQL排序问题定位
首先做一个查询验证,记录之前的排序统计数据:
SQL>select * from v$statname where name='sorts(rows)';
STATISTIC# NAME CLASS
---------------------------------------------------------
247 sorts(rows) 64
SQL>select * from v$mystat where statistic#=247
SID STATISTIC# VALUE
-----------------------------------------------------------
1004 247 63
执行一次这个可疑的SQL:
select productmateriallistitemid from orderedmateriallist where bomid = '0018 ... order by ....
再次查出统计数据:
select * from v$mystat where statistic#=247;
SID STATISTIC# VALUE
-------------------------------------------------
1004 247 250175
返回了16行数据,竟然做了250175-63=250112行的排序
SQL优化与问题解决
SQL问题找到了,那么接下来要分析一下,这个SQL应该如何调整
SQL>select sql_text from v$sqltext_with_newlines where hash_value='2784...' order by piece;
SQL_TEXT
---------------------------------------
select PRODUCTMATERIALLISTITMED From orderedmateriallist where bomid = :"SYS_B_0" order by DisplayPriority
此处的ORDEREDMATERIALLIST是一个视图,其创建语句如下:
SQL> select dbms_metadata.get_ddl('VIEW','ORDEREDMATERIALLIST') VIEW_SQL from dual;
VIEW_SQL
-----------------
.....
再来确认一下执行计划的子指针信息:
SQL> select child_number,plan_hash_value,object_status from v$sql order by plan_hash_value
我们注意到只有一个有效的执行计划,不存在多执行计划问题,这样的调整起来就较为简单了,以下是该SQL被记录的执行计划:
SQL> select operation,options,object_name from v$sql_plan where hash_value='27...' and child_number='9'
用10046进一步跟踪,看一下具体执行的过程
SQL> alter session set events '10046 trace name context forever';
SQL> alter session set events '10046 trace name context off';
在跟踪文件中,找到更为详尽的执行信息:
call count cpu elapsed disk query current rows
------------------------------------------------------------------------------------------------------------------------------------
Parse 1 0.00 0.02 1 5 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 3 0.64 0.63 0 3148 0 16
-------------------------------------------------------------------------------------------------------------------------------------
total 5 0.64 0.65 1 3153 0 16
Misses in library cache during parse:1
Optimizer goal:CHOOSE
Parsing user id:24
Rows Row Source Operation
-----------------------------------------------------------
......
alter session set events '10046 trace name context off'
SQL共执行了0.65秒,其中CPU工作了0.64秒。这么高的CPU使用率主要用于最后一步sort unique.