今天读了oracle的Case Study中的一篇文章:Resolving High CPU Usage on Oracle Servers.
案例讲述了一个email application中在安装了一些application secucurity patches之后出现的性能问题。服务器是一个2-cpu的Linux Server。对案例的分析主要采用top-down的方法,大体流程是这样的:
出问题时的top数据如下:
# of processes in the run queue CPU User mode
CPU Sys mode
CPU idle
zzz ***Thu Mar 9 18:30:00 PST 2006
cpu procs memory swap io system
r b w swpd free buff cache si so bi bo in cs us sy id
3 0 0 920 8596 342908 4088164 0 0 10 514 501 1790 43 7 50
5 0 0 920 8844 343024 4089564 0 0 8 79 319 1751 29 4 67
21 0 1 920 5332 343160 4093964 0 0 18 320 390 1690 97 3 0
22 0 0 920 22864 343304 4057500 0 0 11 149 370 2563 97 3 0
25 0 3 920 9728 343404 4059732 0 0 77 412 400 2412 95 5 0
23 0 0 920 8416 343688 4044128 0 0 2 2 254 1308 97 3 0
21 0 1 920 6760 343520 4045256 0 0 56 2428 704 2382 89 10 1
24 0 0 920 14856 343812 4045968 0 0 54 66 278 2785 95 5 0
21 0 2 920 17600 343956 4048296 0 0 17 70 249 1677 96 4 0
20 0 0 920 6028 344080 4050368 0 0 8 52 265 1461 94 6 0
22 0 1 920 27172 344184 4053608 0 0 25 49 263 2513 96 4 0
14 0 2 920 58264 344364 4057036 0 0 73 125 614 3821 79 20 1
1 1 0 920 65756 344476 4070080 0 0 358 1964 410 2118 12 4 84
0 0 0 920 48628 344580 4080864 0 0 17 106 229 1190 9 2 88
对案例的分析主要采用top-down的方法,大体流程是这样的:
1. Verify CPU consumption on the server,collect top and vmstat data for a period of time.2. collect extended sql trace data(event 10046)
gather sid,serial#,and other information from v$session about these sessions;
trace the sessions using an extendes sql trace via event 10046,level 12;
produce a TKPROF report and look for the queries having the longest elasped time
3. build a best case script for the problem query(使用绑定变量值)
4. obtain statspack snapshots,collect statspack snapshots during the same time period as the OS were collected and look for correlations
with the session-level data collected in step 2.
5. tune the query
设置10046事件
SQL> SELECT p.spid, s.sid, s.serial#
FROM v$session s, v$process p
WHERE s.paddr = p.addr AND p.spid = 24078
SPID SID SERIAL#
------------ ---------- ----------
24078 18 5
SQL> begin dbms_system.set_ev(18,5, 10046,12,''); end; -- trace on
-- collect trace information for approximately 15 minutes during the problem
SQL> begin dbms_system.set_ev(18, 5, 10046,0,''); end; -- trace off
问题SQL的执行计划
Rows Row Source Operation
------- ---------------------------------------------------
19 TABLE ACCESS BY INDEX ROWID DS_ATTRSTORE
26 NESTED LOOPS
1 NESTED LOOPS
1 VIEW VW_NSO_1
1 SORT UNIQUE
1 UNION-ALL
0 TABLE ACCESS BY INDEX ROWID CT_OBJECTCLASS
0 INDEX RANGE SCAN (object id 55349)
1 TABLE ACCESS BY INDEX ROWID CT_OBJECTCLASS
64439 NESTED LOOPS
2 TABLE ACCESS BY INDEX ROWID CT_MAIL
2 INDEX RANGE SCAN (object id 55355)
64436 INDEX RANGE SCAN (object id 55349)
CBO used VA_OBJECTCLASS
index due to INDEX hint
0 TABLE ACCESS BY INDEX ROWID CT_OBJECTCLASS
27 NESTED LOOPS
2 TABLE ACCESS BY INDEX ROWID CT_MAIL
2 INDEX RANGE SCAN (object id 55355)
24 INDEX RANGE SCAN (object id 55349)
1 TABLE ACCESS BY INDEX ROWID CT_DN
1 INDEX RANGE SCAN (object id 55345)
24 INDEX RANGE SCAN (object id 55340)
定位问题的原因后,重新创建索引后,执行计划
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.02 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 4 0.00 0.00 0 63 0 43
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 6 0.02 0.01 0 63 0 43
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 141
Rows Row Source Operation
------- ---------------------------------------------------
43 TABLE ACCESS BY INDEX ROWID DS_ATTRSTORE
45 NESTED LOOPS
1 NESTED LOOPS
1 VIEW VW_NSO_1
1 SORT UNIQUE
1 UNION-ALL
CBO chose a different index
(ST_OBJECTCLASS)
0 TABLE ACCESS BY INDEX ROWID CT_OBJECTCLASS
0 INDEX RANGE SCAN (object id 55349)
1 NESTED LOOPS
2 TABLE ACCESS BY INDEX ROWID CT_MAIL
2 INDEX RANGE SCAN (object id 55355)
1 INDEX RANGE SCAN (object id 55348)
0 NESTED LOOPS
2 TABLE ACCESS BY INDEX ROWID CT_MAIL
2 INDEX RANGE SCAN (object id 55355)
0 INDEX RANGE SCAN (object id 55348)
1 TABLE ACCESS BY INDEX ROWID CT_DN
1 INDEX RANGE SCAN (object id 55345)
43 INDEX RANGE SCAN (object id 55340)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 4 0.00 0.00
SQL*Net message from client 4 1.25 1.29
问题得到解决。
文章名字为CS_HighCPU.pdf。