下班的时候突然接到电话,通知一台主机资源占用超过,同时手机短信也不停的报session数,赶快连上服务器查看具体原因:
1、 通过top可以发现系统CPU资源占用100%。
查当前进程数,发现比平时多了100多
Select * from v$license;
SESSIONS_MAX SESSIONS_WARNING SESSIONS_CURRENT SESSIONS_HIGHWATER USERS_MAX
------------ ---------------- ---------------- ------------------ ----------
0 0 234 246 0
查session等待事件,大部分session都在做latch free等待
SQL> select sid,event,P1TEXT,state from v$session_wait;
SID EVENT P1TEXT STATE
------ ------------------------------ ------------------------------ -------------------
32 latch free address WAITED KNOWN TIME
38 latch free address WAITED KNOWN TIME
41 latch free address WAITING
57 latch free address WAITING
89 latch free address WAITED KNOWN TIME
93 latch free address WAITED KNOWN TIME
111 latch free address WAITED KNOWN TIME
118 latch free address WAITING
137 latch free address WAITED KNOWN TIME
201 latch free address WAITING
200 latch free address WAITED KNOWN TIME
194 latch free address WAITED KNOWN TIME
186 latch free address WAITED KNOWN TIME
182 latch free address WAITED KNOWN TIME
177 latch free address WAITING
163 latch free address WAITING
147 latch free address WAITED KNOWN TIME
146 latch free address WAITED KNOWN TIME
238 latch free address WAITED KNOWN TIME
236 latch free address WAITED KNOWN TIME
233 latch free address WAITED KNOWN TIME
225 latch free address WAITED KNOWN TIME
221 latch free address WAITED KNOWN TIME
211 latch free address WAITING
209 latch free address WAITED KNOWN TIME
207 latch free address WAITED KNOWN TIME
204 latch free address WAITING
261 latch free address WAITED KNOWN TIME
257 latch free address WAITED KNOWN TIME
255 latch free address WAITED KNOWN TIME
253 latch free address WAITED KNOWN TIME
251 latch free address WAITED KNOWN TIME
241 latch free address WAITING
239 latch free address WAITED KNOWN TIME
119 latch free address WAITED KNOWN TIME
113 latch free address WAITED KNOWN TIME
97 latch free address WAITING
92 latch free address WAITED KNOWN TIME
88 latch free address WAITED KNOWN TIME
87 latch free address WAITED KNOWN TIME
68 latch free address WAITED KNOWN TIME
2、 查询占用cpu的进程情况,大量进程占用都很高,如3720.
SQL> SELECT /*+ ordered */ p.spid, s.sid, s.serial#, s.username, s.program,s.status,TO_CHAR(s.logon_time, 'mm-dd-yyyy hh24:mi') logon_time, s.last_call_et, st.value, s.sql_hash_value, s.sql_address, sq.child_number ,sq.sql_text
2 FROM v$statname sn, v$sesstat st, v$process p, v$session s, v$sql sq
3 WHERE s.paddr=p.addr
4 AND s.sql_hash_value = sq.hash_value and s.sql_Address = sq.address
5 AND s.sid = st.sid
6 AND st.STATISTIC# = sn.statistic#
7 AND sn.NAME = 'CPU used by this session'
8 AND p.spid = &osPID -- parameter to restrict for a specific PID
9 -- AND s.status = 'ACTIVE'
10 ORDER BY st.value desc;
Enter value for ospid: 3720
SPID SID SERIAL# USERNAME PROGRAM STATUS LOGON_TIME LAST_CALL_ET VALUE SQL_HASH_VALUE SQL_ADDRESS CHILD_NUMBER
------------ ------ ------- ---------- -------------------------------- -------- ---------------- ------------ ---------- -------------- ---------------- ------------
SQL_TEXT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
3720 97 17856 CCATSUPT JDBC Thin Client ACTIVE 02-21-2011 15:28 3413 16088 1113672989 C000000354738950 0
SELECT /*+ INDEX (C, I_SVR_PUB_DA_MAINQUEUE_HIS_FRT) */ COUNT(1) FROM Svr_pub_da_MainQueue_his c,pub_Specialty k WHERE c.Business in ( 'D46C2BC08404D1211DFA6F7BA8DCB9DB', '293C7B04CD7B0FFD56B255CC2585E16F') AND c.specialty = k.specialtyid AND c.FirstReceptTime BETWEEN TO_DATE('2011-01-21 00:00:00', 'yyyy-MM-dd HH24:MI:SS') AND TO_DATE('2011-02-22 00:00:00', 'yyyy-MM-dd HH24:MI:SS') AND ( k.TreeCode LIKE '0109%') AND EXISTS (SELECT 1 FROM org_unit ou WHERE c.sourcedept = ou.unitid AND ou.TreeCode LIKE '0001000301970003%')
3、查看该sql执行计划,存在开销极大的nested loop,检查各个表的数据量后,怀疑是统计信息出问题,走了不该走的索引。
-------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart | Pstop |
-------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 34498 | | |
| 1 | SORT AGGREGATE | | 1 | 203 | | | |
| * 2 | TABLE ACCESS BY GLOBAL INDEX ROWID | SVR_PUB_DA_MAINQUEUE_HIS | 1 | 101 | 34493 | ROW L | ROW L |
| 3 | NESTED LOOPS | | 1 | 203 | 34498 | | |
| 4 | MERGE JOIN CARTESIAN | | 1 | 102 | 5 | | |
| 5 | TABLE ACCESS BY INDEX ROWID | PUB_SPECIALTY | 1 | 46 | 3 | | |
| * 6 | INDEX RANGE SCAN | I_PUB_SPECIALTY_TR | 1 | | 2 | | |
| 7 | BUFFER SORT | | 1 | 56 | 2 | | |
| 8 | SORT UNIQUE | | | | | | |
| 9 | TABLE ACCESS BY INDEX ROWID | ORG_UNIT | 1 | 56 | 2 | | |
| * 10 | INDEX RANGE SCAN | ORG_UNIT_I01 | 1 | | 1 | | |
| * 11 | INDEX RANGE SCAN | I_SVR_PUB_DA_MAINQUEUE_HIS_FRT | 40241 | | 119 | | |
-------------------------------------------------------------------------------------------------------------------------
4、更新统计信息
analyze table PUB_SPECIALTY compute statistics for table for all indexed columns for all indexes;
analyze table org_unit compute statistics for table for all indexed columns for all indexes;
5、此时再次运行sql,查看执行计划,发现已经已无nested loop
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE (Cost=39980 Card=1 Bytes=2
00)
1 0 SORT (AGGREGATE)
2 1 HASH JOIN (Cost=39980 Card=1811 Bytes=362200)
3 2 TABLE ACCESS (FULL) OF 'PUB_SPECIALTY' (Cost=4 Card=17
0 Bytes=7480)
4 2 HASH JOIN (Cost=39975 Card=2059 Bytes=321204)
5 4 SORT (UNIQUE)
6 5 TABLE ACCESS (FULL) OF 'ORG_UNIT' (Cost=10 Card=21
Bytes=1155)
7 4 TABLE ACCESS (BY GLOBAL INDEX ROWID) OF 'SVR_PUB_DA_
MAINQUEUE_HIS' (Cost=39951 Card=16097 Bytes=1625797)
8 7 INDEX (RANGE SCAN) OF 'I_SVR_PUB_DA_MAINQUEUE_HIS_
FRT' (NON-UNIQUE) (Cost=158 Card=40241)
6、查询等待事件,latch free的session仍然存在,由于走错了执行计划,所以决定杀掉这些session,杀掉session后系统恢复正常
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11088128/viewspace-687675/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/11088128/viewspace-687675/