一次系统卡住不动的故障排查
今天,我在oracle 的scott用户下,执行了 create table emp1 as select * from emp 这条语句后,数据库就卡住不动了。然后我再开个新的会话,在scott用户下执行select * from tab后,也是卡住了。
于是我用v$sql和v$session 联合查询,定位出哪个会话在执行哪些操作,消耗多长时间:
SQL> selectb.sid,a.sql_text,b.status,b.last_call_et,b.event from v$sql a,v$session b wherea.sql_id=b.sql_id;
SID SQL_TEXT STATUS LAST_CALL_ET EVENT
---------- ------------------------------------------------------------------------------ ----------------------------------------------------------------------------
129 select b.sid,a.sql_text,b.status,b.last_call_et,b.event from v$sqla,v ACTIVE 0 SQL*Net messageto client
$session b where a.sql_id=b.sql_id
131 select * from tab ACTIVE 3 buffer busywaits
152 create table emp1 as select * from emp ACTIVE 9101 flashback buf free by RVWR
通过上面可以看到,status 列为active说明当前的sql的语句正在执行,会话处于活跃状态,last_cakk_et=9101说明这条SQL语句已经运行了9101秒,当前会话正在处于等待状态,等待的事件为flashback buf free by RVWR。
(注:可以通过下面这条命令定位更加详细的信息,如客户端ip等。
select b.sid,b.machine,b.program,b.module,a.sql_text,b.status,b.last_call_et,b.eventfrom v$sql a,v$session b where a.sql_id=b.sql_id;)
通过百度,得知flashback buf free by RVWR等待事件可能是由于flashbackbuffer空间紧张。
下面查看数据库top 5事件:
SQL> selectrownum,sql_id,sql_text,optimizer_cost from (select sql_id,sql_text,optimizer_costfrom v$sqlarea order by 3 desc) whererownum<5 ;
ROWNUM SQL_ID SQL_TEXT OPTIMIZER_COST
---------- -------------------------------------------------------------------------------------------------
1 gfjvxb25b773h select o.owner#,o.obj#,decode(o.linkname,null,decode(u.name,null,'SYS 2427
',u.name),o.remoteowner),o.name,o.linkname,o.namespace,o.subname from
user$ u, obj$ o whereu.user#(+)=o.owner# and o.type#=:1 and not exis
ts (select p_obj# fromdependency$ where p_obj# = o.obj#) order by o.o
bj# for update
2 63fyqfhnd7u5k select OBJOID, CLSOID, RUNTIME, PRI, JOBTYPE, SCHLIM, WT, INST, RU 288
NNOW, ENQ_SCHLIM from( select a.obj# OBJOID, a.class_oid CLSOID, d
ecode(bitand(a.flags, 16384), 0,a.next_run_date, a.last_enabled_time)
RUNTIME, (2*a.priority + decode(bitand(a.job_status, 4), 0, 0,
decode(a.running_instance, :1, -1, 1))) PRI, 1 JOBTYPE,
ROWNUM SQL_ID SQL_TEXT OPTIMIZER_COST
---------- -------------------------------------------------------------------------------------------------
a.schedule_limitSCHLIM, a.job_weight WT, decode(a.running_instanc
e, NULL, 0,a.running_instance) INST, decode(bitand(a.flags, 16384)
, 0, 0, 1)RUNNOW, decode(bitand(a.job_status, 8388608),0, 0, 1) E
NQ_SCHLIM from sys.scheduler$_job a where bitand(a.job_status, 515)
= 1 and ((bitand(a.flags, 134217728 +268435456) = 0) or (b
itand(a.job_status,1024) <> 0)) and bitand(a.flags,4096) = 0 a
nd (a.next_run_date<= :2 or bitand(a.flags, 16384)<> 0) and
(a.class_oid isnull or (a.class_oid is notnull and a.clas
s_oid in (selectb.obj# from sys.scheduler$_class b
where b.affin
ROWNUM SQL_ID SQL_TEXT OPTIMIZER_COST
---------- -------------------------------------------------------------------------------------------------
3 cfz686a6qp0kg select o.obj#, u.name, o.name, t.spare1, DECODE(bitand(t.flags 207
, 268435456),268435456, t.initrans, t.pctfree$) from sys.obj$ o, s
ys.user$ u, sys.tab$t where (bitand(t.trigflag, 1048576) = 104857
6) and o.obj#=t.obj# and o.owner# = u.user#
4 b7jn4mf49n569 select o.name, u.name from obj$ o, type$ t, user$ u where o.oid$ = t. 185
tvoid andu.user#=o.owner# and bitand(t.properties,8388608) = 8388608
and (sysdate-o.ctime)> 0.0007
5 767pug2dbpqpc select * from tab 184
6 8ypwcums111db select distinct (case when cd.dbid = wr.dbid and 152
通过top 5事件没看出什么问题。
再看一下v$lock,检查数据库是否有锁表的问题:
SQL> select sid,type,id1,id2,lmode,request,blockfrom v$lock;
SID TY ID1 ID2 LMODE REQUEST BLOCK
---------- -- ---------- -------------------- ---------- ----------
165 XR 4 0 1 0 0
165 CF 0 0 2 0 0
165 RS 25 1 2 0 0
161 FD 1 0 6 0 0
166 RT 1 0 6 0 0
167 PW 1 0 3 0 0
161 FD 0 0 6 0 0
164 TS 3 1 3 0 0
164 US 1 0 6 0 0
167 MR 1 0 4 0 0
167 MR 2 0 4 0 0
SID TY ID1 ID2 LMODE REQUEST BLOCK
---------- -- ---------- -------------------- ---------- ----------
167 MR 3 0 4 0 0
167 MR 4 0 4 0 0
167 MR 5 0 4 0 0
167 MR 201 0 4 0 0
164 TT 1 16 4 0 0
164 HW 1 8388617 6 0 0
164 CI 1 5 6 0 0
165 CI 1 1 4 0 0
160 TT 1 0 4 0 0
142 JQ 0 1 6 0 0
162 JS 0 0 4 0 0
SID TY ID1 ID2 LMODE REQUEST BLOCK
---------- -- ---------- -------------------- ---------- ----------
158 JS 0 1 6 0 1
162 JS 0 1 0 6 0
159 UL 7567243 0 6 0 0
159 JS 0 0 4 0 0
159 JS 0 1 0 6 0
142 TM 50288 0 2 0 0
142 TM 50291 0 3 0 0
158 TM 5123 0 3 0 0
通过上述可以看到,sid为158的会话block=1,说明它在阻塞其他的会话;sid为162的会话的request 为6,说明他正在请求lmode=6的会话,即sid=158的会话。
我的解决办法: 把那个会话kill掉。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/28916011/viewspace-1374854/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/28916011/viewspace-1374854/