在一个报表数据库的alert文件中发现了ORA-1555错误
ORA-1555 cased by SQL statement below(Query Duration=38751 sec,SCN:0x000.fe5b..):
INSERT INTO MAN_ORDER_ITEM(ID...
根据出错的SQL可以判断出这是一个后台运行的JOB,由于这个错误刚刚出现一次就被发现了,且失败的JOB会自动重新执行,
因此可以从DBA_JOBS_RUNNING中查看相关的JOB和SESSION信息
SQL>select SID,JOB from DBA_JOBS_RUNNING;
SID JOB
----------------------------
70 208
检查这个SESSION当前在执行什么SQL语句:
SQL>select sql_text from v$sql sql ,v$session s
where sql.hash_value = s.sql_hash_value
and sql.address =s.sql_address
and s.sid=70;
SQL_TEXT
--------------------------------------------------------------
INSERT INTO MAN_ORDER_ITEM(...
显然,这个SQL就是后台alert文件中出现ORA-1555出错的SQL,现在已经找到正在运行这个SQL的会话,那么看看会话在等待什么:
SQL>select sid,event,pltext,plraw,p2text,p2,seconds_in_wait from v$session_wait where sid =70;
sid event p1text p1raw p2text p2 seconds_in_wait
---------------------------------------------------------------------------------------------------------------------------------
70 latch free address 000004... number 98 330
通过观察发现,会话的等待事件一直都是LATCH FREE.看到这个等待事件,第一感觉就是当前会话可能和其他会话产生了争用.
查询一下oracle等待的具体latch的类型.
SQL>select latch#,name from v$latch where latch#=98;
LATCH# NAME
------------------------------------------
98 cache buffers chains
而查询V$LOCK和V$LATCHHOLDER视图,发现没有其他的进程对JOB运行构成影响:
SQL>select sid,type,ID1,ID2,LMODE,REQUEST,CTIME,BLOCK from v$lock where SID> 8;
SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
-----------------------------------------------------------------------------------------------------------------------------------
70 TM 35258 0 3 0 12072 0
70 JQ 0 208 6 0 12155 0
SQL> select * from v$latchholder;
no rows selected
SQL>select * from v$latchholder;
PID SID LADDR NAME
------------------------------------------------------------------------------------------------------
15 70 00000... cache buffers chains
可以看到并没有其他对象和JOB运行的过程发生争用,多次查询V$LATCHHOLDER,只发现一次cache buffers chains
由于等待事件是latch free,怀疑和系统本身的问题有关。通过下面的脚本可以看到,目前正在等待的这个子latch的信息:
SQL>select addr,latch#,child#,name from v$latch_children where addr in (select plraw from v$session_wait where sid=70);
ADDR LATCH# CHILD# NAME
-------------------------------------------------------------------------------------------------
0000... 98 327 cache buffers chains
观察LATCH_MISSES的信息
SQL>col parent_name format a20
SQL> col where format a35
SQL>select *
from (
select parent_name,"WHERE",SLEEP_COUNT, WTR_SLP_COUNT,LONGHOLD_COUNT_LHCOUNT
from V$LATCH_MISSES
where PARENT_NAME= 'cache buffers chains'
order by sleep_count + wtr_slp_count + longhold_count desc
)
where rownum < 20;
...............................