天萃荷净
Oracle研究中心案例分析:运维DBA反映Oracle RAC环境数据库报错ORA-00469,分析原因是由于BUG 10008092导致RAC节点重启。
本站文章除注明转载外,均为本站原创: 转载自love wife & love life —Roger 的Oracle技术博客
本文链接地址: BUG 10008092 caused instance crash
一个双节点rac,其中某节点被重启了,如下:
###### 1节点 02:23:55 2011 ######
Sat Dec 3 02:23:55 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_pmon_12765.trc:
ORA-00469: CKPT process terminated with error
Sat Dec 3 02:23:55 2011
ORA-469 encountered when generating server alert SMG-3503
Sat Dec 3 02:23:55 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_j000_8539.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00469: CKPT process terminated with error
Sat Dec 3 02:23:56 2011
###### 1节点crash ######
Sat Dec 3 02:23:57 2011
Errors in file /oracle/admin/crmdb/bdump/crmdb1_smon_12876.trc:
ORA-00469: CKPT process terminated with error
Sat Dec 3 02:23:58 2011
Shutting down instance (abort)
License high water mark = 55
Sat Dec 3 02:24:00 2011
从上面来看,由于检查点进程ckpt出现问题,导致实例crash。
###### 1节点pmon进程trace如下:######
*** 2011-12-03 02:23:55.731
Background process CKPT found dead
Oracle pid = 24
OS pid (from detached process) = 12869
OS pid (from process state) = 12869
dtp = c000000040016e40, proc = c0000004950057c8
Dump of memory from 0xC000000040016E40 to 0xC000000040016E88
C000000040016E40 00000076 00000000 C0000004 950057C8 [...v..........W.]
C000000040016E50 00000000 00000000 00000000 434B5054 [............CKPT]
C000000040016E60 00020000 00000000 00003245 00000000 [..........2E....]
....................
....................
....................
....................
Repeat 13 times
C000000495005CF0 6F726163 6C650000 00000000 00000000 [oracle..........]
C000000495005D00 00000000 00000000 00000000 00000000 [................]
C000000495005D10 00000000 00000006 6A6C6372 6D310000 [........jlcrm1..]
C000000495005D20 00000000 00000000 00000000 00000000 [................]
Repeat 2 times
C000000495005D50 00000000 00000000 00000000 00000006Oracleо [................]
C000000495005D60 554E4B4E 4F574E00 00000000 00000000 [UNKNOWN.........]
C000000495005D70 00000000 00000000 00000000 00000000 [................]
C000000495005D80 00000000 00000008 31323836 39000000 [........12869...]
C000000495005D90 00000000 00000000 00000000 00000000 [................]
C000000495005DA0 00000000 00000005 6F726163 6C65406A [........oracle@j]
C000000495005DB0 6C63726D 31202843 4B505429 00000000 [lcrm1 (CKPT)....]
C000000495005DC0 00000000 00000000 00000000 00000000 [................]
....................
....................
....................
....................
C000000495005FA0 00000000 00000000 00000000 00001308 [................]
C000000495005FB0 00000006 00000000 [........]
error 469 detected in background process
ORA-00469: CKPT process terminated with error
*** 2011-12-03 02:24:07.798
ksuitm: waiting up to [5] seconds before killing DIAG
经同事确认,diag trace,甚至ckpt trace都没用生成,跟bug 10008092描述十分相似,包括版本,diagnostic analysis 都十分吻合,大概情况如下:
ckpt 进程死掉(可能是hang) --> pmon cleanup --> 保护后台进程,pmon crash instance
对于 alert 中的如下信息就非常容易解释了:
*** SESSION ID:(1089.34028) 2011-12-03 02:23:56.172
kgefec: fatal error 0
*** 2011-12-03 02:23:56.172
ksedmp: internal or fatal error
ORA-00603: ORACLE server session terminated by fatal error
ORA-00449: background process 'LCK0' unexpectedly terminated with error 469
ORA-00469: CKPT process terminated with error
ORA-00469: CKPT process terminated with error
Current SQL statement for this session:
TRUNCATE TABLE DINF.TEMP1_IN_PDT_CM_USER
----- PL/SQL Call Stack -----
object line object
handle number name
c00000006e89ab60 200 procedure DINF.P_IN_PDT_CM_USER
c00000009e142850 1 anonymous block
----- Call Stack Trace -----
为什么这么说呢?因为truncate table是要触发object checkpoint的。
该bug如下:
Bug 10008092: INSTANCE CRASH WITH ORA-00469: CKPT PROCESS TERMINATED WITH ERROR
--------------------------------------ORACLE-DBA----------------------------------------
最权威、专业的Oracle案例资源汇总之【案例】Oracle RAC报错ORA-00469 节点被强制重启的解决办法