Bug 8552596: NODE EVICTED WITH "TERMINATING INSTANCE DUE TO ERROR 481" | |||||
|
Bug 属性
类型 | B - Defect | 已在产品版本中修复 | - |
严重性 | 2 - Severe Loss of Service | 产品版本 | 10.2.0.4 |
状态 | 33 - Suspended, Req'd Info not Avail | 平台 | 212 - IBM AIX on POWER Systems (64-bit) |
创建时间 | 27-May-2009 | 平台版本 | 5.3 |
更新时间 | 03-Jul-2009 | 基本 Bug | - |
数据库版本 | 10.2.0.4 | ||
影响平台 | Generic | ||
产品源 | Oracle |
相关产品
产品线 | Oracle Database Products | 系列 | Oracle Database |
区域 | Oracle Database | 产品 | 5 - Oracle Server - Enterprise Edition |
Hdr: 8552596 10.2.0.4 RDBMS 10.2.0.4 RAC PRODID-5 PORTID-212
Abstract: NODE EVICTED WITH "TERMINATING INSTANCE DUE TO ERROR 481"
*** 05/27/09 05:28 am ***
TAR:
----
PROBLEM:
--------
Instance terminated with the following error in alert log file.
===============================================================
Error: KGXGN aborts the instance (6)
Tue May 26 15:29:29 2009
USER: terminating instance due to error 481
Tue May 26 15:29:29 2009
System state dump is made for local instance
System State dumped to trace file
/oracle/app/admin/ODSPROD/bdump/odsprod1_diag_430314.trc
Tue May 26 15:29:39 2009
Termination issued to instance processes. Waiting for the processes to exit
Instance terminated by USER, pid = 1667548
DIAGNOSTIC ANALYSIS:
--------------------
The rdbms and asm instance reported issues with the css at 2009-05-26
15:29:28
>>>
*** 15:29:28.953
2009-05-26 15:29:28.953: [ CSSCLNT]clsssRecvMsg: comm error received, comrc
11, con (110ef8c90), msg (fffffffffffd2b0), msgl 144
2009-05-26 15:29:28.990: [ CSSCLNT]clssgsGGetStatus: communications failed
(0/3/-10944)
2009-05-26 15:29:28.990: [ CSSCLNT]clssgsGGetStatus: returning 8
kgxgnpstat: received ABORT event from CLSS
CM problem, please abort
*** 15:29:28.991
Node monitor becomes unavailable for service
2009-05-26 15:29:29.191: [ CSSCLNT]clsssRecvMsg: comm error received, comrc
11, con (110ef8c90), msg (fffffffffffd2b0), msgl 144
2009-05-26 15:29:29.191: [ CSSCLNT]clssgsGGetStatus: communications failed
(0/3/-10944)
2009-05-26 15:29:29.191: [ CSSCLNT]clssgsGGetStatus: returning 8
kgxgnpstat: received ABORT event from CLSS
>>>>
in css log
====
CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmHandleExitUpdate: (src
2) grock ocr_ODS_CRS, member 1
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmRPCDone: rpc
110741ed8 (RPC#1829) state 6, flags 0x100
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmDelMemCompl: rpc
110741ed8, ret 0, client 111f0cd70
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clscsendx: (111f0d970)
Connection not active
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmSendClient: Send
failed rc 6, con (111f0d970), client (111f0cd70), proc (0)
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmFreeRPCIndex:
freeing rpc 1829
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmRemoveMember:
grock(ocr_ODS_CRS) member(1/111f0cbb0) nodeNum(1) flags(0x0) type(2)
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmDispatchCMXMSG():
msg type(6) src(2) dest(65535) size(352) tag(00000000) incarnation(4)
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmHandleExitUpdate:
(src 2) grock crs_version, member 0
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmRPCDone: rpc
110742040 (RPC#1830) state 6, flags 0x100
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmDelMemCompl: rpc
110742040, ret 0, client 111f0dd50
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clscsendx: (111f0e430)
Connection not active
++ We started seeing "Connection not active" messages at the same time.
The oswatcher last snapshot is at 15:29:01
The css seems to respawn very quickly here:
>>>>
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmFreeRPCIndex:
freeing rpc 1832
[ CSSD]2009-05-26 15:29:28.329 [3857] >TRACE: clssgmRemoveMember:
grock(_ORA_CRS_MEMBER_p690ap0) member(0/111f12250) nodeNum(1) flags(0x12)
type(3)
[ CSSD]2009-05-26 15:29:43.860 >USER: Copyright 2009, Oracle version
10.2.0.4.0
[ CSSD]2009-05-26 15:29:43.860 >USER: CSS daemon log for node p690ap0,
number 1, in cluster ODS_CRS
[ CSSD]2009-05-26 15:29:43.873 [1] >TRACE: clssscmain: local-only set to
false
>>>>>
++ This is strange, the css respawned at 15:29:43. There should have been a
reboot here ++.
The reboot happened at 15:41 as reported by errpt.log. We see cssd also
starting at 15:41 again.
The issues are:
=============
1) Why did the css report communication issue. The node statistics in
oswatcher look good.
2) What caused the cssd to respawn at 15:29. Ideally the node should reboot
with abnormal termination of cssd
WORKAROUND:
-----------
RELATED BUGS:
-------------
REPRODUCIBILITY:
----------------
TEST CASE:
----------
STACK TRACE:
------------
SUPPORTING INFORMATION:
-----------------------
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------
DIAL-IN INFORMATION:
--------------------
IMPACT DATE:
------------
*** 05/27/09 05:41 am ***
*** 05/27/09 05:52 am *** (CHG: Sta->16)
*** 05/27/09 09:57 am ***
*** 05/27/09 05:36 pm ***
*** 05/27/09 05:37 pm *** (CHG: Sta->10)
*** 05/28/09 08:01 am *** (CHG: Sta->16)
*** 05/28/09 08:01 am ***
*** 05/28/09 09:16 am ***
*** 05/28/09 09:16 am *** (CHG: Sta->10)
*** 05/28/09 01:00 pm *** (CHG: Sta->16)
*** 05/28/09 01:00 pm ***
*** 05/29/09 10:20 am *** (CHG: Sta->10)
*** 05/29/09 10:20 am ***
*** 05/31/09 02:15 am ***
*** 05/31/09 02:17 am ***
*** 05/31/09 02:17 am *** (CHG: Sta->16)
*** 05/31/09 02:20 am ***
*** 06/01/09 05:35 pm ***
*** 06/01/09 05:35 pm *** (CHG: Sta->10)
*** 06/01/09 05:36 pm ***
*** 06/01/09 09:22 pm ***
*** 06/01/09 09:23 pm ***
*** 06/01/09 11:37 pm *** (CHG: Sta->16)
*** 06/01/09 11:37 pm ***
*** 06/01/09 11:39 pm ***
*** 06/04/09 12:37 am ***
*** 06/04/09 12:37 am ***
*** 06/04/09 01:16 am ***
*** 06/04/09 01:34 am *** (CHG: Sta->10)
*** 06/04/09 02:55 am ***
*** 06/04/09 02:58 am *** (CHG: Sta->16)
*** 06/04/09 02:58 am ***
*** 06/04/09 03:01 am ***
*** 06/04/09 03:23 am *** (CHG: Sta->10)
*** 06/04/09 03:23 am ***
*** 06/04/09 03:19 pm ***
*** 06/04/09 03:21 pm ***
*** 06/05/09 06:47 pm ***
*** 07/03/09 05:52 pm *** (CHG: Sta->33)
*** 07/03/09 05:52 pm ***
try{var s = window.name;parent.MM[s].initIframe();}catch(e){}
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22123669/viewspace-691193/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/22123669/viewspace-691193/