metalink關於這個bug的內容如下:
btw, 這樣算不算違規啊,
,如果這樣的話儘量少貼了呵呵!
Subject: Linux: OCSSD Reboots Nodes Randomly After Application of 10.2.0.4 Patchset and in 11g Environments
Doc ID: 731599.1 Type: ALERT
Modified Date: 23-MAR-2009 Status: PUBLISHED
In this Document
Description
Issue Description
Likelihood of Occurrence
Possible Symptoms
Cause
OS Versions Affected
Workaround or Resolution
Modification History
References
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.4 to 11.1.0.7
Linux Itanium
Linux x86-64
Description
Issue Description
It is possible to see random node reboots induced by Oracle Cluster Synchronization Services Daemon (OCSSD) in RAC on Red Hat, Suse, and Oracle Enterprise Linux OS implementations following application of the 10.2.0.4 patchset and in 11g environments.
Likelihood of Occurrence
All environments running with an affected OS Version (see 'OS Versions Affected' here below) and Oracle Clusterware (either Real Application Clusters or using Oracle Clusterware to protect a single instance) 11.1.0.6/7 or where 10.2.0.4 patchset is applied will encounter this issue.
Possible Symptoms
Symptoms related to this issue as they were reported to Oracle Support have been identified as (but are not necessarily limited to):
- Cluster member reboots
- CLSOMON failing with status 13
- high cpu usage of ocssd.bin
One might see the following messages reported in the system log file:
Apr 10 15:48:36 bn1rac004 logger: Oracle clsomon failed with fatal status 13.
Apr 10 15:48:37 bn1rac004 logger: Oracle CRS failure. Rebooting for cluster
integrity.
One might see the following message reported in the ocssd.log file:
[ CSSD]2008-04-10 15:48:09.611 [1210108224] >ERROR: clssscExit: CSSD
signal 11 in thread GMClientListener
Cause
The symptoms listed above were all diagnosed as being caused by the same base bug:
Bug 6790001 - CLSSSCEXIT: CSSD SIGNAL 11 IN THREAD GMCLIENTLISTENER
Which, in turn, is ultimately caused by a Linux OS bug related to 'glibc'. This bug affects Red Hat, Suse, and Oracle Enterprise Linux OS implementations. The Red Hat bugzilla Bug is BUG 405781. The Suse bug is Novell BUG 416838.
Reference:
========
Red Hat bug report: https://bugzilla.redhat.com/show_bug.cgi?id=405781
Red Hat advisory: http://rhn.redhat.com/errata/RHBA-2008-0083.html
Detailed description: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=473812
OS Versions Affected
RedHat Enterprise Linux (RHEL) 3
* Releases are not affected because glibc provided does not use the madvise() and the MADV_DONTNEED flag.
Oracle Enterprise Linux (OEL) / RHEL 4
* Problem exists with glibc-2.3.4-2.39
* Fixed in glibc-2.3.4-2.40 and above (only version -2.41 was actually released)
Oracle Enterprise Linux (OEL) / RHEL 5
* Problem exists with glibc-2.3.4-2.39
* Fixed in glibc-2.5-20 and above (released glibc-2.5-24)
SLES 10 SP2
* Problem exists with glibc-2.4-31.54
* Contact Novell for the fix
Workaround or Resolution
- EL4 customers: install glibc-2.3.4-2.40 (or above) or upgrade to EL4u7
- EL5 customers: install glibc-2.5-24 (or above) or upgrade to EL5u2
- SLES10 customers: bug 416838 filed for this problem, contact Novell for fix.
Note for 64-bit environments: it is necessary to install both the 32-bit and 64-bit versions of the newer glibc.
The fix for this issue is to install the newer versions of glibc as indicated. This issue will not be addressed / corrected by applying any CRS bundle patch(es).
Modification History
14-Jan-2009 added SLES10 in affected OS
References
Note 730148.1 - Ocssd.Bin Process Consumes 100% Cpu
Note 730437.1 - GLIBC: calloc() Breaks when Application Runs with Locked Process Address Space
Note 732847.1 - [Glibc] Call to calloc() does not always zero locked memory, resulting in segfaults
Keywords
REDHAT ; RAC ; CLUSTERWARE ; REAL~APPLICATION~CLUSTERS ; CSS ; GLIBC ; LINUX ; OCLSOMON ; OCSSD ;