APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.2.0 and later
Information in this document applies to any platform.
SYMPTOMS
o Multi node RAC cluster
o Grid Infrastructure 11.2.0.2 or higher
o redundant cluster interconnect interfaces configured (oifcfg) in the cluster registry -e.g.:
[oragrid@lc1n1 cssd]$ oifcfg getif
eth0 192.168.56.0 global public
eth1 192.168.57.0 global cluster_interconnect
eth2 192.168.57.0 global cluster_interconnect
o all of the private network adpaters have IP addresses in the same subnet
o to perform a failover test, the cable to one of the redundant cluster interconnect NICs was unplugged
o the ocssd.log of the node where the cable was unplugged records the following message:
2012-08-10 09:03:01.832: [GIPCHGEN][1098287424] gipchaInterfaceFail: marking interface failing 0x7fb5042bca10 { host '', haName 'CSS_lc1-scan', local (nil), ip '192.168.57.22', subnet '192.168.57.0', mask '255.255.255.0', mac '08-00-27-b5-53-ef', ifname 'eth1', numRef 2, numFail 0, idxBoot 0, flags 0x184d }
o shortly after which the CSS daemons on all nodes start reporting the following messages
2012-08-10 09:03:17.002: [ CSSD][1110714688]clssnmPollingThread: node lc1n2 (2) at 50% heartbeat fatal, removal in 14.670 seconds
o and eventually the CSS daemon on the node with the failing NIC terminates even though there is a 2nd (working) cluster interconnect NIC
2012-08-10 09:03:24.091: [ CSSD][1094764864]clssnmPollingThread: node lc1n1 (1) at 75% heartbeat fatal, removal in 7.410 seconds
2012-08-10 09:03:24.091: [ CSSD][1094764864]clssnmPollingThread: node lc1n3 (3) at 75% heartbeat fatal, removal in 6.930 seconds
<<..>>
2012-08-10 09:03:31.500: [ CSSD][1101441344](:CSSNM00008:)clssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 1 nodes with leader 2, lc1n2, is smaller than cohort of 2 nodes led by node 1, lc1n1, based on map type 2
2012-08-10 09:03:31.501: [ CSSD][1101441344]###################################
2012-08-10 09:03:31.501: [ CSSD][1101441344]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread
2012-08-10 09:03:31.501: [ CSSD][1101441344]###################################
2012-08-10 09:03:31.501: [ CSSD][1101441344](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
CHANGES
A cable was unplugged from one of the interfaces used as redundant cluster interconnect NIC.
CAUSE
The failure is caused by 2 factors:
1. all NICs are in the same subnet
2. the cable was unplugged from the 1st interface (for that subnet) in the routing table
The clusterware actually detects that the cable has been unplugged, marks the interface as failed and moves the HAIP VIPs for this interface to the 2nd NIC, the entry for the failing NIC remains in the routing table and all traffic to that subnet will continue to use the non-functional interface. This will not affect HAIP, but the network communication between the CSS daemons will fail because it does not use HAIP.
Note: In a 2 node cluster always the node with the higher node number will get evicted, regardless from which node the cable was unplugged.
Example:
Routing table before the cable is unplugged from the 1st NIC (here eth1) per "netstat -rn" command:
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.5.0 * 255.255.255.0 U 0 0 0 eth3
192.168.56.0 * 255.255.255.0 U 0 0 0 eth0
192.168.57.0 * 255.255.255.0 U 0 0 0 eth1
192.168.57.0 * 255.255.255.0 U 0 0 0 eth2
169.254.128.0 * 255.255.128.0 U 0 0 0 eth2
169.254.0.0 * 255.255.128.0 U 0 0 0 eth1
default 10.0.5.2 0.0.0.0 UG 0 0 0 eth3
Routing table after the the cable is unplugged (note that HAIP 169.254.0.0 has moved to eth2):
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.5.0 * 255.255.255.0 U 0 0 0 eth3
192.168.56.0 * 255.255.255.0 U 0 0 0 eth0
192.168.57.0 * 255.255.255.0 U 0 0 0 eth1
192.168.57.0 * 255.255.255.0 U 0 0 0 eth2
169.254.128.0 * 255.255.128.0 U 0 0 0 eth2
169.254.0.0 * 255.255.128.0 U 0 0 0 eth2
default 10.0.5.2 0.0.0.0 UG 0 0 0 eth3
SOLUTION
There are basically 2 solutions to this problem:
A. Put each set on private interfaces into separate subnet
Configuring multiple NICs for the cluster interconnect (upto 4 possible) not only provides redundancy but also allows database instances to make use of the bandwidth provided by additional network interfaces (via HAIP note 1210883.1). To make use of this feature, each of the (set of) NICs used for the cluster interconnect need to be placed into separate subnets. The number of subnets depends on the number of network interfaces.
For example, consider the following current configuration with all NICs in subnet 192.168.57.0/255.255.255.0:
Node 1 Node 2
subnet 1 ((192.168.57.0): eth1 (192.168.57.21) -- [network switch 1] -- eth1 (192.168.57.22)
|
subnet 2 ((192.168.57.0): eth2 (192.168.57.31) -- [network switch 2] -- eth2 (192.168.57.32)
which would need to be changed (e.g.) to have eth1 in subnet 192.168.57.0/255.255.255.0 and eth2 in subnet 192.168.58.0/255.255.255.0:
Node 1 Node 2
subnet 1 ((192.168.57.0): eth1 (192.168.57.21) -- [network switch 1] -- eth1 (192.168.57.22)
subnet 2 ((192.168.58.0): eth2 (192.168.58.21) -- [network switch 2] -- eth2 (192.168.58.22)
Parts of the transition to different subnets can be done without outage (steps 1-3), however to complete the transition (step 4) a short but complete cluster outage cannot be avoided.
1. Remove one of the cluster interconnect NICs from the clusterware using:
$GRID_HOME/bin/oifcfg delif -global eth2
Please note that the clusterware will move the HAIP VIP for this interface to the other NIC so running ASM and database instance will be unaffected.
2. Move the 'eth2' NIC to a different subnet.
This will involve taking down the interface (here eth2) on all nodes, changing the IP addresses to the new subnet, possibly connecting the NICs to different network switches and/or reconfiguring VLANs.
3. Once the subnet change is complete, add the NIC back to the clusterware using the new subnet:
$GRID_HOME/bin/oifcfg setif -global eth2/192.168.58.0:cluster_interconnect
4. For the clusterware to start using the eth2 NIC (again), the clusterware first needs to be shutdown on *all* nodes (this cannot be done in a rolling fashion) and then restarted.
B. Use NIC redundancy provided by the operating system (NIC bonding/trunking, Solaris IPMP etc.)
While this solution will provide NIC redundancy it will not allow the RAC database instances to make use of the additional network bandwidth.