In this Document
Purpose |
Details |
Bug 10332426 - HAIP fails to start due to network mismatch |
Bug 19270660 - AIX: category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8 |
Bug 16445624 - AIX: HAIP fails to start |
Bug 13989181 - AIX: HAIP fails to start with: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0 |
Note 1447517.1 - AIX: HAIP fails to start if bpf and other devices using same major/minor number |
Bug 10253028 - "oifcfg iflist -p -n" not showing HAIP on AIX as expected |
Bug 13332363 - Wrong MTU for HAIP on Solaris |
Bug 10114953 - only one HAIP is create on HP-UX |
Bug 10363902 - HAIP Infiniband support for Linux and Solaris |
Bug 10357258 - Many HAIP started on Solaris IPMP - not affecting 11.2.0.3 |
Bug 10397652/ 12767231 - HAIP not failing over when private network fails - not affecting 11.2.0.3 |
Bug 11077756 - allow root script to continue upon HAIP failure |
Bug 12546712 - not affecting 11.2.0.3 |
HAIP fails to start if default gateway is configured for VLAN for private network on network switch |
Bug 12425730 - HAIP does not start, 11.2.0.3 not affected |
ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 |
11gR2 GI HAIP Resource Not Created in Solaris 11 if IPMP is Used for Private Network |
References |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.2 and laterInformation in this document applies to any platform.
PURPOSE
This document lists knowns HAIP issues in 11gR2/12c Grid Infrastructure. Refer to note 1210883.1 for explanation of HAIP feature.
DETAILS
Bug 10332426 - HAIP fails to start due to network mismatch
Issue: HAIP fails to start while running rootupgrade.sh
Symptom:
- Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start"
encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
- $GRID_HOME/log/<hostname>/gipcd/gipcd.log
2010-12-12 09:41:35.201: [ CLSINET][1088543040] Returning NETDATA: 0 interfaces
2010-12-12 09:41:40.201: [ CLSINET][1088543040] Returning NETDATA: 0 interfaces
Solution:
The cause is mismatch of private network information in OCR and on OS, output of the following should be consistent with each other regarding network adapter name, subnet and netmask - see note 1296579.1 for what to check.
oifcfg iflist -p -n
oifcfg getif
ifconfig
Bug 19270660 - AIX: category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8
Issue: HAIP fails to start on AIX
Symptom:
- $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log
2014-07-21 16:38:59.240: [ USRTHRD][4372]{0:0:2} failed to create arp
2014-07-21 16:38:59.240: [ USRTHRD][4115]{0:0:2} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8
Solution/Workaround:
bug 19270660 is fixed in 12.1.0.2, apply interim patch 19270660 if the issue is encountered.
Bug 16445624 - AIX: HAIP fails to start
Issue: HAIP fails to start if root script (root.sh or rootupgrade.sh) is executed via sudo (not as root user directly) or if bpf device is not functionin properly
Symptom:
- Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
- $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} failed to create arp
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} (null) category: -2, operation: ioctl, loc: bpfopen:2,os, OS error: 14, other:
OR
2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other:
OR
2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 22, other:
OR
2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 22, other:
OR2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2
OR
Various other OS error code can be seen as well
Solution/Workaround:
It's known on AIX and Solaris that command executed via sudo etc may not have full root environment, which could cause HAIP startup failure.
The solution is to obtain and apply patch 16445624 on AIX.
The workaround is to execute root script (root.sh or rootupgrade.sh) as real root user directly.
If root script already failed, try one or all of the following:
- reboot the node
- execute "/usr/sbin/tcpdump -D" as root user, if the timestamp of the bpf device didn't get updated, delete the device and re-run the same "tcpdump -D" command
Before re-running root script, verify whether the following exists and the timestamp is updated
cr-------- 1 root system 42, 0 Oct 03 10:32 /dev/bpf0
..
Bug 13989181 - AIX: HAIP fails to start with: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0
Duplicate Bug 14358011
Issue: HAIP fails to start on AIX
Symptom:
- $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log
2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} failed to create arp
2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2
...
2012-04-21 17:12:41.086: [ora.cluster_interconnect.haip][3343] {0:0:2} [start] Start of HAIP aborted
2012-04-21 17:12:41.086: [ AGENT][3343] {0:0:2} UserErrorException: Locale is
2012-04-21 17:12:41.087: [ora.cluster_interconnect.haip][3343] {0:0:2} [start] clsnUtils::error Exception type=2 string=CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error: Start action for HAIP aborted
Solution/Workaround:
bug 13989181 is fixed in 11.2.0.4, apply interim patch 13989181 if the issue is encountered.
Note 1447517.1 - AIX: HAIP fails to start if bpf and other devices using same major/minor number
Issue: HAIP fails to start on AIX as other system devices using same major/minor number as bpf devices
orarootagent_root.log shows: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 22, other: dev /dev/bpf0, ifr en15
The solution is to ensure no other device is using same major/minor as bpf device, refer to note 1447517.1 for more details.
Bug 10253028 - "oifcfg iflist -p -n" not showing HAIP on AIX as expected
Issue: "oifcfg iflist -p -n" not showing HAIP on AIX
Fixed in: Expected behaviour on AIX
Symptom:
- "oifcfg getif" output
en12 10.0.1.0 global public
en13 10.1.1.0 global cluster_interconnect
- "ifconfig -a" output
en13: flags=5e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
inet 10.1.1.143 netmask 0xffffff00 broadcast 10.1.1.255
inet 169.254.228.154 netmask 0xffff0000 broadcast 169.254.255.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
..
Note HAIP exists
- v$cluster_interconnects
SQL> select * from gv$cluster_interconnects;
INST_ID NAME IP_ADDRESS IS_ SOURCE
---------- --------------- ---------------- ---
1 en13 169.254.228.154 NO
2 en13 169.254.55.162 NO
- "oifcfg iflist -p -n" output
en12 10.0.1.0 PUBLIC 255.255.255.0
en13 10.1.1.0 PUBLIC 255.255.255.0
Note usually we expect HAIP to be listed here as well, however it's not listed on AIX
Bug 13332363 - Wrong MTU for HAIP on Solaris
Issue: Wrong MTU size for HAIP on Solaris, refer to note 1290585.1 for more details.
Fixed in: 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU1, 11.2.0.4
Bug 10114953 - only one HAIP is create on HP-UX
Issue: Only one HAIP created on HP-UX
2013-05-29 17:21:31.280: [ USRTHRD][29499] {0:0:56578} Arp::sCreateSocket {
2013-05-29 17:21:31.280: [ USRTHRD][29499] {0:0:56578} failed to create arp
2013-05-29 17:21:31.281: [ USRTHRD][29499] {0:0:56578} (null) category: -2, operation: ssclsi_dlpi_request, loc: dlpireq:8,na, OS error: 4, other:
The bug is fixed in 11.2.0.4, patch 10114953 is required before 11.2.0.4 is released.
OS kernel parameter dlpi_max_ub_promisc must be set to greater than 1 for the patch to be effective.
To find out value of dlpi_max_ub_promisc: kctune -v dlpi_max_ub_promisc
Refer to bug 15940367
Bug 10363902 - HAIP Infiniband support for Linux and Solaris
Issue: GIPC HA disabled or HAIP fails to start if cluster interconnect is Infiniband or any other network hardware that has hardware address (MAC) longer than 6 bytes
Fixed in: 11.2.0.3 for Linux and Solaris
Symptom:
- Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start"
encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
- $GRID_HOME/log/<hostname>/gipcd/gipcd.log
2010-12-07 13:23:08.560: [ USRTHRD][3858] {0:0:62} Arp::sCreateSocket {
2010-12-07 13:23:08.560: [ USRTHRD][3858] {0:0:62} failed to create arp
2010-12-07 13:23:08.561: [ USRTHRD][3858] {0:0:62} (null) category: -2,
operation: ssclsi_aix_get_phys_addr, loc: aixgetpa:4,n, OS error: 2, other:
Bug 10357258 - Many HAIP started on Solaris IPMP - not affecting 11.2.0.3
Issue: many HAIP created after active NIC fails in IPMP
Fixed in: 11.2.0.3, 11.2.0.2 GI PSU3, interim patch 10357258 exists for 11.2.0.2, patch 11865154 for 11.2.0.2.1, affects Solaris only
Symptom:
- ifconfig output:
nxge3:2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 5
inet 169.254.20.88 netmask ffff0000 broadcast 169.254.255.255
nxge3:3: flags=21000842<BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500 index 5
inet 169.254.20.88 netmask ffff0000 broadcast 169.254.255.255
..
Note the same HAIP shows up multiple times
Bug 10397652/ 12767231 - HAIP not failing over when private network fails - not affecting 11.2.0.3
Issue: HAIP does not failover even when private network experiences problem (i.e. switch port disabled or such) as OS is not providing reliable link information
Fixed in: Bug 12767231 is fixed in 11.2.0.2 GI PSU4, 11.2.0.3
Workaround on AIX is to set "MONITOR" flag for all private network adapters
# ifconfig en1 monitor
# ifconfig en1
en1: flags=5e080863,2c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,
GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN, MONITOR>
inet 192.168.10.83 netmask 0xfffffc00 broadcast 192.168.11.255
inet 169.254.74.136 netmask 0xffff8000 broadcast 169.254.127.255
tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
Bug 11077756 - allow root script to continue upon HAIP failure
Issue: Startup failure of HAIP fails root script, fix of the bug will allow root script to continue so HAIP issue can be worked later.
Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and above
Note: the consequence is that HAIP will be disabled. Once the cause is identified and solution is implemented, HAIP needs to be enabled when there's an outage window. To enable, as root on ALL nodes:
# $GRID_HOME/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=1" -init
# $GRID_HOME/bin/crsctl stop crs
# $GRID_HOME/bin/crsctl start crs
Bug 12546712 - not affecting 11.2.0.3
Issue: ASM crashes as HAIP does not fail over when two or more private network fails , refer to note 1323995.1 for more details.
HAIP fails to start if default gateway is configured for VLAN for private network on network switch
Issue: HAIP fails to start if default gateway is configured for VLAN for private network on network switch
orarootagent_root.log shows: PROBE: conflict detected src { 169.254.12.247, <gateway MAC on switch> }, target { 0.0.0.0, <private NIC MAC> }
The solution is to remove default gateway setting on network switch for private network (VLAN), refer to Note 1366211.1 for more details.
Bug 12425730 - HAIP does not start, 11.2.0.3 not affected
Issue: HAIP fails to start, gipcd.log shows rank 0 or "-1" for private network
Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and onward, refer to note 1374360.1 for details.
ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481
HAIP not running could affect instance start. Refer Note 1383737.1 for details
11gR2 GI HAIP Resource Not Created in Solaris 11 if IPMP is Used for Private Network
HAIP will not be enabled on Solaris 11 if IPMP is configured for private network. This is by design. Refer to Note 1512141.1