AIX 11204 root.sh 无法启动haip

os:AIX 5100-12-09

RAC DB:  11204  


在gi 安装最后一步 ,在节点一运行 root.sh 时 发现节点1的haip 无法启动;

检查 orarootagent_root.log  日志 

[1] + Stopped (SIGTSTP)        vi /oracle/gi/11.2.4/log/server7/agent/ohasd/orarootagent_root/orarootagent_root.log


2016-06-14 14:51:50.692: [ USRTHRD][3086]{0:0:77} Waiting for HAIP work thread to cleanup ARP

2016-06-14 14:51:52.692: [ USRTHRD][3086]{0:0:77} timeout to wait thread to cleanup ARP
2016-06-14 14:51:52.692: [ USRTHRD][3086]{0:0:77} Thread:[NetHAWork]start {
2016-06-14 14:51:52.692: [ USRTHRD][3086]{0:0:77} Thread:[NetHAWork]start }
2016-06-14 14:51:52.692: [ USRTHRD][3857]{0:0:77} [NetHAWork] thread started
2016-06-14 14:51:52.693: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:52.693: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:52.693: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err bpf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:52.693: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:52.694: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:52.694: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:52.697: [ USRTHRD][3086]{0:0:77} use all detected INF
2016-06-14 14:51:52.793: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:52.794: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:52.794: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:52.894: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:52.895: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:52.895: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:52.994: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:52.994: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:52.994: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:53.094: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:53.094: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:53.094: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:53.193: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:53.194: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:53.194: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:53.294: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:53.295: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:53.295: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:53.394: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:53.394: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:53.394: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:53.493: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:53.494: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:53.494: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:53.593: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:53.594: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:53.594: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:53.694: [ USRTHRD][3857]{0:0:77}  Arp::sCreateSocket {
2016-06-14 14:51:53.694: [ USRTHRD][3857]{0:0:77} failed to create arp
2016-06-14 14:51:53.695: [ USRTHRD][3857]{0:0:77} (null) category: -2, operation: pcap_open_live, loc: bpfopen:0,os, OS error: 9, other: err b
pf_load: can't stat /dev/bpf0: Socket is not connected, ifname en2
2016-06-14 14:51:53.695: [ USRTHRD][3857]{0:0:77} (:CLSN00130:) category: 1, operation: ^R^羐? loc: ^蜡? OS error: 0, other: ^P
2016-06-14 14:51:53.695: [ USRTHRD][3857]{0:0:77} [NetHAWork] thread hit exception Agent failed to initialize which is required for HAIP proce
ssing
2016-06-14 14:51:53.695: [ USRTHRD][3857]{0:0:77} [NetHAWork] thread stopping
2016-06-14 14:51:53.695: [ USRTHRD][3857]{0:0:77} Thread:[NetHAWork]isRunning is reset to false here
2016-06-14 14:51:54.697: [ USRTHRD][3086]{0:0:77} use all detected INF
2016-06-14 14:51:54.698: [ USRTHRD][3086]{0:0:77} HAIP:  Moving ip '' from inf 'en2' to inf 'en2'
 

关于bpf:

6.过滤数据包
我们抓到的数据包往往很多,如何过滤掉我们不感兴趣的数据包呢?
几乎所有的操作系统(BSD, AIX, Mac OS, Linux等)都会在内核中提供过滤数据包的方法,主要都是基于BSD Packet Filter(BPF)结构的。libpcap利用BPF来过滤数据包。


MOS 信息查找:

Bug 16445624 - AIX: HAIP fails to start

Issue: HAIP fails to start if rootscript (root.sh or rootupgrade.sh) is executed via sudo (not as root userdirectly) or if bpf device is not functionin properly


Symptom:

  • Output of root script:

CRS-2672: Attempting to start'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start"encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed

  • $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log

2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} failedto create arp
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} (null) category: -2,operation: ioctl, loc: bpfopen:2,os, OS error: 14, other:

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation:open, loc: bpfopen:1,os, OS error: 2, other: 

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2,operation: open, loc: bpfopen:1,os, OS error: 22, other: 

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2,operation: open, loc: bpfopen:1,os, OS error: 22, other: 

OR

 

2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null)category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev/dev/bpf0, ifr en2  

OR 


Various other OSerror code can be seen as well

Solution/Workaround:

It's known on AIX and Solaris that command executed via sudo etc may not have full rootenvironment, which could cause HAIP startup failure. 

The solution is to obtain and apply patch 16445624 on AIX.

The workaround is to execute rootscript (root.sh or rootupgrade.sh) as real root user directly.

If root script already failed, try oneor all of the following:

 - reboot the node

 - execute"/usr/sbin/tcpdump -D" as root user, if the timestamp of the bpfdevice didn't get updated, delete the device and re-run the same "tcpdump-D" command

Before re-running root script, verifywhether the following exists and the timestamp is updated

l

s -ltr /dev/bpf*
cr--------   1 root    system       42,  0 Oct 03 10:32 /dev/bpf0


但是root.sh 脚本却是运行为成功,但是发现节点1私网网卡上没有 生成169.254.XX.XX 网段的IP地址;

使用 tcpdump 命令可以在系统上 /dev/目录下生成 bpf* 设备。

虽然节点1,root.sh 运行有一个失败的问题点;但是最重要的脚本rootcrs.pl  的运行日志了全部都是成功的为唯独无法启动haip 资源这个命令没有执行成功;

但是这个不碍事:

/oracle/gi/11.2.4/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=1" -init
/oracle/gi/11.2.4/bin/crsctl start resource ora.cluster_interconnect.haip -init                                                                 


处理方法:

节点1,2上去分别执行 usr/sbin/tcpdump -D 命令;

节点一手动启动 haip 资源;

然后再在节点2运行root.sh 脚本;


其实还有另外一个办法,这个问题其实是oracle haip的一个bug 导致,那就禁用haip不就没这个问题了么;

就算节点2root.sh脚本执行失败,手动修改节点1,2 的haip 启用属性:

/oracle/gi/11.2.4/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init

然后在运行root.sh脚本,gi 各个节点间的通信还是可以不通过haip来完成;gi 任然可以正常运行;




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值