oracle 4535,Oracle 11.2.0.4 禁用HAIP

现象:

节点宕掉后,无法重启动,需拨心跳网卡几次,方能自启动,初步判定为由于HAIP莫名故障,导致一个节点无法启动CRS

1 检查网络

[grid@gmdb1 trace]$ oifcfg iflist -p -n

bond0 22.1.32.0 UNKNOWN 255.255.254.0

bond1 1.255.255.0 UNKNOWN 255.255.255.0

bond1 169.254.0.0 UNKNOWN 255.255.0.0

2 检查CRS

[root@gmdb2 tmp]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

3 检查ASM和HAIP无法启动:

[root@gmdb2 tmp]# crsctl stat res -t -init

NAME TARGET STATE SERVER STATE_DETAILS Cluster Resources

ora.asm 1 ONLINE OFFLINE

ora.cluster_interconnect.haip 1 ONLINE OFFLINE

4 用mcaasttest.pl检查,并无问题:

[grid@gmdb2 mcasttest]$ perl mcasttest.pl -n gmdb2,gmdb1 -i bond0,bond1

########### Setup for node gmdb2 ##########

Checking node access 'gmdb2'

Checking node login 'gmdb2'

Checking/Creating Directory /tmp/mcasttest for binary on node 'gmdb2'

Distributing mcast2 binary to node 'gmdb2'

########### Setup for node gmdb1 ##########

Checking node access 'gmdb1'

Checking node login 'gmdb1'

Checking/Creating Directory /tmp/mcasttest for binary on node 'gmdb1'

Distributing mcast2 binary to node 'gmdb1'

########### testing Multicast on all nodes ##########

Test for Multicast address 230.0.1.0

11月 28 16:42:02 | Multicast Succeeded for bond0 using address 230.0.1.0:42000

11月 28 16:42:03 | Multicast Succeeded for bond1 using address 230.0.1.0:42001

Test for Multicast address 224.0.0.251

11月 28 16:42:04 | Multicast Succeeded for bond0 using address 224.0.0.251:42002

11月 28 16:42:05 | Multicast Succeeded for bond1 using address 224.0.0.251:42003

5 检查CSSD.LOG

2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: begin on node(2), waittime 193000

2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: set curtime (1040905644) for my node

2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: scanning 32 nodes

2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: Node gmdb1, number 1, is in an existing cluster with disk state 3

2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk

2017-11-28 11:48:02.808: [ CSSD][2358462208]clssnmvDHBValidateNcopy: node 1, gmdb1, has a disk HB, but no network HB, DHB has rcfg 405549564, wrtcnt, 39931581, LATS 1040905654, lastSeqNo 39931578, uniqueness 1510056501, timestamp 1511840882/1783220964

2017-11-28 11:48:03.287: [ CSSD][2144298752]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2017-11-28 11:48:03.782: [ CSSD][2363209472]clssnmvDHBValidateNcopy: node 1, gmdb1, has a disk HB, but no network HB, DHB has rcfg 405549564, wrtcnt, 39931583, LATS 1040906624,

日志中有大量的无网络心跳的记录;

检查

SQL> select * from v$cluster_interconnects;

NAME IPADDRESS IS SOURCE

eth1:1 169.254.134.65 NO

发现走的HAIP,而本地的HAIP无法启动,导致CSSD启动不起来;检查CSSD的依赖关系:

[root@12crac2 ~]# crsctl stat res ora.cluster_interconnect.haip -init -f

NAME=ora.cluster_interconnect.haip

TYPE=ora.haip.type

STATE=OFFLINE

TARGET=ONLINE

ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x

ACTION_FAILURE_TEMPLATE=

ACTION_SCRIPT=

ACTIVE_PLACEMENT=0

AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%

AUTO_START=always

CARDINALITY=1

CARDINALITY_ID=0

CHECK_INTERVAL=30

CREATION_SEED=15

DEFAULT_TEMPLATE=

DEGREE=1

DESCRIPTION="Resource type for a Highly Available network IP"

ENABLED=0

FAILOVER_DELAY=0

FAILURE_INTERVAL=0

FAILURE_THRESHOLD=0

HOSTING_MEMBERS=

ID=ora.cluster_interconnect.haip

LOAD=1

LOGGING_LEVEL=1

NOT_RESTARTING_TEMPLATE=

OFFLINE_CHECK_INTERVAL=0

PLACEMENT=balanced

PROFILE_CHANGE_TEMPLATE=

RESTART_ATTEMPTS=5

SCRIPT_TIMEOUT=60

SERVER_POOLS=

START_DEPENDENCIES=hard(ora.gpnpd,ora.cssd)pullup(ora.cssd)

临时解决办法:

在确定心跳网络无法的情况下

f74401bb686495fc8352672a813f667c.png禁用HAIP:

crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init

crsctl modify res ora.asm -attr "START_DEPENDENCIES='hard(ora.cssd,ora.ctssd)pullup(ora.cssd,ora.ctssd)weak(ora.drivers.acfs)', STOP_DEPENDENCIES='hard(intermediate:ora.cssd)' " -init

修改完成后,再次检查:

36152963731ad19de8d2c28ffd16a933.png

相关文章:MOS上

Known Issues: Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (文档 ID 1640865.1)

MOS上关于HAIP的BUG

1927498711918ab9d1427a5a74b6add5.png

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值