为RAC私有网络配置网卡Bonding

RAC的安装部署过程中,并不仅仅是简单的安装完成了事,整个安装过程要考虑可能出现的单点问题,其中比较重要的是私有网络。

私有网络是RAC节点间通信的通道,包括节点间的网络心跳信息、Cache fusion传递数据块都需要通过私有网络。而很多的私有网络都仅仅是一块单独的网卡连接上交换机就完成了,更有甚者,直接使用服务器间网卡互连的方式配置私有网络。这种部署方式简单,但RAC投入使用后风险非常大,存在诸多单点如网卡、网线、交换机口、交换机。几乎每个组件发生故障都会导致RAC split,所以建议为私有网络配置双网卡bonding

下面是我的配置步骤:

环境:

OSCentOS release 6.4 (Final)

Oracle11.2.0.4 RAC

网卡:4 em1,em2,em3,em4,当前em1作为公有网卡,em3作为私有网卡已经启用了,em2em4闲置。

配置bond模块并加载(在2个节点执行):

编辑/etc/modprobe.d/bonding.conf加入内容:

 [root@node2 ~]# vi /etc/modprobe.d/bonding.conf

alias bond0 bonding

[root@node2 ~]# modprobe -a bond0

验证:

[root@node2 ~]#  lsmod  |grep bond

bonding               127331  0

8021q                  25317  1 bonding

ipv6                  321422  274 bonding,ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6

 

编辑网卡配置文件,编辑成如下内容:

节点一:

Ifcfg-em2:

DEVICE=em2

BOOTPROTO=none

ONBOOT=yes

MASTER=bond0

SLAVE=yes

 

Ifcfg-em4:

DEVICE=em4

BOOTPROTO=none

ONBOOT=yes

MASTER=bond0

SLAVE=yes

 

Ifcfg-bond0:

DEVICE=bond0

MASTER=yes

BOOTPROTO=node

ONBOOT=yes

BONDING_OPTS="mode=1 miimon=100"

IPADDR=10.10.10.105

PREFIX=24

GATEWAY=10.10.10.1

 

节点二:

ifcfg-em2:

DEVICE=em2

BOOTPROTO=none

ONBOOT=yes

MASTER=bond0

SLAVE=yes

 

ifcfg-em4:

DEVICE=em4

BOOTPROTO=none

ONBOOT=yes

MASTER=bond0

SLAVE=yes

 

Ifcfg-bond0:

DEVICE=bond0

MASTER=yes

BOOTPROTO=node

ONBOOT=yes

BONDING_OPTS="mode=1 miimon=100"

IPADDR=10.10.10.106

PREFIX=24

GATEWAY=10.10.10.1

我这里使用的是mode=1的主备网卡模式,平时只激活一块网卡,一旦主网卡发生故障,会切换链路到备网卡,其他也可以考虑4,6两种mode

修改完了配置文件之后,分别在2个节点启动bond0ifup bond0

此时可以看到:

[root@node1 ~]# ifconfig

bond0     Link encap:Ethernet  HWaddr C8:1F:66:FB:6F:CB 

          inet addr:10.10.10.105  Bcast:10.10.10.255  Mask:255.255.255.0

          inet6 addr: fe80::ca1f:66ff:fefb:6fcb/64 Scope:Link

          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

          RX packets:9844809 errors:0 dropped:0 overruns:0 frame:0

          TX packets:7731078 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:9097132073 (8.4 GiB)  TX bytes:6133004979 (5.7 GiB)

em2       Link encap:Ethernet  HWaddr C8:1F:66:FB:6F:CB 

          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

          RX packets:9792915 errors:0 dropped:0 overruns:0 frame:0

          TX packets:7731078 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:9088278883 (8.4 GiB)  TX bytes:6133004979 (5.7 GiB)

          Interrupt:38

 

em4       Link encap:Ethernet  HWaddr C8:1F:66:FB:6F:CB 

          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1

          RX packets:51894 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:8853190 (8.4 MiB)  TX bytes:0 (0.0 b)

          Interrupt:36

网卡的bonding已经配置成功了。

测试验证

此时可以测试分别断掉em2 em4,在一个节点长ping另一个节点的私有IP,并结合/proc/net/bonding/bond0的信息观察primary slave的变化,可以发现当down一个网卡时ping不会中断。

Bond0配置好之后,接下来一步就是把配置成RAC的私有网卡。

为了避免配置失败,首先要备份好原来的配置文件。

grid用户在2个节点对$GRID_HOME/ grid/gpnp/noden/profiles/peer/profile.xml文件执行备份:

 cd /u01/app/11.2.0/grid/gpnp/noden/profiles/peer

 cp  profile.xml  profile.xml.bk

[root@node2 peer]# ls

pending.xml  profile_orig.xml  profile.xml  profile.xml.bk

查看目前的私有网络配置:

node2-> oifcfg getif

em1  192.168.10.0  global  public

em3  10.10.10.0  global  cluster_interconnect

先添加新的私有网络,在任一节点执行即可:

node1-> oifcfg setif -global bond0/10.10.10.0:cluster_interconnect

这一步在执行时可能会报错:

node1-> oifcfg setif -global bond0/10.10.10.0:cluster_interconnect

PRIF-33: Failed to set or delete interface because hosts could not be discovered

  CRS-02307: No GPnP services on requested remote hosts.

PRIF-32: Error in checking for profile availability for host node2

  CRS-02306: GPnP service on host "node2" not found.

 

这是因为gpnpd服务异常导致的。

解决方法:可以Killgpnpd进程,GI会自动重新启动gpnpd服务。

2个节点执行:

[root@node2 ~]# ps -ef| grep gpnp

grid      4927     1  0 Sep22 ?        00:26:38 /u01/app/11.2.0/grid/bin/gpnpd.bin

grid     48568 46762  0 17:26 pts/3    00:00:00 tail -f /u01/app/11.2.0/grid/log/node2/gpnpd/gpnpd.log

root     48648 48623  0 17:26 pts/4    00:00:00 grep gpnp

[root@node2 ~]# kill -9 4927

[root@node2 ~]#

参考gpnpd.log

添加私有网络之后,我们按照如下步骤将原来的私有网络删除:

首先停止并disablecrs

root用户在2个节点分别执行以下命令:

停止crs

crsctl stop crs

禁用crs

crsctl disable crs

 

 

修改hosts文件,将私有IP地址改为新地址。

2个节点分别执行:

 ping node1-priv

 ping node2-priv

 

 再启动crs

 [root@node2 ~]# crsctl enable crs

CRS-4622: Oracle High Availability Services autostart is enabled.

[root@node2 ~]# crsctl start crs

删除原来的私有网络:

node2-> oifcfg delif -global em3/10.10.10.0:cluster_interconnect

检查验证,配置成功了。

node2-> oifcfg getif

em1  192.168.10.0  global  public

bond0  10.10.10.0  global  cluster_interconnect

node2->

 

下面做一个测试验证下bonding的效果:

ifdownem2,此时messages会出现日志信息:

Oct 25 22:00:32 node1 kernel: bonding: bond0: Removing slave em2

Oct 25 22:00:32 node1 kernel: bonding: bond0: Warning: the permanent HWaddr of em2 - c8:1f:66:fb:6f:cb - is still in use by bond0. Set the HWaddr of em2 to a different address to avoid conflicts.

Oct 25 22:00:32 node1 kernel: bonding: bond0: releasing active interface em2

Oct 25 22:00:32 node1 kernel: bonding: bond0: making interface em4 the new active one.

此时bond0自动切换到em4上,所以这个时候ping私有IP是没有问题的。

查看/proc/net/bonding/bond0发现activeslave已经变成em4了:

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

 

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: em4

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

 

Slave Interface: em4

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 3

Permanent HW addr: c8:1f:66:fb:6f:cd

Slave queue ID: 0

[root@node1 ~]#

 

crsd.log ocssd.log中没有报错信息,css仍然每隔5秒发送一次网络心跳。

2014-10-25 22:00:32.975: [    CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes

2014-10-25 22:00:37.977: [    CSSD][656893696]clssnmSendingThread: sending status msg to all nodes

2014-10-25 22:00:37.977: [    CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes

2014-10-25 22:00:42.978: [    CSSD][656893696]clssnmSendingThread: sending status msg to all nodes

这说明bonding确实做到了保护私有网络的单点故障。

此时再downem4

[root@node1 ~]# ifdown em4

Em4down掉之后,私有IP无法从Node2ping通了,此时bond0down掉:

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

 

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: None

MII Status: down

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

 

此时messages信息:

Oct 25 22:02:23 node1 kernel: bonding: bond0: Removing slave em4

Oct 25 22:02:23 node1 kernel: bonding: bond0: releasing active interface em4

ocssd.log日志显示2秒后css已经检测到了私有网络故障:

2014-10-25 22:02:25.573: [GIPCHGEN][1744828160] gipchaInterfaceFail: marking interface failing 0x7f025c00c0a0 { host '', haName 'c617-7010-b72d-6c39', local (nil), ip '10.10.10.105:46469', subnet '10.10.10.0', mask '255.255.255.0', mac 'c8-1f-66-fb-6f-cb', ifname 'bond0', numRef 1, numFail 0, idxBoot 0, flags 0xd }

2014-10-25 22:02:25.661: [GIPCHGEN][1951459072] gipchaInterfaceFail: marking interface failing 0x7f025c023d90 { host 'node2', haName 'ba2c-9227-ca29-8a21', local 0x7f025c00c0a0, ip '10.10.10.106:32369', subnet '10.10.10.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 2, flags 0x6 }

2014-10-25 22:02:27.663: [GIPCHTHR][1951459072] gipchaWorkerCreateInterface: created remote interface for node 'node2', haName 'ba2c-9227-ca29-8a21', inf 'udp://10.10.10.106:32369'

并且发现本服务器的私有网络连不上了

2014-10-25 22:02:27.572: [GIPCHDEM][536868608] gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0xc99250 [0000000000000010] { gipchaContext : host 'node1', name 'CSS_node-cluster', luid 'e2a491a6-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

2014-10-25 22:02:31.012: [    CSSD][656893696]clssnmSendingThread: sending status msg to all nodes

2014-10-25 22:02:31.012: [GIPCHALO][669509376] gipchaLowerProcessNode: no valid interfaces found to node for 5530 ms, node 0x7f0c18065e40 { host 'node2', haName 'CSS_node-cluster', srcLuid e2a491a6-d8e24a48, dstLuid ebce4a7f-2b4e4348 numInf 0, contigSeq 317864, lastAck 317717, lastValidAck 317864, sendSeq [317718 : 317729], createTime 4294073210, sentRegister 1, localMonitor 1, flags 0x2408 }

2014-10-25 22:02:31.012: [    CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes

2014-10-25 22:02:33.573: [GIPCHDEM][536868608] gipchaDaemonInfRequest: sent local interfaceRequest,  hctx 0xc99250 [0000000000000010] { gipchaContext : host 'node1', name 'CSS_node-cluster', luid 'e2a491a6-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x63 } to gipcd

2014-10-25 22:02:36.013: [    CSSD][656893696]clssnmSendingThread: sending status msg to all nodes

2014-10-25 22:02:36.013: [    CSSD][656893696]clssnmSendingThread: sent 5 status msgs to all nodes

2014-10-25 22:02:37.014: [GIPCHALO][669509376] gipchaLowerProcessNode: no valid interfaces found to node for 11530 ms, node 0x7f0c18065e40 { host 'node2', haName 'CSS_node-cluster', srcLuid e2a491a6-d8e24a48, dstLuid ebce4a7f-2b4e4348 numInf 0, contigSeq 317864, lastAck 317717, lastValidAck 317864, sendSeq [317718 : 317741], createTime 4294073210, sentRegister 1, localMonitor 1, flags 0x2408 }

显示节点二的网络心跳失败,14.34秒之后会将节点二移出集群:

2014-10-25 22:02:39.010: [    CSSD][658470656]clssnmPollingThread: node node2 (2) at 50% heartbeat fatal, removal in 14.340 seconds

2014-10-25 22:02:39.010: [    CSSD][658470656]clssnmPollingThread: node node2 (2) is impending reconfig, flag 2228230, misstime 15660

网络心跳失败容忍多长时间由css中的Misscount决定,RAC默认配置是30秒:

node2-> crsctl get  css  misscount

CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.

 

此时开始报has a disk HB, but no network HB

2014-10-25 22:02:53.012: [    CSSD][664778496]clssnmvDHBValidateNcopy: node 2, node2, has a disk HB, but no network HB, DHB has rcfg 309527290, wrtcnt, 3048511, LATS 212362014, lastSeqNo 3048510, uniqueness 1414032467, timestamp 1414245773/212369384

2014-10-25 22:02:53.192: [    CSSD][667932416]clssnmvDiskPing: Writing with status 0x3, timestamp 1414245773/212362194

2014-10-25 22:02:53.352: [    CSSD][658470656]clssnmPollingThread: Removal started for node node2 (2), flags 0x220006, state 3, wt4c 0

2014-10-25 22:02:53.352: [    CSSD][658470656]clssnmMarkNodeForRemoval: node 2, node2 marked for removal

2014-10-25 22:02:53.352: [    CSSD][658470656]clssnmDiscHelper: node2, node(2) connection failed, endp (0x6b7), probe(0x7f0c00000000), ninf->endp 0x6b7

2014-10-25 22:02:53.352: [    CSSD][658470656]clssnmDiscHelper: node 2 clean up, endp (0x6b7), init state 5, cur state 5

2014-10-25 22:02:53.352: [GIPCXCPT][658470656] gipcInternalDissociate: obj 0x7f0bf00084c0 [00000000000006b7] { gipcEndpoint : localAddr 'gipcha://node1:c5bc-f486-c390-b48', remoteAddr 'gipcha://node2:nm2_node-cluster/b370-8934-efb4-3f2', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 1, wobj 0x7f0bf000a3f0, sendp (nil)flags 0x38606, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)

@

 

之后node2被踢出:

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmNeedConfReq: No configuration to change

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmDoSyncUpdate: Terminating node 2, node2, misstime(30000) state(5)

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmDoSyncUpdate: Wait for 0 vote ack(s)

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmCheckDskInfo: Checking disk info...

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmCheckSplit: Node 2, node2, is alive, DHB (1414245773, 212369384) more than disk timeout of 27000 after the last NHB (1414245743, 212339734)

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmCheckDskInfo: My cohort: 1

2014-10-25 22:02:53.353: [    CSSD][891287296]clssgmQueueGrockEvent: groupName(crs_version) count(3) master(0) event(2), incarn 3, mbrc 3, to member 2, events 0x0, state 0x0

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmRemove: Start

2014-10-25 22:02:53.353: [    CSSD][655316736](:CSSNM00007:)clssnmrRemoveNode: Evicting node 2, node2, from the cluster in incarnation 309527290, node birth incarnation 309527289, death incarnation 309527290, stateflags 0x224000 uniqueness value 1414032467

2014-10-25 22:02:53.353: [    CSSD][891287296]clssgmQueueGrockEvent: groupName(IGSSZGDBsszgdb) count(2) master(1) event(2), incarn 2, mbrc 2, to member 1, events 0x0, state 0x0

2014-10-25 22:02:53.353: [ default][655316736]kgzf_gen_node_reid2: generated reid cid=6d207e372096ef48ff1031c3298552d5,icin=309527289,nmn=2,lnid=309527289,gid=0,gin=0,gmn=0,umemid=0,opid=0,opsn=0,lvl=node hdr=0xfece0100

Node2被隔离:

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmrFenceSage: Fenced node node2, number 2, with EXADATA, handle 0

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmSendShutdown: req to node 2, kill time 212362354

2014-10-25 22:02:53.353: [    CSSD][891287296]clssgmQueueGrockEvent: groupName(CRF-) count(4) master(0) event(2), incarn 4, mbrc 4, to member 2, events 0x38, state 0x0

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmsendmsg: not connected to node 2

 

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmSendShutdown: Send to node 2 failed

2014-10-25 22:02:53.353: [    CSSD][655316736]clssnmWaitOnEvictions: Start

防止node2写数据文件:

2014-10-25 22:02:53.354: [    CSSD][664778496]clssnmvDiskEvict: Kill block write, file /dev/asm_datafile flags 0x00010004, kill block unique 1414032467, stamp 212362354/212362354

 

crsd.log显示原来运行的Node2上的资源被切换到node1上运行:

2014-10-25 22:03:00.946: [   CRSPE][870311680]{1:61167:7502} CRS-2676: Start of 'ora.node2.vip' on 'node1' succeeded

2014-10-25 22:03:00.946: [   CRSPE][870311680]{1:61167:7502} CRS-2676: Start of 'ora.node2.vip' on 'node1' succeeded

2014-10-25 22:03:02.358: [   CRSPE][870311680]{1:61167:7502} CRS-2676: Start of 'ora.LISTENER_SCAN2.lsnr' on 'node1' succeeded

 

并且此时Node2上的crs已经无法通信:

node2-> crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.

 

em4 up

ifup em4

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

 

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: None

MII Status: down

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

 

Slave Interface: em4

MII Status: down

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cd

Slave queue ID: 0

查看messages

Oct 25 22:03:43 node1 kernel: bonding: bond0: setting mode to active-backup (1).

Oct 25 22:03:43 node1 kernel: bonding: bond0: Setting MII monitoring interval to 100.

Oct 25 22:03:43 node1 kernel: bonding: bond0: Adding slave em4.

Oct 25 22:03:44 node1 kernel: bonding: bond0: enslaving em4 as a backup interface with a down link.

Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: Link is up at 1000 Mbps, full duplex

Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: Flow control is on for TX and on for RX

Oct 25 22:03:47 node1 kernel: tg3 0000:02:00.1: em4: EEE is disabled

手动启动em4之后,em4被自动设置为Slave Interface,因为之前bond0已经被down,所以需要手动启动bond0

ifup bond0

 

[root@node1 ~]# ifup bond0

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

 

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: em4

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

 

Slave Interface: em4

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cd

Slave queue ID: 0

 

Slave Interface: em2

MII Status: down

Speed: Unknown

Duplex: Unknown

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cb

Slave queue ID: 0

此时messages显示bond0已经就绪:

Oct 25 22:05:25 node1 kernel: bonding: bond0: Adding slave em2.

Oct 25 22:05:26 node1 kernel: bonding: bond0: enslaving em2 as a backup interface with a down link.

Oct 25 22:05:26 node1 kernel: ADDRCONF(NETDEV_UP): bond0: link is not ready

Oct 25 22:05:26 node1 kernel: 8021q: adding VLAN 0 to HW filter on device bond0

Oct 25 22:05:26 node1 kernel: bond0: link status definitely up for interface em4, 1000 Mbps full duplex.

Oct 25 22:05:26 node1 kernel: bonding: bond0: making interface em4 the new active one.

Oct 25 22:05:26 node1 kernel: bonding: bond0: first active interface up!

Oct 25 22:05:26 node1 kernel: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

Ocssd.log也显示可以连接到node2的信息:

2014-10-25 22:05:28.026: [GIPCHGEN][536868608] gipchaNodeAddInterface: adding interface information for inf 0x7f0c14222be0 { host '', haName 'CSS_node-cluster', local (nil), ip '10.10.10.105', subnet '10.10.10.0', mask '255.255.255.0', mac 'c8-1f-66-fb-6f-cd', ifname 'bond0', numRef 0, numFail 0, idxBoot 0, flags 0x1841 }

2014-10-25 22:05:28.254: [GIPCHTHR][669509376] gipchaWorkerCreateInterface: created local interface for node 'node1', haName 'CSS_node-cluster', inf 'udp://10.10.10.105:60625'

2014-10-25 22:05:28.255: [GIPCHTHR][669509376] gipchaWorkerCreateInterface: created local bootstrap multicast interface for node 'node1', haName 'CSS_node-cluster', inf 'mcast://224.0.0.251:42424/10.10.10.105'

2014-10-25 22:05:30.560: [    CSSD][653739776]clssnmSendConnAck: connected to node 2, node2, con (0xa111d5), state 0

2014-10-25 22:05:30.560: [    CSSD][653739776]clssnmCompleteConnProtocol: node node2, 2, uniqueness 1414245779, msg uniqueness 1414245779, endp 0xa111d5 probendp 0x7f0c00000000 endp 0xa111d5

2014-10-25 22:05:31.465: [    CSSD][653739776]clssnmHandleJoin: node 2 JOINING, state 0->1 ninfendp 0x7f0c00a111d5

2014-10-25 22:05:31.940: [    CSSD][664778496]clssnmvReadDskHeartbeat: Reading DHBs to get the latest info for node(2/node2), LATSvalid(0), nodeInfoDHB uniqueness(1414032467)

2014-10-25 22:05:31.940: [    CSSD][664778496]clssnmvDHBValidateNcopy: Setting LATS valid due to uniqueness change for node(node2) number(2), nodeInfoDHB(1414032467), readInfo(1414245779)

2014-10-25 22:05:31.940: [    CSSD][664778496]clssnmvDHBValidateNcopy: Saving DHB uniqueness for node node2, number 2 latestInfo(1414245779), readInfo(1414245779), nodeInfoDHB(1414032467)

2014-10-25 22:05:32.366: [    CSSD][655316736]clssnmDoSyncUpdate: Initiating sync 309527291

2014-10-25 22:05:32.366: [    CSSD][655316736]clssscCompareSwapEventValue: changed NMReconfigInProgress  val 1, from -1, changes 7

2014-10-25 22:05:32.366: [    CSSD][655316736]clssnmDoSyncUpdate: local disk timeout set to 200000 ms, remote disk timeout set to 200000

2014-10-25 22:05:32.366: [    CSSD][655316736]clssnmDoSyncUpdate: new values for local disk timeout and remote disk timeout will take effect when the sync is completed.

2014-10-25 22:05:32.366: [    CSSD][655316736]clssnmDoSyncUpdate: Starting cluster reconfig with incarnation 309527291

2014-10-25 22:05:32.366: [    CSSD][655316736]clssnmSetupAckWait: Ack message type (11)

2014-10-25 22:05:32.366: [    CSSD][655316736]clssnmSetupAckWait: node(1) is ALIVE

2014-10-25 22:05:32.366: [    CSSD][655316736]clssnmSetupAckWait: node(2) is ALIVE

2014-10-25 22:05:32.369: [    CSSD][655316736]clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state

2014-10-25 22:05:32.370: [    CSSD][653739776]clssnmHandleUpdate: NODE 1 (node1) IS ACTIVE MEMBER OF CLUSTER

2014-10-25 22:05:32.370: [    CSSD][653739776]clssnmHandleUpdate: NODE 2 (node2) IS ACTIVE MEMBER OF CLUSTER

2014-10-25 22:05:32.370: [    CSSD][891287296]clssgmSuspendAllGrocks: done

 

crsd.log显示将节点1上运行的节点二资源停止并在node2上重新启动:

2014-10-25 22:06:05.097: [   CRSPE][870311680]{2:25913:2} CRS-2677: Stop of 'ora.node2.vip' on 'node1' succeeded

2014-10-25 22:06:05.101: [   CRSPE][870311680]{2:25913:2} CRS-2672: Attempting to start 'ora.node2.vip' on 'node2'

此时资源又重新飘回node2了:

node2-> crs_stat -t

Name           Type           Target    State     Host       

------------------------------------------------------------

ora....ER.lsnr ora....er.type ONLINE    ONLINE    node1      

ora....N1.lsnr ora....er.type ONLINE    ONLINE    node2      

ora....N2.lsnr ora....er.type ONLINE    ONLINE    node1      

ora....N3.lsnr ora....er.type ONLINE    ONLINE    node1      

ora.OCR.dg     ora....up.type ONLINE    ONLINE    node1      

ora.TEMP.dg    ora....up.type ONLINE    ONLINE    node1      

ora.UNDO.dg    ora....up.type ONLINE    ONLINE    node1      

ora.asm        ora.asm.type   ONLINE    ONLINE    node1      

ora.cvu        ora.cvu.type   ONLINE    ONLINE    node1      

ora.gsd        ora.gsd.type   OFFLINE   OFFLINE              

ora....network ora....rk.type ONLINE    ONLINE    node1      

ora....SM1.asm application    ONLINE    ONLINE    node1      

ora....E1.lsnr application    ONLINE    ONLINE    node1      

ora.node1.gsd  application    OFFLINE   OFFLINE              

ora.node1.ons  application    ONLINE    ONLINE    node1      

ora.node1.vip  ora....t1.type ONLINE    ONLINE    node1      

ora....SM2.asm application    ONLINE    ONLINE    node2      

ora....E2.lsnr application    ONLINE    ONLINE    node2      

ora.node2.gsd  application    OFFLINE   OFFLINE              

ora.node2.ons  application    ONLINE    ONLINE    node2      

ora.node2.vip  ora....t1.type ONLINE    ONLINE    node2      

ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    node1      

ora.ons        ora.ons.type   ONLINE    ONLINE    node1      

ora.scan1.vip  ora....ip.type ONLINE    ONLINE    node2      

ora.scan2.vip  ora....ip.type ONLINE    ONLINE    node1      

ora.scan3.vip  ora....ip.type ONLINE    ONLINE    node1      

ora.sszgdb.db  ora....se.type ONLINE    ONLINE    node1

 

我们我们可以将bond0的另一个slave interface启用起来:

[root@node1 ~]# ifup em2

[root@node1 ~]# cat /proc/net/bonding/bond0

Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

 

Bonding Mode: fault-tolerance (active-backup)

Primary Slave: None

Currently Active Slave: em4

MII Status: up

MII Polling Interval (ms): 100

Up Delay (ms): 0

Down Delay (ms): 0

 

Slave Interface: em4

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cd

Slave queue ID: 0

 

Slave Interface: em2

MII Status: up

Speed: 1000 Mbps

Duplex: full

Link Failure Count: 0

Permanent HW addr: c8:1f:66:fb:6f:cb

Slave queue ID: 0

 

Messages

Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: Link is up at 1000 Mbps, full duplex

Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: Flow control is on for TX and on for RX

Oct 25 22:05:29 node1 kernel: tg3 0000:01:00.1: em2: EEE is disabled

Oct 25 22:05:29 node1 kernel: bond0: link status definitely up for interface em2, 1000 Mbps full duplex.

可以看到em2已经就绪,私有网络又恢复到拥有2个网卡保驾护航的状态。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值