单节点rac无法启动

前几天同事遇到一个问题,在某RAC环境中,由于SA要打patch,所以希望RAC跑在单节点模式下,他先down掉了一台机器,在另外一台机器上,叫DBA把instance起来。

这是一个2个RAC的环境,有2台server,每个server上跑2个instance。即:
对于SIAP数据库,SIAP1在server1上;SIAP2在server2上;对于SIMP数据库,SIMP1在server1上,SIMP2在server2上。

数据库是11g的RAC,共享存储和心跳有veritas控制,RAC cluster有CRS控制。

目前server2已经down了,希望在server1上启动SIAP1和SIMP1。问题是,SIMP1已经启动,但是SIAP1却启动不了。启动时报错为:

SQL > startup
ORA - 27504 : IPC error creating OSD context
ORA - 27300 : OS system dependent operation : check if cable failed with status : 0
ORA - 27301 : OS failure message : Error 0
ORA - 27302 : failure occurred at : skgxpcini1
ORA - 27303 : additional information : requested interface ce2 interface not running set _disable_interface_checking =
TRUE to disable this check for single instance cluster . Check output from ifconf

看报错的提示,是ce2的网卡没有在跑,要求将_disable_interface_checking设置成true才能把数据库起来。

由于当时情况紧急,没来得及细细研究,只是当SA把另一台server也起来的时候,SIAP1就能启动起来了。

于是,我们不禁要问,为什么在同一台server,一个instance能起来,另一个却不能?

我们来查一下RAC的网络配置,看看ce2是什么网卡。(当前状态已正常,2个server,4个instance均工作正常)

CRS中的信息:
oracle@vus029pa:SIAP1:/opt/app/oracle/admin $ oifcfg getif
ce0 144.135.159.0 global public
ce6 144.135.159.0 global public
 
hosts文件中的配置:
oracle@vus029pa:SIAP1:/opt/app/oracle/admin $ more /etc/hosts
#Public
144.135.159.111 vus029pa vus029pa.in.telstra.com.au loghost
#Oracle Virtual IP Addresses
144.135.159.110 osiiprd1dbr01.in.telstra.com.au
144.135.159.112 osiiprd1dbr02.in.telstra.com.au
# Private Interconnects
192.168.0.1 osiiprd1db2-priv osiiprd1db2-priv.in.telstra.com.au
192.168.0.2 osiiprd1db1-priv osiiprd1db1-priv.in.telstra.com.au
 
网卡的配置:
oracle@vus029pa:SIAP1:/opt/app/oracle/admin $ ifconfig -a
ce0: flags=1009040843
< UP , BROADCAST , RUNNING , MULTICAST , DEPRECATED , IPv4 , NOFAILOVER , FIXEDMTU > mtu
1500 index 2
inet 144.135.159.13 netmask ffffff00 broadcast 144.135.159.255
groupname clustermgmt-mnb
ce0:1: flags=1001000843
< UP , BROADCAST , RUNNING , MULTICAST , IPv4 , FIXEDMTU > mtu 1500 index 2
inet 144.135.159.111 netmask ffffff00 broadcast 144.135.159.255
ce0:2: flags=1000843
< UP , BROADCAST , RUNNING , MULTICAST , IPv4 > mtu 1500 index 2
inet 144.135.159.109 netmask ffffff00 broadcast 144.135.159.255
ce2: flags=1000843
< UP , BROADCAST , RUNNING , MULTICAST , IPv4 > mtu 1500 index 18
inet 192.168.0.1 netmask ffffff00 broadcast 192.168.0.255
ce6: flags=1009040843
< UP , BROADCAST , RUNNING , MULTICAST , DEPRECATED , IPv4 , NOFAILOVER , FIXEDMTU > mtu
1500 index 6
inet 144.135.159.63 netmask ffffff00 broadcast 144.135.159.255
groupname clustermgmt-mnb
ce6:1: flags=1040843
< UP , BROADCAST , RUNNING , MULTICAST , DEPRECATED , IPv4 > mtu 1500 index 6
inet 144.135.159.110 netmask ffffff00 broadcast 144.135.159.255

我们看到ce2网卡配置的IP是192.168.0.1,而从hosts文件中看到,这个地址是private的地址。也就是说,SIAP1在启动的时候,去检查private地址的网卡是否up,如果up,实例才能正常启动。

类似这样的检查private网络的网卡,在ASM+10gRAC的环境中也同样存在:

[ root @ rac1 ~] # ifconfig
eth0       Link encap : Ethernet   HWaddr 00 : 0 C : 29 : AE : 9 A : 38  
          
inet addr : 192.168.190.131   Bcast : 192.168.190.255   Mask : 255.255.255.0
          
inet6 addr : fe80 :: 20 c : 29 ff : feae : 9 a38 / 64 Scope : Link
          
UP BROADCAST RUNNING MULTICAST   MTU : 1500   Metric : 1
          
RX packets : 3648 errors : 0 dropped : 0 overruns : 0 frame : 0
          
TX packets : 3809 errors : 0 dropped : 0 overruns : 0 carrier : 0
          
collisions : 0 txqueuelen : 1000
          
RX bytes : 362192 ( 353.7 KiB )   TX bytes : 357537 ( 349.1 KiB )
          
Interrupt : 10 Base address : 0 x1480
 
eth1       Link encap : Ethernet   HWaddr 00 : 0 C : 29 : AE : 9 A : 42  
          
inet addr : 10.10.10.31   Bcast : 10.10.10.255   Mask : 255.255.255.0
          
inet6 addr : fe80 :: 20 c : 29 ff : feae : 9 a42 / 64 Scope : Link
          
UP BROADCAST RUNNING MULTICAST   MTU : 1500   Metric : 1
          
RX packets : 595 errors : 0 dropped : 0 overruns : 0 frame : 0
          
TX packets : 22 errors : 0 dropped : 0 overruns : 0 carrier : 0
          
collisions : 0 txqueuelen : 1000
          
RX bytes : 107822 ( 105.2 KiB )   TX bytes : 1092 ( 1.0 KiB )
          
Interrupt : 5 Base address : 0 x1800
 
lo         Link encap : Local Loopback  
          
inet addr : 127.0.0.1   Mask : 255.0.0.0
          
inet6 addr : :: 1 / 128 Scope : Host
          
UP LOOPBACK RUNNING   MTU : 16436   Metric : 1
          
RX packets : 41187 errors : 0 dropped : 0 overruns : 0 frame : 0
          
TX packets : 41187 errors : 0 dropped : 0 overruns : 0 carrier : 0
          
collisions : 0 txqueuelen : 0
          
RX bytes : 12143968 ( 11.5 MiB )   TX bytes : 12143968 ( 11.5 MiB )
          
[
root @ rac1 ~] # cat /etc/hosts |grep priv
10.10.10.31       rac1 - priv . mycorpdomain . com       rac1 - priv
10.10.10.32       rac2 - priv . mycorpdomain . com       rac2 - priv
10.10.10.33       rac3 - priv . mycorpdomain . com       rac3 - priv
[
root @ rac1 ~] #
[
root @ rac1 ~] #
[
root @ rac1 ~] # ifconfig eth1 down
[
root @ rac1 ~] # ifconfig
eth0       Link encap : Ethernet   HWaddr 00 : 0 C : 29 : AE : 9 A : 38  
          
inet addr : 192.168.190.131   Bcast : 192.168.190.255   Mask : 255.255.255.0
          
inet6 addr : fe80 :: 20 c : 29 ff : feae : 9 a38 / 64 Scope : Link
          
UP BROADCAST RUNNING MULTICAST   MTU : 1500   Metric : 1
          
RX packets : 3769 errors : 0 dropped : 0 overruns : 0 frame : 0
          
TX packets : 3914 errors : 0 dropped : 0 overruns : 0 carrier : 0
          
collisions : 0 txqueuelen : 1000
          
RX bytes : 373764 ( 365.0 KiB )   TX bytes : 368107 ( 359.4 KiB )
          
Interrupt : 10 Base address : 0 x1480
 
lo         Link encap : Local Loopback  
          
inet addr : 127.0.0.1   Mask : 255.0.0.0
          
inet6 addr : :: 1 / 128 Scope : Host
          
UP LOOPBACK RUNNING   MTU : 16436   Metric : 1
          
RX packets : 41240 errors : 0 dropped : 0 overruns : 0 frame : 0
          
TX packets : 41240 errors : 0 dropped : 0 overruns : 0 carrier : 0
          
collisions : 0 txqueuelen : 0
          
RX bytes : 12145505 ( 11.5 MiB )   TX bytes : 12145505 ( 11.5 MiB )
 
[
root @ rac1 ~] #
[
root @ rac1 ~] #
[
root @ rac1 ~] # su - oracle
rac1 ->
rac1 ->
rac1 -> crs_stat - t
Name             Type             Target     State       Host        
----------------------------------------------------------
--
ora.devdb.db   application    OFFLINE   OFFLINE               

ora .... b1 . inst application     OFFLINE     OFFLINE                
ora .... b2 . inst application     OFFLINE     OFFLINE                
ora .... b3 . inst application     OFFLINE     OFFLINE                
ora .... SM1 . asm application     OFFLINE     OFFLINE                
ora .... C1 . lsnr application     OFFLINE     OFFLINE                
ora . rac1 . gsd     application     OFFLINE     OFFLINE                
ora . rac1 . ons     application     OFFLINE     OFFLINE                
ora . rac1 . vip     application     OFFLINE     OFFLINE                
ora .... SM2 . asm application     OFFLINE     OFFLINE                
ora .... C2 . lsnr application     OFFLINE     OFFLINE                
ora . rac2 . gsd     application     OFFLINE     OFFLINE                
ora . rac2 . ons     application     OFFLINE     OFFLINE                
ora . rac2 . vip     application     OFFLINE     OFFLINE                
ora .... SM3 . asm application     OFFLINE     OFFLINE                
ora .... C3 . lsnr application     OFFLINE     OFFLINE                
ora . rac3 . gsd     application     OFFLINE     OFFLINE                
ora . rac3 . ons     application     OFFLINE     OFFLINE                
ora . rac3 . vip     application     OFFLINE     OFFLINE                
rac1 -> export ORACLE_SID =+ ASM1
rac1 -> sqlplus " /as sysdba "
 
SQL * Plus : Release 10.2.0.1.0 - Production on Fri Jul 13 22 : 07 : 31 2012
 
Copyright ( c ) 1982 , 2005 , Oracle All rights reserved .
 
Connected to an idle instance .
 
SQL > startup
ORA - 27504 : IPC error creating OSD context
ORA - 27300 : OS system dependent operation : if_not_up failed with status : 0
ORA - 27301 : OS failure message : Error 0
ORA - 27302 : failure occurred at : skgxpvaddr5
ORA - 27303 : additional information : requested interface eth1 is not UP . Check output from ifconfig command
SQL >

在启动过程中,asm的alertlong中也可以看到会检查private网络:

Starting ORACLE instance ( normal )
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth1 10.10.10.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 192.168.190.0 configured from OCR for use as   a public interface
Picked latch - free SCN scheme 2
......

也就是说,不管在10g还是11g中,不管是asm instance还是database instance,在RAC环境下,启动的时候,总是会检查private的网卡是否up,只有up的情况下,才能启动instance。

那么,为什么我们的SIMP1却能启动呢?

我们来看看SIAP1和SIMP1启动时的alertlog,看看有何不同:
SIAP1启动的alertlog:

Sat Jul 07 19 : 27 : 24 GMT 2012
Starting ORACLE instance ( normal )
cluster_interconnects = 192.168.0.1
Cluster communication is configured to use the following interface ( s ) for this instance
192.168.0.1
Sat Jul 07 19 : 28 : 07 GMT 2012
cluster interconnect IPC version : Oracle UDP / IP ( generic )
IPC Vendor 1 proto 2
siap1 instance setting cluster_interconnects and not list NIC info ,
simp1 instance not setting cluster_interconnects and list NIC info ,
......

SIMP1启动时候的alertlog:

Sat Jul 07 15 : 09 : 36 GMT 2012
Starting ORACLE instance ( normal )
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 ce0 144.135.159.0 configured from OCR for use as a public interface
Interface type 1 ce6 144.135.159.0 configured from OCR for use as a public interface
WARNING : No cluster interconnect has been specified . Depending on
the communication driver configured Oracle cluster traffic
may be directed to the public interface of this machine .
Oracle recommends that RAC clustered databases be configured
with a private interconnect for enhanced security and
performance .
....
Cluster communication is configured to use the following interface ( s ) for this instance
144.135.159.111
Sat Jul 07 15 : 09 : 43 GMT 2012
cluster interconnect IPC version : Oracle UDP / IP ( generic )
IPC Vendor 1 proto 2

我们看到,SIMP1启动时,是用144.135.159.111 这个IP做节点间通信的,而SIAP1启动时,是用192.168.0.1这个IP做节点间通信。

为什么在CRS中都没有配置cluster_interconnect,2个instance会走截然不同的IP。

我们知道在CRS中如果没有配置cluster_interconnect,那么private是会走public IP的,因此,SIMP1确实属于这种情况。那为何SIAP1却没有按照这种情况走?

我们想到有另外一个参数,初始化参数cluster_interconnects,当配置这个参数时,CRS中的配置就是失效,因为优先权还是初始化参数中的cluster_interconnects高。我们来检查一下2个instance的这个参数:
SIAP1:

NAME                           TYPE         VALUE                                                                          
---------------------------
-- ----------- -------------------
cluster_interconnects           string       192.168.0.1

SIMP1上:

NAME                           TYPE         VALUE              
---------------------------
-- ----------- -------------------
cluster_interconnects           string

看来这就是问题所在了,由于在SIAP1中配置了初始化参数cluster_interconnects 为固定的private地址,这个配置会忽略CRS中的设置,不再走public的地址,所以当SA down掉private的网卡,也就是ce2的时候,SIAP1就起不来了。

但是SIMP1由于没有配置cluster_interconnects,它所使用的配置是CRS中的信息,且CRS中没配cluster_interconnect,所以就走public网络了,即我们在alertlog中看到的144.135.159.111的地址。

初始化参数cluster_interconnects的配置,CRS中global public的配置,CRS中global cluster_interconnect的配置。明白了这些的关系和优先级,故障的原因也就明了了。

原文地址:http://www.oracleblog.org/working-case/can-not-startup-single-node-of-rac/

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值