天萃荷净
运维DBA反映在检查Oracle RAC系统时,crs_stat显示节点1的listener和vip时断时续,分析原因网络问题导致
1.crs_stat显示节点1的listener和vip时断时续
(一会online,一会offline)
rac1-> crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.devdb.db application ONLINE ONLINE rac1
ora....b1.inst application ONLINE ONLINE rac1
ora....b2.inst application ONLINE ONLINE rac2
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE OFFLINE
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac2
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE OFFLINE
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac1
rac1-> crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.devdb.db application ONLINE ONLINE rac1
ora....b1.inst application ONLINE ONLINE rac1
ora....b2.inst application ONLINE ONLINE rac2
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE OFFLINE
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac2
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
查看crsd.log日志
0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` failed.
0startRunnable: setting CLI values
0Attempting to start `ora.rac1.vip` on member `rac1`
0Start of `ora.rac1.vip` on member `rac1` succeeded.
0startRunnable: setting CLI values
0Attempting to start `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
0Start of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
u_freem: mem passed is null
0CheckResource error for ora.rac1.vip error code = 1
0In stateChanged, ora.rac1.vip target is ONLINE
0ora.rac1.vip on rac1 went OFFLINE unexpectedly
0StopResource: setting CLI values
0Attempting to stop `ora.rac1.vip` on member `rac1`
0Stop of `ora.rac1.vip` on member `rac1` succeeded.
0ora.rac1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
0ora.rac1.vip failed on rac1 relocating.
0StopResource: setting CLI values
0Attempting to stop `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
0Stop of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` failed.
0Attempting to start `ora.rac1.vip` on member `rac2`
0Start of `ora.rac1.vip` on member `rac2` succeeded.
0CRS-1002: Resource 'ora.rac1.vip' is already running on member 'rac2'
这里可以看出由于vip资源失败,致使lsnr资源也出现失败,紧接着又是启动vip,再启动lsnr。所以使得我们通过crs_stat -t观察资源情况时,看到这两个进程一直处于波动状态
分析ora.rac1.vip.log日志
[ora.rac1.vip]: clsrcexecut:env ORACLE_CONFIG_HOME=/u01/app/oracle/product/10.2.0/crs_1
[ora.rac1.vip]: clsrcexecut:cmd=/u01/app/oracle/product/10.2.0/crs_1/bin/racgeut -e
_USR_ORA_DEBUG=0 54 /u01/app/oracle/product/10.2.0/crs_1/bin/racgvip check rac1
[ora.rac1.vip]: clsrcexecut: rc = 1, time = 6.430s
[ora.rac1.vip]: end for resource = ora.rac1.vip, action=check,status=1,time=6.450s
[ora.rac1.vip]: ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
[ora.rac1.vip]: clsrcstartorp: Error with malloc
[ora.rac1.vip]: ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.1.1 via eth0 failed, rc = 1 (host=rac1)
Interface eth0 checked failed (host=rac1)
Invalid parameters, or failed to bring up VIP (host=rac1)
通过这里发现:从eth0网卡ping192.168.1.1(网关)不通,导致VIP资源不能正常工作
2.核实问题原因/解决
我们人工从节点1上ping 网关(192.168.1.1),果真不通.继续检查发现,网关服务器上意外的开启了防火墙,对部分进来的包进行了过滤,恰好节点1在被禁止之列,使得节点1 ping 网关不成功,从而出现该了该错误.关闭防火墙或者重新设置规则后,rac工作正常,未出现vip和lsnr资源出现波动情况.
--------------------------------------ORACLE-DBA----------------------------------------
最权威、专业的Oracle案例资源汇总之【案例】Oracle RAC系统 crs_stat显示节点1的listener和vip时断时续