双节点RAC各个节点主机频繁自动重启故障解决
[日期:2012-04-20]
来源:Linux社区
作者:ccz320
[字体:大 中 小]
5) /etc/sysconfig/o2cb 中心跳的超时阈值已经修改为了301秒,但idle的时间使用了缺省的30000毫秒
[root@Linux1 ~]# more /etc/sysconfig/o2cb
#
# This is a configuration file for automatic startup of the O2CB
# driver. It is generated by running /etc/init.d/o2cb configure.
# On Debian based systems the preferred method is running
# 'dpkg-reconfigure ocfs2-tools'.
#
# O2CB_ENABLED: 'true' means to load the driver on boot.
O2CB_ENABLED=true
# O2CB_STACK: The name of the cluster stack backing O2CB.
O2CB_STACK=o2cb
# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2
# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=301 /此处单位为秒
# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead.
O2CB_IDLE_TIMEOUT_MS=30000 /此处单位为毫秒,正式message中报的30秒
# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS=2000 /此处单位为毫秒
# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=2000 /此处单位为毫秒
6) 可能由于是在pc机中的vmware搭建的多个虚拟机进行的实验,系统负载较重,导致各个节点的idel时间较长引起,因此计划将/etc/init.d/o2cb进行重新配置,将network相关的配置翻倍:
[root@Linux1 ~]# /etc/init.d/o2cb configure
Configuring the O2CB driver.
This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets ('[]'). Hitting
without typing an answer will keep that current value. Ctrl-C
will abort.
Load O2CB driver on boot (y/n) [y]:
Cluster stack backing O2CB [o2cb]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]:
Specify heartbeat dead threshold (>=7) [301]: /*此处单位为秒
Specify network idle timeout in ms (>=5000) [30000]: 60000 /*此后三行单位为毫秒
Specify network keepalive delay in ms (>=1000) [2000]: 4000
Specify network reconnect delay in ms (>=2000) [2000]: 4000
Writing O2CB configuration: OK
Cluster ocfs2 already online
[root@Linux1 ~]# exit
在各个节点分别重新启动o2cb服务:
[root@Linux2 ~]# service o2cb stop
Stopping O2CB cluster ocfs2: Failed
Unable to stop cluster as heartbeat region still active /*此时ocfs2文件系统在加载状态,不能停o2cb服务,需要先umount ocfs2
[root@Linux2 ~]# umount /u02
[root@Linux2 ~]# service o2cb stop
Stopping O2CB cluster ocfs2: OK
Unloading module "ocfs2": OK
Unmounting ocfs2_dlmfs filesystem: OK
Unloading module "ocfs2_dlmfs": OK
Unloading module "ocfs2_stack_o2cb": OK
Unmounting configfs filesystem: OK
Unloading module "configfs": OK
[root@Linux2 ~]# service o2cb start
Loading filesystem "configfs": OK
Mounting configfs filesystem at /sys/kernel/config: OK
Loading stack plugin "o2cb": OK
Loading filesystem "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Setting cluster stack "o2cb": OK
Starting O2CB cluster ocfs2: OK
[root@Linux2 ~]# mount /u02
[root@Linux2 ~]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
192.168.2.110:/mnt/nfs4backup/nfs4backup/nfs4backup on /mnt/share type nfs (rw,hard,nointr,tcp,noac,nfsvers=3,timeo=600,rsize=32768,wsize=32768,addr=192.168.2.110)
Oracleasmfs on /dev/oracleasm type oracleasmfs (rw)
configfs on /sys/kernel/config type configfs (rw)
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sdf1 on /u02 type ocfs2 (rw,_netdev,datavolume,nointr,heartbeat=local)
启动crs
[root@Linux2 ~]# /u01/app/crs/bin/crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[oracle@Linux2]crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE linux1
ora....X1.lsnr application ONLINE ONLINE linux1
ora.linux1.gsd application ONLINE ONLINE linux1
ora.linux1.ons application ONLINE ONLINE linux1
ora.linux1.vip application ONLINE ONLINE linux1
ora....SM2.asm application ONLINE ONLINE linux2
ora....X2.lsnr application ONLINE ONLINE linux2
ora.linux2.gsd application ONLINE ONLINE linux2
ora.linux2.ons application ONLINE ONLINE linux2
ora.linux2.vip application ONLINE ONLINE linux2
ora.racdb.db application ONLINE ONLINE linux2
ora....b1.inst application ONLINE ONLINE linux1
ora....b2.inst application ONLINE ONLINE linux2
ora...._taf.cs application OFFLINE OFFLINE
ora....db1.srv application OFFLINE OFFLINE
ora....db2.srv application OFFLINE OFFLINE
经过观察,系统正常运行,问题圆满解决。