
 周末接到用户电话说两台Sun fire4800*2+SE3510+ST2540组成sun cluster3.1宕机,无法启动, 经询问得知主节点erp-b由于无法登录,业务无法访问,被用户强行断电重启(惊恐),无法启动,另一台erp-a会报温度过高而宕机.

Rebooting with command: boot
SunOS Release 5.9 Version Generic_118558-02 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
WARNING: forceload of misc/md_trans failed
WARNING: forceload of misc/md_raid failed
WARNING: forceload of misc/md_hotspares failed
WARNING: forceload of misc/md_sp failed
Hardware watchdog enabled
configuring IPv4 interfaces: ce0 ce2 ce4.
Hostname: erp-b
Core files in /var/tmp/SUNWscu/core: core.erp-b.clconfig.80.-1770399348
Could not stat: /dev/rdsk/../../devices/ssm@0,0/pci@1d,600000/SUNW,qlc@1/fp@0,0/ssd@w200400a0b836973a,1f:h,raw path not loaded.
        No such file or directory
Booting as part of a cluster
NOTICE: CMM: Node erp-b (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node erp-a (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d10s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
WARNING: Error reading CCR table rgm_vp_version
NOTICE: The rgm protocol for this node is not compatible with the rgm protocol for the rest of the cluster.
_cladm: CL_CLUSTER_ENABLE: Invalid argument
UNRECOVERABLE ERROR: Sun Cluster boot: Could not initialize cluster framework
Please reboot in non cluster mode(boot -x) and Repair
syncing file systems... done
NOTICE: f_client_exit: Program terminated!
debugger entered.

boot -x 进非cluster模式后,检查硬件都正常、磁盘阵列也正常.

cd /etc/cluster/ccr/   检查发现并没有 rgm_vp_version文件,并且发现有rgm_vp_version.bak文件

mv rgm_vp_version.bak rgm_vp_version后重启

Rebooting with command: boot
WARNING: unknown command 'unset' on line 96 of etc/system
SunOS Release 5.9 Version Generic_118558-02 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
WARNING: forceload of misc/md_trans failed
WARNING: forceload of misc/md_raid failed
WARNING: forceload of misc/md_hotspares failed
WARNING: forceload of misc/md_sp failed
Hardware watchdog enabled
configuring IPv4 interfaces: ce0 ce2 ce4.
Hostname: erp-b
Core files in /var/tmp/SUNWscu/core: core.erp-b.clconfig.80.-1770399348
Could not stat: /dev/rdsk/../../devices/ssm@0,0/pci@1d,600000/SUNW,qlc@1/fp@0,0/ssd@w200400a0b836973a,1f:h,raw path not loaded.
        No such file or directory
Booting as part of a cluster
NOTICE: CMM: Node erp-b (nodeid = 1) with votecount = 1 added.
NOTICE: CMM: Node erp-a (nodeid = 2) with votecount = 1 added.
NOTICE: CMM: Quorum device 1 (/dev/did/rdsk/d10s2) added; votecount = 1, bitmask of nodes with configured paths = 0x3.
NOTICE: clcomm: Adapter ce3 constructed
NOTICE: clcomm: Path erp-b:ce3 - erp-a:ce3 being constructed
NOTICE: clcomm: Adapter ce1 constructed
NOTICE: clcomm: Path erp-b:ce1 - erp-a:ce1 being constructed
NOTICE: CMM: Node erp-b: attempting to join cluster.
NOTICE: clcomm: Path erp-b:ce3 - erp-a:ce3 errors during initiation
WARNING: Path erp-b:ce3 - erp-a:ce3 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
NOTICE: clcomm: Path erp-b:ce1 - erp-a:ce1 errors during initiation
WARNING: Path erp-b:ce1 - erp-a:ce1 initiation encountered errors, errno = 62. Remote node may be down or unreachable through this path.
NOTICE: CMM: Quorum device 1 (gdevname /dev/did/rdsk/d10s2) can not be acquired by the current cluster members. This quorum device is held by node 2.
NOTICE: CMM: Cluster doesn't have operational quorum yet; waiting for quorum.

终于有报错信息出来了,回顾整个故障过程发现 erp-b 由于断电,资源组和投票盘 切换到 erp-a 上,这时erp-a 恰巧PS0电源故障,导致其CPU温度过高宕机, 所以这时启动erp-b,由于投票盘还在erp-a上,cluster为了防止脑裂,不让erp-b进cluster,这时必须先启动erp-a,再启动erp-b就正常.

记住: 先关后启!!!


评论 1




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


