OS:AIX 6100-07
DB:Oracle 11.2.0.3
--------------------------------------------------
今天在安装配置Oracle11g的Grid时,在执行root.sh脚本的最后阶段出现了如下错误:
Start of resource "ora.cssd" failed
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'oaracdb1'
CRS-2672: Attempting to start 'ora.gipcd' on 'oaracdb1'
CRS-2676: Start of 'ora.cssdmonitor' on 'oaracdb1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'oaracdb1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'oaracdb1'
CRS-2672: Attempting to start 'ora.diskmon' on 'oaracdb1'
CRS-2676: Start of 'ora.diskmon' on 'oaracdb1' succeeded
CRS-2674: Start of 'ora.cssd' on 'oaracdb1' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'oaracdb1'
CRS-2681: Clean of 'ora.cssd' on 'oaracdb1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'oaracdb1'
CRS-2677: Stop of 'ora.gipcd' on 'oaracdb1' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'oaracdb1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'oaracdb1' succeeded
CRS-5804: Communication error with agent process
CRS-4000: Command Start failed, or completed with errors.
Failed to start Cluster Synchorinisation Service in clustered mode at /soft/product/11.2.0.3/gih/crs/install/crsconfig_lib.pm line 1211.
/soft/product/11.2.0.3/gih/perl/bin/perl -I/soft/product/11.2.0.3/gih/perl/lib -I/soft/product/11.2.0.3/gih/crs/install /soft/product/11.2.0.3/gih/crs/install/rootcrs.pl execution failed
[oaracdb1@root]#
出错的背景有如下:
1、当时是一个新人做的,在host表中每个节点都少了一条内网IP对于的host表记录xxx.xxx.xxx.xxx oraracdb1-priv
2、那哥们在第一个节点的root.sh脚本还没执行完,就火急火燎的在第二个节点上执行root.sh,正常情况下是应该要依次进行的。
3、节点1是系统管理员使用磁带恢复的,结果恢复出来的效果就是:转好了HACMP和Oracle客户端,这玩意只能卸掉。
最开始的处理是,添加host记录,清掉CRS,然后手工执行rootcrs.pl脚本:
/soft/product/11.2.0.3/gih/crs/install/rootcrs.pl -deconfig -force -verbose
/soft/product/11.2.0.3/gih/perl/bin/perl -I/soft/product/11.2.0.3/gih/perl/lib -I/soft/product/11.2.0.3/gih/crs/install /soft/product/11.2.0.3/gih/crs/install/rootcrs.pl
问题任然不行,于是又再次清掉CRS和HAS,然后执行root.sh脚本:
/soft/product/11.2.0.3/gih/crs/install/rootcrs.pl -deconfig -force -verbose
/soft/product/11.2.0.3/gih/crs/install/roothas.pl -deconfig -force -verbose
/soft/product/11.2.0.3/gih/root.sh
结果问题还是没解决,于是google了一把,发现可能是HACMP没有卸载干净,存在HACMP和Oracle Database 11gR2 Grid Infrastructure不兼容的问题。
由于是生产系统,要求安装配置一次过,所以就只能想把Grid Infrastructure卸载掉再重新安装了,同时也清理了一下HACMP的残留文件:
1、删掉/usr/sbin/cluster目录
2、删除"hagsuser"用户组
3、删除/var/ha/soc这的其他文件及目录,只保留hats目录
4、Modify rootpre.sh file by removing HACMP related part from this file and run rootpre.sh again.其实还不如直接重启服务器
最后重装就OK了。