SUSE平台上的oracle 10g RAC升级11g失败后,需要重新恢复10g的RAC环境。首先需要使用de-install工具卸载11g版本的grid及rdbms产品,正常结束后直接清空11g的grid及base目录。
但在安装10g版本CRS时,执行"$ORA_CRS_HOME/root.sh”时,提示"Waiting for the Oracle CRSD and EVMD to start”,详细信息如下:
[root@dwdb1 ~]# /u01/app/oracle/oraInventory/orainstRoot.sh Changing permissions of /u01/app/oracle/oraInventory to 770. Changing groupname of /u01/app/oracle/oraInventory to oinstall. The execution of the script is complete [root@dwdb1 ~]# /u01/app/oracle/10gR2/crs/root.sh WARNING: directory '/u01/app/oracle/10gR2' is not owned by root WARNING: directory '/u01/app/oracle' is not owned by root WARNING: directory '/u01/app' is not owned by root WARNING: directory '/u01' is not owned by root Checking to see if Oracle CRS stack is already configured /etc/oracle does not exist. Creating it now. Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully WARNING: directory '/u01/app/oracle/10gR2' is not owned by root WARNING: directory '/u01/app/oracle' is not owned by root WARNING: directory '/u01/app' is not owned by root WARNING: directory '/u01' is not owned by root Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node : node 1: dwdb1 dwdb1-priv dwdb1 node 2: dwdb2 dwdb2-priv dwdb2 Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Now formatting voting device: /dev/raw/raw2 Format of 1 voting devices complete. Startup will be queued to init within 30 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. dwdb1 dwdb2 CSS is active on all nodes.Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
查看crs的进程状态,evmd进程不存在,如下
# ps -ef|grep d.bin root 10990 10364 0 15:03 ? 00:00:00 /u01/app/oracle/10gR2/crs/bin/crsd.bin restart root 11485 11024 0 15:03 ? 00:00:00 /u01/app/oracle/10gR2/crs/bin/oprocd.bin run -t 1000 -m 500 oracle 11609 11090 0 15:03 ? 00:00:00 /u01/app/oracle/10gR2/crs/bin/ocssd.bin root 15875 8055 0 15:13 pts/0 00:00:00 grep d.bin
查看crs的alert.log,crsd.log都无法定位问题原因。参照metalink上的文档重新clean up后再次安装,问题依旧。oracle10gRAC安装过多遍了,同一套环境同一个安装流程,升完11g失败后死活就装不上。
检查系统日志/var/log/messages发现了蛛丝马迹,多次执行root.sh失败的都出现过"init: /etc/inittab[56]: duplicate ID field "h1"”,详细信息如下:
ay 7 15:00:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8714. May 7 15:00:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8776. May 7 15:00:47 dwdb1 logger: autorun file for ohasd is missing May 7 15:01:27 dwdb1 last message repeated 4 times May 7 15:01:37 dwdb1 logger: autorun file for ohasd is missing May 7 15:01:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8776. May 7 15:01:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8714. May 7 15:01:47 dwdb1 logger: autorun file for ohasd is missing May 7 15:02:27 dwdb1 last message repeated 4 times May 7 15:02:37 dwdb1 logger: autorun file for ohasd is missing May 7 15:02:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8776. May 7 15:02:38 dwdb1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.8714. May 7 15:02:47 dwdb1 logger: autorun file for ohasd is missing May 7 15:03:27 dwdb1 last message repeated 4 times May 7 15:03:37 dwdb1 logger: autorun file for ohasd is missing May 7 15:03:39 dwdb1 logger: Cluster Ready Services completed waiting on dependencies. May 7 15:03:39 dwdb1 logger: Cluster Ready Services completed waiting on dependencies. May 7 15:03:39 dwdb1 logger: Running CRSD with TZ = May 7 15:03:39 dwdb1 logger: Oracle CSS Family monitor starting. May 7 15:03:40 dwdb1 logger: Filesystem containing /etc/oracle/scls_scr/dwdb1/root/cssrun vanished. May 7 15:03:40 dwdb1 logger: Unpredictable behavior from Oracle CRS may ensue. May 7 15:03:45 dwdb1 root: Oracle Cluster Ready Services starting by user request. May 7 15:03:45 dwdb1 root: Cluster Ready Services completed waiting on dependencies. May 7 15:03:45 dwdb1 init: Re-reading inittab May 7 15:03:47 dwdb1 logger: autorun file for ohasd is missing May 7 15:03:55 dwdb1 init: Re-reading inittab May 7 15:03:55 dwdb1 init: /etc/inittab[56]: duplicate ID field "h1" May 7 15:03:56 dwdb1 logger: Cluster Ready Services completed waiting on dependencies. May 7 15:03:56 dwdb1 logger: Cluster Ready Services completed waiting on dependencies. May 7 15:03:56 dwdb1 logger: Running CRSD with TZ = May 7 15:03:56 dwdb1 logger: Oracle CSS Family monitor restarting. May 7 15:03:57 dwdb1 logger: autorun file for ohasd is missing May 7 15:03:57 dwdb1 logger: Oracle CSS restart. 0, 1 May 7 15:04:07 dwdb1 logger: autorun file for ohasd is missing May 7 15:04:47 dwdb1 last message repeated 4 times May 7 15:05:57 dwdb1 last message repeated 7 times May 7 15:07:07 dwdb1 last message repeated 7 times May 7 15:08:17 dwdb1 last message repeated 7 times May 7 15:09:27 dwdb1 last message repeated 7 times May 7 15:10:37 dwdb1 last message repeated 7 times May 7 15:11:07 dwdb1 last message repeated 3 times May 7 15:11:13 dwdb1 sz[14772]: [root] crslog.tgz/ZMODEM: 185142 Bytes, 216293 BPS May 7 15:11:17 dwdb1 logger: autorun file for ohasd is missing May 7 15:11:57 dwdb1 last message repeated 4 times
核对/etc/inittab文件发现,CRS相关的部分如下
55 h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
56 h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null
57 h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null
58 h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null
/etc/init.d/init.ohasd这在oracle10g及其以前版本并不存在的。检查/etc/init.d/可以发现除了10g的init.evmd,init.cssd,init.crsd及init.css外,还有11g特有的init.ohasd也存在该路径下。
更多init.ohasd的信息可以学习oracle官方文档,ohasd.bin是oracle11g新引进的集群组件Oracle High Availability Services的在linux|AIX下的守护进程。
11g的deinstall工具并没有把添加到初始化项的/etc/inittab的ohasd相关信息清除,从而造成安装10g版本crs执行root.sh时,因为无法初始化ohasd.bin,影响evmd进程的启动。(具体原因还得咨询oracle工程师)
解决办法:
1.注掉/etc/inittab里"h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null”这一行。
2.建议删除"/etc/init.d/init.ohasd”。测试过程中,没有删除该文件并未影响安装。再次执行root.sh,顺利过去。