客户反馈在重启RAC环境后,发现CLUSTER启动正常,但是数据库实例没用启动。
根据客户的电话描述,Oracle尝试在节点1上启动实例2,在节点2上启动实例1,并导致错误CRS-1019。
从客户的描述上很难得到真正有意义的信息,于是请客户将详细的错误信息发给我:
oracle@orcl1:/home/oracle>crs_start -ALL
Attempting TO START `ora.orcl.orcl1.inst` ON member `orcl1`
Attempting TO START `ora.orcl.orcl2.inst` ON member `orcl2`
START OF `ora.orcl.orcl1.inst` ON member `orcl1` failed.
orcl2 : CRS-1019: Resource ora.orcl.orcl1.inst (application) cannot run ON orcl2
START OF `ora.orcl.orcl2.inst` ON member `orcl2` failed.
orcl1 : CRS-1019: Resource ora.orcl.orcl2.inst (application) cannot run ON orcl1
Attempting TO START `ora.orcl.db` ON member `orcl1`
START OF `ora.orcl.db` ON member `orcl1` failed.
Attempting TO START `ora.orcl.db` ON member `orcl2`
START OF `ora.orcl.db` ON member `orcl2` failed.
CRS-1006: No more members TO consider
CRS-0215: Could NOT START resource 'ora.orcl.db'.
CRS-0215: Could NOT START resource 'ora.orcl.orcl1.inst'.
CRS-0215: Could NOT START resource 'ora.orcl.orcl2.inst'.
oracle@orcl1:/home/oracle>crs_start -all
Attempting to start `ora.orcl.orcl1.inst` on member `orcl1`
Attempting to start `ora.orcl.orcl2.inst` on member `orcl2`
Start of `ora.orcl.orcl1.inst` on member `orcl1` failed.
orcl2 : CRS-1019: Resource ora.orcl.orcl1.inst (application) cannot run on orcl2
Start of `ora.orcl.orcl2.inst` on member `orcl2` failed.
orcl1 : CRS-1019: Resource ora.orcl.orcl2.inst (application) cannot run on orcl1
Attempting to start `ora.orcl.db` on member `orcl1`
Start of `ora.orcl.db` on member `orcl1` failed.
Attempting to start `ora.orcl.db` on member `orcl2`
Start of `ora.orcl.db` on member `orcl2` failed.
CRS-1006: No more members to consider
CRS-0215: Could not start resource 'ora.orcl.db'.
CRS-0215: Could not start resource 'ora.orcl.orcl1.inst'.
CRS-0215: Could not start resource 'ora.orcl.orcl2.inst'.
显然客户提到的CRS-1019错误,并不是导致问题的原因。上面的信息中最有意义的部分为:Start of `ora.orcl.orcl1.inst` on member `orcl1` failed。而随后的实例1无法在实例2上启动只是一个提示性的信息,并不是Oracle尝试在实例2上启动实例 1。
那么问题就很简单,找到实例无法启动的原因既可,向客户询问数据库的告警日志中记录的信息,告之只有一个启动实例的信息,没有什么错误也没有其他的信息写入。
有些时候确实会出现通过工具启动,错误信息没有写入到告警日志的情况,于是让客户尝试通过sqlplus直接STARTUP数据库,这次得到的明确的错误信息:
oracle@orcl2:/u01/app/oracle/admin/orcl/bdump>sqlplus / AS sysdba
SQL*Plus: Release 10.2.0.5.0 - Production ON Fri Nov 2 17:44:24 2012
Copyright (c) 1982, 2010, Oracle. ALL Rights Reserved.
Connected TO an idle instance.
SQL> startup mount;
ORA-02194: event specification syntax error 230 (minor error 215) near 'OFF'
oracle@orcl2:/u01/app/oracle/admin/orcl/bdump>sqlplus / as sysdba
SQL*Plus: Release 10.2.0.5.0 - Production on Fri Nov 2 17:44:24 2012
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> startup mount;
ORA-02194: event specification syntax error 230 (minor error 215) near 'OFF'
显然导致问题的原因是SPFILE中设置的EVENT存在语法错误。这也是为什么告警日志中没有记录错误的原因,Oracle在解析初始化参数的时候就碰到了错误,因此还没有真正的开始启动过程。
剩下的问题就很简单了,让客户手工创建PFILE,将EVENT的语法修改正确,或者先暂时注释掉,然后重新生成SPFILE,并重启数据库。
本以为问题解决了,没想到没过多长时间,再次接到客户的电话。这次实例2已经正常启动,不过实例1还存在问题,在SQLPLUS中直接启动不会报错,但是通过crs_start却无法正常启动。
2012-11-02 18:38:55.460: [ CRSRES][11628]32ora.orcl.orcl1.inst target SET TO OFFLINE BEFORE stop action
2012-11-02 18:38:55.460: [ CRSRES][11628]32StopResource: setting CLI VALUES
2012-11-02 18:38:55.471: [ CRSRES][11628]32Target SET TO OFFLINE FOR `ora.orcl.orcl1.inst`
2012-11-02 18:40:07.862: [ CRSRES][11633]32startRunnable: setting CLI VALUES
2012-11-02 18:40:07.867: [ CRSRES][11633]32Attempting TO START `ora.orcl.orcl1.inst` ON member `orcl1`
2012-11-02 18:40:09.194: [ CRSAPP][11633]32StartResource error FOR ora.orcl.orcl1.inst error code = 1
2012-11-02 18:40:09.853: [ CRSRES][11633]32Start OF `ora.orcl.orcl1.inst` ON member `orcl1` failed.
2012-11-02 18:40:09.865: [ CRSRES][11633]32orcl2 : CRS-1019: Resource ora.orcl.orcl1.inst (application) cannot run ON orcl2
2012-11-02 18:38:55.460: [ CRSRES][11628]32ora.orcl.orcl1.inst target set to OFFLINE before stop action
2012-11-02 18:38:55.460: [ CRSRES][11628]32StopResource: setting CLI values
2012-11-02 18:38:55.471: [ CRSRES][11628]32Target set to OFFLINE for `ora.orcl.orcl1.inst`
2012-11-02 18:40:07.862: [ CRSRES][11633]32startRunnable: setting CLI values
2012-11-02 18:40:07.867: [ CRSRES][11633]32Attempting to start `ora.orcl.orcl1.inst` on member `orcl1`
2012-11-02 18:40:09.194: [ CRSAPP][11633]32StartResource error for ora.orcl.orcl1.inst error code = 1
2012-11-02 18:40:09.853: [ CRSRES][11633]32Start of `ora.orcl.orcl1.inst` on member `orcl1` failed.
2012-11-02 18:40:09.865: [ CRSRES][11633]32orcl2 : CRS-1019: Resource ora.orcl.orcl1.inst (application) cannot run on orcl2
开始怀疑是ORACLE_HOME/dbs目录下的initorcl1.ora文件存在错误,没有指向正确的SPFILE文件,让客户进行确认后没有发现问题。
由于SQLPLUS启动没有问题,而通过CRS_START启动出现问题,怀疑是OCR中某些配置异常,于是让客户检查SRVCTL的CONFIG命令输出结果:
oracle@orcl1:/u01/app/oracle/product/10.2.0/crs/log/orcl1/crsd>srvctl config DATABASE -d orcl -a
orcl1 orcl1 /u01/app/oracle/product/10.2.0/db
orcl2 orcl2 /u01/app/oracle/product/10.2.0/db
DB_UNIQUE_NAME: orcl
DB_NAME: orcl
ORACLE_HOME: /u01/app/oracle/product/10.2.0/db
SPFILE: /dev/rspfile
DOMAIN: NULL
DB_ROLE: NULL
START_OPTIONS: NULL
POLICY: AUTOMATIC
ENABLE FLAG: DB ENABLED, INST DISABLED ON orcl1
oracle@orcl1:/u01/app/oracle/product/10.2.0/crs/log/orcl1/crsd>srvctl config database -d orcl -a
orcl1 orcl1 /u01/app/oracle/product/10.2.0/db
orcl2 orcl2 /u01/app/oracle/product/10.2.0/db
DB_UNIQUE_NAME: orcl
DB_NAME: orcl
ORACLE_HOME: /u01/app/oracle/product/10.2.0/db
SPFILE: /dev/rspfile
DOMAIN: null
DB_ROLE: null
START_OPTIONS: null
POLICY: AUTOMATIC
ENABLE FLAG: DB ENABLED, INST DISABLED ON orcl1
很明显,在OCR配置中,实例1被DISABLE了,这就是通过CRS_START启动时,实例1无法正常启动的原因。
执行下面的命令:
srvctl enable instance -d orcl –i orcl1
srvctl enable instance -d orcl –i orcl1
问题解决。