3节点RAC多路控制文件丢失一个后数据库宕机问题
根本原因:1、存储磁盘丢失导致数据库服务器和数据库重启
2、数据库重启后,原设置的数据库控制文件多路复用中新增加的控制文件没有复制至目标位置,导致RAC中147、148数据库启动失败
对数据库状态和日志进行检查
1、检查集群状态(crs_stat -t,online),发现集群状态正常
[oracle@siebeldb1 bin]$ ./crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.CRS.dg ora....up.type ONLINE ONLINE siebeldb1
ora.DATA.dg ora....up.type ONLINE ONLINE siebeldb1
ora.FRA.dg ora....up.type ONLINE ONLINE siebeldb1
ora....ER.lsnr ora....er.type ONLINE ONLINE siebeldb1
ora....N1.lsnr ora....er.type ONLINE ONLINE siebeldb1
ora....N2.lsnr ora....er.type ONLINE ONLINE siebeldb3
ora....N3.lsnr ora....er.type ONLINE ONLINE siebeldb2
ora.asm ora.asm.type ONLINE ONLINE siebeldb1
ora.crmprod.db ora....se.type ONLINE ONLINE siebeldb1
ora.cvu ora.cvu.type ONLINE ONLINE siebeldb2
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE siebeldb1
ora.oc4j ora.oc4j.type ONLINE ONLINE siebeldb3
ora.ons ora.ons.type ONLINE ONLINE siebeldb1
ora.scan1.vip ora....ip.type ONLINE ONLINE siebeldb1
ora.scan2.vip ora....ip.type ONLINE ONLINE siebeldb3
ora.scan3.vip ora....ip.type ONLINE ONLINE siebeldb2
ora....SM1.asm application ONLINE ONLINE siebeldb1
ora....B1.lsnr application ONLINE ONLINE siebeldb1
ora....db1.gsd application OFFLINE OFFLINE
ora....db1.ons application ONLINE ONLINE siebeldb1
ora....db1.vip ora....t1.type ONLINE ONLINE siebeldb1
ora....SM2.asm application ONLINE ONLINE siebeldb2
ora....B2.lsnr application ONLINE ONLINE siebeldb2
ora....db2.gsd application OFFLINE OFFLINE
ora....db2.ons application ONLINE ONLINE siebeldb2
ora....db2.vip ora....t1.type ONLINE ONLINE siebeldb2
ora....SM3.asm application ONLINE ONLINE siebeldb3
ora....B3.lsnr application ONLINE ONLINE siebeldb3
ora....db3.gsd application OFFLINE OFFLINE
ora....db3.ons application ONLINE ONLINE siebeldb3
ora....db3.vip ora....t1.type ONLINE ONLINE siebeldb3
2、数据库状态(gv$instance,open), 发现数据库只有1个节点启动,其余两个节点未启动
SQL> select instance_name,status from gv$instance;
INSTANCE_NAME STATUS
---------------- ------------
crmprod1 OPEN
3、检查各节点alter日志,
siebeldb2、siebeldb3日志:
Reconfiguration complete
ORA-00210: cannot open the specified control file
ORA-00202: control file: '+FRA'
ORA-17503: ksfdopn:2 Failed to open file +FRA
ORA-15045: ASM file name '+FRA' is not in reference form
ORA-00210: cannot open the specified control file
ORA-00202: control file: '+DATA'
ORA-17503: ksfdopn:2 Failed to open file +DATA
ORA-15045: ASM file name '+DATA' is not in reference form
ORA-205 signalled during: ALTER DATABASE MOUNT /* db agent *//* {3:36636:2} */...
Tue Sep 01 00:18:15 2015
Shutting down instance (abort)
License high water mark = 1
USER (ospid: 4023): terminating the instance
Instance terminated by USER, pid = 4023
Tue Sep 01 00:18:16 2015
Instance shutdown complete
出现“不能打开执行的控制文件”错误
siebeldb1 alert日志正常,以为ASM磁盘挂接出现问题
4、ASM实例状态(gv$instance started),ASM实例状态正常
select instance_name, host_name, status from gv$instance;
INSTANCE_N HOST_NAME STATUS
---------- ---------- ------------
+ASM1 siebeldb1 STARTED
+ASM3 siebeldb3 STARTED
+ASM2 siebeldb2 STARTED
5、ASM磁盘状态(v$asm_diskgroup,mounted),ASM磁盘挂接正常
SQL> select name,state from v$asm_diskgroup;
NAME STATE
---------- -----------
CRS MOUNTED
DATA MOUNTED
FRA MOUNTED
6、尝试直接启动数据库,发现数据库无法启动,报错错误与晚上自动重启一致
SQL> startup
和
srvctl start instance –d crmprod –i crmprod2
srvctl start instance –d crmprod –i crmprod3
数据库启动均失败
7、仔细检查对比3个节点的alter日志,发现启动失败的两各节点日志文件中显示两行控制文件信息,而正常启动的节点只有一行控制文件信息,如下:
启动失败节点的alert日志文件:
spfile = "+DATA/crmprod/spfilecrmprod.ora"
nls_length_semantics = "CHAR"
sga_target = 12096M
control_files = "+DATA"
control_files = "+FRA"
db_block_size = 8192
compatible = "11.2.0.0.0"
log_archive_dest_1 = "location=+FRA"
启动正常节点的alert日志文件:
processes = 2000
sessions = 4000
nls_length_semantics = "CHAR"
sga_target = 12096M
control_files = "+DATA/crmprod/controlfile/current.256.865138499"
db_block_size = 8192
compatible = "11.2.0.0.0"
log_archive_dest_1 = "location=+FRA"
db_file_multiblock_read_count= 32
cluster_database = TRUE
db_create_file_dest = "+DATA"
thread = 1
此时想起前几天对控制文件做了多路复用,但是由于无法停机没有将控制文件物理文件拷贝至增加的配置目录,导致数据库启动时校验spfile发现找不到第二个控制文件,因而启动失败,而在整个数据库启动失败后(后来发现节点一能启动的原因为一直以来节点一都是以本地的$ORACLE_HOME/dbs下的spfile启动的)。
节点一:
SQL> show parameter pfile
NAME TYPE VALUE
------------------------ ------------------------------
spfile string /home/u01/db/11.2.0/dbs/spfilecrmprod2.ora
[oracle@siebeldb1 ~]$ ll /home/u01/db/11.2.0/dbs/spfilecrmprod1.ora
-rw-r----- 1 oracle asmadmin 5632 Sep 2 10:10 /home/u01/db/11.2.0/dbs/spfilecrmprod1.ora
找到问题原因后,通过以下方式处理解决:
1、 对节点2、节点3采用pfile文件启动
找到节点2、节点3的pfile文件$ORACLE_HOME/dbs/initcrmprod2.ora和$ORACLE_HOME/dbs/initcrmprod3.ora,查看pfile文件,如下:
[oracle@siebeldb2 dbs]$ more initcrmprod2.ora
SPFILE='+DATA/crmprod/spfilecrmprod.ora'
在文件末尾增加一行指定控制文件参数:
[oracle@siebeldb2 dbs]$ more initcrmprod2.ora
SPFILE='+DATA/crmprod/spfilecrmprod.ora'
*.control_files='+DATA/crmprod/controlfile/current.256.865138499'
使用pfile文件启动,
SQL> startup
节点2、节点3数据库启动正常
2、 为防止下次启动再次出现问题,暂时取消控制文件多路复用
修改spfile,取消控制文件多路复用:
alter system set confile_files='+DATA/crmprod/controlfile/current.256.865138499' scope=spfile;
修改将原pfile文件还原
mv initcrmprod2.ora_bak initcrmprod2.ora
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/532823/viewspace-2080370/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/532823/viewspace-2080370/