共享存储坏掉,存在共享存储的OCR,votedisk,ASM,database数据全部丢失.本地主机保存有数据库的备份.两个主机的软件是没有问题,所以此时只需要重新配置gi,然后恢复数据库.
以下模拟:
host:ct66rac01,ct66rac02
os:centos 6.6
db:11.2.0.4
GRID_HOME=/u01/app/11.2.0/grid
ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1
当前数据库正常状态:
[grid@ct66rac01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA.dg ora....up.type ONLINE ONLINE ct66rac01
ora....ER.lsnr ora....er.type ONLINE ONLINE ct66rac01
ora....N1.lsnr ora....er.type ONLINE ONLINE ct66rac01
ora.asm ora.asm.type ONLINE ONLINE ct66rac01
ora....SM1.asm application ONLINE ONLINE ct66rac01
ora....01.lsnr application ONLINE ONLINE ct66rac01
ora....c01.gsd application OFFLINE OFFLINE
ora....c01.ons application ONLINE ONLINE ct66rac01
ora....c01.vip ora....t1.type ONLINE ONLINE ct66rac01
ora....SM2.asm application ONLINE ONLINE ct66rac02
ora....02.lsnr application ONLINE ONLINE ct66rac02
ora....c02.gsd application OFFLINE OFFLINE
ora....c02.ons application ONLINE ONLINE ct66rac02
ora....c02.vip ora....t1.type ONLINE ONLINE ct66rac02
ora.cvu ora.cvu.type ONLINE ONLINE ct66rac01
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE ct66rac01
ora.oc4j ora.oc4j.type ONLINE ONLINE ct66rac01
ora.ons ora.ons.type ONLINE ONLINE ct66rac01
ora.rac11g.db ora....se.type ONLINE ONLINE ct66rac01
ora.scan1.vip ora....ip.type ONLINE ONLINE ct66rac01
模拟存储坏掉:
[root@ct66rac01 bin]# dd if=/dev/zero bs=1M count=100 of=/dev/sdb1
[root@ct66rac01 bin]# dd if=/dev/zero bs=1M count=100 of=/dev/sdb2
开始修复:
1.清空cluster的配置信息
--先停止各节点crs
[root@ct66rac01 bin]# cd /u01/app/11.2.0/grid/bin/
[root@ct66rac01 bin]# ./crsctl stop crs -f
[root@ct66rac02 ~]# cd /u01/app/11.2.0/grid/bin/
[root@ct66rac02 bin]# ./crsctl stop crs -f
--清空各节点cluster配置信息,注意最后的节点加-lastnode
[root@ct66rac01 bin]# cd /u01/app/11.2.0/grid/crs/install/
[root@ct66rac01 install]# ./rootcrs.pl -deconfig -force
[root@ct66rac02 bin]# cd /u01/app/11.2.0/grid/crs/install/
[root@ct66rac02 install]# ./rootcrs.pl -deconfig -force -lastnode
2.重新配置gi
/u01/app/11.2.0/grid/crs/install/crsconfig_params这里面存放了之前gi的配置信息.
共享存储重建之后,如果路径和之前的不一样,可以修改/u01/app/11.2.0/grid/crs/install/crsconfig_params这个文件改为当前的配置.
如果配置信息变化太大,也可以直接通过图形化执行/u01/app/11.2.0/grid/crs/config/config.sh去重新输入环境信息,以完成gi重建.
如果环境没有变化,只需要执行以下:
--重建cluster配置信息,
[root@ct66rac01 ~]# cd /u01/app/11.2.0/grid
[root@ct66rac01 grid]# ./root.sh
[root@ct66rac02 ~]# cd /u01/app/11.2.0/grid
[root@ct66rac02 grid]# ./root.sh
--此时完成之后,可以看到,连asm也重建完成
[grid@ct66rac01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA.dg ora....up.type ONLINE ONLINE ct66rac01
ora....N1.lsnr ora....er.type ONLINE ONLINE ct66rac01
ora.asm ora.asm.type ONLINE ONLINE ct66rac01
ora....SM1.asm application ONLINE ONLINE ct66rac01
ora....c01.gsd application OFFLINE OFFLINE
ora....c01.ons application ONLINE ONLINE ct66rac01
ora....c01.vip ora....t1.type ONLINE ONLINE ct66rac01
ora....SM2.asm application ONLINE ONLINE ct66rac02
ora....c02.gsd application OFFLINE OFFLINE
ora....c02.ons application ONLINE ONLINE ct66rac02
ora....c02.vip ora....t1.type ONLINE ONLINE ct66rac02
ora.cvu ora.cvu.type ONLINE ONLINE ct66rac01
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE ct66rac01
ora.oc4j ora.oc4j.type ONLINE ONLINE ct66rac01
ora.ons ora.ons.type ONLINE ONLINE ct66rac01
ora.scan1.vip ora....ip.type ONLINE ONLINE ct66rac01
--重新注册本地listenter到ocr
[grid@ct66rac01 ~]$ srvctl add listener -l listener
[grid@ct66rac01 ~]$ srvctl start listener
[grid@ct66rac01 ~]$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): ct66rac01,ct66rac02
--添加数据库的ASM磁盘组
可以通过sqlplus或者asmca,然后通srvctl add diskgroup重新注册diskgroup到ocr.
这里模拟环境的数据库用的和ocr相同的diskgroup,所以不需要新建.
3.gi重新配置完成,开始还原数据库
如果你是dp备份,那需要先通过DBCA去重建数据库,再impdp导入到数据库.
如果是rman备份,执行以下:
[root@ct66rac01 grid]# su - oracle
--显示当前的备份在/home/oracle下
[oracle@ct66rac01 ~]$ ll full*
-rw-r----- 1 oracle asmadmin 1134043136 Oct 22 17:59 full_db_RAC11G_3_1_1.dbfbk
-rw-r----- 1 oracle asmadmin 18579456 Oct 22 17:59 full_db_RAC11G_4_1_1.dbfbk
[oracle@ct66rac01 ~]$ ORACLE_SID=rac11g1
[oracle@ct66rac01 ~]$ rman target /
--启动到nomount,此处不需要参数文件
RMAN> startup nomount;
startup failed: ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATA/rac11g/spfilerac11g.ora'
ORA-17503: ksfdopn:2 Failed to open file +DATA/rac11g/spfilerac11g.ora
ORA-15056: additional error message
ORA-17503: ksfdopn:2 Failed to open file +DATA/rac11g/spfilerac11g.ora
ORA-15173: entry 'rac11g' does not exist in directory '/'
ORA-06512: at line 4
starting Oracle instance without parameter file for retrieval of spfile
Oracle instance started
Total System Global Area 1068937216 bytes
Fixed Size 2260088 bytes
Variable Size 281019272 bytes
Database Buffers 780140544 bytes
Redo Buffers 5517312 bytes
--还原spfile
--此处报错是因为目录不存在,要先在asm的+DATA下建相应目录
RMAN> restore spfile from '/home/oracle/full_db_RAC11G_4_1_1.dbfbk';
Starting restore at 23-OCT-15
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=174 device type=DISK
channel ORA_DISK_1: restoring spfile from AUTOBACKUP /home/oracle/full_db_RAC11G_4_1_1.dbfbk
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 10/23/2015 14:36:58
ORA-19870: error while restoring backup piece /home/oracle/full_db_RAC11G_4_1_1.dbfbk
ORA-32015: unable to restore SPFILE
ORA-17502: ksfdcre:4 Failed to create file +DATA/rac11g/spfilerac11g.ora
ORA-15056: additional error message
ORA-17502: ksfdcre:4 Failed to create file +DATA/rac11g/spfilerac11g.ora
ORA-15173: entry 'rac11g' does not exist in directory '/'
RMAN> exit
--在+DATA下建rac11g目录
[root@ct66rac01 grid]# su - grid
[grid@ct66rac01 ~]$ asmcmd
ASMCMD> ls
DATA/
ASMCMD> cd DATA
ASMCMD> ls
ct66rac-scan/
ASMCMD> mkdir rac11g
ASMCMD> exit
[root@ct66rac01 grid]# su - oracle
[oracle@ct66rac01 ~]$ ORACLE_SID=rac11g1
[oracle@ct66rac01 ~]$ rman target /
Recovery Manager: Release 11.2.0.4.0 - Production on Fri Oct 23 14:38:44 2015
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: DUMMY (not mounted)
--重新还原spfile
RMAN> restore spfile from '/home/oracle/full_db_RAC11G_4_1_1.dbfbk';
--用还原后的spfile开启到nomount
RMAN> startup force nomount;
--还原控制文件
RMAN> restore controlfile from '/home/oracle/full_db_RAC11G_4_1_1.dbfbk';
--启动到mount
RMAN> alter database mount;
--把/home/oracle下备份文件信息记录到控制文件
RMAN> catalog start with '/home/oracle/full';
searching for all files that match the pattern /home/oracle/full
List of Files Unknown to the Database
=====================================
File Name: /home/oracle/full_db_RAC11G_4_1_1.dbfbk
Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /home/oracle/full_db_RAC11G_4_1_1.dbfbk
RMAN> crosscheck backup;
--还原数据库
RMAN> restore database;
RMAN> recover database;
--打开数据库
RMAN> alter database open resetlogs;
database opened
RMAN> exit
--添加数据库信息到ocr
[oracle@ct66rac01 ~]$ srvctl status database -d rac11g
PRCD-1120 : The resource for database rac11g could not be found.
PRCR-1001 : Resource ora.rac11g.db does not exist
[oracle@ct66rac01 ~]$ echo $ORACLE_HOME
/u01/app/oracle/product/11.2.0/db_1
[oracle@ct66rac01 ~]$ srvctl add database -d rac11g -o /u01/app/oracle/product/11.2.0/db_1 -c RAC
[oracle@ct66rac01 ~]$ srvctl add instance -d rac11g -i rac11g1 -n ct66rac01
[oracle@ct66rac01 ~]$ srvctl add instance -d rac11g -i rac11g2 -n ct66rac02
[oracle@ct66rac01 ~]$ srvctl start database -d rac11g
--修复完成,查看数据库状态
[root@ct66rac01 bin]# ./crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DATA.dg ora....up.type ONLINE ONLINE ct66rac01
ora....ER.lsnr ora....er.type ONLINE ONLINE ct66rac01
ora....N1.lsnr ora....er.type ONLINE ONLINE ct66rac01
ora.asm ora.asm.type ONLINE ONLINE ct66rac01
ora....SM1.asm application ONLINE ONLINE ct66rac01
ora....01.lsnr application ONLINE ONLINE ct66rac01
ora....c01.gsd application OFFLINE OFFLINE
ora....c01.ons application ONLINE ONLINE ct66rac01
ora....c01.vip ora....t1.type ONLINE ONLINE ct66rac01
ora....SM2.asm application ONLINE ONLINE ct66rac02
ora....02.lsnr application ONLINE ONLINE ct66rac02
ora....c02.gsd application OFFLINE OFFLINE
ora....c02.ons application ONLINE ONLINE ct66rac02
ora....c02.vip ora....t1.type ONLINE ONLINE ct66rac02
ora.cvu ora.cvu.type ONLINE ONLINE ct66rac01
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE ct66rac01
ora.oc4j ora.oc4j.type ONLINE ONLINE ct66rac01
ora.ons ora.ons.type ONLINE ONLINE ct66rac01
ora.rac11g.db ora....se.type ONLINE ONLINE ct66rac01
ora.scan1.vip ora....ip.type ONLINE ONLINE ct66rac01