假设OCR磁盘和Votedisk磁盘全部破坏,并且都没有备份,该如何恢复,这时最简单的方法就是重新初始话OCR和Votedisk,具体操作如下:
参考《大话oracle rac》
模拟磁盘损坏:
[root@node1 ~]# crsctl stop crs
Stopping resources.
Error while stopping resources. Possible cause: CRSD is down.
Stopping CSSD.
Unable to communicate with the CSS daemon.
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw1 bs=102400 count=1200
dd: writing `/dev/raw/raw1': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 6.68439 seconds, 16.0 MB/s
You have new mail in /var/spool/mail/root
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw2 bs=102400 count=1200
dd: writing `/dev/raw/raw2': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 7.62786 seconds, 14.0 MB/s
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw5 bs=102400 count=1200
dd: writing `/dev/raw/raw5': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 8.75194 seconds, 12.2 MB/s
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw6 bs=102400 count=1200
dd: writing `/dev/raw/raw6': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 6.50958 seconds, 16.4 MB/s
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw6 bs=102400 count=3000
dd: writing `/dev/raw/raw6': No space left on device
1045+0 records in
1044+0 records out
106938368 bytes (107 MB) copied, 6.61992 seconds, 16.2 MB/s
[root@node1 ~]# dd if=/dev/zero f=/dev/raw/raw7 bs=102400 count=3000
dd: writing `/dev/raw/raw7': No space left on device
2509+0 records in
2508+0 records out
256884736 bytes (257 MB) copied, 16.0283 seconds, 16.0 MB/s
[root@node1 ~]#
[root@node1 ~]#
1停止所有节点的Clusterware Stack
Crsctl stop crs;
格式化所有的OCR和Votedisk
2分别在每个节点用root用户执行$CRS_HOME\install\rootdelete.sh脚本
[root@node1 ~]# $CRS_HOME/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed with invalid format: PROC-22: The OCR backend has an invalid format
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script. for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
3在任意一个节点上用root用户执行$CRS_HOME\install\rootinstall.sh脚本
[root@node1 ~]# $CRS_HOME/install/rootdeinstall.sh
Removing contents from OCR device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 2.36706 seconds, 4.4 MB/s
4在和上一步同一个节点上用root执行$CRS_HOME\root.sh脚本
[root@node1 ~]# $CRS_HOME/root.sh
WARNING: directory '/opt/ora10g/product/10.2.0' is not owned by root
WARNING: directory '/opt/ora10g/product' is not owned by root
WARNING: directory '/opt/ora10g' is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/opt/ora10g/product/10.2.0' is not owned by root
WARNING: directory '/opt/ora10g/product' is not owned by root
WARNING: directory '/opt/ora10g' is not owned by root
assigning default hostname node1 for node 1.
assigning default hostname node2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: node1 node1-priv node1
node 2: node2 node2-priv node2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/raw/raw1
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
node1
CSS is inactive on these nodes.
node2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
5在其他节点用root执行行$CRS_HOME\root.sh脚本
[root@node2 ~]# $CRS_HOME/root.sh
WARNING: directory '/opt/ora10g/product/10.2.0' is not owned by root
WARNING: directory '/opt/ora10g/product' is not owned by root
WARNING: directory '/opt/ora10g' is not owned by root
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/opt/ora10g/product/10.2.0' is not owned by root
WARNING: directory '/opt/ora10g/product' is not owned by root
WARNING: directory '/opt/ora10g' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname node1 for node 1.
assigning default hostname node2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: node1 node1-priv node1
node 2: node2 node2-priv node2
clscfg: Arguments check out successfully.
NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
node1
node2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
[root@node2 ~]# vipca
Error 0(Native: listNetInterfaces:[3])
[Error 0(Native: listNetInterfaces:[3])]
上述错误的解决:
[root@node1 ~]# oifcfg iflist
eth1 10.10.17.0
virbr0 192.168.122.0
eth0 192.168.1.0
[root@node1 ~]# oifcfg setif -global eth0/192.168.1.0:public
[root@node1 ~]# oifcfg setif -global eth1/10.10.17.0:cluster_interconnect
[root@node1 ~]#
[root@node1 ~]#
[root@node1 ~]# oifcfg iflist
eth1 10.10.17.0
virbr0 192.168.122.0
eth0 192.168.1.0
[root@node1 ~]# oifcfg getif
eth0 192.168.1.0 global public
eth1 10.10.17.0 global cluster_interconnect
由于上述错误,ONS,GSD,VIP没有创建成功,需手工运行vipca.
[root@node1 ~]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
6用netca命令重新配置监听,确认注册到Clusterware中
[root@node1 ~]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2
到目前为止,只有Listener,ONS,GSD,VIP注册到OCR中,还需要把ASM,数据库都注册到OCR中。
7 向OCR中添加ASM(需在oracle用户下)
[root@node1 dbs]# srvctl add asm -n node1 -i +ASM1 -o /opt/ora10g/product/10.2.0/db_1
null
[PRKS-1030 : Failed to add configuration for ASM instance "+ASM1" on node "node1" in cluster registry, [PRKH-1001 : HASContext Internal Error]
[PRKH-1001 : HASContext Internal Error]]
[root@node1 dbs]# su - oracle
[oracle@node1 ~]$ srvctl add asm -n node1 -i +ASM1 -o /opt/ora10g/product/10.2.0/db_1
[oracle@node1 ~]$ srvctl add asm -n node2 -i +ASM2 -o /opt/ora10g/product/10.2.0/db_1
8启动ASM
[oracle@node1 ~]$ srvctl start asm -n node1
[oracle@node1 ~]$ srvctl start asm -n node2
若在启动时报ORA-27550错误。是因为RAC无法确定使用哪个网卡作为Private Interconnect,解决方法:在两个ASM的pfile文件里添加如下参数:
+ASM1.cluster_interconnects='10.10.17.221'
+ASM2.cluster_interconnects='10.10.17.222'
9手工向OCR中添加Database对象。
[oracle@node1 ~]$ srvctl add database -d RACDB -o /opt/ora10g/product/10.2.0/db_1
10添加2个实例对象
[oracle@node1 ~]$ srvctl add instance -d RACDB -i RACDB1 -n node1
[oracle@node1 ~]$ srvctl add instance -d RACDB -i RACDB2 -n node2
11修改实例和ASM实例的依赖关系
[oracle@node1 ~]$ srvctl modify instance -d RACDB -i RACDB1 -s +ASM1
[oracle@node1 ~]$ srvctl modify instance -d RACDB -i RACDB2 -s +ASM2
12启动数据库
[oracle@node1 ~]$ srvctl start database -d RACDB
若也出现ORA-27550错误。也是因为RAC无法确定使用哪个网卡作为Private Interconnect,修改pfile参数在重启动即可解决。
SQL>alter system set cluster_interconnects='10.10.17.221' scope=spfile sid='RACDB1';
SQL>alter system set cluster_interconnects='10.10.17.222' scope=spfile sid='RACDB2';
[root@node1 ~]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....B1.inst application ONLINE ONLINE node1
ora....B2.inst application ONLINE ONLINE node2
ora.RACDB.db application ONLINE ONLINE node1
ora....SM1.asm application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2