Oracle Clusterware把整个集群的配置信息放在共享存储上,这些信息包括了集群节点的列表、集群数据库实例到节点的映射以及CRS应用程序资源信息。也即是存放在ocr 磁盘(或者ocfs文件)上。因此对于这个配置文件的重要性是不言而喻的。任意使得ocr配置发生变化的操作在操作之间或之后都建议立即备份ocr。本文主要基于Oracle 10g RAC环境描述OCR的备份与恢复。
一、OCR的备份与恢复概念
与Oracle数据库备份恢复相似,OCR的备份也有物理备份或逻辑备份的概念,因此有两种备份方式,两种恢复方式。
物理备份与恢复:
缺省情况下,Oracle 每4个小时对其做一次备份,并且保留最后的3个副本,以及前一天,前一周的最后一个备份副本。
用户不能自定义备份频率以及备份文件的副本数。
对于OCR的备份备份由是由Master Node CRSD进程完成,因此备份的默认位置是$CRS_HOME/crs/cdata/<cluster_name>目录下。
备份的文件会自动更名,以反应备份时间顺序,最近一次的备份叫作backup00.ocr。
由于是在Master Node的节点之上进行备份,因此备份文件仅存在于Master Node节点。
对于Master Node的节点crash之后则由剩余节点接管。
备份目录可以通过ocrconfig -backuploc <directory_name> 命令修改。
OCR磁盘最多只能有两个,一个Primary OCR 和一个Mirror OCR。两者互为镜像以避免单点故障。
对于物理备份恢复,不能简单的使用操作系统级别的复制命令(使用ocr文件时)来完成,该操作将导致ocr不可用。
逻辑备份与恢复:
使用ocrconfig -export 方式产生的备份,统称之为逻辑备份。
对于重大的ocr配置发生变化前后,如添加删除节点,修改集群资源,创建数据库等,都建议使用逻辑备份。
对于由于错误配置而导致的ocr被损坏的情形下,我们可以使用ocrconfig -import方式进行恢复。
对于这种逻辑方式也可以还原丢失或损坏的ocr磁盘(文件)。
备份建议:
将oracle的自动备份产生的文件复制到共享或其它可用存储设备上。
每天至少导出一次ocr配置信息。
二、备份OCR
一、OCR的备份与恢复概念
与Oracle数据库备份恢复相似,OCR的备份也有物理备份或逻辑备份的概念,因此有两种备份方式,两种恢复方式。
物理备份与恢复:
缺省情况下,Oracle 每4个小时对其做一次备份,并且保留最后的3个副本,以及前一天,前一周的最后一个备份副本。
用户不能自定义备份频率以及备份文件的副本数。
对于OCR的备份备份由是由Master Node CRSD进程完成,因此备份的默认位置是$CRS_HOME/crs/cdata/<cluster_name>目录下。
备份的文件会自动更名,以反应备份时间顺序,最近一次的备份叫作backup00.ocr。
由于是在Master Node的节点之上进行备份,因此备份文件仅存在于Master Node节点。
对于Master Node的节点crash之后则由剩余节点接管。
备份目录可以通过ocrconfig -backuploc <directory_name> 命令修改。
OCR磁盘最多只能有两个,一个Primary OCR 和一个Mirror OCR。两者互为镜像以避免单点故障。
对于物理备份恢复,不能简单的使用操作系统级别的复制命令(使用ocr文件时)来完成,该操作将导致ocr不可用。
逻辑备份与恢复:
使用ocrconfig -export 方式产生的备份,统称之为逻辑备份。
对于重大的ocr配置发生变化前后,如添加删除节点,修改集群资源,创建数据库等,都建议使用逻辑备份。
对于由于错误配置而导致的ocr被损坏的情形下,我们可以使用ocrconfig -import方式进行恢复。
对于这种逻辑方式也可以还原丢失或损坏的ocr磁盘(文件)。
备份建议:
将oracle的自动备份产生的文件复制到共享或其它可用存储设备上。
每天至少导出一次ocr配置信息。
二、备份OCR
-
1、OCR的自动备份
-
#使用ocrconfig -showbackup查看ocr备份所在节点及路径
-
oracle@bo2dbp:~> ocrconfig -showbackup
-
-
bo2dbp 2013/ 02/ 25 06: 23: 15 /u01/oracle/crs/cdata/crs
-
-
bo2dbp 2013/ 02/ 25 02: 23: 13 /u01/oracle/crs/cdata/crs
-
-
bo2dbp 2013/ 02/ 24 22: 23: 13 /u01/oracle/crs/cdata/crs
-
-
bo2dbp 2013/ 02/ 24 02: 23: 09 /u01/oracle/crs/cdata/crs
-
-
bo2dbp 2013/ 02/ 22 18: 23: 04 /u01/oracle/crs/cdata/crs
-
oracle@bo2dbp:~> ls -hltr /u01/oracle/crs/cdata/crs
-
total 40M
-
-rw-r--r-- 1 root root 6.7M 2013 -02 -22 18: 23 week.ocr
-
-rw-r--r-- 1 root root 6.7M 2013 -02 -24 02: 23 day.ocr
-
-rw-r--r-- 1 root root 6.7M 2013 -02 -24 22: 23 backup02.ocr
-
-rw-r--r-- 1 root root 6.7M 2013 -02 -25 02: 23 backup01.ocr
-
-rw-r--r-- 1 root root 6.7M 2013 -02 -25 02: 23 day_.ocr
-
-rw-r--r-- 1 root root 6.7M 2013 -02 -25 06: 23 backup00.ocr
-
-
#改变物理备份路径
-
ocrconfig -backuploc <new_dirname>
-
-
#使用物理备份恢复ocr
-
ocrconfig -restore <backup_file_name>
-
对于物理备份,仅仅只能使用restore方式来进行恢复,而不支持 import方式
-
-
2、OCR的手动备份
-
OCR的手动备份也即是逻辑备份,使用-export方式来实现
-
ocrconfig -export <backup_file_name>
-
-
#备份示例
-
#建议在不同的节点导出ocr,导出位置尽可能存放在共享磁盘,以便任意节点均可从该磁盘恢复
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/ocrconfig -export /u02/crs_bak/ocr_bak/exp/bo2dbp/ocr_bak.dmp
-
root '''s password:
-
oracle@bo2dbp:/u02/crs_bak/ocr_bak/exp/bo2dbp> ls -hltr /u02/crs_bak/ocr_bak/exp/bo2dbp/ocr_bak.dmp
-
-rw-r--r-- 1 root root 144K 2013-02-25 10:10 /u02/crs_bak/ocr_bak/exp/bo2dbp/ocr_bak.dmp
-
oracle@bo2dbs:~> sudo -s /u01/oracle/crs/bin/ocrconfig -export /u02/crs_bak/ocr_bak/exp/bo2dbs/ocr_bak.dmp
-
root'''s password:
-
1、从可用的OCR镜像中恢复受损的OCR
-
a、首先模拟ocr 损坏
-
oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw1 bs= 1024k count= 10
-
10+ 0 records in
-
10+ 0 records out
-
10485760 bytes ( 10 MB) copied, 0.24662 seconds, 42.5 MB/s
-
-
oracle@bo2dbp:~> ocrcheck
-
Status of Oracle Cluster Registry is as follows :
-
Version : 2
-
Total space (kbytes) : 204560
-
Used space (kbytes) : 6184
-
Available space (kbytes) : 198376
-
ID : 1512159503
-
Device/File Name : /dev/raw/raw1
-
Device/File integrity check failed
-
n Device/File Name : /dev/raw/raw11
-
Device/File integrity check succeeded
-
-
Cluster registry integrity check succeeded
-
-
oracle@bo2dbp:~> ocrcheck
-
Status of Oracle Cluster Registry is as follows :
-
Version : 2
-
Total space (kbytes) : 204560
-
Used space (kbytes) : 6184
-
Available space (kbytes) : 198376
-
ID : 1512159503
-
Device/File Name : /dev/raw/raw1
-
Device/File needs to be synchronized with the other device
-
Device/File Name : /dev/raw/raw11
-
Device/File integrity check succeeded
-
-
Cluster registry integrity check succeeded
-
-
#尽管此时ocr文件被破坏,但整个集群依旧处于Online状态,此处不列出,读者可自行验证
-
#接下来修复ocr
-
-
b、校验所在的裸设备处于可用状态
-
oracle@bo2dbp:~> sudo -s rcraw status | grep raw1
-
root ''''s password:
-
/dev/raw/raw1: bound to major 8, minor 33
-
/dev/raw/raw11: bound to major 8, minor 113
-
-
c、校验裸设备的权限
-
oracle@bo2dbp:~> ls -hltr /dev/raw/raw1
-
crw-rw---- 1 oracle dba 162, 1 2013-02-05 16:00 /dev/raw/raw1
-
oracle@bo2dbp:~> ssh bo2dbs ls -hltr /dev/raw/raw1
-
crw-rw---- 1 oracle dba 162, 1 2013-02-05 10:28 /dev/raw/raw1
-
-
d、重新初始化裸设备
-
oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw1 bs=1024k count=200
-
dd: writing `/dev/raw/raw1': No space left on device
-
200+0 records in
-
199+0 records out
-
209698816 bytes (210 MB) copied, 4.84775 seconds, 43.3 MB/s
-
-
e、从镜像ocr恢复主ocr
-
#实际上等同于添加一个新的ocr。此时主ocr从镜像ocr复制内容。
-
#对于镜像ocr的损坏可以采用相同的方式如法炮制。
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/ocrconfig -replace ocr /dev/raw/raw1
-
root''' 's password:
-
oracle@bo2dbp:~> ocrcheck
-
Status of Oracle Cluster Registry is as follows :
-
Version : 2
-
Total space (kbytes) : 204560
-
Used space (kbytes) : 6184
-
Available space (kbytes) : 198376
-
ID : 1512159503
-
Device/File Name : /dev/raw/raw1
-
Device/File integrity check succeeded
-
Device/File Name : /dev/raw/raw11
-
Device/File integrity check succeeded
-
-
Cluster registry integrity check succeeded
-
-
f、校验修复情况
-
oracle@bo2dbp:~> cluvfy comp ocr -n all
-
-
Verifying OCR integrity
-
-
Checking OCR integrity...
-
-
Checking the absence of a non-clustered configuration...
-
All nodes free of non-clustered, local-only configurations.
-
-
Uniqueness check for OCR device passed.
-
-
Checking the version of OCR...
-
OCR of correct Version "2" exists.
-
-
Checking data integrity of OCR...
-
Data integrity check for OCR passed.
-
-
OCR integrity check passed.
-
-
Verification of OCR integrity was successful.
-
-
2、从逻辑备份(导出的文件)中恢复OCR
-
a、首先查看一下ocr的位置
-
oracle@bo2dbp:~> more /etc/oracle/ocr.loc
-
#Device/file /dev/raw/raw1 getting replaced by device /dev/raw/raw1
-
ocrconfig_loc=/dev/raw/raw1
-
ocrmirrorconfig_loc=/dev/raw/raw11
-
local_only=false
-
-
b、停止两个节点上的crs
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/crsctl stop crs
-
root' '''s password:
-
Stopping resources. This could take several minutes.
-
Successfully stopped CRS resources.
-
Stopping CSSD.
-
Shutting down CSS daemon.
-
Shutdown request successfully issued.
-
oracle@bo2dbp:~> ps -ef | grep d.bin | grep -v grep
-
-
oracle@bo2dbs:~> sudo -s /u01/oracle/crs/bin/crsctl stop crs
-
root''' 's password:
-
Stopping resources. This could take several minutes.
-
Successfully stopped CRS resources.
-
Stopping CSSD.
-
Shutting down CSS daemon.
-
Shutdown request successfully issued.
-
# Author : Robinson
-
# Blog : http://blog.csdn.net/robinson_0612
-
oracle@bo2dbs:~> ps -ef | grep d.bin | grep -v grep
-
-
c、尝试破坏ocr
-
oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw1 bs=1024k count=10
-
10+0 records in
-
10+0 records out
-
10485760 bytes (10 MB) copied, 0.1811 seconds, 57.9 MB/s
-
oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw11 bs=1024k count=10
-
10+0 records in
-
10+0 records out
-
10485760 bytes (10 MB) copied, 0.167224 seconds, 62.7 MB/s
-
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/crsctl start crs
-
Attempting to start CRS stack
-
The CRS stack will be started shortly
-
oracle@bo2dbp:~> ps -ef | grep d.bin | grep -v grep
-
oracle@bo2dbp:~> ./crs_stat.sh #这个查看已经无法同crs通信
-
Resource name Target State
-
-------------- ------ -----
-
error connecting to CRSD at [(ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))] clsccon 184
-
oracle@bo2dbp:~> crs_stat -t
-
CRS-0184: Cannot communicate with the CRS daemon.
-
-
d、从导出的备份文件中恢复ocr
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/ocrconfig -import /u02/crs_bak/ocr_bak/exp/bo2dbp/ocr_bak.dmp
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/crsctl start crs
-
Attempting to start CRS stack
-
The CRS stack will be started shortly
-
oracle@bo2dbp:~> ps -ef | grep d.bin | grep -v grep
-
oracle 27209 23220 0 10:32 ? 00:00:00 /u01/oracle/crs/bin/evmd.bin
-
root 27307 23392 0 10:32 ? 00:00:01 /u01/oracle/crs/bin/crsd.bin reboot
-
oracle 27613 27153 0 10:32 ? 00:00:00 /u01/oracle/crs/bin/ocssd.bin
-
-
#尝试启动第2个几点的crs
-
oracle@bo2dbs:~> sudo -s /u01/oracle/crs/bin/crsctl start crs
-
root' '''s password:
-
Attempting to start CRS stack
-
The CRS stack will be started shortly
-
-
e、在第二个节点上执行ocrcheck,此时显示ocrcheck成功
-
oracle@bo2dbs:~> ocrcheck
-
Status of Oracle Cluster Registry is as follows :
-
Version : 2
-
Total space (kbytes) : 204560
-
Used space (kbytes) : 6184
-
Available space (kbytes) : 198376
-
ID : 1325424958
-
Device/File Name : /dev/raw/raw1
-
Device/File integrity check succeeded
-
Device/File Name : /dev/raw/raw11
-
Device/File integrity check succeeded
-
-
Cluster registry integrity check succeeded
-
-
oracle@bo2dbs:~> cluvfy comp ocr -n all #使用cluvfy工具校验
-
-
Verifying OCR integrity
-
-
Checking OCR integrity...
-
-
Checking the absence of a non-clustered configuration...
-
All nodes free of non-clustered, local-only configurations.
-
-
Uniqueness check for OCR device passed.
-
-
Checking the version of OCR...
-
OCR of correct Version "2" exists.
-
-
Checking data integrity of OCR...
-
Data integrity check for OCR passed.
-
-
OCR integrity check passed.
-
-
Verification of OCR integrity was successful.
-
-
3、从物理备份中恢复OCR
-
a、查看ocr的备份信息
-
oracle@bo2dbp:~> ocrconfig -showbackup
-
-
bo2dbp 2013/02/25 06:23:15 /u01/oracle/crs/cdata/crs
-
-
bo2dbp 2013/02/25 02:23:13 /u01/oracle/crs/cdata/crs
-
-
bo2dbp 2013/02/24 22:23:13 /u01/oracle/crs/cdata/crs
-
-
bo2dbp 2013/02/24 02:23:09 /u01/oracle/crs/cdata/crs
-
-
bo2dbp 2013/02/22 18:23:04 /u01/oracle/crs/cdata/crs
-
-
oracle@bo2dbp:~> ls -hltr /u01/oracle/crs/cdata/crs #此时ocr的备份位于节点1
-
total 40M
-
-rw-r--r-- 1 root root 6.7M 2013-02-22 18:23 week.ocr
-
-rw-r--r-- 1 root root 6.7M 2013-02-24 02:23 day.ocr
-
-rw-r--r-- 1 root root 6.7M 2013-02-24 22:23 backup02.ocr
-
-rw-r--r-- 1 root root 6.7M 2013-02-25 02:23 backup01.ocr
-
-rw-r--r-- 1 root root 6.7M 2013-02-25 02:23 day_.ocr
-
-rw-r--r-- 1 root root 6.7M 2013-02-25 06:23 backup00.ocr
-
-
b、尝试损坏ocr文件
-
oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw1 bs=1024k count=10
-
10+0 records in
-
10+0 records out
-
10485760 bytes (10 MB) copied, 0.279904 seconds, 37.5 MB/s
-
oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw11 bs=1024k count=10
-
10+0 records in
-
10+0 records out
-
10485760 bytes (10 MB) copied, 0.145885 seconds, 71.9 MB/s
-
-
#此时何ocr相关的操作都处于失败状态
-
oracle@bo2dbp:~> ocrcheck
-
Segmentation fault (core dumped)
-
oracle@bo2dbp:~> ocrconfig -showbackup
-
Segmentation fault (core dumped)
-
oracle@bo2dbp:~> crs_stat -t
-
Segmentation fault (core dumped)
-
-
#ASM实例和RAC实例依旧处于online
-
oracle@bo2dbp:~> ps -ef | grep pmon
-
oracle 7915 1 0 10:09 ? 00:00:00 asm_pmon_+ASM1
-
oracle 9234 1 0 10:10 ? 00:00:00 ora_pmon_ora10g1
-
oracle 31704 11229 0 10:26 pts/0 00:00:00 grep pmon
-
-
c、关闭crs,集群数据库及ASM
-
oracle@bo2dbp:~> export ORACLE_SID=ora10g1
-
oracle@bo2dbp:~> sqlplus / as sysdba
-
-
SQL> show parameter db_name
-
-
NAME TYPE VALUE
-
------------------------------------ ----------- ------------
-
db_name string ora10g
-
-
#此时查看一下ocr的位置,以便于恢复时查看对应的裸设备
-
oracle@bo2dbp:~> more /etc/oracle/ocr.loc
-
#Device/file /dev/raw/raw1 getting replaced by device /dev/raw/raw1
-
ocrconfig_loc=/dev/raw/raw1
-
ocrmirrorconfig_loc=/dev/raw/raw11
-
local_only=false
-
-
#停止crs,收到错误提示
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/crsctl stop crs
-
root''' 's password:
-
Segmentation fault
-
-
oracle@bo2dbs:~> sudo -s /u01/oracle/crs/bin/crsctl stop crs
-
root' '''s password:
-
Segmentation fault
-
-
#下面的查询中crsd进程已经crash
-
oracle@bo2dbp:~> ps -ef | grep d.bin | grep -v grep
-
oracle 5844 5189 0 10:09 ? 00:00:00 /u01/oracle/crs/bin/evmd.bin
-
oracle 6357 5883 0 10:09 ? 00:00:04 /u01/oracle/crs/bin/ocssd.bin
-
-
#关闭集群数据库
-
oracle@bo2dbp:~> export ORACLE_SID=ora10g1
-
oracle@bo2dbp:~> sqlplus / as sysdba
-
-
SQL> shutdown immediate;
-
-
oracle@bo2dbs:~> export ORACLE_SID=ora10g2
-
oracle@bo2dbs:~> sqlplus / as sysdba
-
-
SQL> shutdown immediate;
-
-
d、重启节点
-
bo2dbp:~ # reboot
-
bo2dbs:~ # reboot
-
-
e、校验ocr所在的裸设备及其权限
-
#校验所在的裸设备处于可用状态
-
oracle@bo2dbp:~> sudo -s rcraw status | grep raw1
-
root''' 's password:
-
/dev/raw/raw1: bound to major 8, minor 33
-
/dev/raw/raw11: bound to major 8, minor 113
-
-
#校验裸设备的权限
-
oracle@bo2dbp:~> ls -hltr /dev/raw/raw1
-
crw-rw---- 1 oracle dba 162, 1 2013-02-05 16:00 /dev/raw/raw1
-
oracle@bo2dbp:~> ssh bo2dbs ls -hltr /dev/raw/raw1
-
crw-rw---- 1 oracle dba 162, 1 2013-02-05 10:28 /dev/raw/raw1
-
-
#清空裸设备
-
oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw1 bs=1024k count=200
-
dd: writing `/dev/raw/raw1': No space left on device
-
200+ 0 records in
-
199+ 0 records out
-
209698816 bytes ( 210 MB) copied, 4.84775 seconds, 43.3 MB/s
-
oracle@bo2dbp:~> dd if=/dev/zero of=/dev/raw/raw11 bs= 1024k count= 200
-
dd: writing `/dev/raw/raw11 ': No space left on device
-
200+0 records in
-
199+0 records out
-
209698816 bytes (210 MB) copied, 2.30847 seconds, 90.8 MB/s
-
-
f、从物理备份中恢复ocr
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/ocrconfig -restore /u01/oracle/crs/cdata/crs/backup00.ocr
-
root' '''s password:
-
-
oracle@bo2dbp:~> sudo -s /u01/oracle/crs/bin/crsctl start crs
-
Attempting to start CRS stack
-
The CRS stack will be started shortly
-
-
oracle@bo2dbs:~> sudo -s /u01/oracle/crs/bin/crsctl start crs
-
root''' 's password:
-
Attempting to start CRS stack
-
The CRS stack will be started shortly
-
-
g、校验恢复结果
-
oracle@bo2dbp:~> ocrcheck
-
Status of Oracle Cluster Registry is as follows :
-
Version : 2
-
Total space (kbytes) : 204560
-
Used space (kbytes) : 6184
-
Available space (kbytes) : 198376
-
ID : 1512159503
-
Device/File Name : /dev/raw/raw1
-
Device/File integrity check succeeded
-
Device/File Name : /dev/raw/raw11
-
Device/File integrity check succeeded
-
-
Cluster registry integrity check succeeded
-
-
oracle@bo2dbp:~> cluvfy comp ocr -n all
-
-
Verifying OCR integrity
-
-
Checking OCR integrity...
-
-
Checking the absence of a non-clustered configuration...
-
All nodes free of non-clustered, local-only configurations.
-
-
Uniqueness check for OCR device passed.
-
-
Checking the version of OCR...
-
OCR of correct Version "2" exists.
-
-
Checking data integrity of OCR...
-
Data integrity check for OCR passed.
-
-
OCR integrity check passed.
-
-
Verification of OCR integrity was successful.
-
-
#校验application
-
oracle@bo2dbp:~> ./crs_stat.sh | grep bo2dbp
-
ora.bo2dbp.ASM1.asm ONLINE ONLINE on bo2dbp
-
ora.bo2dbp.LISTENER_BO2DBP.lsnr ONLINE ONLINE on bo2dbp
-
ora.bo2dbp.LISTENER_ORA10G_BO2DBP.lsnr ONLINE ONLINE on bo2dbp
-
ora.bo2dbp.gsd ONLINE ONLINE on bo2dbp
-
ora.bo2dbp.ons ONLINE ONLINE on bo2dbp
-
ora.bo2dbp.vip ONLINE ONLINE on bo2dbp
-
ora.ora10g.ora10g1.inst