模拟一次Oracle 11gRAC仲裁盘故障(无备份恢复OCR和OLR)


title: 模拟一次Oracle 11gRAC仲裁盘故障(无备份恢复OCR和OLR)
categories: 数据库
tags:
- RAC
- Oracle
timezone: Asia/Shanghai
date: 2019-01-06

环境

[root@redhat6 ~]# cat  /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.10 (Santiago)

p13390677_112040_Linux-x86-64_1of7.zip
p13390677_112040_Linux-x86-64_2of7.zip
p13390677_112040_Linux-x86-64_3of7.zip

模拟磁盘数据损坏

查看表决磁盘
[grid@rac1 ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   a77aa6d0e5274fc3bf891bfceba9e54b (/dev/raw/raw1) [DATA]
 2. ONLINE   0968eed4e9f24fc8bf2986e1edd74a8b (/dev/raw/raw2) [DATA]
 3. ONLINE   3ad2768be3f54f14bf52fd4805921bb5 (/dev/raw/raw5) [DATA]

破坏表决磁盘
dd if=/dev/zero of=/dev/raw/raw1 bs=1024K count=1
dd if=/dev/zero of=/dev/raw/raw2 bs=1024K count=1
dd if=/dev/zero of=/dev/raw/raw5 bs=1024K count=1

使用以下命令来查看服务器集群状态,发现全部报错。

查看监听状态也报错。服务器已经彻底挂了。

[oracle@rac1 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 29-DEC-2018 18:15:30

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
TNS-12541: TNS:no listener
 TNS-12560: TNS:protocol adapter error
  TNS-00511: No listener
   Linux Error: 111: Connection refused

ASMCMD看已经看不到表决盘

[grid@rac1 ~]$ asmcmd         
ASMCMD> ls
BAK/
DB/

登录asm实例()

[grid@rac1 ~]$ sqlplus / as sysasm

查看磁盘组状态(data容量已变0)

SQL> select group_number,name,state,total_mb,free_mb from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE         TOTAL_MB    FREE_MB
------------ ------------------------------ ----------- ---------- ----------
           1 BAK                            MOUNTED          10239      10144
           2 DATA                           MOUNTED              0          0
           3 DB                             MOUNTED          20479      16110

损坏后的处理

1.强制关闭crs(by all)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped

[root@rac2 ~]# /u01/app/11.2.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'
CRS-2673: Attempting to stop 'ora.crf' on 'rac2'
CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
2.exclusive模式启动crs 以-excl -nocrs 方式启动集群,这将启动ASM实例 但不启动CRS 启动到独占模式且不启动ora.crsd(节点1)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
3.使用root用户清空所有节点cluster配置信息

恢复过程其实就是使用root.sh重建ocr的过程,重建之后可能需要重新注册相关资源如listener or database 实例等!在清空之前呢我们需要先使用rootcrs.pl脚本脚本清空所有节点cluster配置信息。

(11G中存在rootcrs.pl脚本,该脚本默认存在于$GRID_HOME/crs/install)。

节点1执行:

/u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force

节点2执行:因为我的RAC是双节点的,那么在清除最后一个使用lastnode参数:

/u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force -lastnode
4.挂载一个新的磁盘用于替换原来的表决盘(如果不更换表决盘,此步骤可直接跳过)

crsconfig_params配置文件可以用于查看之前设置表决盘信息和修改表决盘操作。

1.iscsi挂载新磁盘并重启服务器生效(by all)

ll /dev/sd*
brw-rw---- 1 root disk 8, 96 12月 29 18:38 /dev/sdg     (多出来的g)

2.设置裸盘(by all)(fdisk分区操作只在一个节点执行就可以了)

fdisk /dev/sdg

vim /etc/udev/rules.d/60-raw.rules      # 增加以下行
ACTION=="add", KERNEL=="sdg1", RUN+="/bin/raw /dev/raw/raw6 %N"

3.重启udev并查看(by all)

/sbin/start_udev

[root@rac1 ~]# ll /dev/raw/
总用量 0
crw-rw---- 1 grid asmadmin 162, 1 12月 30 11:00 raw1
crw-rw---- 1 grid asmadmin 162, 2 12月 30 11:00 raw2
crw-rw---- 1 grid asmadmin 162, 3 12月 30 11:00 raw3
crw-rw---- 1 grid asmadmin 162, 4 12月 30 11:00 raw4
crw-rw---- 1 grid asmadmin 162, 5 12月 30 11:00 raw5
crw-rw---- 1 grid asmadmin 162, 6 12月 30 11:00 raw6        # 此块为新挂载的
crw-rw---- 1 root disk     162, 0 12月 30 11:00 rawctl

4.修改crsconfig_params配置文件,此配置文件里记录了等下重建rac所需要的一些配置信息,因为我们要将表决盘从3块换成1一块,所以需要更改为正确的信息。(by all)

vim /u01/app/11.2.0/grid/crs/install/crsconfig_params

将
ASM_DISKS=/dev/raw/raw1,/dev/raw/raw2,/dev/raw/raw5
ASM_REDUNDANCY=NORMAL
改为
ASM_REDUNDANCY=External
ASM_DISKS=/dev/raw/raw6
5.使用root用户重建ocr和olr(by all)

使用root.sh脚本完成重建,其实这就是在安装RAC中执行的脚本,默认位置为:$GRID_HOME/

/u01/app/11.2.0/grid/root.sh
6.使用grid用户将监听程序配置添加到 Oracle Clusterware。(by all)
srvctl add listener -l listener

[grid@rac1 ~]$ srvctl add listener -h
将监听程序配置添加到 Oracle Clusterware。
用法: srvctl add listener [-l <lsnr_name>] [-s] [-p "[TCP:]<port>[, ...][/IPC:<key>][/NMP:<pipe_name>][/TCPS:<s_port>] [/SDP:<port>]"] [-o <oracle_home>] [-k <net_num>]
    -l <lsnr_name>           监听程序名称 (默认名称为 LISTENER)
    -o <oracle_home>         ORACLE_HOME 路径 (默认值为 CRS_HOME)
    -k <net_num>             网络编号 (默认编号为 1)
    -s                       跳过端口检查
    -p "[TCP:]<port>[, ...][/IPC:<key>][/NMP:<pipe_name>][/TCPS:<s_port>] [/SDP:<port>]"       逗号分隔的 TCP 端口或监听程序端点
    -h                       输出用法
7.使用oracle用户将数据库配置添加到 Oracle Clusterware。(by all)
srvctl add database -d oracledbrac -o /u01/app/oracle/product/11.2.0/db -c RAC

[oracle@rac2 ~]$ srvctl add database -h
将数据库配置添加到 Oracle Clusterware。
用法: srvctl add database -d <db_unique_name> -o <oracle_home> [-c {RACONENODE | RAC | SINGLE} [-e <server_list>] [-i <inst_name>] [-w <timeout>]] [-m <domain_name>] [-p <spfile>] [-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY}] [-s <start_options>] [-t <stop_options>] [-n <db_name>] [-y {AUTOMATIC | MANUAL | NORESTART}] [-g \"<serverpool_list>\"] [-x <node_name>] [-a \"<diskgroup_list>\"] [-j \"<acfs_path_list>\"]
    -d <db_unique_name>      数据库的唯一名称
    -o <oracle_home>         ORACLE_HOME 路径
    -c <type>                数据库的类型: RAC One Node, RAC 或单实例
    -e <server_list>         RAC One Node 数据库的候选服务器列表
    -i <inst_name>            管理员管理的 RAC One Node 数据库的实例名前缀 (默认值为 <db_unique_name> 的前 12 个字符)
    -w <timeout>             联机重新定位超时 (分钟)
    -x <node_name>           节点名。为单实例数据库指定 -x 选项
    -m <domain>              数据库的域。如果数据库设置了 DB_DOMAIN, 则必须设置此项。
    -p <spfile>              服务器参数文件路径
    -r <role>                数据库的角色 (primary, physical_standby, logical_standby, snapshot_standby)
    -s <start_options>       数据库启动选项。启动选项的示例包括 OPEN, MOUNT 或 'READ ONLY'。
    -t <stop_options>        数据库停止选项。关闭选项的示例包括 NORMAL, TRANSACTIONAL, IMMEDIATE 或 ABORT。
    -n <db_name>             数据库名 (DB_NAME), 如果该名称不同于由 -d 选项提供的唯一名称
    -y <dbpolicy>            数据库的管理策略 (AUTOMATIC, MANUAL 或 NORESTART)
    -g "<serverpool_list>"   逗号分隔的数据库服务器池名称列表
    -a "<diskgroup_list>"    逗号分隔的磁盘组列表
    -j "<acfs_path_list>"    将设置数据库相关性的 ACFS 路径的逗号分隔列表
    -h                       输出用法
8.使用oracle将数据库实例配置添加到 Oracle Clusterware。(可以在一台服务器上执行)
srvctl add instance -d oracledbrac -i orcl1 -n rac1
srvctl add instance -d oracledbrac -i orcl2 -n rac2

[oracle@rac1 ~]$ srvctl add instance -h
将数据库实例配置添加到 Oracle Clusterware。
用法: srvctl add instance -d <db_unique_name> -i <inst_name> -n <node_name> [-f]
    -d <db_unique_name>      数据库的唯一名称
    -i <inst>                实例名
    -n <node_name>           节点名
    -f                       强制执行添加操作, 即使某些资源将被停止
    -h                       输出用法
9.使用root用户重启crs(在一台执行就可以)
[root@rac1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster -all
CRS-4690: Oracle Clusterware is already running on 'rac1'
CRS-4690: Oracle Clusterware is already running on 'rac2'
CRS-4000: Command Start failed, or completed with errors.

这里报CRS-4000错误,经检查两个节点的ASM数据盘没有mount

方法1:使用asmca在GUI下装载磁盘
方法2:使用asmcmd在命令下装载磁盘

装载磁盘以后,手动启动两个节点的数据库实例。

sqlplus / as sysdba
SQL> startup;
ORACLE instance started.

Total System Global Area  830930944 bytes
Fixed Size                  2257800 bytes
Variable Size             599788664 bytes
Database Buffers          226492416 bytes
Redo Buffers                2392064 bytes
Database mounted.
Database opened.

启动两个实例以后,尝试使用工具连接scan-ip数据库看是否成功。已经可以成功连接,至此集群恢复成功。

10.检查集群状态,所有状态均已正常。

此时将服务器重启后再此验证全部正常,至此RAC在OCR无备份情况下恢复过程全部完成。

[grid@rac1 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 30-DEC-2018 12:08:27
Copyright (c) 1991, 2013, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                30-DEC-2018 12:02:03
Uptime                    0 days 0 hr. 6 min. 24 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/11.2.0/grid/log/diag/tnslsnr/rac1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.0.1.131)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.0.1.133)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "oracledbrac" has 1 instance(s).
  Instance "orcl1", status READY, has 1 handler(s) for this service...
Service "orclXDB" has 1 instance(s).
  Instance "orcl1", status READY, has 1 handler(s) for this service...
The command completed successfully

[grid@rac1 ~]$ srvctl config database -d oracledbrac
数据库唯一名称: oracledbrac
数据库名: 
Oracle 主目录: /u01/app/oracle/product/11.2.0/db
Oracle 用户: oracle
Spfile: 
域: 
启动选项: open
停止选项: immediate
数据库角色: PRIMARY
管理策略: AUTOMATIC
服务器池: oracledbrac
数据库实例: orcl1,orcl2
磁盘组: DB
装载点路径: 
服务: 
类型: RAC
数据库是管理员管理的

[grid@rac1 ~]$ asmcmd lsdsk --statistics -G DATA
Reads  Write  Read_Errs  Write_Errs  Read_time  Write_Time  Bytes_Read  Bytes_Written  Voting_File  Path
 1418   5029          0           0   1.425113  406.580839    21956608       65479680            Y  /dev/raw/raw6

[grid@rac1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

[grid@rac1 ~]$ crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host        
----------------------------------------------------------------------
ora.BAK.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.DATA.dg    ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.DB.dg      ora....up.type 0/5    0/     ONLINE    ONLINE    rac1        
ora....ER.lsnr ora....er.type 0/5    0/     ONLINE    ONLINE    rac1        
ora....N1.lsnr ora....er.type 0/5    0/0    ONLINE    ONLINE    rac1        
ora.asm        ora.asm.type   0/5    0/     ONLINE    ONLINE    rac1        
ora.cvu        ora.cvu.type   0/5    0/0    ONLINE    ONLINE    rac1        
ora.gsd        ora.gsd.type   0/5    0/     OFFLINE   OFFLINE               
ora....network ora....rk.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.oc4j       ora.oc4j.type  0/1    0/2    ONLINE    ONLINE    rac1        
ora.ons        ora.ons.type   0/3    0/     ONLINE    ONLINE    rac1        
ora....brac.db ora....se.type 0/2    0/1    ONLINE    ONLINE    rac1        
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    0/5    0/0    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
ora.rac1.ons   application    0/3    0/0    ONLINE    ONLINE    rac1        
ora.rac1.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac1        
ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    0/5    0/0    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    0/5    0/0    OFFLINE   OFFLINE               
ora.rac2.ons   application    0/3    0/0    ONLINE    ONLINE    rac2        
ora.rac2.vip   ora....t1.type 0/0    0/0    ONLINE    ONLINE    rac2        
ora....ry.acfs ora....fs.type 0/5    0/     ONLINE    ONLINE    rac1        
ora.scan1.vip  ora....ip.type 0/0    0/0    ONLINE    ONLINE    rac1 

[grid@rac1 ~]$ olsnodes -n
rac1    1
rac2    2

[grid@rac1 ~]$ ps -ef|grep lsnr|grep -v 'grep'|grep -v 'ocfs'|awk '{print$9}'
LISTENER_SCAN1
LISTENER

[grid@rac1 ~]$ srvctl status asm -a
ASM 正在 rac2,rac1 上运行
ASM 已启用。

附:rootcrs.pl脚本使用帮助
[root@rac1 ~]# /u01/app/11.2.0/grid/crs/install/rootcrs.pl --help
Usage:
      rootcrs.pl [-verbose] [-upgrade [-force] | -patch]
                 [-paramfile <parameter-file>] 
                 [-deconfig [-deinstall] [-keepdg] [-force] [-lastnode]]
                 [-downgrade -oldcrshome <old crshome path> -version <old crs version> [-force] [-lastnode]]  
                 [-unlock [-crshome <path to crs home>] [-nocrsstop]]
                 [-init]

      Options:
       -verbose    Run this script in verbose mode
       -upgrade    Oracle HA is being upgraded from previous version
       -patch      Oracle HA is being upgraded to a patch version
       -paramfile  Complete path of file specifying HA parameter values
       -lastnode   Force the node this script is executing on to be considered
                   as the last node of deconfiguration or downgrade, and perform
                   actions associated with deconfiguring or downgrading the
                   last node 强制执行此脚本的节点被视为取消配置或降级的最后一个节点
       -downgrade  Downgrade the clusterware
       -version    For use with downgrade; special handling is required if
                   downgrading to 9i. This is the old crs version in the format
                   A.B.C.D.E (e.g 11.1.0.6.0).
       -deconfig   Remove Oracle Clusterware to allow it to be uninstalled or reinstalled               删除Oracle Clusterware以允许卸载或重新安装
       -force      Force the execution of steps in delete or dwongrade that cannot
                   be verified to be safe   强制执行无法验证为安全的步骤
       -deinstall  Reset the permissions on CRS home during de-configuration
       -keepdg     Keep existing diskgroups during de-configuration
       -unlock     Unlock CRS home 
       -crshome    Complete path of crs home. Use with unlock option
       -oldcrshome For use with downgrade. Complete path of the old crs home
       -nocrsstop  used with unlock option to reset permissions on an inactive grid home
       -init       Reset the permissions of all files and directories under CRS home


      If neither -upgrade nor -patch is supplied, a new install is performed

      To see the full manpage for this program, execute:
        perldoc rootcrs.pl
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值