RAC环境替换多路径软件后cssd服务无法启动的恢复

系统环境:Centos6.8
数据库版本:11.2.0.4.0

由华为多路径替换为Centos自带的device-mapper-multipath后,RAC集群启动一直卡在CSSD服务,状态一直是starting。

 

由于11gR2中CRS服务依赖于ASM,因为ocr存放在ASM中,所以ASM若无法有效启动,这导致CRS服务也无法正常工作:

集群日志:

2022-03-26 12:08:02.469: 
[ohasd(28938)]CRS-2112:The OLR service started on node RAC01.
2022-03-26 12:08:02.486: 
[ohasd(28938)]CRS-1301:Oracle High Availability Service started on node RAC01.
2022-03-26 12:08:02.493: 
[ohasd(28938)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2022-03-26 12:08:06.212: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(29065)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 
2022-03-26 12:08:10.405: 
[ohasd(28938)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 
2022-03-26 12:08:10.412: 
[gpnpd(29482)]CRS-2328:GPNPD started on node RAC01. 
2022-03-26 12:08:12.745: 
[cssd(29564)]CRS-1713:CSSD daemon is started in clustered mode
2022-03-26 12:08:14.621: 
[ohasd(28938)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2022-03-26 12:08:14.621: 
[ohasd(28938)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2022-03-26 12:08:17.884: 
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:08:32.900: 
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:08:47.916: 
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:09:02.932: 
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:09:17.949: 
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:09:32.965: 
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:09:47.981: 
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:10:02.998: 
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log

 ocssd.log


2022-03-26 12:09:47.981: [    CLSF][3717953280]checksum failed for disk:/dev/asm-datadisk01-new:
2022-03-26 12:09:47.981: [    CLSF][3717953280]Error: obj 2147483658 blk 0 name 'check_kfbh' num1 1289612970 num2 2751807285
2022-03-26 12:09:47.981: [    CLSF][3717953280]bh: ptr 0x7f5dc8138e00 size 512
2022-03-26 12:09:47.981: [   SKGFD][3717953280]bh:  dump of 0x0x7f5dc8138e00, len 512
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e00 01 82 01 01 00 00 00 00 - 0a 00 00 80 aa ee dd 4c ...............L
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e10 a0 6f 80 15 00 00 00 00 - 00 00 00 00 00 00 00 00 .o..............
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e20 4f 52 43 4c 44 49 53 4b - 00 00 00 00 00 00 00 00 ORCLDISK........
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e30 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e40 00 00 20 0b 0a 00 01 03 - 44 41 54 41 5f 30 30 31 .. .....DATA_001
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e50 30 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 0...............
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e60 00 00 00 00 00 00 00 00 - 44 41 54 41 00 00 00 00 ........DATA....
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e70 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e80 00 00 00 00 00 00 00 00 - 44 41 54 41 5f 30 30 31 ........DATA_001
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138e90 30 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 0...............
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138ea0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138eb0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138ec0 00 00 00 00 00 00 00 00 - d1 8e f9 01 00 1c 9b 10 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138ed0 d1 8e f9 01 00 28 9b 10 - 00 02 00 10 00 00 10 00 .....(..........
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138ee0 80 bc 01 00 00 00 20 00 - 14 00 00 00 01 00 00 00 ...... .........
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138ef0 02 00 00 00 12 1c 0d 00 - 0a 00 ff ff ff ff ff ff ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f00 00 00 10 0a ab 51 f8 01 - 00 b0 35 65 00 00 00 00 .....Q....5e....
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f10 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f20 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f30 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f40 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f50 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f60 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f70 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f80 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138f90 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138fa0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138fb0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138fc0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138fd0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138fe0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]0x0x7f5dc8138ff0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc81388e0 for disk :/dev/asm-datadisk01-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8139110 for disk :/dev/asm-datadisk02-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc813abb0 for disk :/dev/asm-datadisk09-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc813b660 for disk :/dev/asm-datadisk03-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc813c280 for disk :/dev/asm-datadisk10-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc81413c0 for disk :/dev/asm-datadisk05-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8141fe0 for disk :/dev/asm-datadisk04-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8142c00 for disk :/dev/asm-datadisk06-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8143820 for disk :/dev/asm-datadisk08-new:

2022-03-26 12:09:47.981: [   SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8144440 for disk :/dev/asm-datadisk07-new:

2022-03-26 12:09:47.981: [    CSSD][3717953280]clssnmvDiskVerify: Successful discovery of 0 disks
2022-03-26 12:09:47.981: [    CSSD][3717953280]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2022-03-26 12:09:47.981: [    CSSD][3717953280]clssnmvFindInitialConfigs: No voting files found
2022-03-26 12:09:47.982: [    CSSD][3717953280](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2022-03-26 12:09:47.986: [    CSSD][3720496896]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f5dd406ed80) client((nil))
2022-03-26 12:09:47.986: [    CSSD][3720496896]clssgmDeadProc: proc 0x7f5dd406ed80
2022-03-26 12:09:47.986: [    CSSD][3720496896]clssgmDestroyProc: cleaning up proc(0x7f5dd406ed80) con(0x896) skgpid  ospid 29509 with 0 clients, refcount 0
2022-03-26 12:09:47.986: [    CSSD][3720496896]clssgmDiscEndpcl: gipcDestroy 0x896
2022-03-26 12:09:52.573: [    CSSD][3720496896]clssscSelect: cookie accept request 0x26156b0

checksum failed for disk:/dev/asm-datadisk01-new:
Error: obj 2147483658 blk 0 name 'check_kfbh' num1 1289612970 num2 2751807285
bh: ptr 0x7f5dc8138e00 size 512
bh:  dump of 0x0x7f5dc8138e00, len 512

clssnmvDiskVerify: Successful discovery of 0 disks

找不到votedisk


解决方法:

1、首先彻底关闭OHASD服务:

crsctl stop has -f 

2、以-excl -nocrs方式启动CRS,这将仅启动ASM 实例而不会启动CRS服务:

crsctl start crs -excl -nocrs 

3、修改ASM实例的disk_strings为当前的ASM DISK PATH信息,并重建spfile文件:

[root@RAC01 ~]# su - grid

[grid@RAC01 ~]$ sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.4.0 Production on Sun Jul 15 04:40:40 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> ALTER SYSTEM SET asm_diskgroups = CRS, DATA;

System altered.

SQL> alter system set asm_diskstring='/dev/asm*';

System altered.

SQL> alter diskgroup CRS mount;

Diskgroup altered.

SQL> alter diskgroup DATA mount;

Diskgroup altered.

SQL> create spfile from memory;

File created.

SQL> startup force mount;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /g01/grid/app/11.2.0/grid/dbs/
                                                 spfile+ASM1.ora

SQL> show parameter disk

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      CRS, DATA
asm_diskstring                       string      /dev/asm*

SQL> create pfile from spfile;

File created.

SQL> create spfile='+CRS' from pfile;

File created.

SQL> startup force;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +CRS/RAC-cluster/asmparameterfile/registry.253.788682933

以上成功修改了asm_diskstring ,且更新了ASM DISKGROUP上的SPFILE , 由于ASM使用共享的SPFILE所以其他节点上一般无需在做其他操作。

4、crsctl replace votedisk 命令将votedisk重置位置:

[root@RAC01 ~]# crsctl replace votedisk +CRS
Successful addition of voting disk b0d8ba07a9684fcfbfe7660e829128d5.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced
[root@RAC01 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   b0d8ba07a9684fcfbfe7660e829128d5 (/dev/asm-crsdisk-new) [CRS]
Located 1 voting disk(s).
[root@RAC01 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2940
         Available space (kbytes) :     259180
         ID                       : 2028826513
         Device/File Name         :       +CRS
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

以上replace了votedisk到新的 ASM DISK上,并确认votedisk和OCR均为可用状态。

5、重启CRS服务:

[root@RAC01 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'RAC01'
CRS-2673: Attempting to stop 'ora.ctssd' on 'RAC01'
CRS-2673: Attempting to stop 'ora.asm' on 'RAC01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'RAC01'
CRS-2677: Stop of 'ora.mdnsd' on 'RAC01' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'RAC01' succeeded
CRS-2677: Stop of 'ora.asm' on 'RAC01' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'RAC01'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'RAC01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'RAC01'
CRS-2677: Stop of 'ora.cssd' on 'RAC01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'RAC01'
CRS-2677: Stop of 'ora.gipcd' on 'RAC01' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'RAC01'
CRS-2677: Stop of 'ora.gpnpd' on 'RAC01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'RAC01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@RAC01 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

[root@RAC01 ~]# crsctl status res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       RAC01              Started             
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       RAC01                                  
ora.crf
      1        OFFLINE OFFLINE                                                   
ora.crsd
      1        ONLINE  ONLINE       RAC01                                  
ora.cssd
      1        ONLINE  ONLINE       RAC01                                  
ora.cssdmonitor
      1        ONLINE  ONLINE       RAC01                                  
ora.ctssd
      1        ONLINE  ONLINE       RAC01              OBSERVER            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  ONLINE       RAC01                                  
ora.gipcd
      1        ONLINE  ONLINE       RAC01                                  
ora.gpnpd
      1        ONLINE  ONLINE       RAC01                                  
ora.mdnsd
      1        ONLINE  ONLINE       RAC01 

因为上面更新了ASM共享使用的SPFILE,所以其他节点上一般不会存在问题,直接重启后CRS即可正常工作。

以上修复过程参考:在11gR2 RAC中修改ASM DISK Path磁盘路径

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

楚枫默寒

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值