故障处理-数据库本地磁盘100%,之后集群状态异常

故障报告

 

故障现象

测试数据库磁盘100%,将其日志删除之后,发现Oracle develop SQL应用程序连接的实例节点2仍然能够正常执行查询语句,但是数据库集群状态异常。

使用grid用户执行:

[root@node1 bin]# su - grid

[grid@node1 ~]$ cd /u01/app/11.2.0/grid_1/bin

[grid@node1 bin]$ pwd

/u01/app/11.2.0/grid_1/bin

[grid@node1 bin]$ ./crsctl status res -t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.


使用root用户执行:

[root@node1 bin]# ./crsctl status res -t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.

[root@node1 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

OHAS、CSS、EVM正常,CRS异常。

 

定位原因

检查CRS日志文件

$GRID_HOME/log/<nodename>/crsd/crsd.log


  37337 2016-08-19 22:21:56.292: [UiServer][3380524800] CS(0x7fca90016ef0)set Properties ( root,0x7fcaac15b640)

  37338 2016-08-19 22:21:56.303: [UiServer][3382626048]{1:35321:2977} Sending message to PE. ctx= 0x7fca94008b60, Client PID: 7985

  37339 2016-08-19 22:21:56.303: [   CRSPE][3384727296]{1:35321:2977} Processing PE command id=133976. Description: [Stat Resource : 0x7fca880c04a        0]

  37340 2016-08-19 22:21:56.304: [UiServer][3382626048]{1:35321:2977} Done for ctx=0x7fca94008b60

  37341 2016-08-19 22:24:47.808: [   CRSPE][3384727296]{0:1:5} State change received from node2 for ora.asm node2 1

  37342 2016-08-19 22:24:47.840: [   CRSPE][3384727296]{0:1:5} Processing PE command id=8072. Description: [Resource State Change (ora.asm node2 1        ) : 0x7fca880c4a60]

  37343 2016-08-19 22:24:47.997: [   CRSPE][3384727296]{0:1:5} State information for [ora.asm node2 1] has been lost, all we know is the initial c        heck timed out. Issuing check operations until we can operate on better data.

  37344 2016-08-19 22:24:48.722: [   CRSPE][3384727296]{0:1:5} State information for [ora.asm node2 1] is still bad. Issuing another check.


ora.asm进程状态信息异常。

 

检查日志文件

[grid@node1 crsd]$ vi crsd.log

Linux-x86_64 Error: 28: No space left on device

Additional information: 9925


2016-08-20 04:55:07.052: [  OCRASM][2829412128]proprasmo: kgfoCheckMount returned [7]

2016-08-20 04:55:07.052: [  OCRASM][2829412128]proprasmo: The ASM instance is down

2016-08-20 04:55:07.053: [  OCRRAW][2829412128]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.

2016-08-20 04:55:07.053: [  OCRRAW][2829412128]proprioo: No OCR/OLR devices are usable

2016-08-20 04:55:07.053: [  OCRASM][2829412128]proprasmcl: asmhandle is NULL

2016-08-20 04:55:07.054: [    GIPC][2829412128] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5343]

2016-08-20 04:55:07.057: [ default][2829412128]clsvactversion:4: Retrieving Active Version from local storage.

2016-08-20 04:55:07.062: [  OCRRAW][2829412128]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required.

2016-08-20 04:55:07.066: [  OCRRAW][2829412128]proprinit: Could not open raw device

2016-08-20 04:55:07.066: [  OCRASM][2829412128]proprasmcl: asmhandle is NULL

2016-08-20 04:55:07.068: [  OCRAPI][2829412128]a_init:16!: Backend init unsuccessful : [26]

2016-08-20 04:55:07.068: [  CRSOCR][2829412128] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage

ORA-09925: Unable to create audit trail file

Linux-x86_64 Error: 28: No space left on device

Additional information: 9925


2016-08-20 04:55:07.069: [    CRSD][2829412128] Created alert : (:CRSD00111:) :  Could not init OCR, error: PROC-26: Error while accessing the physical storage

ORA-09925: Unable to create audit trail file

Linux-x86_64 Error: 28: No space left on device

Additional information: 9925


2016-08-20 04:55:07.069: [    CRSD][2829412128][PANIC] CRSD exiting: Could not init OCR, code: 26

2016-08-20 04:55:07.069: [    CRSD][2829412128] Done.

 

由于无可用存储空间,无法创建audit trail file导致CRS无法初始化OCR。

 

解决方法

在两个节点上,检查OCR磁盘组的磁盘头。

使用root用户,切换到CRS_HOME/bin目录。

[root@node2 bin]# ./kfed read /dev/sdc1

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:              2147483648 ; 0x008: disk=0

kfbh.check:                  1828899572 ; 0x00c: 0x6d02caf4

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr:      ORCLDISKOCR ; 0x000: length=11

kfdhdb.driver.reserved[0]:      5391183 ; 0x008: 0x0052434f

kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000

kfdhdb.compat:                186646528 ; 0x020: 0x0b200000

kfdhdb.dsknum:                        0 ; 0x024: 0x0000

kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname:                     OCR ; 0x028: length=3

kfdhdb.grpname:                    DATA ; 0x048: length=4

kfdhdb.fgname:                      OCR ; 0x068: length=3

kfdhdb.capname:                         ; 0x088: length=0

kfdhdb.crestmp.hi:             33036370 ; 0x0a8: HOUR=0x12 DAYS=0x2 MNTH=0x6 YEAR=0x7e0

kfdhdb.crestmp.lo:           2945546240 ; 0x0ac: USEC=0x0 MSEC=0x5e SECS=0x39 MINS=0x2b

kfdhdb.mntstmp.hi:             33036370 ; 0x0b0: HOUR=0x12 DAYS=0x2 MNTH=0x6 YEAR=0x7e0

kfdhdb.mntstmp.lo:           3117557760 ; 0x0b4: USEC=0x0 MSEC=0x8a SECS=0x1d MINS=0x2e

kfdhdb.secsize:                     512 ; 0x0b8: 0x0200

kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000

kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000

kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80

kfdhdb.dsksize:                    2047 ; 0x0c4: 0x000007ff

kfdhdb.pmcnt:                         2 ; 0x0c8: 0x00000002

kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001

kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002

kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002

kfdhdb.redomirrors[0]:                0 ; 0x0d8: 0x0000

kfdhdb.redomirrors[1]:                0 ; 0x0da: 0x0000

kfdhdb.redomirrors[2]:                0 ; 0x0dc: 0x0000

kfdhdb.redomirrors[3]:                0 ; 0x0de: 0x0000

kfdhdb.dbcompat:              168820736 ; 0x0e0: 0x0a100000

kfdhdb.grpstmp.hi:             33036370 ; 0x0e4: HOUR=0x12 DAYS=0x2 MNTH=0x6 YEAR=0x7e0

kfdhdb.grpstmp.lo:           2945359872 ; 0x0e8: USEC=0x0 MSEC=0x3a8 SECS=0x38 MINS=0x2b

kfdhdb.vfstart:                     352 ; 0x0ec: 0x00000160

kfdhdb.vfend:                       384 ; 0x0f0: 0x00000180

kfdhdb.spfile:                       58 ; 0x0f4: 0x0000003a

kfdhdb.spfflg:                        1 ; 0x0f8: 0x00000001

…….

kfdhdb.ub4spare[0]:                   0 ; 0x0fc: 0x00000000

kfdhdb.acdb.ub2spare:                 0 ; 0x1de: 0x0000


磁盘头正常。

 

直接重新启动CRS

root用户执行如下命令:

# /opt/grid/product/11.2.0/grid_1/bin/crsctl stop crs


如果命令运行失败,使用-f参数。

[root@node1 bin]# ./crsctl stop crs -f

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node1'

CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'

CRS-2673: Attempting to stop 'ora.evmd' on 'node1'

CRS-2673: Attempting to stop 'ora.asm' on 'node1'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'

CRS-2677: Stop of 'ora.evmd' on 'node1' succeeded

CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded

CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded

CRS-2677: Stop of 'ora.asm' on 'node1' succeeded

CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node1'

CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'node1' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'node1'

CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded

CRS-2673: Attempting to stop 'ora.crf' on 'node1'

CRS-2677: Stop of 'ora.crf' on 'node1' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'node1'

CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded

CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'

CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed

CRS-4133: Oracle High Availability Services has been stopped.

 

 [root@node1 bin]# ./crsctl status res -t

---------------------------------------------------------------------------

NAME          TARGET  STATE     SERVER   STATE_DETAILS  Local Resources

---------------------------------------------------------------------------

ora.DATA.dg     ONLINE  ONLINE       node1   

ora.DATA1.dg    ONLINE  ONLINE       node1

ora.FRA.dg       ONLINE  ONLINE       node1

ora.LISTENER.lsnr     ONLINE  ONLINE       node1

ora.asm             ONLINE  ONLINE       node1           Started

ora.gsd              OFFLINE OFFLINE      node1

ora.net1.network      ONLINE  ONLINE       node1

ora.ons              ONLINE  ONLINE       node1

---------------------------------------------------------------------------

Cluster Resources

---------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr       1        ONLINE  ONLINE       node1

ora.cvu       1        ONLINE  ONLINE       node1

ora.node1.vip       1        ONLINE  ONLINE       node1

ora.node2.vip       1        ONLINE  OFFLINE

ora.oc4j       1        ONLINE  ONLINE       node1

ora.scan1.vip       1        ONLINE  ONLINE       node1

ora.vmtest.db       1        ONLINE  OFFLINE

                   2        ONLINE  OFFLINE

 [root@node1 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

 

等待数据库实例启动之后,重新运行命令,确认实例状态open即可.

参考官方文档:文档 ID 1095214.1


 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/31142205/viewspace-2124849/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/31142205/viewspace-2124849/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值