oracle rac asm的日志,【Oracle】RAC ASM日志报错 ORA-15078 CRSD自动关闭

最新推荐文章于 2024-01-27 08:17:19 发布

珍儿要有梦

最新推荐文章于 2024-01-27 08:17:19 发布

阅读量446

点赞数

文章标签： oracle rac asm的日志

在grid用户下查看crs进程日志

$ cd $ORACLE_HOME/log/node1/crsd

$ vim crsd.log

-------------------

2013-12-10 15:47:19.902: [ OCRASM][33715952]ASM Error Stack :

2013-12-10 15:47:19.934: [ OCRASM][33715952]proprasmo: kgfoCheckMount returned [6]

2013-12-10 15:47:19.934: [ OCRASM][33715952]proprasmo: The ASM disk group crs is not found or not mounted

2013-12-10 15:47:19.934: [ OCRRAW][33715952]proprioo: Failed to open [+crs]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.

2013-12-10 15:47:19.934: [ OCRRAW][33715952]proprioo: No OCR/OLR devices are usable

2013-12-10 15:47:19.934: [ OCRASM][33715952]proprasmcl: asmhandle is NULL

2013-12-10 15:47:19.935: [ GIPC][33715952] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5343]

2013-12-10 15:47:19.935: [ default][33715952]clsvactversion:4: Retrieving Active Version from local storage.

2013-12-10 15:47:19.937: [ OCRRAW][33715952]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required.

2013-12-10 15:47:19.938: [ OCRRAW][33715952]proprinit: Could not open raw device

2013-12-10 15:47:19.938: [ OCRASM][33715952]proprasmcl: asmhandle is NULL

2013-12-10 15:47:19.939: [ OCRAPI][33715952]a_init:16!: Backend init unsuccessful : [26]

2013-12-10 15:47:19.939: [ CRSOCR][33715952] OCR context init failure. Error: PROC-26: Error while accessing the physical storage

2013-12-10 15:47:19.939: [ CRSD][33715952] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage

2013-12-10 15:47:19.939: [ CRSD][33715952][PANIC] CRSD exiting: Could not init OCR, code: 26

2013-12-10 15:47:19.939: [ CRSD][33715952] Done.

---------------------

通过日志，可以看出是CRS磁盘组有问题

也确实如此，没有挂载CRS磁盘组

su - grid

SQL> set linesize 200

SQL> select GROUP_NUMBER,NAME,TYPE,ALLOCATION_UNIT_SIZE,STATE from v$asm_diskgroup;

GROUP_NUMBER NAME TYPE ALLOCATION_UNIT_SIZE STATE

------------ ------------------------------ ------ -------------------- -----------

0 CRS 0DISMOUNTED

2 DATA1 EXTERN 4194304 MOUNTED

查看asm实例alert日志，返现CRS磁盘组被强制卸载了

SQL> show parameter dump

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

background_core_dump string partial

background_dump_dest string /app/gridbase/diag/asm/+asm/+A

SM1/trace

cd /app/gridbase/diag/asm/+asm/+ASM1/trace

$ vim alert_+ASM1.log

-------------------------------------------

Tue Dec 10 11:13:57 2013

WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 1 in group 1.

WARNING: Waited 15 secs for write IO to PST disk 2 in group 1.

Tue Dec 10 11:13:57 2013

NOTE: process _b000_+asm1 (15822) initiating offline of disk 0.3916226472 (CRS_0000) with mask 0x7e in group 1

NOTE: process _b000_+asm1 (15822) initiating offline of disk 1.3916226471 (CRS_0001) with mask 0x7e in group 1

NOTE: process _b000_+asm1 (15822) initiating offline of disk 2.3916226470 (CRS_0002) with mask 0x7e in group 1

NOTE: checking PST: grp = 1

GMON checking disk modes for group 1 at 12 for pid 37, osid 15822

ERROR: no read quorum in group: required 2, found 0 disks

NOTE: checking PST for grp 1 done.

NOTE: initiating PST update: grp = 1, dsk = 0/0xe96cdfa8, mask = 0x6a, op = clear

NOTE: initiating PST update: grp = 1, dsk = 1/0xe96cdfa7, mask = 0x6a, op = clear

NOTE: initiating PST update: grp = 1, dsk = 2/0xe96cdfa6, mask = 0x6a, op = clear

GMON updating disk modes for group 1 at 13 for pid 37, osid 15822

ERROR: no read quorum in group: required 2, found 0 disks

Tue Dec 10 11:13:57 2013

NOTE: cache dismounting (not clean) group 1/0x165C2F6D (CRS)

WARNING: Offline for disk CRS_0000 in mode 0x7f failed.

WARNING: Offline for disk CRS_0001 in mode 0x7f failed.

NOTE: messaging CKPT to quiesce pins Unix process pid: 15824, image: oracle@node1 (B001)

WARNING: Offline for disk CRS_0002 in mode 0x7f failed.

Tue Dec 10 11:13:57 2013

NOTE: halting all I/Os to diskgroup 1 (CRS)

Tue Dec 10 11:13:57 2013

NOTE: LGWR doing non-clean dismount of group 1 (CRS)

NOTE: LGWR sync ABA=3.42 last written ABA 3.42

Tue Dec 10 11:13:57 2013

kjbdomdet send to inst 2

detach from dom 1, sending detach message to inst 2

Tue Dec 10 11:13:57 2013

List of instances:

1 2

Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)

Global Resource Directory partially frozen for dirty detach

* dirty detach - domain 1 invalid = TRUE

Tue Dec 10 11:13:57 2013

NOTE: No asm libraries found in the system

520 GCS resources traversed, 0 cancelled

Dirty Detach Reconfiguration complete

Tue Dec 10 11:13:57 2013

WARNING: dirty detached from domain 1

NOTE: cache dismounted group 1/0x165C2F6D (CRS)

SQL> alter diskgroup CRS dismount force /* ASM SERVER:375140205 */

Tue Dec 10 11:13:57 2013

NOTE: cache deleting context for group CRS 1/0x165c2f6d

GMON dismounting group 1 at 14 for pid 41, osid 15824

NOTE: Disk CRS_0000 in mode 0x7f marked for de-assignment

NOTE: Disk CRS_0001 in mode 0x7f marked for de-assignment

NOTE: Disk CRS_0002 in mode 0x7f marked for de-assignment

NOTE:Waiting for all pending writes to complete before de-registering: grpnum 1

Tue Dec 10 11:14:27 2013

NOTE:Waiting for all pending writes to complete before de-registering: grpnum 1

Tue Dec 10 11:14:29 2013

ASM Health Checker found 1 new failures

Tue Dec 10 11:14:57 2013

SUCCESS: diskgroup CRS was dismounted

SUCCESS: alter diskgroup CRS dismount force /* ASM SERVER:375140205 */

SUCCESS: ASM-initiated MANDATORY DISMOUNT of group CRS

--------------------------------------

挂载CRS 磁盘组

su - grid

sqlplus / assysasm--！！！一定是sysasm

SQL> alter diskgroup crs mount;

SQL> select GROUP_NUMBER,NAME,TYPE,ALLOCATION_UNIT_SIZE,STATE from v$asm_diskgroup;

GROUP_NUMBER NAME TYPE ALLOCATION_UNIT_SIZE STATE

------------ ------------------------------ ------ -------------------- -----------

1 CRS NORMAL 4194304 MOUNTED

2 DATA1 EXTERN 4194304 MOUNTED

启动CRS

但是常用的start crs命令执行不成功

# /app/grid/bin/crsctl start crs

CRS-4640: Oracle High Availability Services is already active

CRS-4000: Command Start failed, or completed with errors.

使用该命令启动成功

[root@node1 ~]#/app/grid/bin/crsctl start res ora.crsd -init

CRS-2672: Attempting to start 'ora.crsd' on 'node1'

CRS-2676: Start of 'ora.crsd' on 'node1' succeeded

# /app/grid/bin/crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

解决路线图:

crsd_log-->asm_instance_alert_log-->mount crs diskgroup -->start crs

上面的解决方法借鉴.chinaunix十字螺丝钉如有侵权请告知

ASM日志看到因为找不到CRS磁盘组导致磁盘DISMOUNT，但系统和存储工程师没有找到相关问题，以后发现问题原因会继续补充；

补充内容：

ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (文档 ID 1581684.1)

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.3 to 12.1.0.2 [Release 11.2 to 12.1]

Oracle Database - Enterprise Edition - Version 10.2.0.4 to 10.2.0.4 [Release 10.2]

Information in this document applies to any platform.

SYMPTOMS

Normal or high redundancy diskgroup is dismounted with these WARNING messages.

//ASM alert.log

Mon Jul 01 09:10:47 2013

WARNING: Waited 15 secs for write IO to PST disk 1 in group 6.

WARNING: Waited 15 secs for write IO to PST disk 4 in group 6.

WARNING: Waited 15 secs for write IO to PST disk 1 in group 6.

WARNING: Waited 15 secs for write IO to PST disk 4 in group 6.

....

GMON dismounting group 6 at 72 for pid 44, osid 8782162

CAUSE

The ASM disk could go into unresponsiveness, normally in the following scenarios:

+ Some of the paths of the physical paths of the multipath device are offline or lost

+ During path 'failover' in a multipath set up

+ Server load, or any sort of storage/multipath/OS maintenance

The Doc ID 10109915.8 briefs about Bug 10109915(this fix introduce this underscore parameter). And the issue is with no OS/Storage tunable timeout mechanism in a case of a Hung NFS Server/Filer. And then _asm_hbeatiowait helps in setting the time out.

SOLUTION

1] Check with OS and Storage admin that there is disk unresponsiveness.

2] Possibly keep the disk responsiveness to below 15 seconds.

This will depend on various factors like

+ Operating System

+ Presence of Multipath ( and Multipath Type )

+ Any kernel parameter

So you need to find out, what is the 'maximum' possible disk unresponsiveness for your set up.

For example, on AIX rw_timeout setting affects this and defaults to 30 seconds.

Another example is Linux with native multipathing. In such set up, number of physical paths and polling_interval value in multipath.conf file, will dictate this maximum disk unresponsiveness.

So for your set up ( combination of OS / multipath / storage ), you need to find out this.

3] If you can not keep the disk unresponsiveness to below 15 seconds, then the below parameter can be set in the ASM instance ( on all the Nodes of RAC ):

_asm_hbeatiowait

As per internal bug 17274537 , based on internal testing the value should be increased to 120 secs, which is fixed in 12.1.0.2

Run below in asm instance to set desired value for _asm_hbeatiowait

alter system set "_asm_hbeatiowait"= scope=spfile sid='*';

And then restart asm instance / crs, to take new parameter value in effect.