oracle hbeatiowait,_asm_hbeatiowait-CSDN博客

Oracle Database - Enterprise Edition - Version 11.2.0.3 to 12.1.0.1

[Release 11.2 to 12.1]

Information in this document applies to any

platform.

SYMPTOMS

Normal or high redundancy diskgroup is dismounted with these

WARNING messages.

//ASM alert.log

Mon Jul 01 09:10:47 2013

WARNING: Waited 15 secs for write IO to PST disk 1 in group

WARNING: Waited 15 secs for write IO to PST disk 4 in group

WARNING: Waited 15 secs for write IO to PST disk 1 in group

WARNING: Waited 15 secs for write IO to PST disk 4 in group

....

GMON dismounting group 6 at 72 for pid 44, osid

8782162

CAUSE

Generally this kind messages comes in ASM alertlog file on below

situations,

Delayed

ASM PST heart beats on ASM disks in normal or high redundancy

diskgroup,

thus the ASM instance dismount the diskgroup.By

default, it is 15 seconds.

the way the heart beat delays are sort of ignored for external

redundancy diskgroup.

ASM instance stop issuing more PST heart beat until it succeeds PST

revalidation,

but the heart beat delays do not dismount external redundancy

diskgroup directly.

The ASM disk could go into unresponsiveness, normally in the

following scenarios:

+ Some of the

paths of the physical paths of the multipath device are offline or

lost

+ During path

'failover' in a multipath set up

+ Server

load, or any sort of storage/multipath/OS maintenance

The Doc ID 10109915.8 briefs about

Bug 10109915(this

fix introduce this underscore parameter). And the issue is

with no OS/Storage tunable timeout mechanism in a case of a Hung

NFS Server/Filer. And then _asm_hbeatiowait helps in setting the time

out.

SOLUTION

1] Check with

OS and Storage admin that there is disk unresponsiveness.

2] Possibly

keep the disk responsiveness to below 15

seconds. This will depend on various factors like

+ Operating

System

+ Presence of

Multipath ( and Multipath Type )

+ Any kernel

parameter

So you need to find out, what is the 'maximum' possible disk

unresponsiveness for your set up.

For example, on AIX rw_timeout setting affects this and defaults to 30 seconds.

Another example is Linux with native multipathing. In such set up,

number of physical paths and polling_interval

value in multipath.conf file, will dictate this maximum disk

unresponsiveness.

So for your set up ( combination of OS / multipath / storage ), you

need to find out this.

3] If you can

not keep the disk unresponsiveness to below 15 seconds, then the

below parameter can be set in the ASM instance ( on all the Nodes

of RAC ):

_asm_hbeatiowait

As per internal bug 17274537 , based on internal

testing the value should be increased to 120 secs, the same will be

fixed in 12.2

Run below in asm instance to set desired value

for _asm_hbeatiowait

alter system set "_asm_hbeatiowait"= scope=spfile sid='*';

And then restart asm instance / crs, to take new parameter value in

effect.

REFERENCES

BUG:17043894-

DISKGROUP DISMOUNTS IF 2 OUT OF 8 PATHS LOST

BUG:10109915-

ASM HANGS IN HIGH REDUNDANCY CONFIG IF 1 OF 5 DISKS GOES

OFFLINE

NOTE:1910315.1-

How to Create a Normal Redundancy Diskgroup Best Practices

[grid@racj1 ~]$ more asm.txt

*._asm_hbeatiowait=120

+ASM2.asm_diskgroups='ARCHDG','DATADG'#Manual

Mount

+ASM1.asm_diskgroups='ARCHDG','DATADG'#Manual

Mount

*.asm_diskstring='/dev/asmdisk/*'

*.asm_power_limit=1

*.diagnostic_dest='/oracle/app/grid'

*.instance_type='asm'

*.large_pool_size=12M

+ASM1.local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=10.62.xxx.xx2)(PORT=1521))'

+ASM2.local_listener='(ADDRESS=(PROTOCOL=TCP)(HOST=10.62.xxx.xx4)(PORT=1521))'

*.memory_max_target=2147483648

*.memory_target=2147483648

*.remote_login_passwordfile='EXCLUSIVE'