oracle 10g ASM 重启遭遇ERROR:no PST quorum in group 2: required 2, found 0

db 10205

os  rhel 5.x x86_64

在最近一次服务器升级时,重启机器后,oracle asm instance 无法正常启动;


SQL> startup

ASM instance started

Total System Global Area  130023424 bytes

Fixed Size                  2094544 bytes

Variable Size             102763056 bytes

ASM Cache                  25165824 bytes

ORA-15032: not all alterations performed

ORA-15063: ASM discovered an insufficientnumber of disks for diskgroup"REDODG"

ORA-15063: ASM discovered aninsufficient number of disks for diskgroup"ORADATA"

Oradataredodg 无法mount ,原因是diskgroup memberdisk 丢失!!!!

 

ASM alert 日志信息:

Fri Oct 30 22:58:39 CST 2015

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_1 parameterdefault value as /oracle/product/10.2.0/db_1/dbs/arch

Autotune of undo retention is turnedoff.

LICENSE_MAX_USERS = 0

SYS auditing is disabled

ksdpec: called for event 13740 prior toevent group initialization

Starting up ORACLE RDBMS Version:10.2.0.5.0.

System parameters with non-defaultvalues:

 large_pool_size          =12582912

 instance_type            = asm

 remote_login_passwordfile= SHARED

 background_dump_dest     =/oracle/admin/+ASM/bdump

 user_dump_dest           =/oracle/admin/+ASM/udump

 core_dump_dest           =/oracle/admin/+ASM/cdump

 asm_diskstring           =/dev/emc*

 asm_diskgroups           = ORADATA, REDODG

PSP0 started with pid=3, OS id=7999

PMON started with pid=2, OS id=7997

MMAN started with pid=4, OS id=8001

DBW0 started with pid=5, OS id=8003

LGWR started with pid=6, OS id=8005

CKPT started with pid=7, OS id=8007

SMON started with pid=8, OS id=8009

RBAL started with pid=9, OS id=8011

GMON started with pid=10, OS id=8013

Fri Oct 30 22:58:45 CST 2015

SQL> ALTER DISKGROUP ALL MOUNT

Fri Oct 30 22:58:45 CST 2015

NOTE: cache registered group ORADATAnumber=1 incarn=0xcd286ba1

NOTE: cache registered group REDODGnumber=2 incarn=0xcd786ba2

Fri Oct 30 22:58:45 CST 2015

Loaded ASM Library - Generic Linux,version 2.0.4 (KABI_V2) library for asmlib interface

Fri Oct 30 22:58:45 CST 2015

ERROR:no PST quorum in group 1: required 2, found 0----------------

FriOct 30 22:58:45 CST 2015

NOTE:cache dismounting group 1/0xCD286BA1 (ORADATA)

NOTE:dbwr not being msg'd to dismount

ERROR:diskgroup ORADATA was not mounted

FriOct 30 22:58:45 CST 2015

ERROR:no PST quorum in group 2: required 2, found 0

FriOct 30 22:58:45 CST 2015

NOTE:cache dismounting group 2/0xCD786BA2 (REDODG)

NOTE:dbwr not being msg'd to dismount

ERROR:diskgroup REDODG was not mounted

Fri Oct 30 23:08:51 CST 2015

Shutting down instance (immediate)

Fri Oct 30 23:08:51 CST 2015

Shutting down instance: further logonsdisabled

 

这行报错信息比较可以

ERROR:no PST quorum in group 2: required 2, found 0

 

PST quorum丢失的问题常由以下几个原因导致:

1.   ASM DISK丢失

2.   ASM DISK corrupted损坏

3.    部分ASM DISKAUN=1 PST部分损坏,或者被数据不完整

4.   不当的ASM_DISKSTRING参数设置

5.   不当的ASM DISK权限设置

 

经过分析asm alert 日志的历史信息,分析相应磁盘的路径信息:

1Datagroup 对应的磁盘信息

Wed Aug 01 16:10:49 CST 2012

SQL> CREATE DISKGROUP REDODG ExternalREDUNDANCY  DISK

'/dev/emcpowera1'SIZE 6138M ,

'/dev/emcpowerd1'SIZE 6138M ,

'/dev/emcpowere1'SIZE 6138M  

 

 

Wed Aug 01 16:10:00 CST 2012

SQL> CREATE DISKGROUP ORADATA ExternalREDUNDANCY  DISK '

/dev/emcpowerb1'SIZE 204797M ,

'/dev/emcpowerc1'SIZE 204797M

 

------- asm_diskstring 的历史配置信息

SYS auditing is disabled

ksdpec: called for event 13740 prior toevent group initialization

Starting up ORACLE RDBMS Version:10.2.0.5.0.

System parameters with non-defaultvalues:

 large_pool_size          =12582912

 instance_type            = asm

 remote_login_passwordfile= SHARED

 background_dump_dest     =/oracle/admin/+ASM/bdump

 user_dump_dest           = /oracle/admin/+ASM/udump

 core_dump_dest           =/oracle/admin/+ASM/cdump

  asm_diskstring           = /dev/emc*

PMON started with pid=2, OS id=31153

 

 

---------查看os 层面的磁盘信息是否存在?

[root@localhost dev]# ls -l em*

crw------- 1 root root  10, 59 Oct 30 22:56 emcpower

brw-r----- 1 root disk 120,  0 Oct 30 22:56 emcpowera

brw-r----- 1 root disk 120,  1 Oct 30 22:56 emcpowera1

brw-r----- 1 root disk 120, 16 Oct 3022:56 emcpowerb

brw-r----- 1 root disk 120, 17 Oct 3022:56 emcpowerb1

brw-r----- 1 root disk 120, 32 Oct 3022:56 emcpowerc

brw-r----- 1 root disk 120, 33 Oct 3022:56 emcpowerc1

brw-r----- 1 root disk 120, 48 Oct 3022:56 emcpowerd

brw-r----- 1 root disk 120, 49 Oct 3022:56 emcpowerd1

brw-r----- 1 root disk 120, 64 Oct 3022:56 emcpowere

brw-r----- 1 root disk 120, 65 Oct 3022:56 emcpowere1

这里的权限貌似有问题啊!!!!!!


---但是根据mos 里的文档显示:

#/etc/init.d/oracleasm createdisk ASMDISK1  /dev/mapper/mpath0p1

在使用oracleasm 标示asmdisk 用的是root 用户,也就表明原始盘的权限是什么对于映射后的asmdisk没有任何影响。

 

----当前asm 实例中配置的参数信息:

SQL> show parameter disk

NAME                                 TYPE        VALUE

------------------------------------                   -----------    ------------------------------

asm_diskgroups                       string

asm_diskstring                        string     /dev/emc*

貌似这里的asm_diskstring 设置也有点问题!!!

 

按照官方正规应该设置成这样

alter system setASM_DISKSTRING = 'ORCL:*'  scope=BOTH;

 

而且由于磁盘是asmlib 管理官方的建盘建议是这样的:

create diskgroupDATA external redundancy disk 'ORCL:DISK1';

 

----使用kfod 检测asm_diskstring 参数设置是否正确(即asm 实例可以认到asm 磁盘)

使用系统原来的参数检查:

[localhost.localdomain]lixora$kfod asm_diskstring='/dev/emc*' disk=all

--------------------------------------------------------------------------------

ORACLE_SID ORACLE_HOME                                                         

================================================================================

     +ASM /oracle/product/10.2.0/db_1       

 

----任意使用一个错误的asm_disksting参数验证:                             

[localhost.localdomain]lixora$kfod asm_diskstring='/dev/oracleasm/*'disk=all    ---------kfod 是oracle 用户执行,oracle asm 实例也是oracle 用户

--------------------------------------------------------------------------------

ORACLE_SID ORACLE_HOME                                                         

================================================================================

     +ASM /oracle/product/10.2.0/db_1  

 

----使用asm_diskstring='/dev/oracleasm/disks/*'进行验证,这是是kfod 读了磁盘头才判读出来这个目录下的磁盘是asm 的磁盘

[localhost.localdomain]lixora$kfod asm_diskstring='/dev/oracleasm/disks/*'disk=all

--------------------------------------------------------------------------------

 Disk         Size Path     

================================================================================

  1:       6138 Mb/dev/oracleasm/disks/LOG1

  2:       6138 Mb/dev/oracleasm/disks/LOG2

  3:       6138 Mb/dev/oracleasm/disks/LOG3

  4:     204797 Mb/dev/oracleasm/disks/VOL1

  5:     204797 Mb/dev/oracleasm/disks/VOL2

--------------------------------------------------------------------------------

ORACLE_SID ORACLE_HOME                                                         

================================================================================

     +ASM /oracle/product/10.2.0/db_1

 

----使用缺省命令检测:这里默认读的是asmlib 原始配置库中的信息;

[localhost.localdomain]+ASM$kfoddisk=all

--------------------------------------------------------------------------------

 Disk         Size Path     

================================================================================

  1:       6138 Mb ORCL:LOG1

  2:       6138 Mb ORCL:LOG2

  3:       6138 Mb ORCL:LOG3

  4:     204797 Mb ORCL:VOL1

  5:     204797 Mb ORCL:VOL2

--------------------------------------------------------------------------------

ORACLE_SID ORACLE_HOME                                                         

================================================================================

     +ASM /oracle/product/10.2.0/db_1

 

到这里基本可以得出问题的结论和解决方案了:

主要是这2个原因:

  1. 不当的ASM_DISKSTRING参数设置

2.   不当的ASM DISK权限设置

但是由于在asm 实例中设置ASM_DISKSTRING='/dev/emc*' 后使用 kfod 去做检测时无法认盘,那么也就表明asm 实例无法读取这些disk;且在asm 历史alert 中显示以前的建dg命令是:

CREATE DISKGROUP REDODG External REDUNDANCY  DISK

'/dev/emcpowera1'SIZE 6138M ,'/dev/emcpowerd1' SIZE 6138M ,'/dev/emcpowere1' SIZE 6138M

这里就表明当初asm 实例对这些'/dev/emcpower*‘ 裸盘有访问权限,否者不可能在create diskgroup时可以成功,因为他没有使用disk 'ORCL:DISK1' 这种方式;且在系统层面没有发现udev 等的配置文件中记录关于这些盘的权限配置信息;Rc.local 中业务相关的权限配置信息;当系统重启后'/dev/emcpower*‘磁盘的权限信息丢失,而盘符等信息没变是由于emc 的多路径做了管理。

 

初步给了2个解决方法:

1》考虑到以前的create disk 的命令,实际采用了方法1

SQL> alter system setasm_diskstring='/dev/oracleasm/disks/*';

 

2'/dev/emcpower* 盘的权限改为 oracleoinstall

(这不是很合理,下次重启后如果没有提前配置好权限修改信息,今天的问题将重演)

 

在设置好alter system setasm_diskstring='/dev/oracleasm/disks/*'; asm 依然无法挂载diskgroup

SQL> startup mount

ASM instance started

Total System Global Area  130023424 bytes

Fixed Size                  2094544 bytes

Variable Size             102763056 bytes

ASM Cache                  25165824 bytes

ORA-15110: no diskgroups mounted

 -------没有diskgroup mount 奇怪了

 

SQL> ALTER DISKGROUP ALL MOUNT ;

ALTER DISKGROUP ALL MOUNT

*

ERROR at line 1:

ORA-15110: no diskgroups mounted

 

SQL> show parameter name

NAME                                 TYPE        VALUE

----------------------------------------------- ------------------------------

instance_name                        string      +ASM

SQL> show parameter asm

NAME                                 TYPE        VALUE

----------------------------------------------- ------------------------------

asm_diskgroups                       string

asm_diskstring                       string      /dev/oracleasm/disks/*

asm_power_limit                      integer    1

---- asm_diskgroups 竟然被置空了

----尝试手动mount diskgroup

SQL> alter diskgroup REDODGmount;            

Diskgroup altered. 

SQL> alter diskgroup ORADATA mount;

Diskgroup altered.

到这里问题全部结束,数据库可以成功打开了。

 

 待处理的一个疑惑?????

这里有个疑问为啥asm_diskgroup 在会丢失????

 

手动修改asm_diskstring 参数前后alert日志分析:

修改前日志:

System parameters with non-defaultvalues:

 large_pool_size          =12582912

 instance_type            = asm

 remote_login_passwordfile= SHARED

 background_dump_dest     =/oracle/admin/+ASM/bdump

 user_dump_dest           =/oracle/admin/+ASM/udump

 core_dump_dest           =/oracle/admin/+ASM/cdump

  asm_diskstring           =/dev/emc*

 asm_diskgroups           =ORADATA, REDODG

PMON started with pid=2, OS id=11104

PSP0 started with pid=3, OS id=11106

MMAN started with pid=4, OS id=11108

DBW0 started with pid=5, OS id=11110

LGWR started with pid=6, OS id=11112

CKPT started with pid=7, OS id=11114

SMON started with pid=8, OS id=11116

RBAL started with pid=9, OS id=11118

GMON started with pid=10, OS id=11120

Fri Oct 30 23:15:12 CST 2015

SQL> ALTER DISKGROUP ALL MOUNT

Fri Oct 30 23:15:12 CST 2015

NOTE: cache registered group ORADATAnumber=1 incarn=0xe7a9ee58

NOTE: cache registered group REDODGnumber=2 incarn=0xe7b9ee59

Fri Oct 30 23:15:12 CST 2015

Loaded ASM Library - Generic Linux,version 2.0.4 (KABI_V2) library for asmlib interface

Fri Oct 30 23:15:12 CST 2015

ERROR: no PST quorum in group 1:required 2, found 0

 

修改后asm_diskstring参数,打算重启下asm 实例让其可以自动mount 所有的diskgroup

但是实际情况不是想象的那样,重启实例后发现 asm_diskgroups  的信息丢失了,

难道是因为我重启asm 实例的原因么?因为这个参数改的是spfile 的信息,我怀疑

asm_diskstring  asm_diskgroups  2个参数有级联关系,当asm 实例检测到asm_diskstring变化时,asm 实例会读取asm_diskstring 下磁盘的信息来设置asm_diskgroup 这个参数

 

 oracle reference 中关于这个参数的解释:

ASM_DISKGROUPS

Property

Description

Parameter type

String

Syntax

ASM_DISKGROUPS = diskgroup [, diskgroup ] ...

Default value

There is no default value.

Modifiable

ALTER SYSTEM

Range of values

Comma-separated list of strings, up to 30 characters

Oracle RAC

Multiple instances can have different values.

 

Note:

Thisparameter may only be specified in an Automatic Storage Management instance.

ASM_DISKGROUPS specifies a list of names of disk groups to be mounted by anAutomatic Storage Management instance at instance startup. Oracle ignores thevalue that you set for ASM_DISKGROUPS when you specify the NOMOUNT option at startup or when a ALTERDISKGROUP ALL MOUNT statement isissued.

Automatic Storage Management (ASM)automatically adds a disk group to this parameter when the disk group issuccessfully created or mounted,and automatically removes a disk group from this parameter when the disk group isdropped or dismounted.

Issuing the ALTERDISKGROUP...ALL MOUNT or ALTERDISKGROUP...ALL DISMOUNT command doesnot affect the value of this parameter.

上述标黄段基本证实了我的想法,当磁盘组的信息无法在可识别的asm disk 中获取时(medata),asm 实例自动移除原来asm_diskgroup 的信息;

难道 oracle asmlib 会额外记录asm diskgroup 的信息。

 

LICENSE_MAX_USERS = 0

SYS auditing is disabled

ksdpec: called for event 13740 prior toevent group initialization

Starting up ORACLE RDBMS Version:10.2.0.5.0.

System parameters with non-defaultvalues:

 large_pool_size          =12582912

  instance_type            = asm

 remote_login_passwordfile= SHARED

 background_dump_dest     =/oracle/admin/+ASM/bdump

 user_dump_dest           =/oracle/admin/+ASM/udump

 core_dump_dest           =/oracle/admin/+ASM/cdump

  asm_diskstring           = /dev/oracleasm/disks/*

 asm_diskgroups           =

PMON started with pid=2, OS id=11520

PSP0 started with pid=3, OS id=11522

MMAN started with pid=4, OS id=11524

DBW0 started with pid=5, OS id=11526

LGWR started with pid=6, OS id=11528

CKPT started with pid=7, OS id=11530

SMON started with pid=8, OS id=11532

RBAL started with pid=9, OS id=11534

GMON started with pid=10, OS id=11536

Fri Oct 30 23:48:04 CST 2015

SQL>ALTER DISKGROUP ALL MOUNT  -----这里为啥不能mount 起来所有 disk_group 不是很好理解,

只能理解为asm 实例管理还是有bug 么?,不够强大?不够健壮?

Fri Oct 30 23:48:16 CST 2015

Shutting down instance (immediate)

Fri Oct 30 23:48:16 CST 2015

Shutting down instance: further logonsdisabled

License high water mark = 1

Fri Oct 30 23:48:16 CST 2015

SQL> ALTER DISKGROUP ALL DISMOUNT

Fri Oct 30 23:48:23 CST 2015

Starting ORACLE instance (normal)





  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值