db 10205
os rhel 5.x x86_64
在最近一次服务器升级时,重启机器后,oracle asm instance 无法正常启动;
SQL> startup
ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2094544 bytes
Variable Size 102763056 bytes
ASM Cache 25165824 bytes
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficientnumber of disks for diskgroup"REDODG"
ORA-15063: ASM discovered aninsufficient number of disks for diskgroup"ORADATA"
Oradata和redodg 无法mount ,原因是diskgroup的 memberdisk 丢失!!!!
ASM alert 日志信息:
Fri Oct 30 22:58:39 CST 2015
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameterdefault value as /oracle/product/10.2.0/db_1/dbs/arch
Autotune of undo retention is turnedoff.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior toevent group initialization
Starting up ORACLE RDBMS Version:10.2.0.5.0.
System parameters with non-defaultvalues:
large_pool_size =12582912
instance_type = asm
remote_login_passwordfile= SHARED
background_dump_dest =/oracle/admin/+ASM/bdump
user_dump_dest =/oracle/admin/+ASM/udump
core_dump_dest =/oracle/admin/+ASM/cdump
asm_diskstring =/dev/emc*
asm_diskgroups = ORADATA, REDODG
PSP0 started with pid=3, OS id=7999
PMON started with pid=2, OS id=7997
MMAN started with pid=4, OS id=8001
DBW0 started with pid=5, OS id=8003
LGWR started with pid=6, OS id=8005
CKPT started with pid=7, OS id=8007
SMON started with pid=8, OS id=8009
RBAL started with pid=9, OS id=8011
GMON started with pid=10, OS id=8013
Fri Oct 30 22:58:45 CST 2015
SQL> ALTER DISKGROUP ALL MOUNT
Fri Oct 30 22:58:45 CST 2015
NOTE: cache registered group ORADATAnumber=1 incarn=0xcd286ba1
NOTE: cache registered group REDODGnumber=2 incarn=0xcd786ba2
Fri Oct 30 22:58:45 CST 2015
Loaded ASM Library - Generic Linux,version 2.0.4 (KABI_V2) library for asmlib interface
Fri Oct 30 22:58:45 CST 2015
ERROR:no PST quorum in group 1: required 2, found 0----------------
FriOct 30 22:58:45 CST 2015
NOTE:cache dismounting group 1/0xCD286BA1 (ORADATA)
NOTE:dbwr not being msg'd to dismount
ERROR:diskgroup ORADATA was not mounted
FriOct 30 22:58:45 CST 2015
ERROR:no PST quorum in group 2: required 2, found 0
FriOct 30 22:58:45 CST 2015
NOTE:cache dismounting group 2/0xCD786BA2 (REDODG)
NOTE:dbwr not being msg'd to dismount
ERROR:diskgroup REDODG was not mounted
Fri Oct 30 23:08:51 CST 2015
Shutting down instance (immediate)
Fri Oct 30 23:08:51 CST 2015
Shutting down instance: further logonsdisabled
这行报错信息比较可以
ERROR:no PST quorum in group 2: required 2, found 0
该PST quorum丢失的问题常由以下几个原因导致:
1. ASM DISK丢失
2. ASM DISK corrupted损坏
3. 部分ASM DISK的AUN=1 PST部分损坏,或者被数据不完整
4. 不当的ASM_DISKSTRING参数设置
5. 不当的ASM DISK权限设置
经过分析asm alert 日志的历史信息,分析相应磁盘的路径信息:
1)Datagroup 对应的磁盘信息
Wed Aug 01 16:10:49 CST 2012
SQL> CREATE DISKGROUP REDODG ExternalREDUNDANCY DISK
'/dev/emcpowera1'SIZE 6138M ,
'/dev/emcpowerd1'SIZE 6138M ,
'/dev/emcpowere1'SIZE 6138M
Wed Aug 01 16:10:00 CST 2012
SQL> CREATE DISKGROUP ORADATA ExternalREDUNDANCY DISK '
/dev/emcpowerb1'SIZE 204797M ,
'/dev/emcpowerc1'SIZE 204797M
------- asm_diskstring 的历史配置信息
SYS auditing is disabled
ksdpec: called for event 13740 prior toevent group initialization
Starting up ORACLE RDBMS Version:10.2.0.5.0.
System parameters with non-defaultvalues:
large_pool_size =12582912
instance_type = asm
remote_login_passwordfile= SHARED
background_dump_dest =/oracle/admin/+ASM/bdump
user_dump_dest = /oracle/admin/+ASM/udump
core_dump_dest =/oracle/admin/+ASM/cdump
asm_diskstring = /dev/emc*
PMON started with pid=2, OS id=31153
---------查看os 层面的磁盘信息是否存在?
[root@localhost dev]# ls -l em*
crw------- 1 root root 10, 59 Oct 30 22:56 emcpower
brw-r----- 1 root disk 120, 0 Oct 30 22:56 emcpowera
brw-r----- 1 root disk 120, 1 Oct 30 22:56 emcpowera1
brw-r----- 1 root disk 120, 16 Oct 3022:56 emcpowerb
brw-r----- 1 root disk 120, 17 Oct 3022:56 emcpowerb1
brw-r----- 1 root disk 120, 32 Oct 3022:56 emcpowerc
brw-r----- 1 root disk 120, 33 Oct 3022:56 emcpowerc1
brw-r----- 1 root disk 120, 48 Oct 3022:56 emcpowerd
brw-r----- 1 root disk 120, 49 Oct 3022:56 emcpowerd1
brw-r----- 1 root disk 120, 64 Oct 3022:56 emcpowere
brw-r----- 1 root disk 120, 65 Oct 3022:56 emcpowere1
这里的权限貌似有问题啊!!!!!!
---但是根据mos 里的文档显示:
#/etc/init.d/oracleasm createdisk ASMDISK1 /dev/mapper/mpath0p1
在使用oracleasm 标示asmdisk 时用的是root 用户,也就表明原始盘的权限是什么对于映射后的asmdisk没有任何影响。
----当前asm 实例中配置的参数信息:
SQL> show parameter disk
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string
asm_diskstring string /dev/emc*
貌似这里的asm_diskstring 设置也有点问题!!!
按照官方正规应该设置成这样
alter system setASM_DISKSTRING = 'ORCL:*' scope=BOTH;
而且由于磁盘是asmlib 管理官方的建盘建议是这样的:
create diskgroupDATA external redundancy disk 'ORCL:DISK1';
----使用kfod 检测asm_diskstring 参数设置是否正确(即asm 实例可以认到asm 磁盘)
使用系统原来的参数检查:
[localhost.localdomain]lixora$kfod asm_diskstring='/dev/emc*' disk=all
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME
================================================================================
+ASM /oracle/product/10.2.0/db_1
----任意使用一个错误的asm_disksting参数验证:
[localhost.localdomain]lixora$kfod asm_diskstring='/dev/oracleasm/*'disk=all ---------kfod 是oracle 用户执行,oracle asm 实例也是oracle 用户
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME
================================================================================
+ASM /oracle/product/10.2.0/db_1
----使用asm_diskstring='/dev/oracleasm/disks/*'进行验证,这是是kfod 读了磁盘头才判读出来这个目录下的磁盘是asm 的磁盘
[localhost.localdomain]lixora$kfod asm_diskstring='/dev/oracleasm/disks/*'disk=all
--------------------------------------------------------------------------------
Disk Size Path
================================================================================
1: 6138 Mb/dev/oracleasm/disks/LOG1
2: 6138 Mb/dev/oracleasm/disks/LOG2
3: 6138 Mb/dev/oracleasm/disks/LOG3
4: 204797 Mb/dev/oracleasm/disks/VOL1
5: 204797 Mb/dev/oracleasm/disks/VOL2
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME
================================================================================
+ASM /oracle/product/10.2.0/db_1
----使用缺省命令检测:这里默认读的是asmlib 原始配置库中的信息;
[localhost.localdomain]+ASM$kfoddisk=all
--------------------------------------------------------------------------------
Disk Size Path
================================================================================
1: 6138 Mb ORCL:LOG1
2: 6138 Mb ORCL:LOG2
3: 6138 Mb ORCL:LOG3
4: 204797 Mb ORCL:VOL1
5: 204797 Mb ORCL:VOL2
--------------------------------------------------------------------------------
ORACLE_SID ORACLE_HOME
================================================================================
+ASM /oracle/product/10.2.0/db_1
到这里基本可以得出问题的结论和解决方案了:
主要是这2个原因:
- 不当的ASM_DISKSTRING参数设置
2. 不当的ASM DISK权限设置
但是由于在asm 实例中设置ASM_DISKSTRING='/dev/emc*' 后使用 kfod 去做检测时无法认盘,那么也就表明asm 实例无法读取这些disk;且在asm 历史alert 中显示以前的建dg命令是:
CREATE DISKGROUP REDODG External REDUNDANCY DISK
'/dev/emcpowera1'SIZE 6138M ,'/dev/emcpowerd1' SIZE 6138M ,'/dev/emcpowere1' SIZE 6138M
这里就表明当初asm 实例对这些'/dev/emcpower*‘ 裸盘有访问权限,否者不可能在create diskgroup时可以成功,因为他没有使用disk 'ORCL:DISK1' 这种方式;且在系统层面没有发现udev 等的配置文件中记录关于这些盘的权限配置信息;Rc.local 中业务相关的权限配置信息;当系统重启后'/dev/emcpower*‘磁盘的权限信息丢失,而盘符等信息没变是由于emc 的多路径做了管理。
初步给了2个解决方法:
1》考虑到以前的create disk 的命令,实际采用了方法1;
SQL> alter system setasm_diskstring='/dev/oracleasm/disks/*';
2》把'/dev/emcpower* 盘的权限改为 oracle:oinstall
(这不是很合理,下次重启后如果没有提前配置好权限修改信息,今天的问题将重演)
在设置好alter system setasm_diskstring='/dev/oracleasm/disks/*'; 后asm 依然无法挂载diskgroup
SQL> startup mount
ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2094544 bytes
Variable Size 102763056 bytes
ASM Cache 25165824 bytes
ORA-15110: no diskgroups mounted
-------没有diskgroup mount 奇怪了
SQL> ALTER DISKGROUP ALL MOUNT ;
ALTER DISKGROUP ALL MOUNT
*
ERROR at line 1:
ORA-15110: no diskgroups mounted
SQL> show parameter name
NAME TYPE VALUE
----------------------------------------------- ------------------------------
instance_name string +ASM
SQL> show parameter asm
NAME TYPE VALUE
----------------------------------------------- ------------------------------
asm_diskgroups string
asm_diskstring string /dev/oracleasm/disks/*
asm_power_limit integer 1
---- asm_diskgroups 竟然被置空了
----尝试手动mount diskgroup
SQL> alter diskgroup REDODGmount;
Diskgroup altered.
SQL> alter diskgroup ORADATA mount;
Diskgroup altered.
到这里问题全部结束,数据库可以成功打开了。
待处理的一个疑惑?????
这里有个疑问为啥asm_diskgroup 在会丢失????
手动修改asm_diskstring 参数前后alert日志分析:
修改前日志:
System parameters with non-defaultvalues:
large_pool_size =12582912
instance_type = asm
remote_login_passwordfile= SHARED
background_dump_dest =/oracle/admin/+ASM/bdump
user_dump_dest =/oracle/admin/+ASM/udump
core_dump_dest =/oracle/admin/+ASM/cdump
asm_diskstring =/dev/emc*
asm_diskgroups =ORADATA, REDODG
PMON started with pid=2, OS id=11104
PSP0 started with pid=3, OS id=11106
MMAN started with pid=4, OS id=11108
DBW0 started with pid=5, OS id=11110
LGWR started with pid=6, OS id=11112
CKPT started with pid=7, OS id=11114
SMON started with pid=8, OS id=11116
RBAL started with pid=9, OS id=11118
GMON started with pid=10, OS id=11120
Fri Oct 30 23:15:12 CST 2015
SQL> ALTER DISKGROUP ALL MOUNT
Fri Oct 30 23:15:12 CST 2015
NOTE: cache registered group ORADATAnumber=1 incarn=0xe7a9ee58
NOTE: cache registered group REDODGnumber=2 incarn=0xe7b9ee59
Fri Oct 30 23:15:12 CST 2015
Loaded ASM Library - Generic Linux,version 2.0.4 (KABI_V2) library for asmlib interface
Fri Oct 30 23:15:12 CST 2015
ERROR: no PST quorum in group 1:required 2, found 0
修改后asm_diskstring参数,打算重启下asm 实例让其可以自动mount 所有的diskgroup
但是实际情况不是想象的那样,重启实例后发现 asm_diskgroups 的信息丢失了,
难道是因为我重启asm 实例的原因么?因为这个参数改的是spfile 的信息,我怀疑
asm_diskstring asm_diskgroups 这2个参数有级联关系,当asm 实例检测到asm_diskstring变化时,asm 实例会读取asm_diskstring 下磁盘的信息来设置asm_diskgroup 这个参数
oracle reference 中关于这个参数的解释:
Property | Description |
Parameter type | String |
Syntax |
|
Default value | There is no default value. |
Modifiable |
|
Range of values | Comma-separated list of strings, up to 30 characters |
Oracle RAC | Multiple instances can have different values. |
Note:
Thisparameter may only be specified in an Automatic Storage Management instance.
ASM_DISKGROUPS
specifies a list of names of disk groups to be mounted by anAutomatic Storage Management instance at instance startup. Oracle ignores thevalue that you set for ASM_DISKGROUPS
when you specify the NOMOUNT
option at startup or when a ALTERDISKGROUP ALL MOUNT
statement isissued.
Automatic Storage Management (ASM)automatically adds a disk group to this parameter when the disk group issuccessfully created or mounted,and automatically removes a disk group from this parameter when the disk group isdropped or dismounted.
Issuing the ALTERDISKGROUP...ALL MOUNT
or ALTERDISKGROUP...ALL DISMOUNT
command doesnot affect the value of this parameter.
上述标黄段基本证实了我的想法,当磁盘组的信息无法在可识别的asm disk 中获取时(medata),asm 实例自动移除原来asm_diskgroup 的信息;
难道 oracle asmlib 会额外记录asm diskgroup 的信息。
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior toevent group initialization
Starting up ORACLE RDBMS Version:10.2.0.5.0.
System parameters with non-defaultvalues:
large_pool_size =12582912
instance_type = asm
remote_login_passwordfile= SHARED
background_dump_dest =/oracle/admin/+ASM/bdump
user_dump_dest =/oracle/admin/+ASM/udump
core_dump_dest =/oracle/admin/+ASM/cdump
asm_diskstring = /dev/oracleasm/disks/*
asm_diskgroups =
PMON started with pid=2, OS id=11520
PSP0 started with pid=3, OS id=11522
MMAN started with pid=4, OS id=11524
DBW0 started with pid=5, OS id=11526
LGWR started with pid=6, OS id=11528
CKPT started with pid=7, OS id=11530
SMON started with pid=8, OS id=11532
RBAL started with pid=9, OS id=11534
GMON started with pid=10, OS id=11536
Fri Oct 30 23:48:04 CST 2015
SQL>ALTER DISKGROUP ALL MOUNT -----这里为啥不能mount 起来所有 disk_group 不是很好理解,
只能理解为asm 实例管理还是有bug 么?,不够强大?不够健壮?
Fri Oct 30 23:48:16 CST 2015
Shutting down instance (immediate)
Fri Oct 30 23:48:16 CST 2015
Shutting down instance: further logonsdisabled
License high water mark = 1
Fri Oct 30 23:48:16 CST 2015
SQL> ALTER DISKGROUP ALL DISMOUNT
Fri Oct 30 23:48:23 CST 2015
Starting ORACLE instance (normal)