客户安装了一个Oracle 11.2.0.2 for Linux X64环境,使用了ASM方式,重启后发现数据库没有启动。
检查操作系统进程,发现ASM已经启动,CLUSTER环境也启动完成,但是数据库并未启动,尝试手工启动数据库,报错找不到对应的SPFILE。
登录ASM实例,查询发现找不到任何的ASM磁盘组信息:
[grid@rptdb trace]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.2.0 Production on Tue Dec 20 18:41:41 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Automatic Storage Management option
SQL> select * from v$asm_diskgroup;
no rows selected
查询V$ASM_DISK可以看到磁盘信息,但是磁盘头的头状态是UNKNOWN:
SQL> SELECT GROUP_NUMBER, DISK_NUMBER, MOUNT_STATUS, HEADER_STATUS, PATH FROM V$ASM_DISK;
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU PATH
------------ ----------- ------- ------------ ----------------------------------
0 1 CLOSED UNKNOWN ORCL:FRAVOL
0 0 CLOSED UNKNOWN ORCL:DATAVOL
显然Oracle并没有去加载磁盘组,利用当前ASM的SPFILE生成PFILE后发现,只有最基本的参数:
*.asm_power_limit=1
*.diagnostic_dest='/u01/app/grid'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
连ASM_DISKGROUPS参数都没有,难怪查询V$ASM_DISKGROUP找不到任何数据,添加下面的参数:
asm_diskgroups='DATA','FRA'
asm_diskstring='ORCL:*VOL'
重启ASM实例,在加载磁盘组时报错:
[grid@rptdb ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.2.0 Production on Tue Dec 20 18:25:03 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Automatic Storage Management option
SQL> shutdown immediate
ORA-15100: invalid or missing diskgroup name
ASM instance shutdown
SQL> startup pfile=/home/grid/init+ASM.ora
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2225792 bytes
Variable Size 256539008 bytes
ASM Cache 25165824 bytes
ORA-15032: not all alterations performed
ORA-15017: diskgroup "FRA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "FRA"
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
导致这个错误的原因有很多,比如权限,多路径设置,存储设置等等,检查告警日志寻找进一步的信息:
Tue Dec 20 18:25:47 2011
* instance_number obtained from CSS = 1, checking for the existence of node 0...
* node 0 does not exist. instance_number = 1
Starting ORACLE instance (normal)
****************** Huge Pages Information *****************
Huge Pages memory pool detected (total: 42000 free: 42000)
DFLT Huge Pages allocation successful (allocated: 0)
***********************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/grid/product/11.2.0/grid/dbs/arch
Autotune of undo retention is turned on.
IMODE=BR
ILAT =0
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Automatic Storage Management option.
Using parameter settings in client-side pfile /home/grid/init+ASM.ora on machine rptdb
System parameters with non-default values:
large_pool_size = 12M
instance_type = "asm"
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring = "ORCL:*VOL"
asm_diskgroups = "DATA"
asm_diskgroups = "FRA"
asm_power_limit = 1
diagnostic_dest = "/u01/app/grid"
Tue Dec 20 18:25:48 2011
PMON started with pid=2, OS id=14718
Tue Dec 20 18:25:48 2011
PSP0 started with pid=3, OS id=14722
Tue Dec 20 18:25:49 2011
VKTM started with pid=4, OS id=14726 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Tue Dec 20 18:25:49 2011
GEN0 started with pid=5, OS id=14732
Tue Dec 20 18:25:49 2011
DIAG started with pid=6, OS id=14736
Tue Dec 20 18:25:49 2011
DIA0 started with pid=7, OS id=14740
Tue Dec 20 18:25:49 2011
MMAN started with pid=8, OS id=14744
Tue Dec 20 18:25:49 2011
DBW0 started with pid=9, OS id=14748
Tue Dec 20 18:25:49 2011
LGWR started with pid=10, OS id=14752
Tue Dec 20 18:25:49 2011
CKPT started with pid=11, OS id=14756
Tue Dec 20 18:25:49 2011
SMON started with pid=12, OS id=14760
Tue Dec 20 18:25:49 2011
RBAL started with pid=13, OS id=14764
Tue Dec 20 18:25:49 2011
GMON started with pid=14, OS id=14768
Tue Dec 20 18:25:49 2011
MMON started with pid=15, OS id=14772
Tue Dec 20 18:25:49 2011
MMNL started with pid=16, OS id=14776
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Tue Dec 20 18:25:49 2011
SQL> ALTER DISKGROUP ALL MOUNT
NOTE: Diskgroups listed in ASM_DISKGROUPS are
DATA
FRA
NOTE: cache registered group DATA number=1 incarn=0x157c40a1
NOTE: cache began mount (first) of group DATA number=1 incarn=0x157c40a1
NOTE: cache registered group FRA number=2 incarn=0x158c40a2
NOTE: cache began mount (first) of group FRA number=2 incarn=0x158c40a2
NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
ORA-15186: ASMLIB error function = [asm_open(global)], error = [1], mesg = [Operation not permitted]
ORA-15025: could not open disk "ORCL:DATAVOL"
ORA-15186: ASMLIB error function = [asm_open(global)], error = [1], mesg = [Operation not permitted]
ORA-15025: could not open disk "ORCL:FRAVOL"
ERROR: no PST quorum in group: required 2, found 0
NOTE: cache dismounting (clean) group 1/0x157C40A1 (DATA)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x157C40A1 (DATA)
NOTE: cache ending mount (fail) of group DATA number=1 incarn=0x157c40a1
NOTE: cache deleting context for group DATA 1/0x157c40a1
GMON dismounting group 1 at 2 for pid 17, osid 14779
ERROR: diskgroup DATA was not mounted
ERROR: no PST quorum in group: required 2, found 0
NOTE: cache dismounting (clean) group 2/0x158C40A2 (FRA)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 2/0x158C40A2 (FRA)
NOTE: cache ending mount (fail) of group FRA number=2 incarn=0x158c40a2
NOTE: cache deleting context for group FRA 2/0x158c40a2
GMON dismounting group 2 at 4 for pid 17, osid 14779
ERROR: diskgroup FRA was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "FRA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "FRA"
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
ERROR: ALTER DISKGROUP ALL MOUNT
这里发现了ORA-15186和ORA-15025错误。而ORA-15186错误发生在ASMLIB调用上,根据这个错误更容易定位到问题的原因。
在MOS上,有专门的文章描述这个问题:Mount ASM Disk Group Fails : ORA-15186, ORA-15025, ORA-15063 [ID 1384504.1],根据问题描述,导致这个错误的原因是多路径的配置存在问题:
[grid@rptdb ~]$ /etc/init.d/oracleasm listdisks
DATAVOL
FRAVOL
[grid@rptdb ~]$ cat /proc/partitions
major minor #blocks name
8 0 285155328 sda
8 1 104391 sda1
8 2 285049327 sda2
8 16 1073741824 sdb
8 17 322119283 sdb1
8 18 429497775 sdb2
8 19 322119315 sdb3
8 32 1073741824 sdc
8 33 322119283 sdc1
8 34 429497775 sdc2
8 35 322119315 sdc3
8 48 1073741824 sdd
8 49 322119283 sdd1
8 50 429497775 sdd2
8 51 322119315 sdd3
8 64 1073741824 sde
8 65 322119283 sde1
8 66 429497775 sde2
8 67 322119315 sde3
253 0 150896640 dm-0
253 1 134119424 dm-1
253 2 1073741824 dm-2
253 3 322119283 dm-3
253 4 429497775 dm-4
253 5 322119315 dm-5
[grid@rptdb ~]$ ls -l /dev/oracleasm/disks
total 0
brw-rw---- 1 grid oinstall 8, 18 Dec 20 17:42 DATAVOL
brw-rw---- 1 grid oinstall 8, 19 Dec 20 17:42 FRAVOL
可以看到,DATAVOL和FRAVOL没有对应到多路径设备dm-n上,而是对应到了sdb2和sdb3上。
修改/etc/sysconfig/oracleasm文件,将ORACLEASM_SCANORDER参数和ORACLEASM_SCANEXCLUDE修改如下:
ORACLEASM_SCANORDER="mpath dm"
ORACLEASM_SCANEXCLUDE="sd"
修改后重启服务器,由于ASM中的SPFILE还是存在问题的,所以先关闭,然后重新启动:
[grid@rptdb ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.2.0 Production on Tue Dec 20 19:05:33 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected.
SQL> shutdown abort
ASM instance shutdown
SQL> startup pfile=/home/grid/init+ASM.ora
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2225792 bytes
Variable Size 256539008 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> create spfile='+DATA' from pfile='/home/grid/init+ASM.ora';
File created.
磁盘组已经顺利启动,创建一个SPFILE文件,确保下次ASM自动启动可以加载磁盘组。切换到Oracle用户打开数据库:
[root@rptdb ~]# su - oracle
[oracle@rptdb ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.2.0 Production on Tue Dec 20 19:06:27 2011
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 6.4939E+10 bytes
Fixed Size 2242560 bytes
Variable Size 3.0467E+10 bytes
Database Buffers 3.4360E+10 bytes
Redo Buffers 109195264 bytes
Database mounted.
Database opened.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
数据库顺利打开,最后检查一下ASM磁盘的设置:
[oracle@rptdb ~]$ ls -l /dev/oracleasm/disks
total 0
brw-rw---- 1 oracle oinstall 253, 4 Dec 20 19:00 DATAVOL
brw-rw---- 1 oracle oinstall 253, 5 Dec 20 19:00 FRAVOL
[oracle@rptdb ~]$ cat /proc/partitions
major minor #blocks name
8 0 285155328 sda
8 1 104391 sda1
8 2 285049327 sda2
8 16 1073741824 sdb
8 17 322119283 sdb1
8 18 429497775 sdb2
8 19 322119315 sdb3
8 32 1073741824 sdc
8 33 322119283 sdc1
8 34 429497775 sdc2
8 35 322119315 sdc3
8 48 1073741824 sdd
8 49 322119283 sdd1
8 50 429497775 sdd2
8 51 322119315 sdd3
8 64 1073741824 sde
8 65 322119283 sde1
8 66 429497775 sde2
8 67 322119315 sde3
253 0 150896640 dm-0
253 1 134119424 dm-1
253 2 1073741824 dm-2
253 3 322119283 dm-3
253 4 429497775 dm-4
253 5 322119315 dm-5
调整oracleasm的配置后,多路径的配置恢复正常。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-715173/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/4227/viewspace-715173/