1. 环境说明
有一批数据库准备上线,当时安装的版本是11.2.0.3,打了PSU到11.2.0.3.7,但目前该版本的最新PSU已经到了11了,为了避免上线后安全扫描等需要停机打补丁操作,所以干脆在上线前就将数据库打上最新的PSU到11.2.0.3.11(Patch ID:18522512)。
blog地址:http://blog.csdn.net/hw_libo/article/details/39672901
2. alert日志
Mon Sep 29 14:50:23 2014
ALTER DATABASE MOUNT
Mon Sep 29 14:50:26 2014
Sweep [inc][280114]: completed
Sweep [inc][280113]: completed
Sweep [inc2][280114]: completed
Sweep [inc2][280113]: completed
NOTE: Loaded library: System
ORA-15025: could not open disk "/dev/diskgroup/dg_ora"
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
SUCCESS: diskgroup DG_ORA was mounted
ERROR: failed to establish dependency between database NDADB and diskgroup resource ora.DG_ORA.dg
Errors in file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_ckpt_15674.trc (incident=288113):
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/ndadb/NDADB/incident/incdir_288113/NDADB_ckpt_15674_i288113.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_ckpt_15674.trc (incident=288114):
ORA-00600: internal error code, arguments: [17090], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/oracle/diag/rdbms/ndadb/NDADB/incident/incdir_288114/NDADB_ckpt_15674_i288114.trc
Dumping diagnostic data in directory=[cdmp_20140929145027], requested by (instance=1, osid=15674 (CKPT)), summary=[incident=288113].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
ERROR: unrecoverable error ORA-600 raised in ASM I/O path; terminating process 15674
Dumping diagnostic data in directory=[cdmp_20140929145028], requested by (instance=1, osid=15674 (CKPT)), summary=[incident=288114].
PMON (ospid: 15585): terminating the instance due to error 469
System state dump requested by (instance=1, osid=15585 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/ndadb/NDADB/trace/NDADB_diag_15634.trc
Dumping diagnostic data in directory=[cdmp_20140929145030], requested by (instance=1, osid=15585 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 15585
查看状态:
NDADB01:~ # crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.DG_DATA.dg ora....up.type ONLINE ONLINE ndadb01
ora.DG_ORA.dg ora....up.type ONLINE ONLINE ndadb01
ora....ER.lsnr ora....er.type ONLINE ONLINE ndadb01
ora.asm ora.asm.type ONLINE ONLINE ndadb01
ora.cssd ora.cssd.type ONLINE ONLINE ndadb01
ora.diskmon ora....on.type OFFLINE OFFLINE
ora.evmd ora.evm.type ONLINE ONLINE ndadb01
ora.ons ora.ons.type OFFLINE OFFLINE
说明:数据库是由VCS双机拉起的,所以这里是看不到rdbms资源组的。
并且查看了crs日志、asm日志均是正常的。
3. 根据MOS文档解决问题
在MOS中查到:
ORA-00600 [kfioTranslateIO03] [17090] (Doc ID 1336846.1)
关键检查点:
Case #1 ] Group permission of "oracle" executable from RDBMS home should have the same group information for ASM devices according to note 1084186.1.
$ ls -l $GRID_HOME/bin/oracle
-rwsr-s--x 1 grid oinstall 228954465 Jul 1 13:37 /oh1/grid/product/11.2.0/bin/oracle
$ ls -l $RDBMS_HOME/bin/oracle
-rwsr-s--x 1 oracle asmadmin 228954465 Jul 1 13:37 /oh1/oracle/product/11.2.0/bin/oracle
导致这个问题的原因在于oracle可执行文件的所在操作系统组必需要有ASM磁盘文件的读写权限。
解决办法:
Please execute the following action plan from note 1084186.1.
$ su - grid
$ cd <Grid Home>/bin
$ ./setasmgidwrap o=<11.2 RDBMS Home>/bin/oracle
经检查,确实是oracle用户下的$ORACLE_HOME/bin/oracle文件权限不对了:
## grid用户的$ORACLE_HOME/bin/oracle权限是正确的
NDADB01:/dev/diskgroup # su - grid
grid@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x 1 grid oinstall 204902468 2014-09-29 10:37 /opt/oracrs/product/11gR2/grid/bin/oracle
## oracle用户下的$ORACLE_HOME/bin/oracle文件权限不对
NDADB01:/dev/diskgroup # su - oracle
oracle@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle
-rwxr-x--x 1 oracle oinstall 233461759 2014-09-29 11:53 /opt/oracle/product/11gR2/db/bin/oracle
## 正确应该为:
oracle@NDADB01:~> ls -l $ORACLE_HOME/bin/oracle
-rwsr-s--x 1 oracle asmadmin 233461759 2014-09-29 15:39 /opt/oracle/product/11gR2/db/bin/oracle
根据MOS的文档,解决办法:
NDADB01:/dev/diskgroup # su - grid
grid@NDADB01:~> cd $ORACLE_HOME/bin
grid@NDADB01:/opt/oracrs/product/11gR2/grid/bin> ./setasmgidwrap o=/opt/oracle/product/11gR2/db/bin/oracle ##这里指定的是oracle用户下的$ORACLE_HOME/bin/oracle
grid@NDADB01:/opt/oracrs/product/11gR2/grid/bin> ls -l /opt/oracle/product/11gR2/db/bin/oracle
-rwsr-s--x 1 oracle asmadmin 233461759 2014-09-29 11:45 /opt/oracle/product/11gR2/db/bin/oracle
说明:这个文件的权限,我使用chmod u+s和chmod g+s等手工更正了文件权限,但数据库还是无法启动的,问题不能得到解决。
然后重启has(我这里是HA双机,而非RAC):
NDADB01:~ # crsctl stop has -f
NDADB01:~ # crsctl start has经检查,数据库状态正常,数据也没有丢失,问题解决。
blog地址:http://blog.csdn.net/hw_libo/article/details/39672901
-- Bosco QQ:375612082
---- END ----