环境:
平台:RedHat EnterPrise 5.8 X86_X64
数据库:Oracle EnterPrise 11.2.0.3
集群软件:Oracle grid 11.2.0.3
故障现象:
数据库出现了归档失败,其中有一个节点的实例出现HANG死的状况。
日志信息如下:
分析:
由于归档失败发生在ASM磁盘上,首先检查ASM磁盘空间以及DB_RECOVERY_FILE_DEST_SIZE,ASM磁盘空间是足够的,而且由于只有一个节点出现出现了无法归档的问题,也可以排除是空间不足造成的。确认两个节点的DB_RECOVERY_FILE_DEST_SIZE参数设置都是0,基本上可以判断问题和当前节点的ASM实例状态不正常有关。
检查ASM实例的错误信息:
当前节点ASM实例出现了的这个信息,说明报错发生在实例2上:
果然实例2上的ASM出现了大量ORA-4031错误。检查ASM启动的参数配置:
调整及建议:
当前ASM实例使用默认的MEMORY_TARGET配置,分配大小大约是400M,根据Oracle的MOS文章:ASM & Shared Pool (ORA-4031) [ID 437924.1],在11.2.0.3中,Oracle增加了ASM实例所允许的默认进程数PROCESSES,但是默认的MEMORY_TARGET参数没有增加。
根据Oracle的建议,11.2.0.3的MEMORY_TARGET至少应该设置到1536M,而MEMORY_MAX_TARGET设置为4096M。
对于当前的情况,如果短时间内无法重启DB和ASM实例,可以在问题节点配置一个本地归档路径,设置目标路径为本地磁盘,从而避免归档无法完成而导致的实例HANG死。
平台:RedHat EnterPrise 5.8 X86_X64
数据库:Oracle EnterPrise 11.2.0.3
集群软件:Oracle grid 11.2.0.3
故障现象:
数据库出现了归档失败,其中有一个节点的实例出现HANG死的状况。
日志信息如下:
- Fri Feb 28 19:49:04 2014
- ARC1: Error 19504 Creating archive log file to \'+DATA02\'
- ARCH: Archival stopped, error occurred. Will continue retrying
- ORACLE Instance orcl1 - Archival Error
- ORA-16038: log 14 sequence# 68244 cannot be archived
- ORA-19504: failed to create file \"\"
- ORA-00312: online log 14 thread 1: \'+DATA02/orcl/onlinelog/group_14.264.792274883\'
- ORA-00312: online log 14 thread 1: \'+DATA02/orcl/onlinelog/group_14.265.792274889\'
- Archiver process freed from errors. No longer stopped
- Fri Feb 28 19:50:22 2014
- ARC0: LGWR is actively archiving destination LOG_ARCHIVE_DEST_3
- ARCH: Archival stopped, error occurred. Will continue retrying
- ORACLE Instance orcl1 - Archival Error
- ORA-16014: log 14 sequence# 68244 not archived, no available destinations
- ORA-00312: online log 14 thread 1: \'+DATA02/orcl/onlinelog/group_14.264.792274883\'
- ORA-00312: online log 14 thread 1: \'+DATA02/orcl/onlinelog/group_14.265.792274889\'
- ARC0: Archive log rejected (thread 1 sequence 68240) at host \'orclsh\'
- FAL[server, ARC0]: FAL archive failed, see trace file.
- ARCH: FAL archive failed. Archiver continuing
- ORACLE Instance orcl1 - Archival Error. Archiver continuing.
由于归档失败发生在ASM磁盘上,首先检查ASM磁盘空间以及DB_RECOVERY_FILE_DEST_SIZE,ASM磁盘空间是足够的,而且由于只有一个节点出现出现了无法归档的问题,也可以排除是空间不足造成的。确认两个节点的DB_RECOVERY_FILE_DEST_SIZE参数设置都是0,基本上可以判断问题和当前节点的ASM实例状态不正常有关。
检查ASM实例的错误信息:
- Fri Feb 28 19:41:23 2014
- Dumping diagnostic data in directory=[cdmp_20130702164115], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165521].
- Fri Feb 28 19:49:19 2014
- Dumping diagnostic data in directory=[cdmp_20130702164845], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165522].
- Fri Feb 28 19:55:56 2014
- Dumping diagnostic data in directory=[cdmp_20130702165517], requested by (instance=2, osid=2032294 (LMD0)), summary=[incident=165523].
- Fri Feb 28 18:34:25 2014
- Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186256):
- ORA-04031: unable to allocate 3768 bytes of shared memory (\"shared pool\",\"unknown object\",\"sga heap(1,0)\",\"ges enqueues\")
- Use ADRCI or Support Workbench to package the incident.
- See Note 411.1 at My Oracle Support for error and packaging details.
- Insufficient shared pool to allocate a GES object (ospid 2032294)
- Fri Feb 28 18:29:53 2014
- Sweep [inc][186256]: completed
- Fri Feb 28 18:36:49 2014
- Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_2032294.trc (incident=186257):
- ORA-04031: unable to allocate 3768 bytes of shared memory (\"shared pool\",\"unknown object\",\"sga heap(1,0)\",\"ges enqueues\")
- Use ADRCI or Support Workbench to package the incident.
- See Note 411.1 at My Oracle Support for error and packaging details.
- Insufficient shared pool to allocate a GES object (ospid 2032294)
- Fri Feb 28 20:06:55 2012
- NOTE: No asm libraries found in the system
- ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK1)
- ERROR: -5(Duplicate disk DATA_DG01:ASM_DISK2)
- MEMORY_TARGET defaulting to 411041792.
- * instance_number obtained from CSS = 2, checking for the existence of node 0...
- * node 0 does not exist. instance_number = 2
- Starting ORACLE instance (normal)
- LICENSE_MAX_SESSION = 0
- LICENSE_SESSIONS_WARNING = 0
- Private Interface \'en1\' configured from GPnP for use as a private interconnect.
- [name=\'en1\', type=1, ip=169.254.78.6, mac=00-1a-64-bb-50-7d, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
- Public Interface \'en0\' configured from GPnP for use as a public interface.
- [name=\'en0\', type=1, ip=10.1.16.35, mac=00-1a-64-bb-50-7c, net=10.1.16.32/27, mask=255.255.255.224, use=public/1]
- Picked latch-free SCN scheme 3
- Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/11.2.0.3/grid/dbs/arch
- Autotune of undo retention is turned on.
- LICENSE_MAX_USERS = 0
- SYS auditing is disabled
- NOTE: Volume support enabled
- Starting up:
- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
- With the Real Application Clusters and Automatic Storage Management options.
- ORACLE_HOME = /u01/app/11.2.0.3/grid
- System name: AIX
- Node name: orcldb2
- Release: 1
- Version: 6
- Machine: 00C94E064C00
- Using parameter settings in server-side pfile /u01/app/11.2.0.3/grid/dbs/init+ASM2.ora
- System parameters with non-default values:
- large_pool_size = 12M
- instance_type = \"asm\"
- remote_login_passwordfile= \"EXCLUSIVE\"
- asm_diskstring = \"/dev/ocr_*\"
- asm_diskstring = \"/dev/voting_*\"
- asm_diskstring = \"/dev/asm_*\"
- asm_diskgroups = \"DATA\"
- asm_diskgroups = \"DATA_DG01\"
- asm_diskgroups = \"SPFILE_DG\"
- asm_power_limit = 1
- diagnostic_dest = \"/u01/app/grid\"
- Cluster communication is configured to use the following interface(s) for this instance
- 169.254.78.6
- cluster interconnect IPC version:Oracle UDP/IP (generic)
- IPC Vendor 1 proto 2
当前ASM实例使用默认的MEMORY_TARGET配置,分配大小大约是400M,根据Oracle的MOS文章:ASM & Shared Pool (ORA-4031) [ID 437924.1],在11.2.0.3中,Oracle增加了ASM实例所允许的默认进程数PROCESSES,但是默认的MEMORY_TARGET参数没有增加。
根据Oracle的建议,11.2.0.3的MEMORY_TARGET至少应该设置到1536M,而MEMORY_MAX_TARGET设置为4096M。
- SQL> alter system set memory_max_target=4096m scope=spfile;
-
- SQL> alter system set memory_target=1536m scope=spfile;
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/20674423/viewspace-1098176/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/20674423/viewspace-1098176/