- 数据库告警日志分析分析
通过对数据库告警日志查看,节点1出现ORA-29970错误,LMD进程无法响应,被LMHB进程将实例终止,实例终止后自动启动。节点2被LMS0进程将实例终止,实例终止后自动启动。
1节点:
Mon Jun 07 01:22:14 2021
LMD0 (ospid: 10613) has not called a wait for sub 0 secs.
LMD1 (ospid: 10617) has not called a wait for sub 0 secs.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmhb_10637.trc (incident=688375) (PDBNAME=CDB$ROOT):
ORA-29770: global enqueue process LMD0 (OSID 10613) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb1/incident/incdir_688375/cdb1_lmhb_10637_i688375.trc
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmhb_10637.trc (incident=688376) (PDBNAME=CDB$ROOT):
ORA-29770: global enqueue process LMD1 (OSID 10617) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/cdb/cdb1/incident/incdir_688376/cdb1_lmhb_10637_i688376.trc
LOCK_DBGRP: GCR_SYSTEST debug event locked group GR+DB_CDB by memno 0
ERROR: Some process(s) is not making progress.
LMHB (ospid: 10637) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process(s) is not making progress.
LMHB (ospid: 10637): terminating the instance due to error 29770
Mon Jun 07 01:22:24 2021
System state dump requested by (instance=1, osid=10637 (LMHB)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_diag_10575_20210607012224.trc
Mon Jun 07 01:22:26 2021
License high water mark = 591
Mon Jun 07 01:22:29 2021
Instance terminated by LMHB, pid = 10637
Mon Jun 07 01:22:29 2021
USER (ospid: 21601): terminating the instance
Mon Jun 07 01:22:29 2021
Instance terminated by USER, pid = 21601
Mon Jun 07 01:22:32 2021
Starting ORACLE instance (normal) (OS id: 21660)
Mon Jun 07 01:22:32 2021
CLI notifier numLatches:37 maxDescs:3986
Mon Jun 07 01:22:32 2021
**********************************************************************
Mon Jun 07 01:22:32 2021
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
Mon Jun 07 01:22:32 2021
Per process system memlock (soft) limit = UNLIMITED
Mon Jun 07 01:22:32 2021
Expected per process system memlock (soft) limit to lock
SHARED GLOBAL AREA (SGA) into memory: 128G
Mon Jun 07 01:22:32 2021
Available system pagesizes:
4K, 2048K
Mon Jun 07 01:22:32 2021
Supported system pagesize(s):
Mon Jun 07 01:22:32 2021
PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)
Mon Jun 07 01:22:32 2021
2048K 66823 65538 65538 NONE
Mon Jun 07 01:22:32 2021
Reason for not supporting certain system pagesizes:
Mon Jun 07 01:22:32 2021
4K - Large pagesizes only
Mon Jun 07 01:22:32 2021
**********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 24
2节点:
Mon Jun 07 01:25:05 2021
Set master node info
Mon Jun 07 01:26:05 2021
Auto-tuning: Shutting down background process GTXb
Mon Jun 07 01:27:53 2021
IPC Send timeout detected. Sender: ospid 62152 [oracle@node1 (PING)]
Receiver: inst 1 binc 912546309 ospid 10589
Mon Jun 07 01:29:25 2021
LMD0 (ospid: 62168) received an instance eviction notification from instance 1 [2]
Mon Jun 07 01:29:26 2021
Received an instance abort message from instance 1
Mon Jun 07 01:29:26 2021
Received an instance abort message from instance 1
Mon Jun 07 01:29:26 2021
Please check instance 1 alert and LMON trace files for detail.
Mon Jun 07 01:29:26 2021
Please check instance 1 alert and LMON trace files for detail.
Mon Jun 07 01:29:26 2021
LMS0 (ospid: 62192): terminating the instance due to error 481
Mon Jun 07 01:29:26 2021
System state dump requested by (instance=2, osid=62192 (LMS0)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/cdb/cdb2/trace/cdb2_diag_62123_20210607012926.trc
Mon Jun 07 01:29:26 2021
ORA-1092 : opitsk aborting process
Mon Jun 07 01:29:27 2021
License high water mark = 1251
Mon Jun 07 01:29:31 2021
Instance terminated by LMS0, pid = 62192
Mon Jun 07 01:29:31 2021
USER (ospid: 85262): terminating the instance
Mon Jun 07 01:29:31 2021
Instance terminated by USER, pid = 85262
Mon Jun 07 01:29:33 2021
Starting ORACLE instance (normal) (OS id: 85397)
Mon Jun 07 01:29:33 2021
CLI notifier numLatches:37 maxDescs:3986
Mon Jun 07 01:29:33 2021
**********************************************************************
Mon Jun 07 01:29:33 2021
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
Mon Jun 07 01:29:33 2021
Per process system memlock (soft) limit = UNLIMITED
Mon Jun 07 01:29:33 2021
Expected per process system memlock (soft) limit to lock
SHARED GLOBAL AREA (SGA) into memory: 128G
Mon Jun 07 01:29:33 2021
Available system pagesizes:
4K, 2048K
Mon Jun 07 01:29:33 2021
Supported system pagesize(s):
Mon Jun 07 01:29:33 2021
PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)
Mon Jun 07 01:29:33 2021
2048K 67200 65538 65538 NONE
Mon Jun 07 01:29:33 2021
Reason for not supporting certain system pagesizes:
Mon Jun 07 01:29:33 2021
4K - Large pagesizes only
Mon Jun 07 01:29:33 2021
**********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 24
查看更多日志,发现在关闭之前,出现了较多的ORA-04031错误。
Mon Jun 07 01:18:14 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14795.trc (incident=691727) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1069","kglseshtTable")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmd0_10613.trc (incident=707486) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory ("shared pool","unknown object","sga heap(5,0)","ges big msg pool")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 07 01:18:16 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14832.trc (incident=691688) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^320","kglseshtTable")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 07 01:18:16 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14837.trc (incident=691735) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^1234","kglseshtTable")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Jun 07 01:18:18 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_ora_14912.trc (incident=691871) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^159","kglseshtTable")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_lmd0_10613.trc (incident=707487) (PDBNAME=CDB$ROOT):
ORA-04031: unable to allocate 8504 bytes of shared memory ("shared pool","unknown object","sga heap(5,0)","ges big msg pool")
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
DDE: Problem Key 'ORA 4031' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Mon Jun 07 01:18:33 2021
Errors in file /u01/app/oracle/diag/rdbms/cdb/cdb1/trace/cdb1_m000_15479.trc:
ORA-04031: unable to allocate 12312 bytes of shared memory ("shared pool","unknown object","KKSSP^2147","kglseshtTable")
-
AWR报告信息分析
拉取了宕机前半小时的AWR报告,从AWR报告中得知,shared pool中ges enqueues和ges resource dynamic分别达到了18G和16G。 -
Oracle官方参考文献
根据Oracle官方文档记载:ORA-04031 Errors Occurring with High “ges resource dynamic” & “ges enqueues” Memory Usage In The Shared Pool (Doc ID 2063751.1),该问题是Oracle的BUG,该BUG出现于Oracle RAC 12.1.0.1至12.1.0.2,此问题在Oracle RAC 12.2中被修复,当前数据库版本为Oracle RAC 12.1.0.2。 -
处理操作
根据Oracle官方参考文献,解决该问题有两种方式:临时解决和永久解决。
a) 临时解决方案
修改数据库参数,_GES_DIRECT_FREE_RES_TYPE=“CTARAHDXBB”,该参数是静态参数,需要重启数据库生效,可采取2个节点轮流修改的方法,减少业务停机时间,推荐采用此方案。
b) 永久解决方案
安装数据库补丁,补丁号Patch:21260431,需要安排业务停机时间安装补丁。