不啰嗦,具体分析过程如下:
1. alert log
2018-02-22T16:00:23.021370+08:00
Errors in file /u01/app/oracle/diag/rdbms/masaike/masaike1/trace/masaike1_m001_50771.trc (incident=5427952):
ORA-04031: unable to allocate 4224 bytes of shared memory ("shared pool","SELECT dbin.instance_number,...","TCHK^c22c7d28","kkotbp : kkoipt")
Incident details in: /u01/app/oracle/diag/rdbms/masaike/masaike1/incident/incdir_5427952/masaike1_m001_50771_i5427952.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
22日最早的4031触发在下午4点,这个时间段开始没有大量4031出现。
2018-02-22T20:05:00.915809+08:00
Errors in file /u01/app/oracle/diag/rdbms/masaike/masaike1/trace/masaike1_ora_45285.trc (incident=5433666):
ORA-04031: unable to allocate 4048 bytes of shared memory ("shared pool","declare
v_cursor integer;
...","PL/SQL MPCODE","pdy3M63_Finish_Page_Table: Copy data seg")
Incident details in: /u01/app/oracle/diag/rdbms/masaike/masaike1/incident/incdir_5433666/masaike1_ora_45285_i5433666.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
......
***********************************************************************
Fatal NI connect error 12170.
VERSION INFORMATION:
TNS for Linux: Version 12.2.0.1.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
Time: 22-FEB-2018 20:44:05
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS:operation timed out
ns secondary err code: 12606
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=190.123.178.68)(PORT=58152))
2018-02-22T20:44:05.727171+08:00
***********************************************************************
......
20:00左右开始出现业务进程触发4031,之后大量的连接报错,说明4031已经影响到数据库的整体性能,碎片严重。
2018-02-22T20:59:37.884506+08:00
Restarting dead background process MMON
Starting background process MMON
2018-02-22T20:59:37.904907+08:00
MMON started with pid=95, OS id=82438
2018-02-22T20:59:43.561240+08:00
opidrv aborting process CL24 ospid (83880) as a result of ORA-4031
2018-02-22T20:59:43.576138+08:00
opidrv aborting process CL25 ospid (83892) as a result of ORA-4031
2018-02-22T20:59:43.766862+08:00
opidrv aborting process CL36 ospid (83939) as a result of ORA-4031
2018-02-22T20:59:43.818709+08:00
opidrv aborting process CL39 ospid (83976) as a result of ORA-4031
2018-02-22T20:59:43.971824+08:00
opidrv aborting process CL45 ospid (84015) as a result of ORA-4031
2018-02-22T20:59:43.971862+08:00
opidrv aborting process CL46 ospid (84025) as a result of ORA-4031
2018-02-22T20:59:44.022049+08:00
opidrv aborting process CL51 ospid (84035) as a result of ORA-4031
2018-02-22T20:59:44.126307+08:00
opidrv aborting process CL57 ospid (84060) as a result of ORA-4031
2018-02-22T20:59:44.142147+08:00
opidrv aborting process CL58 ospid (84062) as a result of ORA-4031
2018-02-22T20:59:44.226535+08:00
opidrv aborting process CL63 ospid (84072) as a result of ORA-4031
2018-02-22T20:59:44.525697+08:00
opidrv aborting process CL81 ospid (84130) as a result of ORA-4031
2018-02-22T20:59:44.547475+08:00
opidrv aborting process CL82 ospid (84132) as a result of ORA-4031
2018-02-22T20:59:44.561992+08:00
opidrv aborting process CL83 ospid (84134) as a result of ORA-4031
2018-02-22T20:59:44.778414+08:00
opidrv aborting process CL96 ospid (84160) as a result of ORA-4031
2018-02-22T20:59:44.794834+08:00
opidrv aborting process CL97 ospid (84162) as a result of ORA-4031
2018-02-22T20:59:44.860335+08:00
opidrv aborting process CL01 ospid (84173) as a result of ORA-4031
2018-02-22T20:59:44.961698+08:00
opidrv aborting process CL07 ospid (84197) as a result of ORA-4031
2018-02-22T20:59:45.043893+08:00
opidrv aborting process CL12 ospid (84207) as a result of ORA-4031
2018-02-22T20:59:45.666646+08:00
opidrv aborting process CL48 ospid (84293) as a result of ORA-4031
2018-02-22T20:59:48.388508+08:00
Instance Critical Process (pid: 183, ospid: 84072, CL63) died unexpectedly
PMON (ospid: 56733): terminating the instance due to error 12752
2018-02-22T20:59:48.509139+08:00
System state dump requested by (instance=1, osid=56733 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/masaike/masaike1/trace/masaike1_diag_56811_20180222205948.trc
2018-02-22T20:59:49.128835+08:00
ORA-1092 : opitsk aborting process
2018-02-22T20:59:50.319126+08:00
License high water mark = 4496
2018-02-22T20:59:54.771874+08:00
Instance terminated by PMON, pid = 56733
2018-02-22T20:59:54.773210+08:00
Warning: 2 processes are still attach to shmid 1409038:
(size: 143360 bytes, creator pid: 53461, last attach/detach pid: 56813)
2018-02-22T20:59:55.333231+08:00
USER (ospid: 85323): terminating the instance
2018-02-22T20:59:55.338825+08:00
Instance terminated by USER, pid = 85323
直接原因是由于CL63进程died导致实例重启,CLnn代表Cleanup Helper Processes,被包含在PMON group中。这个特性是在12.2中引入。
具体可以参考https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cncpt/process-architecture.html#GUID-B5CA9579-53DB-442C-A85F-F21FD334833A下面这个部分15.4.1.1 Process Monitor Process (PMON) Group
15.4.1.1.3 Cleanup Helper Processes (CLnn)
CLMN delegates cleanup work to the CLnn helper processes.
The CLnn processes assist in the cleanup of terminated processes and sessions. The number of helper processes is proportional to the amount of cleanup work to be done and the current efficiency of cleanup.
A cleanup process can become blocked, which prevents it from proceeding to clean up other processes. Also, if multiple processes require cleanup, then cleanup time can be significant. For these reasons, Oracle Database can use multiple helper processes in parallel to perform cleanup, thus alleviating slow performance.
2. 重启前的4031 trace
- alert
2018-02-22T20:44:00.552799+08:00
Errors in file /u01/app/oracle/diag/rdbms/masaike/masaike1/trace/masaike1_ora_28848.trc (incident=5415624):
ORA-04031: unable to allocate 728 bytes of shared memory ("shared pool","unknown object","KGLH0^9d1660bd","kkscs")
2018-02-22T20:44:00.553363+08:00
Errors in file /u01/app/oracle/diag/rdbms/masaike/masaike1/trace/masaike1_ora_21328.trc (incident=5412384):
ORA-04031: unable to allocate 728 bytes of shared memory ("shared pool","DATABASE","KGLH0^ee7e02e2","kkscs")
ORA-04031: unable to allocate 40 bytes of shared memory ("shared pool","unknown object","KGLH0^9d1660bd","kglHeapInitialize:temp")
ORA-04031: unable to allocate 728 bytes of shared memory ("shared pool","unknown object","KGLH0^9d1660bd","kkscs")
Incident details in: /u01/app/oracle/diag/rdbms/masaike/masaike1/incident/incdir_5415624/masaike1_ora_28848_i5415624.trc
Incident details in: /u01