备份出现问题,通过查看数据库日志及主机运行信息,发现遇到大量ORA-00060死锁报错,产生大量TRACE文件,导致ORACLE软件目录空间资源耗尽;为了尽快恢复系统,通过扩大ORACLE软件目录后恢复正常。工程师后续对此故障原因进行分析及相关建议,详细的故障情况及相关日志、TRACE文件的分析及总结、建议,请参阅本文档。
ORACLE数据库产生大量TRACE文件原因分析
2.1 数据库alert日志的分析
数据库实例的Alert日志
Thu Jan 14 12:00:02 2016
Thread 1 cannot allocate new log, sequence 1783
Private strand flush not complete
Current log# 9 seq# 1782 mem# 0: /DATA/fcr51/redo09.log
Thread 1 advanced to log sequence 1783 (LGWR switch)
Current log# 10 seq# 1783 mem# 0: /DATA/fcr51/redo10.log
Archived Log entry 2641 added for thread 1 sequence 1782 ID 0x10075ae2 dest 1:
ALTER SYSTEM ARCHIVE LOG
Thread 1 cannot allocate new log, sequence 1784
Private strand flush not complete
Current log# 10 seq# 1783 mem# 0: /DATA/fcr51/redo10.log
Thread 1 advanced to log sequence 1784 (LGWR switch)
Current log# 11 seq# 1784 mem# 0: /DATA/fcr51/redo11.log
Archived Log entry 2642 added for thread 1 sequence 1783 ID 0x10075ae2 dest 1:
Thu Jan 14 16:34:19 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_5898780.trc.
Thu Jan 14 16:34:22 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23527512.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_5898780.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23527512.trc.
Thu Jan 14 16:34:33 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_5898780.trc.
Thu Jan 14 16:34:37 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23527512.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_5898780.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23527512.trc.
Thu Jan 14 16:34:47 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_5898780.trc.
Thu Jan 14 16:34:51 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23527512.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_5898780.trc.
Thu Jan 14 16:34:59 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23266146.trc.
Thu Jan 14 16:35:02 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23527512.trc.
Thu Jan 14 16:35:04 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_15073478.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23266146.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_15073478.trc.
Thu Jan 14 16:35:10 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23266146.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_15073478.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23266146.trc.
Thu Jan 14 16:35:16 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_15073478.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23266146.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_15073478.trc.
Thu Jan 14 16:35:22 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_23266146.trc.
Thu Jan 14 16:35:26 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_20185548.trc.
Thu Jan 14 16:35:29 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_15073478.trc.
Thu Jan 14 16:35:32 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_12648556.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_20185548.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_12648556.trc.
Thu Jan 14 16:35:43 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_20185548.trc.
Thu Jan 14 16:35:47 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_12648556.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_20185548.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_12648556.trc.
Thu Jan 14 16:35:54 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_20185548.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_12648556.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_20185548.trc.
Thu Jan 14 16:35:59 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_21823518.trc.
Thu Jan 14 16:36:02 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_12648556.trc.
Thu Jan 14 16:36:06 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_14746308.trc.
Thu Jan 14 16:36:09 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_21823518.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_14746308.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_21823518.trc.
Thu Jan 14 16:36:19 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_14746308.trc.
Thu Jan 14 16:36:21 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_21823518.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_14746308.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_21823518.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_14746308.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_21823518.trc.
Thu Jan 14 16:36:30 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_11928290.trc.
Thu Jan 14 16:36:33 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_14746308.trc.
Thu Jan 14 16:36:37 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_13829192.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_11928290.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_13829192.trc.
Thu Jan 14 16:36:47 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_11928290.trc.
Thu Jan 14 16:36:51 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_13829192.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_11928290.trc.
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_13829192.trc.
Thu Jan 14 16:37:02 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_11928290.trc.
Thu Jan 14 16:37:06 2016
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_13829192.trc.
…………………………
节约篇幅,大量类似输出省略;
对故障时间段的alert日志进行分析,可以发现当时ALERT日志中出现大量死锁告警信息,并产生对应的TRACE文件:
ORA-00060: Deadlock detected. More info in file /oracle/diag/rdbms/fcr51/fcr51/trace/fcr51_ora_17105662.trc.
2.2 对trace文件信息的分析
ALERT日志中提到了大量trace文件,对此类TRACE文件进行了分析。
可以发现产生死锁进程涉及的SQL语句为UPDATE AAA SET PROPAGATE_FLAG = NULL。
如下为截取fcr51_ora_24838762.trc中的部分信息:
*** 2016-01-14 17:51:12.312
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00270007-000072db 433 1767 X 417 1191 X
TX-000c001d-00006e79 417 1191 X 433 1767 X
session 1767: DID 0001-01B1-00000029 session 1191: DID 0001-01A1-0000002A
session 1191: DID 0001-01A1-0000002A session 1767: DID 0001-01B1-00000029
Rows waited on:
Session 1767: obj - rowid = 0003E3A5 - AAA+OlAAbAAAXoEAAG
(dictionary objn - 254885, file - 27, block - 96772, slot - 6)
Session 1191: obj - rowid = 0003E3A5 - AAA+OlAAbAAAXoEAAA
(dictionary objn - 254885, file - 27, block - 96772, slot - 0)
----- Information for the OTHER waiting sessions -----
Session 1191:
sid: 1191 ser: 91 audsid: 230125 user: 67/FCR51HOST
flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
pid: 417 O/S info: user: oracle, term: UNKNOWN, ospid: 16712824
image: oracle@FCRDB1
client details:
O/S info: user: weblogic, term: unknown, ospid: 1234
machine: FCRAPP_2 program: JDBC Thin Client
application name: API, hash value=2297302627
action name: 155, hash value=497584993
current SQL:
UPDATE IF_PROPAGATED_RATES SET PROPAGATE_FLAG = NULL
----- End of information for the OTHER waiting sessions -----
Information for THIS session:
----- Current SQL Statement for this session (sql_id=6x4603d0sfjmf) -----
UPDATE AAAAA SET PROPAGATE_FLAG = NULL
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
700012a1d3ba3f0 699 package body FCR51HOST.IF_CYRATE
700012a9add0910 3 anonymous block
===================================================
PROCESS STATE
-------------
Process global information:
总结与建议
针对此次ORACLE数据库产生大量TRACE文件的问题,分析总结及建议如下:
3.1 问题描述与分析
主机操作系统为AIX7.1,数据库版本为11.2.0.4.0;
死锁概念:ORA-00060产生的根源是多个进程执行的SQL语句互相持有对方的资源,导致互相持有资源的进程均无法完成操作;ORACLE后台进程检测到死锁后会对相应进行状态进行转储,生成trace文件;
本次问题中,分析TRACE文件可以发现问题原因为多个进程执行SQL语句:UPDATE AAAA SET PROPAGATE_FLAG = NULL从而产生大量死锁,报错信息为ORA-00060,同时对应产生大量TRACE文件;
因短时间内产生海量的TRACE文件,导致TRACE文件所在的目录空间不足,进而导致数据库软件运行出现异常,无法进行备份操作。
3.2 后续处理方法与建议
1. 协调应用开发商优化问题SQL涉及的业务逻辑(推荐)
死锁问题会影响业务处理效率,同时产生的海量TRACE文件会占用大量空间。
2. 加强对主机操作系统资源及数据库运行情况的监控。
3. 限制TRACE文件最大大小值(不推荐)
通过数据库参数MAX_DUMP_FILE_SIZE限制单个TRACE文件的上限值,一定程序上会减少TRACE文件占用的空间,但是无法限制TRACE文件的个数。如果短时间内出现大量死锁问题,所产生大量TRACE文件仍可能消耗大量空间。
同时限制单个TRACE文件的上限值后,例如限制为10M,当问题进程对应的TRACE文件达到10M时,后续的TRACE信息无法写入TRACE文件;相应的故障分析将因缺少相关数据而无法进行。
ORACEL通过参数MAX_DUMP_FILE_SIZE(默认值是UNLIMITED),即不限制TRACE文件大小。