gc current request 引起长期锁表的故障
故障描述:一个数仓系统长期跑批时出现锁表,锁表偶尔严重时影响整体跑批流程,进而影响第2天业务运行。根据业务请求需要进行优化。优化目标尽量减少锁表时间(业务人员最低要求不影响整体跑批流程)。
业务人员诉求分析:找出严重锁表原因,解决锁表严重问题不影响整体跑批即可。
DBA解决思路分析:现象=》AWR/ASH/ADDM报告和调用(监控foglight/bmc)等=》结合经验分析结果=》结合现象+技术原理分析=》最终确定问题原因+解决方案=》长期观察是否解决问题
1.系统环境
SQL> select * from gv$version;
INST_ID BANNER
---------- --------------------------------------------------------------------------------
2 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
2 PL/SQL Release 11.2.0.4.0 - Production
2 CORE 11.2.0.4.0 Production
2 TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
2 NLSRTL Version 11.2.0.4.0 - Production
1 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
1 PL/SQL Release 11.2.0.4.0 - Production
1 CORE 11.2.0.4.0 Production
1 TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
1 NLSRTL Version 11.2.0.4.0 - Production
10 rows selected.
2.故障时间点确认
Troubleshooting:数据说话。
我们这里用foglight(DB监控软件)来抓取数据(CPU/IO/MEMORY波峰),确定故障时间点。
2.1 以等待事件为基础数据
7天统计如下:
截图省略...
9月13号,业务人员报故障的时间段
截图省略...
==经DBA分析,决定选择00:00:00--01:00:00和03:00:00--04:00:00进行分析==
3.根据性能分析工具AWR的数据进行分析
00:00:00--01:00:00 AWR报告
3.1 因为是锁表也是一个种等待事件,所以直接快速关注events
==node A==
Top 10 Foreground Events by Total Wait Time
EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
enq: CF - contention3,65113.7K374657.9Other
enq: TX - row lock contention513610.87080015.3Application
direct path read10,6521705.31607.2User I/O
db file scattered read35,5611056.8304.5User I/O
direct path write22,929989.5434.2User I/O
DB CPU 530.4 2.2
direct path read temp18,124270.6151.1User I/O
db file sequential read46,447264.361.1User I/O
enq: HW - contention3,241199.161.8Configuration
gc current block busy3,794164.543.7Cluster
==node B==
Top 10 Foreground Events by Total Wait Time
EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
enq: CF - contention7,1146801.195630.9Other
direct path write102,6064393.34319.9User I/O
direct path read28,6704047.914118.4User I/O
db file scattered read91,5642344.52610.6User I/O
DB CPU 1296 5.9
gc current request1383.13830951.7Cluster
db file parallel read3,286284.7871.3User I/O
db file sequential read86,52424731.1User I/O
CSS initialization298222.57471.0Other
control file parallel write31,876148.75.7System I/O
Performance Degradation as a Result of 'enq: CF - contention' (文档 ID 1072417.1)
Waits for 'Enq: TX - ...' Type Even