导读:本文将对Redo Nowait指标的算法和诊断进行深度解析。
AWR知识体系:https://www.modb.pro/topic/6165(复制到浏览器打开或者点击“阅读原文”)
曾经遇到过一个性能故障,数据库的检查点执行的非常缓慢,直接导致所有日志组都处于活动状态,数据库处于不断停顿的『打嗝』工作状态。
检查V$LOG视图,可以获得日志状态,除了CURRENT日志组,其他日志都处于ACTIVE状态,而且后面的几组日志都是DBA最新添加的:
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ---------
1 1 520403 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
2 1 520404 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
3 1 520405 31457280 1 NO ACTIVE 1.3861E+10 23-JUN-05
4 1 520406 31457280 1 NO CURRENT 1.3861E+10 23-JUN-05
5 1 520398 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05
6 1 520399 31457280 1 NO ACTIVE 1.3860E+10 23-JUN-05
7 1 520400 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05
8 1 520401 104857600 1 NO ACTIVE 1.3860E+10 23-JUN-05
9 1 520402 104857600 1 NO ACTIVE 1.3861E+10 23-JUN-05
如果日志都处于Active状态,那么显然是DBWR的写出已经无法跟上Log Switch切换触发的检查点。
接下来让我们检查一下DBWR的繁忙程度:
oracle:/oracle >ps -ef|grep ora_dbw
oracle 2266 1 0 Mar 31 ? 811:42 ora_dbw0_hysms02
oracle 21023 21012 0 18:52:59 pts/65 0:00 grep ora_dbw
这里可以看到DBWR的进程号是2266,接下来使用Top命令观察一下其CPU资源使用情况:
oracle:/oracle >top
last pid: 21145; load averages: 3.38, 3.45, 3.67 18:53:38
725 processes: 711 sleeping, 1 running, 10 zombie, 3 on cpu
CPU states: 35.2% idle, 40.1% user, 9.4% kernel, 15.4% iowait, 0.0% swap
Memory: 3072M real, 286M free, 3120M swap in use, 1146M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
11855 smspf 1 59