下午收到告警,某同城adg同步异常,数据库比较重要,需要紧急处理。
一、分析
1、查看同步状态
该环境是主备均为两节点的rac adg结构,登录备库1节点,查询同步进程状态。
SQL> set line 999
SQL> select inst_id,thread#,process,pid,status,client_process,client_pid,sequence#,block#,active_agents,known_agents from gv$managed_standby;
INST_ID THREAD# PROCESS PID STATUS CLIENT_PROCESS CLIENT_PID SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
---------- ---------- -------- ------- ----------- ---------------- ----------- --------- ---------- ------------- ------------
1 2 ARCH 75208 CLOSING ARCH 75208 18277 622592 0 0
1 2 ARCH 75210 CLOSING ARCH 75210 18273 18432 0 0
1 0 ARCH 75212 CONNECTED ARCH 75212 0 0 0 0
1 2 ARCH 75214 CLOSING ARCH 75214 18279 276480 0 0
1 2 ARCH 75216 CLOSING ARCH 75216 18274 26624 0 0
1 2 ARCH 75218 CLOSING ARCH 75218 18272 202752 0 0
1 2 ARCH 75220 CLOSING ARCH 75220 18270 24576 0 0
1 2 ARCH 75222 CLOSING ARCH 75222 18252 626688 0 0
1 2 RFS 81844 IDLE LGWR 43790 18280 367617 0 0
1 0 RFS 101341 IDLE ARCH 27516 0 0 0 0
1 0 RFS 75268 IDLE UNKNOWN 27512 0 0 0 0
1 0 RFS 102717 IDLE ARCH 2645 0 0 0 0
2 1 ARCH 65246 CLOSING ARCH 65246 18592 722944 0 0
2 1 ARCH 65248 CLOSING ARCH 65248 18593 163840 0 0
2 0 ARCH 65250 CONNECTED ARCH 65250 0 0 0 0
2 1 ARCH 65252 CLOSING ARCH 65252 18594 610304 0 0
2 1 ARCH 65254 CLOSING ARCH 65254 18595 253952 0 0
2 1 ARCH 65256 CLOSING ARCH 65256 18589 32768 0 0
2 1 ARCH 65258 CLOSING ARCH 65258 18590 198656 0 0
2 1 ARCH 65260 CLOSING ARCH 65260 18591 67584 0 0
2 1 RFS 81245 IDLE LGWR 30715 18596 452360 0 0
2 0 RFS 65360 IDLE UNKNOWN 49378 0 0 0 0
22 rows selected.
发现同步进程MRP0已经停止。
2、查看告警日志
Wed Jan 27 12:59:12 2021
Media Recovery Waiting for thread 2 sequence 18279 (in transit)
Recovery of Online Redo Log: Thread 2 Group 14 Seq 18279 Reading mem 0
Mem# 0: +DATA/dr_xxxxdb/onlinelog/group_14.287.1046720255
Wed Jan 27 12:59:12 2021
Archived Log entry 5392 added for thread 2 sequence 18278 ID 0x61a69b1c dest 1:
Media Recovery Waiting for thread 1 sequence 18595 (in transit)
Recovery of Online Redo Log: Thread 1 Group 13 Seq 18595 Reading mem 0
Mem# 0: +DATA/dr_xxxxdb/onlinelog/group_13.288.1046720253
Wed Jan 27 14:54:22 2021
Errors in file /home/db/oracle/diag/rdbms/dr_xxxxdb/ibpldb1/trace/ibpldb1_pr00_75552.trc:
ORA-01119: error in creating database file '+DATA'
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup "DATA" space exhausted
File #18 added to control file as 'UNNAMED00018'.
Originally created as:
'+DATA/ibpldb/datafile/xxxxx_data.295.1062946453'
Recovery was unable to create the file as a new OMF file.
MRP0: Background Media Recovery terminated with error 1274
Errors in file /home/db/oracle/diag/rdbms/dr_xxxxdb/ibpldb1/trace/ibpldb1_pr00_75552.trc:
ORA-01274: cannot add datafile '+DATA/ibpldb/datafile/xxxxx_data.295.1062946453' - file could not be created
Wed Jan 27 14:54:22 2021
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 5816190494
Wed Jan 27 14:54:25 2021
Reconfiguration started (old inc 6, new inc 8)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Jan 27 14:54:25 2021
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Wed Jan 27 14:54:25 2021
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Wed Jan 27 14:54:27 2021
MRP0: Background Media Recovery process shutdown (ibpldb1)
Wed Jan 27 14:59:11 2021
RFS[4]: Selected log 12 for thread 2 sequence 18280 dbid 1638254198 branch 947517108
从alert里面可以看出,在Jan 27 14:54:22 2021添加数据文件的时候,由于DATA磁盘组不足,创建数据文件报错,导致MRP0进程shutdown。
3、查看磁盘组及数据文件状态
SQL> set lines 300
SQL> col name for a20
SQL> col free for a20
SQL> select name,group_number,state ,TOTAL_MB/1024,FREE_MB/1024,trunc(free_mb/total_mb*100,2)||'%' free from v$asm_diskgroup ;
NAME GROUP_NUMBER STATE TOTAL_MB/1024 FREE_MB/1024 FREE
------- ------------ ------------ ------------- ------------ ------
ARCH 1 CONNECTED 50 29.390625 58.78%
CRS 2 MOUNTED 9.29882813 8.39453125 90.27%
DATA 3 CONNECTED 402 2.546875 .63%
FRA 4 MOUNTED 51 50.8789063 99.76%
可以看到DATA磁盘组剩余空间不足3G了。
SQL> alter session set nls_DATe_format='yyyy-mm-dd h