oracle data guard failover 切换(手动处理redo)
1 初始状态
master库:
select database_role,switchover_status,protection_mode from v$database;
DATABASE_ROLE SWITCHOVER_STATUS PROTECTION_MODE
---------------- -------------------- --------------------
PRIMARY TO STANDBY MAXIMUM PERFORMANCE
初始数据
SYS@master> select * from wen;
NAME TIME
---------- ---------
king 09-MAR-20
slave库:
SYS@slave> select database_role,switchover_status,protection_mode from v$database;
DATABASE_ROLE SWITCHOVER_STATUS PROTECTION_MODE
---------------- -------------------- --------------------
PHYSICAL STANDBY NOT ALLOWED MAXIMUM PERFORMANCE
初始数据
SYS@slave> select * from wen;
NAME TIME
---------- ---------
king 09-MAR-20
2 制造数据差异
2.1 关闭slave网卡
[root@slave ~]# systemctl stop network
[root@slave ~]# ping master
connect: Network is unreachable
master的日志报错:
Wed Mar 11 04:54:11 2020
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 16198 for archive log file 3 to 'slave'
ORA-16198: LGWR received timedout error from KSR
LGWR: Error 16198 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'slave' ----网络已断开
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Failed to archive log 3 thread 1 sequence 10 (16198)
Thread 1 advanced to log sequence 11 (LGWR switch)
Current log# 1 seq# 11 mem# 0: /u01/app/oracle/oradata/master/redo01.log
Wed Mar 11 04:54:11 2020
Archived Log entry 9 added for thread 1 sequence 10 ID 0xb880a537 dest 1:
2.2 master库插入数据,并切redo日志,最后shutdown abort
SYS@master> insert into wen values('leo',sysdate);
1 row created.
SYS@master> commit;
Commit complete.
SYS@master> alter system switch logfile;
System altered.
SYS@master> insert into wen values('white',sysdate);
1 row created.
SYS@master> commit;
Commit complete.
SYS@master> alter system switch logfile;
System altered.
SYS@master> insert into wen values('neo',sysdate);
1 row created.
SYS@master> commit;
Commit complete.
SYS@master> select * from wen;
NAME TIME
---------- ---------
king 09-MAR-20
leo 09-MAR-20
white 09-MAR-20
neo 09-MAR-20
SYS@master> shu abort
ORACLE instance shut down.
2.3 开启slave的网卡,并确认表 wen 的数据。
--开网卡
[root@slave ~]# systemctl start network
[root@slave ~]# ping master
PING master (192.168.189.39) 56(84) bytes of data.
64 bytes from master (192.168.189.39): icmp_seq=1 ttl=64 time=1.07 ms
64 bytes from master (192.168.189.39): icmp_seq=2 ttl=64 time=0.648 ms
^C
--- master ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.648/0.863/1.078/0.215 ms
--确认数据
SYS@slave> select * from wen;
NAME TIME
---------- ---------
king 09-MAR-20
slave库当前缺少数据。
3 手动注册归档,并使用redo做不完全恢复
3.1 查看slave的归档信息
SYS@slave> select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
14 --slave已经归档到14
SYS@slave> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u01/app/arch/slave
Oldest online log sequence 0
Next log sequence to archive 0
Current log sequence 15 --slave当前redo为15
3.2 查看master的alert日志,确认master和slave相差的归档个数,及master库发生异常时的current redo
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 16198 for archive log file 3 to 'slave'
ORA-16198: LGWR received timedout error from KSR
LGWR: Error 16198 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'slave'
Mon Mar 09 23:17:09 2020
LGWR: Failed to archive log 3 thread 1 sequence 15 (16198)
Thread 1 advanced to log sequence 16 (LGWR switch)
Current log# 1 seq# 16 mem# 0: /u01/app/oracle/oradata/master/redo01.log
Mon Mar 09 23:17:09 2020
Archived Log entry 15 added for thread 1 sequence 15 ID 0xb87f4fff dest 1:
Mon Mar 09 23:17:40 2020
Thread 1 advanced to log sequence 17 (LGWR switch)
Current log# 2 seq# 17 mem# 0: /u01/app/oracle/oradata/master/redo02.log --master的current redo为redo02.log
Mon Mar 09 23:17:40 2020
Archived Log entry 16 added for thread 1 sequence 16 ID 0xb87f4fff dest 1: --master已经归档到16
Mon Mar 09 23:21:27 2020
WARN: ARC0: Terminating pid 118500 hung on an I/O operation
Mon Mar 09 23:21:28 2020
WARN: ARC3: Terminating pid 118500 hung on an I/O operation
master和slave相差15、16号归档日志,master库异常关闭时的current redo为redo02.log
4 将相差的归档日志和current redo复制到slave库
--复制归档日志
[oracle@master master]$ scp 1_15_1034631553.dbf oracle@192.168.189.40:/home/oracle/wen
oracle@192.168.189.40's password:
1_15_1034631553.dbf 100% 675KB 54.0MB/s 00:00
[oracle@master master]$ scp 1_16_1034631553.dbf oracle@192.168.189.40:/home/oracle/wen
oracle@192.168.189.40's password:
1_16_1034631553.dbf 100% 16KB 2.9MB/s 00:00
--复制redo日志
[oracle@master master]$ cd /u01/app/oracle/oradata/master
[oracle@master master]$ scp redo02.log oracle@192.168.189.40:/home/oracle/wen
oracle@192.168.189.40's password:
redo02.log 100% 50MB 50.0MB/s 00:01
5 slave库注册归档日志(slave的状态为mount,否则归档日志不会应用)
5.1 注册归档日志
SYS@slave> alter database register logfile '/home/oracle/wen/1_15_1034631553.dbf';
Database altered.
SYS@slave> alter database register logfile '/home/oracle/wen/1_16_1034631553.dbf';
Database altered.
SYS@slave> alter database recover managed standby database cancel;
Database altered.
SYS@slave> alter database recover managed standby database disconnect from session;
Database altered.
注:
1、slave库的状态为mount;
2、使用alter database recover managed standby database disconnect from session应用日志。
slave库的alert日志:
Tue Mar 10 01:44:43 2020
alter database recover managed standby database cancel
Tue Mar 10 01:44:43 2020
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /u01/app/oracle/diag/rdbms/slave/slave/trace/slave_pr00_129684.trc:
ORA-16037: user requested cancel of managed recovery operation
Managed Standby Recovery not using Real Time Apply
Recovery interrupted!
Recovered data files to a consistent state at change 2659224
Tue Mar 10 01:44:43 2020
MRP0: Background Media Recovery process shutdown (slave)
Managed Standby Recovery Canceled (slave)
Completed: alter database recover managed standby database cancel
Tue Mar 10 01:44:57 2020
alter database recover managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (slave)
Tue Mar 10 01:44:57 2020
MRP0 started with pid=25, OS id=10934
MRP0: Background Managed Standby Recovery process started (slave)
started logmerger process
Tue Mar 10 01:45:02 2020
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 4 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Waiting for thread 1 sequence 15
Completed: alter database recover managed standby database disconnect from session
Error 1034 received logging on to the standby
FAL[client, USER]: Error 1034 connecting to master for fetching gap sequence
Tue Mar 10 01:45:23 2020
alter database register logfile '/home/oracle/wen/1_15_1034631553.dbf'
There are 1 logfiles specified.
ALTER DATABASE REGISTER [PHYSICAL] LOGFILE
Completed: alter database register logfile '/home/oracle/wen/1_15_1034631553.dbf'
Tue Mar 10 01:45:33 2020
Media Recovery Log /home/oracle/wen/1_15_1034631553.dbf
Media Recovery Waiting for thread 1 sequence 16
Tue Mar 10 01:45:40 2020
alter database register logfile '/home/oracle/wen/1_16_1034631553.dbf'
There are 1 logfiles specified.
ALTER DATABASE REGISTER [PHYSICAL] LOGFILE
Resynchronizing thread 1 from sequence 15 to 16
Completed: alter database register logfile '/home/oracle/wen/1_16_1034631553.dbf'
Tue Mar 10 01:45:44 2020
Media Recovery Log /home/oracle/wen/1_16_1034631553.dbf
Media Recovery Waiting for thread 1 sequence 17
Tue Mar 10 01:45:57 2020
alter database recover managed standby database cancel
Tue Mar 10 01:45:57 2020
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /u01/app/oracle/diag/rdbms/slave/slave/trace/slave_pr00_10940.trc:
ORA-16037: user requested cancel of managed recovery operation
Recovery interrupted!
Tue Mar 10 01:45:58 2020
MRP0: Background Media Recovery process shutdown (slave)
Managed Standby Recovery Canceled (slave)
Completed: alter database recover managed standby database cancel
Tue Mar 10 01:46:11 2020
alter database recover managed standby database disconnect from session
Attempt to start background Managed Standby Recovery process (slave)
Tue Mar 10 01:46:11 2020
MRP0 started with pid=25, OS id=11035
MRP0: Background Managed Standby Recovery process started (slave)
started logmerger process
Tue Mar 10 01:46:16 2020
Managed Standby Recovery not using Real Time Apply
Parallel Media Recovery started with 4 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Waiting for thread 1 sequence 17
Completed: alter database recover managed standby database disconnect from session
5.2 查看slave的归档信息
SYS@slave> select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
--------------
16 --slave库已经注册到16号归档
SYS@slave> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u01/app/arch/slave
Oldest online log sequence 0
Next log sequence to archive 0
Current log sequence 16
6 使用master库的current redo进行不完全恢复(slave的状态为mount)
--slave库取消日志应用
SYS@slave> alter database recover managed standby database cancel;
Database altered.
--使用master库的current redo 进行不完全恢复
SYS@slave> recover standby database until cancel;
ORA-00279: change 2659662 generated at 03/09/2020 23:17:40 needed for thread 1
ORA-00289: suggestion : /u01/app/arch/slave/1_17_1034631553.dbf
ORA-00280: change 2659662 for thread 1 is in sequence #17
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/home/oracle/wen/redo02.log --为从master库复制过来的current redo
Log applied.
Media recovery complete.
--进行failover故障转移
SYS@slave> alter database activate standby database;
Database altered.
--开库
SYS@slave> alter database open;
Database altered.
7 状态和数据确认
SYS@slave> select database_role,switchover_status,protection_mode from v$database;
DATABASE_ROLE SWITCHOVER_STATUS PROTECTION_MODE
---------------- -------------------- --------------------
PRIMARY FAILED DESTINATION MAXIMUM PERFORMANCE
SYS@slave> select * from wen;
NAME TIME
---------- ---------
king 09-MAR-20
leo 09-MAR-20
white 09-MAR-20
neo 09-MAR-20
—failover切换完成—
总结:
1、slave库应用归档时要注意数据库状态是否为mount,且不能使用using current logfile方式应用日志;
2、failover切换后,通过删除master库来重建dg架构。(使用flash方式时,新备库不能应用日志,后续确认是否是个人原因)
3、参考了https://www.oschina.net/question/565065_126200。
谨记:心存敬畏,行有所止。