前言:
本文主要测试Oracle DataGuard三种保护模式最大保护,最大可用以及最大性能的配置以及事务同步场景。
模式一:最大可用
这种保护模式在不影响主库可用性的情况下提供了最高级别的数据保护。主库只有将事务所需的redo写入至少一个同步备库上的standby log(取决于配置)后,事务才会提交。
如果主数据库无法将其redo写入至少一个同步备用数据库,则主库模式会转变为最高性能模式,以保持主数据库的可用性,直到它再次能够将其redo写入备库。
这种保护模式尽可能的确保了零数据丢失,除非出现某些双重故障,例如备库发生故障后主库发生故障。
模式二:最大性能
这是默认保护模式。它提供主备模式下的最高性能。主库事务可以立即提交,不需要等待备库的确认。redo也会写入一个或多个备库,但这是相对于主库提交异步完成的,因此主数据库性能不受redo日志写入备库延迟的影响。
这种保护模式提供的数据保护略低于最大可用性模式,对主数据库性能的影响最小。
模式三:最大保护
最大模式可确保在主库发生故障时不会发生数据丢失。为了提供这种级别的保护,事务提交的redo日志必须在事务提交之前写入至少一个同步备库。为了确保数据不会丢失,如果主库事务提交的redo日志无法写入至少一个同步备库,数据库则会shutdown,不继续处理事务。
log_archive_dest_n参数模式配置
SYNC:主库使用同步的方式同步redo log到备库。
ASYN:主库使用异步的方式同步redo log到备库。
AFFIRM:主库事务提交之后需要确认主库redo log传输到备库并写入standby log 的磁盘上才返回提交。
NOAFFIRM:主库事务提交之后只要确认主库redo log传输到备库,不需要确认写入到磁盘上,即可提交。
配置模式一最大可用
查看当前的保护模式,为默认的最大性能
1* select database_role,protection_mode,protection_level from v$database
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE
通过命令方式切换为最大可用模式,注意log_archive_dest_n要设置SYNC+AFFIRM或者SYNC+NOAFFIRM
--执行切换最大可用模式
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE AVAILABILITY;
--查看当前的保护模式,可以发现当前的protection_level为RESYNCHRONIZATION,不是MAXIMUM AVAILABILITY
--log_archive_dest_n 没有设置sync,需要log_archive_dest_n设置为sync
SQL> select database_role,protection_mode,protection_level from v$database;
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM AVAILABILITY RESYNCHRONIZATION
--调整log_archive_dest_n
alter system set log_archive_dest_2='service=orcldg LGWR SYNC delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 net_timeout=10 valid_for=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=orcldg' scope=both sid='*';
--再次查看当前的保护模式,为MAXIMUM AVAILABILITY
1 select database_role,protection_mode,protection_level from v$database
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY
--同步链路的状态为SYNCHRONIZED
LGWR (PID:9029): LAD:2 is SYNCHRONIZED
LGWR (PID:9029): SRL selected to archive T-1.S-28
LGWR (PID:9029): SRL selected for T-1.S-28 for LAD:2
对事务同步进行测试(模拟主备网络故障,传输异常)
--删除数据,提交事务正常
SQL> delete from test where rownum<2;
1 row deleted.
Elapsed: 00:00:00.00
SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL>
--现在模拟网络故障,把备库的网卡down
ifdown ens38
--删除数据,提交事务,事务hang住,直到超时,才完成提交
SQL> delete from test where rownum<2;
1 row deleted.
Elapsed: 00:00:00.00
SQL> commit;
Commit complete.
Elapsed: 00:00:08.14
--后台lgwr进程,网络超时10秒(net_timeout参数控制)
*** 2022-07-10T22:10:43.446647+08:00
*** 2022-07-10 22:10:43.446615 [krsw.c:3617]
krsw_ksr_receive: Current time <07/10/2022 22:10:33> exceeds begin time <07/10/2022 22:10:33> by <10> seconds, return timed out error
*** 2022-07-10 22:10:43.447048 [krsh.c:6348]
ORA-16198: Received timed out error from KSR
*** 2022-07-10 22:10:43.447162 [krsh.c:6348]
Attempting LAD:2 network reconnect (16198)
*** 2022-07-10 22:10:43.447385 [krsh.c:6348]
LAD:2 network reconnect abandoned
krsw_ksr_status: Received error 16198 on receiving channel messages
<error barrier> at 0x7ffc2e3db788 placed krsl.c@7092
ORA-16198: Timeout incurred on internal channel during remote archival
--查看当前的保护模式,可以发现当前的protection_level降级为RESYNCHRONIZATION
SQL> select database_role,protection_mode,protection_level from v$database;
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM AVAILABILITY RESYNCHRONIZATION
对事务同步进行测试(模拟备库应用异常)
--直接停掉备库mrp进程
SQL> alter database recover managed standby database cancel;
Database altered.
SQL>
select open_mode from v$database;
OPEN_MODE
--------------------
READ ONLY
SQL>
--主库提交事务,没有影响,因为事务redo log还是可以正常传输到备库的standby log
SQL> select database_role,protection_mode,protection_level from v$database;
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY
SQL>
SQL>
SQL> delete from test where rownum<2;
1 row deleted.
SQL>
commit;
Commit complete.
SQL>
保护模式重新恢复到MAXIMUM AVAILABILITY
--在主库重新enable链路,等待主备重新实时同步之后,模式就自动切换为MAXIMUM AVAILABILITY
SQL> alter system set log_archive_dest_state_2='enable';
System altered.
Elapsed: 00:00:00.01
1 select database_role,protection_mode,protection_level from v$database
2*
DATABASE_ROLE PROTECTION_MODE
-------------------------------- ----------------------------------------
PROTECTION_LEVEL
----------------------------------------
PRIMARY MAXIMUM AVAILABILITY
MAXIMUM AVAILABILITY
--重新升级为sync
*** 2022-07-10T22:14:14.479490+08:00
*** 2022-07-10 22:14:14.479485 [krsw.c:4550]
krsw_nss_init: NSS2 (PID:28725) for mode SYNC has been re-initialized
krsw_nss_init: Performing a channel reset to ignore previous responses
krsw_nss_init: Successfully reused NSS2 (PID:28725) for service orcldg mode SYNC ocis = 0x7ffb4f2bd140
*** 2022-07-10 22:14:14.509428 [krsu.c:7534]
krsu_nss_upi_ahm: upiahm connect done status is 0
配置模式二最大性能
通过命令方式切换为最大性能
SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE;
Database altered.
SQL> select database_role,protection_mode,protection_level from v$database
2 ;
DATABASE_ROLE PROTECTION_MODE
-------------------------------- ----------------------------------------
PROTECTION_LEVEL
----------------------------------------
PRIMARY MAXIMUM PERFORMANCE
MAXIMUM PERFORMANCE
配置最大模式,log_archive_dest_N注意设置为ASYNC+NOAFFIRM,如果还是设置为SYNC模式,即使是最大性能模式,还是会进行sync复制,如果出现备库网络问题,依然会导致主库提交出现等待,直到网络超时
--测试在最大性能下,由于设置了SYNC,备库网络问题,依然会导致主库提交出现等待,直到网络超时
select database_role,protection_mode,protection_level from v$database
2*
DATABASE_ROLE PROTECTION_MODE
-------------------------------- ----------------------------------------
PROTECTION_LEVEL
----------------------------------------
PRIMARY MAXIMUM PERFORMANCE
MAXIMUM PERFORMANCE
SQL> show parameter log_archive_dest_2
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
log_archive_dest_2 string service=orcldg LGWR SYNC NOAFF
IRM delay=0 optional compress
ion=disable max_failure=0 max_
connections=1 reopen=300 net_
timeout=10 valid_for=(ONLINE_L
OGFILES,PRIMARY_ROLE) db_uniq
ue_name=orcldg
--现在模拟网络故障,把备库的网卡down
ifdown ens38
--删除数据,提交事务,事务hang住,直到超时,才完成提交
SQL> delete from test where rownum<2;
1 row deleted.
Elapsed: 00:00:00.00
SQL> commit;
Commit complete.
Elapsed: 00:00:07.43
设置为ASYNC+NOAFFIRM,即使备库网络异常,也不会影响主库
--log_archive_dest_n设置为ASYNC+NOAFFIRM
alter system set log_archive_dest_2='service=orcldg LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 net_timeout=10 valid_for=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=orcldg' scope=both sid='*';
--现在模拟网络故障,把备库的网卡down
ifdown ens38
--删除数据,提交事务正常,不需要备库返回
SQL> delete from test where rownum<2;
1 row deleted.
Elapsed: 00:00:00.00
SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL>
配置模式三最大保护
通过命令方式切换为最大可用模式,注意log_archive_dest_n要设置SYNC+AFFIRM
--设置最大保护,数据库需要启动到mount,不能open
ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PROTECTION;
--切换为最大保护模式
1* select database_role,protection_mode,protection_level from v$database
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM PROTECTION MAXIMUM PROTECTION
--在最大模式下,不能在线更改不满足最大保护的参数,例如:修改为ASYNC
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-16159: Cannot change protected standby destination attributes
--在最大模式下,至少要存在一个备库,或者不能正常open
ORA-16072: a minimum of one standby database destination is required
对事务同步进行测试
--现在模拟网络故障,把备库的网卡down
ifdown ens38
--删除数据,提交事务,事务hang住
SQL> delete from test where rownum<2;
1 row deleted.
SQL> delete from test where rownum<2;
1 row deleted.
SQL> commit;
---------->>>hang 住
--并且会一直尝试和备库重连,期间整个数据库处于hang住状态,直到于备库恢复同步
GWR (PID:33181): ORA-16198: Received timed out error from KSR
LGWR (PID:33181): Error 16198 attaching to RFS for reconnect
2022-07-10T22:55:35.002269+08:00
***********************************************************************
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.242)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=ol7db1)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 19.0.0.0.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
Version 19.10.0.0.0
Time: 10-JUL-2022 22:55:35
Tracing not turned on.
Tns error struct:
ns main err code: 12543
TNS-12543: TNS:destination host unreachable
ns secondary err code: 12560
nt main err code: 513
TNS-00513: Destination host unreachable
nt secondary err code: 113
nt OS err code: 0
2022-07-10T22:55:38.010715+08:00
***********************************************************************
Fatal NI connect error 12543, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.2.242)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=ol7db1)(USER=oracle))))
VERSION INFORMATION:
TNS for Linux: Version 19.0.0.0.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
Version 19.10.0.0.0
Time: 10-JUL-2022 22:55:38
Tracing not turned on.
Tns error struct:
ns main err code: 12543
TNS-12543: TNS:destination host unreachable
ns secondary err code: 12560
nt main err code: 513
TNS-00513: Destination host unreachable
nt secondary err code: 113
nt OS err code: 0
2022-07-10T22:55:38.011765+08:00
NSS2 (PID:33919): Error 12543 received logging on to the standby
2022-07-10T22:55:39.159327+08:00
Starting background process NSS2
2022-07-10T22:55:39.169416+08:00
NSS2 started with pid=44, OS id=33941
--恢复备库网络之后
ifup ens33
--数据库重新恢复正常
*** 2022-07-10T23:00:10.711866+08:00
*** 2022-07-10 23:00:10.711851 [krsw.c:4550]
krsw_nss_init: NSS2 (PID:34615) for mode SYNC has been initialized
krsw_nss_init: Performing a channel reset to ignore previous responses
krsw_nss_init: Successfully started NSS2 (PID:34615) for service orcldg mode SYNC ocis = 0x7f2a5631b140
*** 2022-07-10 23:00:10.760257 [krsu.c:7534]
krsu_nss_upi_ahm: upiahm connect done status is 0
对事务同步进行测试(模拟备库应用异常)
--直接停掉备库mrp进程
SQL> alter database recover managed standby database cancel;
Database altered.
SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ ONLY
SQL>
--主库提交事务,没有影响,因为事务redo log还是可以正常传输到备库的standby log
1* select database_role,protection_mode,protection_level from v$database
DATABASE_ROLE PROTECTION_MODE PROTECTION_LEVEL
-------------------------------- ---------------------------------------- ----------------------------------------
PRIMARY MAXIMUM PROTECTION MAXIMUM PROTECTION
SQL> delete from test where rownum<2;
1 row deleted.
SQL> commit;
Commit complete.
SQL>
总结:
从测试来看,最大性能模式下数据库的性能最好,不受备库的同步影响,最大保护模式下,如果主备网络同步出现问题,主库将出现hang住的情况,对业务影响较大,而最大可用模式下,数据库的性能以及数据保护都得到兼备,在备库同步出现问题的情况下,可以进行降级,避免对生产业务造成影响。