oracle 10g 物理Dataguard 日常操作管理(一)

物理Dataguard日常操作管理(一)


3.1 Dataguard 常用参数

日志的传输以及应用可以算作是Dataguard的核心所在.在我们搭建DG的过程中,如何配置优化日志传输服务,关系到整个DG体系的性能以及可用性.而且,不同的保护模式也需要不用的参数组合.10g下,影响配置日志传输的参数主要有以下几个:
1. ARCH/LGWR
设置日志的传送模式,默认使用arch传送.传送发生在日志切换边沿,最大可用和最大保护模式下,需要使用lgwr来传送日志.使用lgwr传送日志,需要备库建立standby logfile,并且支持日志的实时应用.
2. SYNC /ASYNC
该参数表示网络I/O的操作方式, SYNC表示网络I/O将与重做日志的写入同步进行,等待网络i/o完成收到响应后继续下一个写操作.而ASYNC表示日志的传送是异步的,oracle利于LNS进程,接收lgwr发送过来的重做日志信息放入缓冲区,并异步传送到备机,也可以手动指定缓冲区的大小
最大保护和最大可用模式下,需要设置为SYNC AFFIM模式.
3. AFFIMAFFIRM
该参数是LGWR传送模式下的一个属性,表示重做日志的磁盘I/O模式, AFFIM表示同步并且发送成功写操作状态到主数据库, NOAFFIRM表示主库无需等待备库的日志写成功.
4. MANDATORY /OPTIONAL
该参数表示归档的模式,默认值为OPTIONAL. MANDATORY表示强制归档,如果归档不成功会引起主库的归档等待.
5. REOPENREOPEN
该参数表示归档文件收到错误信息后,是否重试以及重试的最小间隔时间.
6. MAX_FAILURE/ NOMAX_FAILUR
该参数表示由于故障而被关闭的目标文件的最大重试次数.超过设定次数,将不再重试.
NOMAX_FAILUR表示不断重试
7. NET_TIMEOUT
该参数表示在网络出现异常或者某些情况下,主数据库的LNS进程放弃网络连接之前的最大等待时间.
8. DELAYDELAY
该参数表示日志的应用模式,delay表示延时指定时间应用传送过来的日志,nodelay表示不延时.
该参数作用也可以通过发布alter database recover managed standby database delay 来实现.
9. VALID_FOR
VALID_FOR的引入,为了更好的实现主备平滑切换,用于说明目标文件何时可用以及归档的重做日志类型.
VALID_FOR属性由2部分组成:archive_source(online_logfile,standby_logfile,all_logfiles)和database_role(primary_role,standby_role,all_role).
online_logfile: 表示归档联机重做日志
standby_logfile:表示归档备用数据库的重做日志/接受的重做日志
all_logfiles: online_logfile && standby_logfile
primary_role: 仅当数据库角色为主库时候归档生效
standby_role: 仅当数据库角色为备库时候归档生效
all_role: 任意角色归档均生效

3.2 选择数据保护模式 
表3.2  不同保护模式下LOG_ARCHIVE_DEST_n参数属性设置
                         最大保护       最高可用     最高性能
REDO写进程                   LGWR          LGWR        LGWR或ARCH
网络传输模式                 SYNC          SYNC        LGWR进程时SYNC或ASYNC,ARCH进程时SYNC
磁盘写操作                   AFFIRM        AFFIRM      AFFIRM或NOAFFIRM
是否需要 Standby Redolog     YES           YES         可没有,但推荐有
                                                                                                                                                           下面我们进入实践部分将一个Data Guard配置从最高性能模式改为最高可用性模式,以下操作如非特别注明,均是在Primary数据库端执行。

3.2.1 更改DG最大可用模式   
                                                                                                                    
主库上执行以下命令查看主库保护模式                                                                                                             
                                                                                                                                               
SQL> select protection_mode,protection_level from v$database;                                                                                  
                                                                                                                                               
PROTECTION_MODE                          PROTECTION_LEVEL                                                                                                     
-----------------------------            -------------------------------                                                                       
MAXIMUM PERFORMANCE                      MAXIMUM PERFORM
A                                                       
                                                                                                                                               
当前主库的模式为最大性能模式,也是建立DG默认模式,停止备库的日志应用,之前建立DG的时候主备库都增加了standby_redolog,在此步骤直接做保护模式转换
                                                                                                                                               
主库上执行以下语句
SQL>alter system set log_archive_dest_2='SERVICE=syw01                                                                                         
LGWR SYNC                                                                                                                                      
AFFIRM                                                                                                                                         
VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=SYW01'                                                                                 
                                                                                                                                               
重启主库至mount状态并在主库上执行下列语句
                                                                                                                                               
SQL> alter database set standby database to maximize availability;                                                                             
性能模式,
下面我们就测试一下。首先断开物理Standby数据库网络,这时primary与standby无法正常通信和传送日志
                                                                               
                                                                                                                                               
SQL> alter database open                                                                                                                       

考虑主备切换同时修改standby,在备库上执行
                                                                                                     
SQL>alter system set  log_archive_dest_2='SERVICE=syw                                                
LGWR  SYNC  AFFIRM                                                                                   
VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)                                                             
DB_UNIQUE_NAME=SYW'                                                                                  
                                                                                                     
注:MAXIMIZE后可跟{PROTECTION | AVAILABILITY | PERFORMANCE},分别对应最大保护、最高可用性和最高性能。

再次查看主库,备库保护模式 
SQL> select protection_mode,protection_level from v$database;
                                                             
PROTECTION_MODE                    PROTECTION_LEVEL                      
--------------------               --------------------      
MAXIMUM AVAILABILITY               MAXIMUM AVAILABILITY               
                                                          
3.2.2 测试最高可用状态
当数据库处于最高可用性模式时,如果Standby无法访问,Primary应该会自动切换成最
高                                                                                                      
重新查询主库的保护级别:
SQL> select protection_mode,protection_level from v$database;                                                                                                                         
PROTECTION_MODE                      PROTECTION_LEVEL                                                                                                                                               
--------------------               --------------------                                                                                                                              
MAXIMUM AVAILABILITY                 RESYNCHRONIZATION                                                                                                                               
保护级别已经自动变成待同步状态。

警告日志有如下信息
                                                                                                
Thread 1 advanced to log sequence 143                                                           
Current log# 2 seq# 143 mem# 0: /u01/app/oracle/oradata/syw/redo02.log                        
Sun Aug 28 11:17:06 2011                                                                        
ARC3: LGWR is actively archiving destination LOG_ARCHIVE_DEST_2                                 
ARC3: Standby redo logfile selected for thread 1 sequence 142 for destination LOG_ARCHIVE_DEST_2
Sun Aug 28 11:20:57 2011                                                                        
                       
                                                             
系统已更改。    
ORA-16198: LGWR received timedout error from KSR                                                
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)                       
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned                                
Sun Aug 28 11:20:57 2011                                                                        
Errors in file /u01/app/oracle/admin/syw/bdump/syw_lgwr_30471.trc:                              
ORA-16198: 远程归档期间内部通道上超时
LGWR: Network asynch I/O wait error 16198 log 2 service 'syw01'                                 
  Current log# 2 seq# 143 mem# 0: /u01/app/oracle/oradata/syw/redo02.log                        
LGWR: Failed to archive log 2 thread 1 sequence 143 (16198)                                     
Sun Aug 28 11:21:00 2011                                                                        
LGWR: Closing remote archive destination LOG_ARCHIVE_DEST_2: 'syw01' (error 16198      


在主库执行更新插入操作                                                
                                                             
SQL> insert into t values(3);                                
已创建 1 行。                                                
                                                             
SQL> commit;                                                 
提交完成。                                                   
                                                             
执行日志切换                                                 
SQL> alter system switch logfile;    
                                              
                                                             
再次查看主库保护级别                                         
SQL> select protection_mode,protection_level from v$database;
                                                              
PROTECTION_MODE                     PROTECTION_LEVEL                        
--------------------               --------------------      
MAXIMUM AVAILABILITY                RESYNCHRONIZATION 

此时primary依旧是日志待同步状态,现在将网线连接上,使主库与备库正常通信                                 
备库警告日志出现如下信息                                                             
                                                                                     
Sun Aug 28 11:29:10 2011                                                             
Media Recovery Log /sywdg/arch1/1_143_758642906.dbf                                  
Media Recovery Log /sywdg/arch1/1_144_758642906.dbf                                  
Media Recovery Log /sywdg/arch1/1_145_758642906.dbf                                  
Media Recovery Log /sywdg/arch1/1_146_758642906.dbf                                  
Media Recovery Waiting for thread 1 sequence 147 (in transit)                        
Sun Aug 28 11:29:35 2011                                                             
Primary database is in MAXIMUM AVAILABILITY mode                                     
Changing standby controlfile to MAXIMUM AVAILABILITY level                           
RFS[24]: Successfully opened standby log 4: /u01/app/oracle/oradata/stdby_redo04.log'

此时主库已更改成最大可用模式,与此同时在备库上执行              
ABILITY                MAXIMUM AVAILABILIT                      
                                                                
SQL> alter database recover managed standby database cancel;    
                                                                
数据库已更改。                                                  
                                                                
SQL> alter database open;                                       
                                                                
数据库已更改。                                                  
                                                                
SQL> select * from t;                                           
                                                                
         A                                                      
----------                                                      
         1                                                      
         3                                                      
                                                                                                                                         
                                                                
在主库上insert的数据已经同步到备库上,再次查看主库与备库保护级别
                                                                
主库                                                            
SQL> select protection_mode,protection_level from v$database;   
                                                                
PROTECTION_MODE                    PROTECTION_LEVEL                           
--------------------              --------------------          
MAXIMUM AVAI
L
                                                                
备库                                                            
                                                                
SQL> select protection_mode,protection_level from v$database;   
                                                                
PROTECTION_MODE                    PROTECTION_LEVEL                           
--------------------              --------------------          
MAXIMUM AVAILABILITY                MAXIMUM AVAILABILIT                      
                                                         
3.2.4将数据库由只读模式切换到执行重做日志应用的模式:                                                                                                              
                                                                                                   
                                                                                                   
1. 终止在备用数据库上的所有活动的用户会话。                                                        
SQL> STARTUP FORCE;                                                                                
                                                                                                   
2. 重启redo应用,开始重做应用                                                                       
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;                      
                                                                                                   
3.要启用实时应用,包括使用当前LOGFILE子句:                                                        
                                                                  
Thread 1 opened at log sequence 
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
                                                                                                   
启动到这些Apply Mode,不需要关闭数据库实例。                                                       
                                                                                                   

3.2.5 data guard中主备库的启动顺序

DG中主备库的启动顺序问题,针对data guard采用不同模式,主备库的启动顺序如下:

max performance(最大性能):主库,备库的启动和关闭顺序没有先后

max availability(最大可用):要先启动备库,再启动主库,如果启动顺序相反,主库仍然能启动,但会在主库的alert.log文件中出现如下出错提示:

Tue Jan 23 09:36:26 2011                                                                                                      alter database open
Tue Jan 23 09:36:26 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
LGWR: Primary database is in MAXIMUM AVAILABILITY mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR                                                                  LNS0 started with pid=12
Tue Jan 23 09:36:29 2011
LGWR: Error 1034 verifying archivelog destination LOG_ARCHIVE_DEST_2 
LGWR: Continuing...                                                                                                           Tue Jan 23 09:36:29 2011
Errors in file /opt/oracle/admin/devdb/bdump/test_lgwr_30979.trc: 
ORA-01034: ORACLE not available
LGWR: Error 1034 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'test_stb_186'                                                            
Thread 1 advanced to log sequence 73                                                   
 73                                                                                                                        
Completed: alter database open                                                                                                                                                                      
Max protection(最大保护):先启动备库,再启动主库,如果顺序相反,主库实例会自动中断,数据库无法启动,
并会在alert.log文件中留下如下的信息:

Tue Jan 23 09:34:00 2011                                                                                                                                                                           
alter database open   Tue Jan 23 09:34:00 2011                                                                                                                                                                           
LGWR: Primary database is in CLUSTER CONSISTENT mode
LGWR: Primary database is in MAXIMUM PROTECTION mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR LNS0 started with pid=12
Tue Jan 23 09:34:03 2011                                                                                                                                 
LGWR: Error 1034 verifying archivelog destination LOG_ARCHIVE_DEST_2                                                                                     
LGWR: Continuing...                    
Tue Jan 23 09:34:03 2011                                                                                                                                
Errors in file /opt/oracle/admin/devdb/bdump/test_lgwr_30812.trc:                                                                                         
骤:首先是Primary端操作,修改Primary数据库的log_archive_dest_state_n参数,暂时取消向Standby数据库发送日志,  
例如:SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=DEFER; 这样Standby端不可访问时,Pr
ORA-01034: ORACLE not available                                                                                                                           
LGWR: Error 1034 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'test_stb_186'                                                            
LGWR: Minimum of 1 applicable standby database required                                                                                                   
Tue Jan 23 09:34:07 2011                                                                                                                                  
Errors in file /opt/oracle/admin/devdb/bdump/test_lgwr_30812.trc:                                                                                         
ORA-16072: a minimum of one standby database destination is required                                                                                      
LGWR: terminating instance due to error 16072                                                                                                             
Instance terminated by LGWR, pid = 30812                                                                                                                                                                                                                                                                             3.2.6正确停止Standby数据库

某些情况下如果操作不当,关闭Standby数据库甚至会连带导致Primary数据库也关闭

正常情况下,停止Standby数据库(含物理Standby和逻辑Standby)之前,应该首先停止Primary数据库,如果直接停止Standby数据库,轻则Primary数据库的Alert文件中记录一堆归档发送失败的错误信息,重则Primary直接shutdown。不过,对于一些测试环境,偶尔也希望能在Primary数据库正常运行的情况下,停止Standby以进行一些其他操作,在这种情况下通常建议使用下列
步imary数据库的Alert日志文件中也不会再报错了。  
然后Standby端就可以停止REDO应用:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CALCEL;                                                                                                                                         
 最后才是关闭Standby数据库
SQL> SHUTDOWN IMMEDIATE;