oracle dataguard 自动故障转移及VIP漂移

一、环境


          主机名                      ip                        实例名  唯一名
主库  oracle_dg1.test.com  218.168.129.64  orcl      orclpri
备库  oracle_dg2.test.com  218.168.129.65  orcl      orclstd
VIP      scan_01.test.com    218.168.129.66  orcl

[root@oracle_dg1 ~]# uname -r
2.6.32-754.el6.x86_64

二、搭建dataguard

见:CSDNicon-default.png?t=N7T8https://mp.csdn.net/mp_blog/creation/editor/132938794

三、配置dg_broker

3.1 配置broker的前提条件


3.1.1 主从数据库都需要打开flashback database


select flashback_onfrom v$database;  --在主从都看查看是否开启
shutdown immediate
startup mount
alter database flashback on;  --开启闪回
alter database open;

3.1.2 主从数据库必须置为最大可用状态

--如果不是也不要着急,一会用dgmgrl来切成最大可用。
SQL> select name,db_unique_name,database_role,open_mode,protection_mode,protection_level,force_logging,flashback_on,switchover_status from v$database;
NAME      DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            PROTECTION_MODE      PROTECTION_LEVEL     FOR FLASHBACK_ON       SWITCHOVER_STATUS
--------- ------------------------------ ---------------- -------------------- -------------------- -------------------- --- ------------------ --------------------
ORCL      orclpri                        PRIMARY          READ WRITE           MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY YES YES                TO STANDBY

3.1.3 主从数据库都要使用spfile:


show parameter spfile;

3.1.4 配置 dg_broker_config_file


show parameter dg_broker_config_file
dg_broker_config_filen参数用于指定DataGuard配置文件的路径
特别注意一点:RAC中需要把Broker配置文件放入ASM中。

3.1.5 主备数据库都要启用broker


dg_broker_start 参数设置实例启动的时候是否自动启动Broker。
show parameter dg_broker_start;
alter system set dg_broker_start=true;


3.2 开始配置broker


3.2.1 创建配置(主库上执行)


--删除配置用DGMGRL> remove configuration;
在主库上使用dgmgrl连接到数据库
[oracle@oracle_dg1 admin]$ dgmgrl
DGMGRL> connect sys/oracle@orclpri
DGMGRL> create configuration 'orclpricfg' as primary database is 'orclpri' connect identifier is 'orclpri';
Configuration "orclpribroker" created with primary database "orclpri"

参数说明:
orclpricfg 是配置的名称,这里可以随便填.
PRIMARY DATABASE IS 'orclpri' : orclpri 是指 database 的 db_unique_name
connectidentifier is 'orclpri' : orclpri是指 tnsname.ora 连接到主库的 net service name

我们可以使用show confiruration查看配置信息.
DGMGRL> show configuration
Configuration - orclpricfg
  Protection Mode: MaxPerformance
  Databases:
    orclpri - Primary database
Fast-Start Failover: DISABLED
Configuration Status:
DISABLED

3.2.2 添加standby database的配置(主库上执行)


DGMGRL> add database 'orclstd' as connect identifier is orclstd maintained as physical;
Database "orclstd" added
参数说明:
add database 'orclstd' : 这里的orclstd是指 database 的 db_unique_name
AS CONNECT IDENTIFIER IS orclstd: 这里的 orclstd 是指 tnsname.ora 连接到standby database的net service name.

DGMGRL> show configuration
Configuration - orclpricfg
  Protection Mode: MaxPerformance
  Databases:
    orclpri - Primary database
    orclstd - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
DISABLED

3.2.3 设置日志同步传输模式(主库上执行)


DGMGRL> edit database 'orclpri' set property 'logxptmode'='sync';
Property”logxptmode” updated

DGMGRL> edit database 'orclstd' set property 'logxptmode'='sync';
Property”logxptmode” updated

3.2.4 设置 DATAGUARD 保护模式(如果前最面已经设置则此步省略)


DGMGRL> edit configuration set protection mode as MAXAvailability;

3.2.5 设置故障转移目标 FastStartFailoverTarget


DGMGRL> edit database orclpri set property FastStartFailoverTarget='orclstd';
Property "faststartfailovertarget" updated
DGMGRL> show database orclpri FastStartFailoverTarget;
  FastStartFailoverTarget = 'orclstd'

DGMGRL> edit database orclstd set property FastStartFailoverTarget='orclpri';
Property "faststartfailovertarget" updated
DGMGRL> show database orclstd FastStartFailoverTarget;
  FastStartFailoverTarget = 'orclpri'

3.2.6 修改快速故障转移的延迟时间(默认为30秒,有些少):


DGMGRL> edit configuration set property FastStartFailoverThreshold=60;
DGMGRL> show configuration FastStartFailoverThreshold;
  FastStartFailoverThreshold = '60'
FastStartFailoverThreshold (表示观察者与备库同主库断网后几秒,自动启动failover,即主库终止从库延迟几秒后从库再启动
注意:在生产中要把 FastStartFailoverThreshold 这个参数调大点,要不在生产中发生网络不畅通的情况下,等30s就自动切换了,所以还是改大点吧,我改的是1分钟

3.2.7 配置监听


DGMGRL> show database verbose orclpri StaticConnectIdentifier;
  StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oracle_dg1.test.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=orclpri_DGMGRL)(INSTANCE_NAME=orcl)(SERVER=DEDICATED)))'
DGMGRL> show database verbose orclstd StaticConnectIdentifier;
  StaticConnectIdentifier = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oracle_dg2.test.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=orclstd_DGMGRL)(INSTANCE_NAME=orcl)(SERVER=DEDICATED)))'

3.2.7.1 主库
[oracle@oracle_dg1 ~]$ vi /u01/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# listener.ora Network Configuration File: /u01/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = oracle_dg1.test.com)(PORT = 1521))
    #  (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    )
  )

SID_LIST_LISTENER =
    (SID_LIST =
        (SID_DESC =
            (GLOBAL_DBNAME = ORCL)
            (ORACLE_HOME = /u01/oracle/product/11.2.0/dbhome_1)
            (SID_NAME = ORCL)
        )
        (SID_DESC =
            (GLOBAL_DBNAME = orclpri_DGMGRL)
            (ORACLE_HOME = /u01/oracle/product/11.2.0/dbhome_1)
            (SID_NAME = orcl)
        )
    )

ADR_BASE_LISTENER = /u01/oracle

3.2.7.2 备库
[oracle@oracle_dg2 trace]$ vi /u01/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# listener.ora Network Configuration File: /u01/oracle/product/11.2.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = oracle_dg2.test.com)(PORT = 1521))
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    )
  )

SID_LIST_LISTENER =
    (SID_LIST =
        (SID_DESC =
            (GLOBAL_DBNAME = orcl)
            (ORACLE_HOME = /u01/oracle/product/11.2.0/dbhome_1)
            (SID_NAME = orcl)
        )
       (SID_DESC =
            (GLOBAL_DBNAME = orclstd_DGMGRL)
            (ORACLE_HOME = /u01/oracle/product/11.2.0/dbhome_1)
            (SID_NAME = orcl)
        )
    )

ADR_BASE_LISTENER = /u01/oracle


3.2.8 启动 observer


官方建议observer应该与primary和standby数据库隔离,不能在一台机器上,我这里资源有限,将observer部署在备机上面

如果 observer 没有启动,会有以下报错:
DGMGRL> show configuration
Configuration - orclpricfg
  Protection Mode: MaxAvailability
  Databases:
    orclpri - Primary database
      Error: ORA-16820: fast-start failover observer is no longer observing this database

    orclstd - (*) Physical standby database
      Error: ORA-16820: fast-start failover observer is no longer observing this database

Fast-Start Failover: ENABLED
Configuration Status:
ERROR

3.2.8.1 命令行启用 observer,开启后,界面无法返回
[oracle@oracle_dg2 ~]$ dgmgrl
DGMGRL> connect sys/oracle
DGMGRL> start observer;
Observer started

3.2.8.2 后台启动的方式
nohup dgmgrl -logfile /home/oracle/observer.log sys/oracle@orclstd "start observer" &
nohup dgmgrl -logfile /home/oracle/observer.log <<EOF
connect sys/passwd@orclstd
start observer
EOF

3.2.8.3 脚本启动
vi observer.sh
#!/bin/ksh
## Script to start observer via DGMGRL
dgmgrl -echo -logfile $HOME/observer.log << EOF
connect sys/passwd@orclstd
start observer
EOF

chmod +x observer.sh
./observer.sh &


3.2.9 启动快速故障转移


DGMGRL> enable fast_start failover;
Enabled.
DGMGRL> show configuration VERBOSE;  
Configuration - orclpricfg
  Protection Mode: MaxAvailability
  Databases:
    orclpri - Primary database
    orclstd - (*) Physical standby database
  (*) Fast-Start Failover target
  Properties:
    FastStartFailoverThreshold      = '60'
    OperationTimeout                = '30'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: ENABLED
  Threshold:          60 seconds
  Target:             orclstd
  Observer:           oracle_dg2.test.com
  Lag Limit:          30 seconds (not in use)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configuration Status:
SUCCESS

3.3 switchover切换演练

3.3.1 手工将从库(orclpri)切换为主的角色


DGMGRL> switchover to 'orclpri';
Performing switchover NOW, please wait...
Operation requires a connection to instance "orcl" on database "orclpri"
Connecting to instance "orcl"...
Connected.
New primary database "orclpri" is opening...
Operation requires startup of instance "orcl" on database "orclstd"
Starting instance "orcl"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "orclpri"

3.3.2 查看主备库状态


---- orclpri
SQL> select name,db_unique_name,database_role,open_mode,protection_mode,protection_level,force_logging,flashback_on,switchover_status from v$database;
NAME      DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            PROTECTION_MODE      PROTECTION_LEVEL     FOR FLASHBACK_ON       SWITCHOVER_STATUS
--------- ------------------------------ ---------------- -------------------- -------------------- -------------------- --- ------------------ --------------------
ORCL      orclpri                        PRIMARY          READ WRITE           MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY YES YES                SESSIONS ACTIVE

----orclstd
SQL> select name,db_unique_name,database_role,open_mode,protection_mode,protection_level,force_logging,flashback_on,switchover_status from v$database
NAME      DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            PROTECTION_MODE      PROTECTION_LEVEL     FOR FLASHBACK_ON       SWITCHOVER_STATUS
--------- ------------------------------ ---------------- -------------------- -------------------- -------------------- --- ------------------ --------------------
ORCL      orclstd                        PHYSICAL STANDBY READ ONLY WITH APPLY MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY YES YES                NOT ALLOWED

四、配置keepalived

本方案安装keepalived的作用是提供一个vip 218.168.129.66,当主库发生故障后,可以自动将vip漂移到备库,从而使应用不用再修改数据库连接。

4.1 安装

[root@oracle_dg1 yum.repos.d]# yum -y install gcc gcc-c++ libnl* popt* libnfnetlink-devel libnfnetlink lm_sensors-libs net-snmp-agent-libs net-snmp-libs  openssh-server openssh-clients  openssl  openssl-devel automake iproute ipvsadm keepalived 
Loaded plugins: fastestmirror, refresh-packagekit, security

4.2 主库Keepalived的配置

[root@oracle_dg1 ~]# cd /etc/keepalived/
[root@localhost keepalived]# cp keepalived.conf keepalived.conf.default
--------------------------------------------------------
--节点1(218.168.129.64)keepalived.conf文件内容:
--------------------------------------------------------
[root@oracle_dg1 ~]# vi /etc/keepalived/keepalived.conf

! Configuration File for keepalived

#全局配置
global_defs {
   notification_email {
root@oracle_dg1
   }

   notification_email_from root@oracle_dg1
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL   #身份识别(全局唯一)
}

vrrp_script chk_dg_stats {  
    script "/etc/keepalived/check_dataguard.sh"   ## 监控脚本
    interval 2       ## 监控时间
    weight -5        ## 即该节点出现故障后下面的priority的值减去5
    fall 2  
    rise 1  
}

# 配置VRRP协议
vrrp_instance VI_1 {
  state MASTER          # 状态,MASTER和BACKUP(这里仅仅是一个标记,真正确认VIP的是权重)
  interface eth0        # 绑定网卡
  mcast_src_ip 218.168.129.64
  virtual_router_id 131 # 虚拟路由标示,可以理解为分组
  priority 101  # 优先级(数字越大,权重越大)
  inopreempt    # 主库重新恢复后不抢占vip
  advert_int 1  # 监测心跳间隔时间      
  authentication {       # 配置认证
      auth_type PASS     # 认证类型
      auth_pass 888888   # 认证的密码
  }
  
  virtual_ipaddress {    # 设置VIP
      218.168.129.66/24 dev eth0 label eth0:1     # 虚拟的VIP地址
  }
    
  track_script {               
     chk_dg_stats   ###执行上面定义的chk_dg_stats           
  }
}

4.3 备库Keepalived的配置

[root@oracle_dg2 ~]# cd /etc/keepalived/
[root@oracle_dg2 keepalived]# cp keepalived.conf keepalived.conf.default
[root@oracle_dg2 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
  notification_email {
    root@oracle_dg2
  }
  notification_email_from root@oracle_dg2
  smtp_server 127.0.0.1
  smtp_connect_timeout 30
  router_id LVS_DEVEL      #集群的ID 主备两台机器的这名字得一样
}

vrrp_script chk_dg_stats {  
    script "/etc/keepalived/check_dataguard.sh" 
    interval 2
    weight -5
    fall 2  
    rise 1  
}
       
vrrp_instance VI_1 {
    state BACKUP
    interface eth0 
    mcast_src_ip 218.168.129.65
    virtual_router_id 131 
    priority 99 
    inopreempt
    advert_int 1         
    authentication {   
        auth_type PASS 
        auth_pass 888888   
    }
    virtual_ipaddress {    
        218.168.129.66/24 dev eth0 label eth0:1
    }
      
    track_script {               
       chk_dg_stats
    }
}

4.4 配置脚本

两个机器上都配置如下脚本,让keepalived每隔2s执行以下该脚本,用来监控oracle的角色,从而判断本节点的oracle运行是否正常。如果发现不是 PRIMAYR ,立即把vip切换到另一台机。
--------------------------------------------------------
--所有节点配置脚本check_dataguard.sh,并确认具有x执行权限:
--------------------------------------------------------
# cat /etc/keepalived/check_dataguard.sh
#!/bin/bash
dbrole=`echo -e 'set head off;\n select database_role from v$database;' | su - oracle -c "sqlplus -s / as sysdba"`

if [ "${dbrole}" = 'PRIMARY'  ]; then
  echo 0
  exit 0
else 
  echo 1
  exit 1
fi

五、failover 切换演练

5.1 模拟自动故障转移

--关闭主库
SQL> shutdown abort

5.2 查看监听器日志

tail -f observer.log
21:17:21.06  Saturday, September 16, 2023
Initiating Fast-Start Failover to database "orclstd"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "orclstd"
21:17:22.97  Saturday, September 16, 2023
以上提示,已自动启用备库为主库

此时查看IP情况,发现VIP已自动漂移到 orclstd

[root@oracle_dg2 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:50:56:b3:0d:68 brd ff:ff:ff:ff:ff:ff
    inet 218.168.129.65/24 brd 218.168.129.255 scope global eth0
    inet 218.168.129.66/24 scope global secondary eth0:1

5.3 查看 dgbroker 状态信息

dgmgrl
DGMGRL> conn sys/oracle@orclstd
DGMGRL> show configuration;
Configuration - orclpricfg
  Protection Mode: MaxAvailability
  Databases:
    orclstd - Primary database
      Warning: ORA-16817: unsynchronized fast-start failover configuration
    orclpri - (*) Physical standby database (disabled)
      ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: ENABLED

5.4 尝试修复故障库,把故障库启动到mount状态

[oracle@oracle_dg1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Sat Sep 16 21:36:40 2023
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 2455228416 bytes
Fixed Size                  2255712 bytes
Variable Size             620758176 bytes
Database Buffers         1811939328 bytes
Redo Buffers               20275200 bytes
Database mounted.

5.5 监听器自动开始修复故障库

[oracle@oracle_dg2 ~]$ tail -f observer.log 
Observer started
[W000 09/16 20:54:15.57] Observer started.

21:17:21.06  Saturday, September 16, 2023
Initiating Fast-Start Failover to database "orclstd"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "orclstd"
21:17:22.97  Saturday, September 16, 2023

21:37:32.40  Saturday, September 16, 2023
Initiating reinstatement for database "orclpri"...
Reinstating database "orclpri", please wait...
Operation requires shutdown of instance "orcl" on database "orclpri"
Shutting down instance "orcl"...
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.

Please complete the following steps and reissue the REINSTATE command:
        shut down instance "orcl" of database "orclpri"
        start up and mount instance "orcl" of database "orclpri"

21:37:40.08  Saturday, September 16, 2023

根据监听器提示重启故障库至mount
SQL> shutdown immediate;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.

SQL> startup mount
ORACLE instance started.
Total System Global Area 2455228416 bytes
Fixed Size                  2255712 bytes
Variable Size             620758176 bytes
Database Buffers         1811939328 bytes
Redo Buffers               20275200 bytes
Database mounted.


5.6 查看dgmgrl状态信息

DGMGRL>  show configuration;
Configuration - orclpricfg
  Protection Mode: MaxAvailability
  Databases:
    orclstd - Primary database
      Warning: ORA-16817: unsynchronized fast-start failover configuration
    orclpri - (*) Physical standby database (disabled)
      ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: ENABLED
Configuration Status:
WARNING

5.7 根据提示 reinstate 故障库

DGMGRL> reinstate database orclpri;
Reinstating database "orclpri", please wait...
Reinstatement of database "orclpri" succeeded
注意:
必须把数据库启动到mout状态才能执行reinstate的恢复命令,其他状态执行恢复会失败的。
另外,如果你执行reinstate命令失败,可用尝试把要恢复的数据库重启到mount状态,再执行reinstate命令

5.8 查看dgmgrl状态信息

DGMGRL> show configuration;
Configuration - orclpricfg
  Protection Mode: MaxAvailability
  Databases:
    orclstd - Primary database
    orclpri - (*) Physical standby database
Fast-Start Failover: ENABLED
Configuration Status:
SUCCESS
状态已恢复。

5.9 开启数据库,并打开日志应用

SQL> select name,db_unique_name,database_role,open_mode,protection_mode,protection_level,force_logging,flashback_on,switchover_status from v$database;
NAME      DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            PROTECTION_MODE      PROTECTION_LEVEL     FOR FLASHBACK_ON       SWITCHOVER_STATUS
--------- ------------------------------ ---------------- -------------------- -------------------- -------------------- --- ------------------ --------------------
ORCL      orclpri                        PHYSICAL STANDBY MOUNTED              MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY YES YES                NOT ALLOWED

SQL> alter database open;
Database altered.

SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.

SQL> select name,db_unique_name,database_role,open_mode,protection_mode,protection_level,force_logging,flashback_on,switchover_status from v$database;
NAME      DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE            PROTECTION_MODE      PROTECTION_LEVEL     FOR FLASHBACK_ON       SWITCHOVER_STATUS
--------- ------------------------------ ---------------- -------------------- -------------------- -------------------- --- ------------------ --------------------
ORCL      orclpri                        PHYSICAL STANDBY READ ONLY WITH APPLY MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY YES YES                NOT ALLOWED
 

  • 6
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值