对于RAC来说,当Oracle Instance失败的,应用有几种高可用技术来保证可用性,这些技术对客户端来说都是透明的,客户端可能感知不到instance failure。
一.Transparent Application Failover (TAF)
二.Fast Connection Failover (FCF).
当一个节点failure的时候,TAF允许数据库session使用OCI libraries进行fail over到其他存活的一个节点。但是如果应用使用JDBC thin driver则不能使用TAF。
Failover Modes:
Session failover
Select failover
None(default)
1.select mode允许query重新在新的节点执行,并且丢弃已经fetch的行,继续返回结果给客户端。
2.none mode显式的声明不使用TAF
3.TAF不能够恢复任何DML事务,事务将在另一个节点回滚。
Failover Methods:
Basic
Preconnect
1.Basic选项意味着客户端在实例失败后重新建立新的连接,这个method可能导致节点失败时,另一节点的性能下降,因为有很多的session将要重新连接到幸存的节点。
2.客户端产生一个preconnected session作为备份,在实例失败时加速failover。
TAF是client-side feature,我们需要配置它在客户端Tnsnames.ora或者server端通过service来实现
1,客户端实现
instance failure failover。
FCF支持的客户端驱动:
Java Database Connection Driver (JDBC)
Oracle Universal Connection Pool UCP
Clients using the Oracle Call Interface
Oracle Data Providers for .Net
在11R2(11.2.0.1)中当设置了db_domain参数,FCF将会失效,bug:8779597
11gR2中默认配置客户端通过如下:
那么客户端的tnsnames.ora来做如下配置:
1.Client Side Load Balance
当客户端通过SCAN连接11gR2 RAC,load balance发生在连接阶段(可以通过sqlnet的trace来确认),Scan name被解析成最多三个不同的IP,
客户端随机的选择一个IP去连接。
对于非11gR2 client,是不能使用SCAN的好处,因为它不能处理DNS解析的SCAN ip,所以,在address_list里,它将
尝试用第一个address去连接,不会使用其他的地址,那么如果这个ip地址不能被scan listener监听或者ip本事存在问题,连接就会失败。
使用静态的SCAN IP(未使用GNS)的tnsnames.ora的条目如下:
关于rac连接管理的详细文档,崔华写过一篇很详细,可以参阅:
一.Transparent Application Failover (TAF)
二.Fast Connection Failover (FCF).
当一个节点failure的时候,TAF允许数据库session使用OCI libraries进行fail over到其他存活的一个节点。但是如果应用使用JDBC thin driver则不能使用TAF。
Failover Modes:
Session failover
Select failover
None(default)
1.select mode允许query重新在新的节点执行,并且丢弃已经fetch的行,继续返回结果给客户端。
2.none mode显式的声明不使用TAF
3.TAF不能够恢复任何DML事务,事务将在另一个节点回滚。
Failover Methods:
Basic
Preconnect
1.Basic选项意味着客户端在实例失败后重新建立新的连接,这个method可能导致节点失败时,另一节点的性能下降,因为有很多的session将要重新连接到幸存的节点。
2.客户端产生一个preconnected session作为备份,在实例失败时加速failover。
TAF是client-side feature,我们需要配置它在客户端Tnsnames.ora或者server端通过service来实现
1,客户端实现
RAC中在tnsnames.ora里面典型配置如下:
LEO1 =
(DESCRIPTION_LIST =
(LOAD_BALANCE = off)
(FAILOVER = on)
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=OFF)
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = node1.chinamobile.com)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = leo)
(INSTANCE_NAME = leo1)
(FAILOVER_MODE=(TYPE=session)(METHOD=basic)(RETRIES=4)(DELAY=1))
)
)
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=OFF)
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = node2.chinamobile.com)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = leo)
(INSTANCE_NAME = leo2)
(FAILOVER_MODE=(TYPE=session)(METHOD=basic)(RETRIES=4)(DELAY=1))
)
)
)
2,server端实现
srvctl add service -d leo -s cmcc -q TRUE -P BASIC -e SESSION -z 4 -w 5 具体使用参数如下:
Usage: srvctl add service -d <db_unique_name> -s <service_name> {-r "<preferred_list>" [-a "<available_list>"] [-P {BASIC | NONE | PRECONNECT}]
| -g <pool_name> [-c {UNIFORM | SINGLETON}] } [-k <net_num>] [-l [PRIMARY][,PHYSICAL_STANDBY][,LOGICAL_STANDBY][,SNAPSHOT_STANDBY]]
[-y {AUTOMATIC | MANUAL}] [-q {TRUE|FALSE}] [-x {TRUE|FALSE}] [-j {SHORT|LONG}] [-B {NONE|SERVICE_TIME|THROUGHPUT}]
[-e {NONE|SESSION|SELECT}] [-m {NONE|BASIC}] [-z <failover_retries>] [-w <failover_delay>] [-t <edition>] [-f]
-d <db_unique_name> Unique name for the database
-s <service> Service name
-r "<preferred_list>" Comma separated list of preferred instances
-a "<available_list>" Comma separated list of available instances
-g <pool_name> Server pool name
-c {UNIFORM | SINGLETON} Service runs on every active server in the server pool hosting this service (UNIFORM) or just one server (SINGLETON)
-k <net_num> network number (default number is 1)
-P {NONE | BASIC | PRECONNECT} TAF policy specification
-l <role> Role of the service (primary, physical_standby, logical_standby, snapshot_standby)
-y <policy> Management policy for the service (AUTOMATIC or MANUAL)
-e <Failover type> Failover type (NONE, SESSION, or SELECT)
-m <Failover method> Failover method (NONE or BASIC)
-w <integer> Failover delay
-z <integer> Failover retries
-t <edition> Edition (or "" for empty edition value)
-j <clb_goal> Connection Load Balancing Goal (SHORT or LONG). Default is LONG.
-B <Runtime Load Balancing Goal> Runtime Load Balancing Goal (SERVICE_TIME, THROUGHPUT, or NONE)
-x <Distributed Transaction Processing> Distributed Transaction Processing (TRUE or FALSE)
-q <AQ HA notifications> AQ HA notifications (TRUE or FALSE)
Usage: srvctl add service -d <db_unique_name> -s <service_name> -u {-r "<new_pref_inst>" | -a "<new_avail_inst>"} [-f]
-d <db_unique_name> Unique name for the database
-s <service> Service name
-u Add a new instance to service configuration
-r <new_pref_inst> Name of new preferred instance
-a <new_avail_inst> Name of new available instance
-f Force the add operation even though a listener is not configured for a network
-h Print usage
Fast Connection Failover and Fast Application Notification
FAN是一种消息发布机制,在instance up/down发布消息到客户端,来进行群集的重新配置,FCF是通过FAN来透明的进行instance failure failover。
FCF支持的客户端驱动:
Java Database Connection Driver (JDBC)
Oracle Universal Connection Pool UCP
Clients using the Oracle Call Interface
Oracle Data Providers for .Net
在11R2(11.2.0.1)中当设置了db_domain参数,FCF将会失效,bug:8779597
下面看看11gR2的情况:
11gR2出现了scan,相应的listener配置也有所变化:
$GRID_HOME/network/admin目录下:
-rw-r--r-- 1 grid oinstall 695 May 1 16:43 listener.ora
-rw-r--r-- 1 grid oinstall 369 May 1 16:43 endpoints_listener.ora
[grid@node2 admin]$ cat listener.ora
LISTENER2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER2)))) # line added by Agent
# listener.ora.node2 Network Configuration File: /u01/app/11.2.0.3/grid/network/admin/listener.ora.node2
# Generated by Oracle configuration tools.
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER_SCAN1 = ON
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER1 = ON
LISTENER =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER))
)
)
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER = ON
LISTENER_SCAN1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = LISTENER_SCAN1))
)
)
ENABLE_GLOBAL_DYNAMIC_ENDPOINT_LISTENER2=ON # line added by Agent
[grid@node2 admin]$ cat endpoints_listener.ora
LISTENER2_NODE2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node2-vip)(PORT=1522))
(ADDRESS=(PROTOCOL=TCP)(HOST=10.180.69.103)(PORT=1522)(IP=FIRST)))) # line added by Agent
LISTENER_NODE2=(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=node2-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=10.180.69.103)(PORT=1521)(IP=FIRST)))) # line added by Agent
查看scan ip的配置和状态:
[grid@node2 admin]$ srvctl config scan
SCAN name: scanip, Network: 1/10.180.69.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /scanip/10.180.69.105
[grid@node2 admin]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node node2
[grid@node2 admin]$ srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node node2
验证实例注册:
grid@node2 admin]$ lsnrctl status listener_scan1
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 09-JUN-2012 15:04:39
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias LISTENER_SCAN1
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 07-JUN-2012 19:30:34
Uptime 1 days 19 hr. 34 min. 5 sec
Trace Level support
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File /u01/app/11.2.0.3/grid/log/diag/tnslsnr/node2/listener_scan1/alert/log.xml
Listener Trace File /u01/app/11.2.0.3/grid/log/diag/tnslsnr/node2/listener_scan1/trace/ora_6446_3050497728.trc
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.180.69.105)(PORT=1521)))
Services Summary...
Service "leo" has 2 instance(s).
Instance "leo1", status READY, has 2 handler(s) for this service...
Instance "leo2", status READY, has 2 handler(s) for this service...
Service "leoXDB" has 1 instance(s).
Instance "leo1", status READY, has 1 handler(s) for this service...
Instance "leo2", status READY, has 1 handler(s) for this service...
The command completed successfully
监听listener_scan1运行在node2,两个实例是被注册的,群集内的每个节点实例通过remote listener来注册自身。
SQL> show parameter list
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
local_listener string (DESCRIPTION=(ADDRESS_LIST=(AD
DRESS=(PROTOCOL=TCP)(HOST=10.1
80.69.104)(PORT=1521))(ADDRESS
=(PROTOCOL=TCP)(HOST=10.180.69
.104)(PORT=1522))))
remote_listener string scanip:1521
这里因为我添加另外一个本地监听,所以local_listener有两个。11gR2中默认配置客户端通过如下:
LEO =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = scanip)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = leo)
)
)
现在我们来看看11gR2的TAF,之前提到TAF可以在客户端的tnsnames.ora里配置,也可以在server端通过添加service来配置。
那么客户端的tnsnames.ora来做如下配置:
LEO_TAF =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = scanip)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = leo)
(FAILOVER_MODE=
(TYPE=session)
(METHOD=basic))))
可以通过如下查询来验证客户端TAF:
SELECT MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER, COUNT(*)
FROM V$SESSION
GROUP BY MACHINE, FAILOVER_TYPE, FAILOVER_METHOD, FAILED_OVER;
MACHINE FAILOVER_TYPE FAILOVER_M FAI COUNT(*)
-------------------- ------------- ---------- --- ----------
Thinkpad-leo SESSION BASIC NO 5
如果是Server端的,来看看配置的过程:
1.create service(Use Oracel Account)
./srvctl add service -d leo -s taf -q TRUE -r leo2 -P BASIC -e SESSION -z 4 -w 5
2.start service
srvctl start service -d leo -s taf
3.check service
srvctl config service -d leo
Service name: taf
Service is enabled
Server pool: leo_taf
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: true
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 4
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: leo2
Available instances: leo1
srvctl status service -d leo
Service taf is running on instance(s) leo2
4.查看信息
col failover_method format a11 heading 'METHOD'
col failover_type format a10 heading 'TYPE'
col failover_retries format 9999999 heading 'RETRIES'
col goal format a10
col clb_goal format a8
col AQ_HA_NOTIFICATIONS format a5 heading 'AQNOT'
select name, failover_method, failover_type, failover_retries,goal, clb_goal,aq_ha_notifications
from dba_services where service_id = 3;
NAME METHOD TYPE RETRIES GOAL CLB_GOAL AQNOT
--------------- ----------- ---------- -------- ---------- -------- -----
taf BASIC SESSION 4 LONG YES
5.查看监听
LSNRCTL> services
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
Services Summary...
Service "leo" has 1 instance(s).
Instance "leo2", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.180.69.104)(PORT=1522)))
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.180.69.104)(PORT=1521)))
Service "leoXDB" has 1 instance(s).
Instance "leo2", status READY, has 1 handler(s) for this service...
Handler(s):
"D000" established:0 refused:0 current:0 max:1022 state:ready
DISPATCHER <machine: node2, pid: 10911>
(ADDRESS=(PROTOCOL=tcp)(HOST=node2)(PORT=18689))
Service "taf" has 1 instance(s).
Instance "leo2", status READY, has 2 handler(s) for this service...
Handler(s):
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.180.69.104)(PORT=1522)))
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.180.69.104)(PORT=1521)))
The command completed successfully
Load Balance:1.Client Side Load Balance
当客户端通过SCAN连接11gR2 RAC,load balance发生在连接阶段(可以通过sqlnet的trace来确认),Scan name被解析成最多三个不同的IP,
客户端随机的选择一个IP去连接。
对于非11gR2 client,是不能使用SCAN的好处,因为它不能处理DNS解析的SCAN ip,所以,在address_list里,它将
尝试用第一个address去连接,不会使用其他的地址,那么如果这个ip地址不能被scan listener监听或者ip本事存在问题,连接就会失败。
使用静态的SCAN IP(未使用GNS)的tnsnames.ora的条目如下:
LEO_LB =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = <scan-VIP1>)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = <scan-VIP2>)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = <scan-VIP3>)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = leo)
(FAILOVER_MODE=
(TYPE=session)
(METHOD=basic))))
10g对方式:
LEO =
(DESCRIPTION =
(LOAD_BALANCE = yes)
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)
(PORT = 1522))
(ADDRESS = (PROTOCOL = TCP)(HOST = node2-vip)
(PORT = 1522))
(CONNECT_DATA =
(SERVICE_NAME = LEO)))
--JDBC--
url="jdbc:oracle:thin:@(DESCRIPTION=(LOAD_BALANCE=on)
(ADDRESS=(PROTOCOL=TCP)(HOST=node1-vip)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=node2-vip)(PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=LEO)))"
关于rac连接管理的详细文档,崔华写过一篇很详细,可以参阅:
http://www.oracle.com/technetwork/cn/articles/database-performance/oracle-rac-connection-mgmt-1650424-zhs.html
http://www.oracle.com/technetwork/database/clustering/overview/scan-129069.pdf