Oracle RAC Service资源
http://blog.163.com/donfang_jianping/blog/static/1364739512013112214836523/
http://blog.chinaunix.net/uid-23177306-id-2531222.html
http://www.oracle.com/technetwork/cn/articles/database-performance/oracle-rac-connection-mgmt-1650424-zhs.html
Service资源是RAC高可用特性中重要的组成部分,它最大的特点就是可以通过对Service的管理来控制用户对实例的连接。
在介绍Service之前,我们先来讨论一下数据库中的几个名称,如下:
Service Name(服务名):提供给客户端,用于连接到数据库的名称。数据库默认会创建与全局数据库名相同的名称。可以通过参数service_names来维护,如service_names='orcl1,orcl2'。
DB Name(数据库名):数据库名是整个数据库的标识。可以通过alter database rename <db_name>来进行对数据库名的修改。
Instance Name(实例名):数据库实例的标识符,默认和SID名称相同。通过参数instance_name来维护。
Global Name(全局数据库名):数据库的完整标识,默认为db_name.db_domain,和服务名相同。
DB Domainn(数据库域名称):默认为空,由参数db_domain来维护。
SID(System Identifier):系统唯一标识,默认和instance_name相同。
Service高可用的特性由于Clusterware提供,Service依赖于VIP和实例,随着VIP的漂移(所谓的漂移,其实就是当一个节点出现故障后,它可以切换到另一个节点上继续运行),Clusterware将Service资源向活动备用实例转移。例如:服务orarac1原本在节点djp01上运行,当djp01了出现了故障,VIP资源漂移到了节点djp02上继续运行,Clusterware同时将orarac1 service转移到备用的orarac2实例上,并且orarac2会动态注册到orarac1 servcie上,即orarac1可以再次正常接收连接,这个特性我们可以称为Failover特性,即节点故障后的切换。进行切换时,它只对会话进行切换,事务则不做切换。也就是说,当一个节点出现了故障后,它所执行的DML操作没有commit的事务,都将被回滚。
下面,我们来看一下Service资源的维护。
先看一下servcie是否运行:
[oracle@djp01 ~]$ srvctl status service -d orarac
[oracle@djp01 ~]$
对于实例orarac没有可用的servcie资源,下面我们来添加一下。
[oracle@djp01 ~]$ srvctl add service -d orarac -s orarac -r orarac1 -a orarac2 -P basic -y automatic -e select -z 5 -w 180
PRCD-1026 : Failed to create service djpora1 for database orarac
PRCR-1006 : Failed to add resource ora.orarac.djpora1.svc for djpora1
PRCR-1071 : Failed to register or update resource ora.orarac.djpora1.svc
CRS-2566: User 'oracle' does not have sufficient permissions to operate on resource 'ora.LISTENER.lsnr', which is part of the dependency specification.
[oracle@djp01 ~]$
这里遇到了一个错误,用户oracle对ora.LISTENER.lsnr没有权限操作,我尝试使用grid,root用户进行修改,报“PRKH-1014 : Current user root is not the same as oracle owner oracle of oracle home /u01/app/oracle/product/11.2.0/dbhome_1.”这样的错误,这里我没有修改,是默认由root.sh脚本执行后分配的权限。我们可以这样处理,如果有监听存在,我们可以先删除监听,然后创建服务,然后再创建监听,如下:
[root@djp01 bin]# ./srvctl stop listener
[root@djp01 bin]# ./srvctl remove listener
[root@djp01 ~]# su - oracle
[oracle@djp01 ~]$ srvctl add service -d orarac -s orarac -r orarac1,orarac2 -P basic -y automatic -e select -z 5 -w 180
[oracle@djp01 ~]$
添加监听与启动服务与监听:
[oracle@djp01 ~]$ su - root
Password:
[root@djp01 ~]# cd /u01/app/11.2.0/grid/bin/
[root@djp01 bin]# ./srvctl add listener
[root@djp01 bin]# ./srvctl start service -d orarac -s orarac
[root@djp01 bin]# ./srvctl start listener
[root@djp01 bin]#
服务检查:
[root@djp01 bin]# ./srvctl add listener
[root@djp01 bin]# ./srvctl start listener
[root@djp01 bin]# ./srvctl status service -d orarac -s orarac
Service orarac is running on instance(s) orarac1,orarac2
[root@djp01 bin]# ./srvctl config service -d orarac -s orarac
Service name: orarac
Service is enabled
Server pool: orarac_orarac
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SELECT
Failover method: NONE
TAF failover retries: 5
TAF failover delay: 180
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Preferred instances: orarac1,orarac2
Available instances:
[root@djp01 bin]#
对上述命令参数的说明:
-d<db_unique_name>:数据库名称。
-s<service_name>:数据库服务名。
-r<preferred_list>:首选节点列表。
-a<available_list>:备选节点列表。
-P{BASIC|NONE|PRECONNECTS}:策略,对应failover_method属性,
BASIC:表示在发生failover时才创建连接,
PRECONNECTS:创建到切换时的预连接,提供了快速failover功能。
-e{NONE|SELECT|SESSION}:发生故障时的处理方式,对应failover_type属性。
SELECT:如果用户所创建的连接丢失,那么新创建的会话将继续之前失败后的select 操作。
SESSION:如果用户所创建的连接丢失,那么新的会话将自动被创建,该功能不能恢复select 操作。
-y{AUTOMATIC|MANUAL}:是否自启动。
-z<failover_retries failover>:发生故障时重试次数。
-w<failover_delay faiover>:延迟多久再次重试,单位为秒。
我们用如下的客户端的配置来进行一下测试:
ORARAC =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 172.168.88.90)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 172.168.88.91)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = orarac)
)
)
创建会话:
C:\Users\Administrator>sqlplus djp01/djp01@orarac
SQL*Plus: Release 12.1.0.1.0 Production on 星期日 12月 22 17:18:52 2013
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
djp01@ORARAC> select inst_id,failover_method,failover_type,failed_over
2 from gv$session
3 where username = upper('djp01')
4 /
INST_ID FAILOVER_METHOD FAILOVER_TYPE FAILED
---------- -------------------- -------------------------- ------
1 BASIC SELECT NO
1 NONE NONE NO
2 NONE NONE NO
djp01@ORARAC>
对实例1所在的服务器进行重启
[root@djp01 ~]# reboot
执行如下SQL:
djp01@ORARAC> select count(*) from user_objects
2 /
COUNT(*)
----------
1
djp01@ORARAC>
该SQL需要执行几秒,然后输出结果,下面,我们再次查看会话:
djp01@ORARAC> select inst_id,failover_method,failover_type,failed_over
2 from gv$session
3 where username = upper('djp01')
4 /
INST_ID FAILOVER_METHOD FAILOVER_TYPE FAILED
---------- -------------------- -------------------------- ------
2 BASIC SELECT YES
djp01@ORARAC>
切换成功。
下面, 我们来介绍用另外一种方法,使用dbms_service工具,用法简单,如下:
(1)创建Service并启动Service:
SQL> begin
2 dbms_service.create_service(
3 service_name => 'djpora1',
4 network_name => 'djpora1',
5 failover_method => dbms_service.failover_method_basic,
6 failover_type => dbms_service.failover_type_select,
7 failover_retries => 180,
8 failover_delay => 5);
9 end;
10 /
PL/SQL procedure successfully completed.
SQL> exec dbms_service.start_service('djpora1')
PL/SQL procedure successfully completed.
SQL>
(2)服务检查:
[oracle@djp01 ~]$ su - grid
Password:
[grid@djp01 ~]$ lsnrctl status listener_scan1
LSNRCTL for Linux: Version 11.2.0.1.0 - Production on 22-DEC-2013 23:29:35
Copyright (c) 1991, 2009, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias LISTENER_SCAN1
Version TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date 22-DEC-2013 21:22:31
Uptime 0 days 2 hr. 7 min. 3 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/11.2.0/grid/log/diag/tnslsnr/djp01/listener_scan1/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=172.168.88.149)(PORT=1521)))
Services Summary...
Service "djpora1" has 1 instance(s).
Instance "orarac1", status READY, has 1 handler(s) for this service...
Service "orarac" has 2 instance(s).
Instance "orarac1", status READY, has 1 handler(s) for this service...
Instance "orarac2", status READY, has 1 handler(s) for this service...
Service "oraracXDB" has 2 instance(s).
Instance "orarac1", status READY, has 1 handler(s) for this service...
Instance "orarac2", status READY, has 1 handler(s) for this service...
The command completed successfully
[grid@djp01 ~]$
所创建的服务已经成功运行,我们还可以通过数据字典gv$service来查看。
我们还可以对所创建的Service进行删除,如下:
SQL> begin
2 dbms_service.stop_service('djpora1');
3 dbms_service.delete_service('djpora1');
4 end;
5 /
PL/SQL procedure successfully completed.
SQL>
先停止,然后再删除即可。
上述我们可以称为服务端的Failover,我们还可以在客户端实现,即TAF(transparent applicaton failover 透明应用故障切换)机制。
我们先来看一个典型的客户端net配置:
ORARAC =
(DESCRIPTION =
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 172.168.88.90)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 172.168.88.91)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = orarac)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic)
(RETRIES=60)
(DELAY=3)
)
)
)
我们只要加入"FAILOVER_MODE"这部分即可,含义在上述,我已经介绍过了,这里还有一个就是BACKUP,指定用于创建本地连接的服务名,当使用PRECONNECTS时,应该指定该参数,下面,我给出一个使用预连接的典型的一个配置:
ORARAC1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 172.168.88.90)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = orarac)
(INSTANCE_ROLE=primary)
(FAILOVER_MODE=
(BACKUP=ORARAC2)
(TYPE=select)
(METHOD=preconnect)
)
)
)
ORARAC2 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 172.168.88.91)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = orarac)
(INSTANCE_ROLE=secondary)
)
)
下面,在客户端,我们使用上面“典型的客户端net配置”来连接数据库:
C:\Users\Administrator>sqlplus djp01/djp01@orarac
SQL*Plus: Release 12.1.0.1.0 Production on 星期日 12月 22 16:26:34 2013
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
djp01@ORARAC> select inst_id,failover_method,failover_type,failed_over
2 from gv$session
3 where username = upper('djp01')
4 /
INST_ID FAILOVER_METHOD FAILOVER_TYPE FAILED
---------- -------------------- -------------------------- ------
1 NONE NONE NO
1 BASIC SELECT NO
2 NONE NONE NO
djp01@ORARAC> create table t as select * from all_objects where rownum <= 100
2 /
Table created.
djp01@ORARAC>
对实例1所在的服务器进行重启
[root@djp01 ~]# reboot
执行如下SQL:
djp01@ORARAC> select count(*) from t
2 /
COUNT(*)
----------
100
djp01@ORARAC>
该SQL需要等待几秒种,然后输入结果。下面,我们再次查看会话:
djp01@ORARAC> select inst_id,failover_method,failover_type,failed_over
2 from gv$session
3 where username = upper('djp01')
4 /
INST_ID FAILOVER_METHOD FAILOVER_TYPE FAILED
---------- -------------------- -------------------------- ------
2 BASIC SELECT YES
djp01@ORARAC>
切换成功。