前言:CRS的简介和由来
从Oracle 10gR1 RAC 开始,Oracle推出了自身的集群软件,这个软件的名称叫做Oracle Cluster Ready Service(Oracle集群就绪服务),简称CRS。从Oracle 10gR2开始,包括最新的11g,Oracle将其更名为Clusterware(集群件),但通常意义上我们认为CRS = Clusterware = Oracle Cluster Ready Service = Oracle Cluster Software.
CRS一般用来搭建Oracle的并行数据库,即RAC,但除了与RAC的接口之外,CRS还提供了一组高可用性的应用程序接口(API),用来搭建一般应用程序的高可用集群,即一般我们常说的双机热备,比如使用CRS实现MySQL的双机热备。
这种主备模式的双机热备还可以包括许多第三方的应用程序,比如虚拟IP、磁盘组、文件系统、MySQL数据库、Apache,或者单节点的Oracle实例,或者单节点的ASM,等等,都可以作为资源注册到CRS中去,由CRS来启动,关闭,监测应用程序的状态,还可以设置应用程序相互的依赖关系,保证多组资源正确的启动顺序。
本文就以保护单节点oracle实例为例,演示如何使用CRS来实现上述功能。使用的主要的软件有:Solaris 10u4, Oracle CRS 10.2.0.2 , Oracle RDMBS 10.2.0.3, VxVM 5.0 ,磁盘阵列型号是AMS1000。
系统拓扑图大致如下:
主要操作步骤如下:
一、准备工作:软件安装,数据库创建
1.安装Solaris 10, Veritas Volume Manager ,安装过程略
2.创建用户组dba/oinstall,oracle用户,并修改相应profile和/etc/hosts文件
修改rhosts文件,配置oracle用户的对等连接;
连接心跳网线;
如果生产环境中推荐心跳网络使用千兆,推荐每台机器有两块网卡分别连接两个网络交换机;
在操作系统中启动心跳网卡;
准备共享磁盘,这里使用的是SAN环境下的共享盘ams_wms0_0098,大小为20G;
3.下面使用静默方式来安装Oracle集群软件、数据库软件
这种安装创建方式的优点是创建速度比较快,并且不需要运行图形界面,适合远程安装和建库;
缺点是不直观,需要手工编写响应文件;
这一步CRS的安装只需要在一个节点上做:
oracle@rac01$. ./clusterware/runInstaller -silent -responsefile /tmp/shahand/crs.rsp
crs.rsp 文件内容参考“五-7”部分
CRS的runInstaller运行完毕以后,要手工在两个节点上运行root.sh,
并要手工运行$CRS_HOME/cfgtoollogs/configToolAllCommands
检查CRS安装配置正确:
root @rac01 #crs_stat - t
Name Type Target State Host
-- ----------------------------------------------------------
ora....c01.gsd application ONLINE ONLINE rac01
ora....c01.ons application ONLINE ONLINE rac01
ora....c01.vip application ONLINE ONLINE rac01
ora....c02.gsd application ONLINE ONLINE rac02
ora....c02.ons application ONLINE ONLINE rac02
ora....c02.vip application ONLINE ONLINE rac02
![]()
检查设置了正确的心跳网络:
root@rac01 # $CRS_HOME/bin/oifcfg getif
e1000g0 10.198.88.0 global public
e1000g1 192.168.2.0 global cluster_interconnect
使用静默方式安装Oracle 数据库软件,这一步两个节点都要做:
./runInstaller -silent -responsefile /tmp/shahand/db.rsp
db.rsp 文件内容参考“五-8”部分
4.创建oracle数据库文件所需要的盘组、逻辑卷、文件系统、挂载文件系统并设置权限;
只需要在一个节点上做;
root @rac01 # vxdisksetup - i ams_wms0_0098
root @rac01 # vxdg init oradata12 ams_wms0_0098
root @rac01 # vxassist - g oradata12 make oradata 18G
root @rac01 # vxedit - g oradata12 set user = oracle group = dba mode = 644 oradata
root @rac01 # timex mkfs - F vxfs / dev / vx / rdsk / oradata12 / oradata
version 7 layout
37748736 sectors, 18874368 blocks of size 1024 , log size 65536 blocks
largefiles supported
![]()
real 10.30
user 0.07
sys 0.04
![]()
root @rac01 # mount - F vxfs - o largefiles / dev / vx / dsk / oradata12 / oradata / oradata
root @rac01 # chown oracle:dba / oradata
root @rac01 # df - h / oradata /
Filesystem size used avail capacity Mounted on
/ dev / vx / dsk / oradata12 / oradata
18G 70M 17G 1 % / oradata
![]()
5.静默方式创建oracle数据库,只需要一个节点上做:
![]()
oracle @rac01 $ dbca - silent - createDatabase - sid orcl - sysPassword sys - systemPassword sys /
- datafileDestination / oradata - gdbName orcl - templateName General_Purpose.dbc
Copying database files
1 % complete
3 % complete
11 % complete
18 % complete
26 % complete
37 % complete
Creating and starting Oracle instance
40 % complete
45 % complete
50 % complete
55 % complete
56 % complete
60 % complete
62 % complete
Completing Database Creation
66 % complete
70 % complete
73 % complete
85 % complete
96 % complete
100 % complete
![]()
手工检查数据库orcl的状态,可以登陆数据库select status from v$instance查看。
二、Oracle 集群软件资源的手工注册
1. 注销crs本身自带的ons、gsd、vip资源
root@rac01 # crs_stop -all
Attempting to stop `ora.rac01.gsd` on member `rac01`
Attempting to stop `ora.rac01.ons` on member `rac01`
Attempting to stop `ora.rac02.gsd` on member `rac02`
Attempting to stop `ora.rac02.ons` on member `rac02`
Stop of `ora.rac02.gsd` on member `rac02` succeeded.
Stop of `ora.rac02.ons` on member `rac02` succeeded.
Stop of `ora.rac01.gsd` on member `rac01` succeeded.
Stop of `ora.rac01.ons` on member `rac01` succeeded.
Attempting to stop `ora.rac01.vip` on member `rac01`
Attempting to stop `ora.rac02.vip` on member `rac02`
Stop of `ora.rac02.vip` on member `rac02` succeeded.
Stop of `ora.rac01.vip` on member `rac01` succeeded.
root@rac01 # crs_unregister ora.rac01.gsd
root@rac01 # crs_unregister ora.rac01.ons
root@rac01 # crs_unregister ora.rac01.vip
root@rac01 # crs_unregister ora.rac02.vip
root@rac01 # crs_unregister ora.rac02.ons
root@rac01 # crs_unregister ora.rac02.gsd
root@rac01 # crs_stat -t
CRS-0202: No resources are registered.
2.创建虚拟IP资源:
root@rac01 # crs_profile -create havip -t application -a /oracle/crs/bin/usrvip /
-o oi=e1000g0,ov=10.198.94.139,on=255.255.248.0
root@rac01 # crs_register havip
root@rac01 # crs_setperm havip -o root
root@rac01 # crs_setperm havip -u user:oracle:r-x
root@rac01 # crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
ha_vip application 0/1 0/0 OFFLINE OFFLINE
root@rac01 # crs_start havip
root@rac01 # crs_stat -t -v
Name Type R/RA F/FT Target State Host
----------------------------------------------------------------------
havip application 0/1 0/0 ONLINE ONLINE rac01
3.准备控制其他资源启动、关闭、检查的脚本文件dg.sh/fs.sh/db.sh/lsnr.sh
这四个脚本文件内容参考“五-3/4/5/6”部分
对crs_profile命令中的选项和参数做简单说明:
(1) 选项-r定义了该资源所依赖的资源,在下面的例子中,资源oradata_mount启动时依赖于
disk_group先 启动,需要停止disk_group的时候必须先停止资源oradata_mount,
资源orcl_db的启动则同时依赖于oradata_mount/disk_group/havip/listener;
(2) 参数-o 包括:ci的意思是crs对资源状态的监测间隔(check interval),单位为秒;
ra : crs重启资源的尝试次数,RESTART_ATTEMPTS,次数到达以后将重新分配;
fi : 资源状态出现错误以后,crs的尝试间隔,FAILURE_INTERVAL,单位是秒;
ft : 资源状态出现错误以后,crs的尝试次数,FAILURE_THRESHOLD;
这些参数可以使用默认值,分别是60秒/1/0秒/0。
(3) 参数-a 是指ACTION_SCRIPT,参数值为资源启动、关闭、监测的脚本,脚本固定的三个参数为
start/stop/check;
管理数据库监听的部分:
修改$ORACLE_HOME/network/admin/listener.ora文件,
将其中(HOST = rac01 )部分修改成(HOST = 10.198.94.139 ) (虚拟IP地址)
crs_profile -create listener -t application -a /oracle/crs/crs/public/lsnr.sh -r havip -o /
ci=180,ra=6,ft=2,fi=12
crs_register listener
crs_setperm listener -o root
crs_setperm listener -u user:oracle:r-x
crs_start listener
管理磁盘组和逻辑卷的部分:
crs_profile -create disk_group -t application -a /oracle/crs/crs/public/dg.sh -r havip -o /
ci=180,ra=6,ft=2,fi=12
crs_register disk_group
crs_setperm disk_group -o root
crs_setperm disk_group -u user:oracle:r-x
注:本身磁盘组的启动并不依赖于虚拟IP的启动,这里之所以设置两者的依赖关系,
是为了防止虚拟IP在一个节点启动,而磁盘组在另外一个节点启动,造成资源不一致的情况出现。
管理文件系统的部分:
crs_profile -create oradata_mount -t application -a /oracle/crs/crs/public/fs.sh -r disk_group -o /
ci=180,ra=6,ft=2,fi=12
crs_register oradata_mount
crs_setperm oradata_mount -o root
crs_setperm oradata_mount -u user:oracle:r-x
管理数据库实例的部分:
crs_profile -create orcl_db -t application -a /oracle/crs/crs/public/db.sh -r /
"oradata_mount listener" -o ci=180,ra=6,ft=2,fi=12
crs_register orcl_db
crs_setperm orcl_db -o root
crs_setperm orcl_db -u user:oracle:r-x
crs_start orcl_db
4.确保脚本具有执行属性,并把public 和profile的内容拷到第二个节点上。
# chmod +x /oracle/crs/crs/public/*
# rcp -r -p /oracle/crs/crs/public/* rac02:/oracle/crs/crs/public/
5.启动所有的资源
下面可以看到,在crs启动和关闭资源的过程中,其顺序是按照前面定义的资源依赖关系进行的:
root@rac01 # crs_stop -all
Attempting to stop `orcl_db` on member `rac01`
Stop of `orcl_db` on member `rac01` succeeded.
Attempting to stop `oradata_mount` on member `rac01`
Stop of `oradata_mount` on member `rac01` succeeded.
Attempting to stop `disk_group` on member `rac01`
Stop of `disk_group` on member `rac01` succeeded.
Attempting to stop `listener` on member `rac01`
Stop of `listener` on member `rac01` succeeded.
Attempting to stop `havip` on member `rac01`
Stop of `havip` on member `rac01` succeeded.
root@rac01 # crs_start -all
Attempting to start `havip` on member `rac01`
Start of `havip` on member `rac01` succeeded.
Attempting to start `listener` on member `rac01`
Start of `listener` on member `rac01` succeeded.
Attempting to start `disk_group` on member `rac01`
Start of `disk_group` on member `rac01` succeeded.
Attempting to start `oradata_mount` on member `rac01`
Start of `oradata_mount` on member `rac01` succeeded.
Attempting to start `orcl_db` on member `rac01`
Start of `orcl_db` on member `rac01` succeeded.
检查资源状态是否正常:
oracle@rac01 $ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
disk_group application ONLINE ONLINE rac01
havip application ONLINE ONLINE rac01
listener application ONLINE ONLINE rac01
oradata_mount application ONLINE ONLINE rac01
orcl_db application ONLINE ONLINE rac01
三、Oracle 集群软件的资源的管理
1.如果需要修改资源的属性,使用crs_profile -update 选项;具体例子可以参见五-1的错误二;
2.如果资源的状态为UNKNOWN,要对该资源进行关闭,使用crs_stop的命令的时候需要加入-f参数;
3.使用crs_profile -print <resource_name> 来查看资源的属性情况,包括依赖关系等等,
同样也可以使用crs_stat -p <resource_name> 来实现;
4.关于CRS的日志:主要在$CRS_HOME/log/node_name目录下,但需要提醒的是,系统日志中也会有
比较重要的日志信息,比如Solaris下的/var/adm/messages、linux一般在/var/log/messages ,
HPUX则是/var/adm/syslog/syslog.log文件;
5.启动、关闭、和查看crs资源的命令分别为crs_start 和crs_stop 和crs_stat,
每个命令都可以使用-H参数得到相应语法;
也可以使用stvctl start nodeapps -n rac1命令;
四、对集群软件进行测试
1.手工切换节点:
在任意节点上依次使用下面的命令,以oracle或者root执行均可,只要配置正确的$PATH环境变量
crs_stop -all;
crs_start havip -c rac02;
crs_start listener -c rac02;
crs_start disk_group -c rac02;
crs_start oradata_mount -c rac02;
crs_start orcl_db -c rac02;
然后,登陆到rac02(现在为主节点)使用df -h /oradata 检查共享盘是否挂载,
使用ps -ef|grep ora_检查到oracle启动,检查后台报警日志中没有错误信息,
2.自动切换:
手工模拟主节点的故障情况:使用reboot命令
root@rac01 # reboot
Jan 8 14:53:57 rac01 reboot: [ID 662345 auth.crit] rebooted by root
从日志中看到备用节点rac02上的crs感应到了主节点的失败,并接管相关服务:
2008-01-08 14:30:33.929: [ CRSMAIN][1] Starting Threads
2008-01-08 14:30:33.929: [ CRSMAIN][1] CRS Daemon Started.
2008-01-08 14:52:18.777: [ CRSEVT][71] Processing member leave for rac01, incarnation: 2
2008-01-08 14:52:18.878: [ CRSEVT][71] Do failover for: rac01
2008-01-08 14:52:42.180: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:42.193: [ CRSRES][73] Attempting to start `disk_group` on member `rac02`
2008-01-08 14:52:45.722: [ CRSRES][73] Start of `disk_group` on member `rac02` succeeded.
2008-01-08 14:52:45.731: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:45.732: [ CRSRES][73] Attempting to start `oradata_mount` on member `rac02`
2008-01-08 14:52:45.986: [ CRSRES][73] Start of `oradata_mount` on member `rac02` succeeded.
2008-01-08 14:52:46.013: [ CRSRES][73] startRunnable: setting CLI values
2008-01-08 14:52:46.015: [ CRSRES][73] Attempting to start `orcl_db` on member `rac02`
2008-01-08 14:53:31.486: [ CRSRES][73] Start of `orcl_db` on member `rac02` succeeded.
2008-01-08 14:53:31.487: [ CRSEVT][71] Post recovery done evmd event for: rac01
2008-01-08 14:53:31.603: [ CRSEVT][75] Processing RecoveryDone
然后再登陆rac02,查看文件系统是否挂载,确认数据状态正常。
from:【IT168】