配置说明:
1. 通过Openfiler实现iscsi共享存储
2. 通过VMware ESXi5 虚拟fence实现fence功能。
3. 结合Redhat 5.8 vmware-fence-soap实现RHCS fence设备功能。
4. 本文原创建搭建RHCS实验环境测试RHCS Oracle HA功能。

本文链接:http://koumm.blog.51cto.com/703525/1161791

一、准备基础环境

1. 网络环境准备

node1,node2节点

# cat /etc/hosts

192.168.14.100  node1
192.168.14.110  node2

2. 配置YUM安装源

(1) 挂载光盘ISO

# mount /dev/cdrom /mnt

(2) 配置YUM客户端

说明: 通过本地光盘做为yum安装源。

# vi /etc/yum.repos.d/rhel-debuginfo-bak.repo

[rhel-debuginfo]
name=Red Hat Enterprise Linux $releasever - $basearch - Debug
baseurl=ftp://ftp.redhat.com/pub/redhat/linux/enterprise/$releasever/en/os/$basearch/Debuginfo/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

[Server]
name=Server
baseurl=file:///mnt/Server
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

[VT]
name=VT
baseurl=file:///mnt/VT
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

[Cluster]
name=Cluster
baseurl=file:///mnt/Cluster
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

[ClusterStorage]
name=ClusterStorage
baseurl=file:///mnt/ClusterStorage
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

(3) openfiler iscsi存储配置
具体配置略,共划分两块lun,一块10G配置GFS,一块1G配置表决盘。

(4) 挂载存储

rpm -ivh iscsi-initiator-utils-6.2.0.872-13.el5.x86_64.rpm
chkconfig iscsi --level 35 on
chkconfig iscsid --level 35 on
service iscsi start

# iscsiadm -m discovery -t st -p 192.168.14.162
# iscsiadm -m node -T iqn.2006-01.com.openfiler:tsn.b2bd5bb312a7 -p 192.168.14.162 -l


二、RHCS软件包的安装

192.168.10.100  node1  (管理机)
192.168.10.110  node2

1. 在node1上安装luci及RHCS软件包

安装 ricci、rgmanager、gfs、cman

(1) node1(管理节点)安装RHCS软件包,luci是管理端软件包,只在管理端安装。

yum install luci ricci cman cman-devel gfs2-utils rgmanager system-config-cluster -y


(2) 配置RHCS服务开机启动

chkconfig luci on
chkconfig ricci on
chkconfig rgmanager on
chkconfig cman on
service ricci start
service cman start


2. 在node2 上安装RHCS软件包

(1) node2安装RHCS软件包

yum install ricci cman cman-devel gfs2-utils rgmanager system-config-cluster -y


(2) 配置RHCS服务开机启动

chkconfig ricci on
chkconfig rgmanager on
chkconfig cman on
service ricci start
service cman start

这是因为还没有加入集群没有产生配置文件/etc/cluster/cluster.conf
 
3. 直接安装集群组件或安装操作系统的过程中选择集群。
yum groupinstall Clustering

三、RHCS集群管理端配置

1. 在node1管理节点上安装启动luci服务

说明:在管理节点上进行操作。

(1) luci初始化
# luci_admin init    
Initializing the luci server

Creating the 'admin' user

Enter password:
Confirm password:

Please wait...
The admin password has been successfully set.
Generating SSL certificates...
The luci server has been successfully initialized

(2) 启动luci服务

# service luci start

(3) 配置管理地址

https://192.168.14.100:8084
admin/111111


四、RHCS集群配置

1. 添加集群

登录进管理界面,点击cluster->Create a New Cluster->填入如下内容:

Cluster Name: rhcs

node1 192.168.10.100
node2 192.168.10.110 

选中如下选项,然后提交,集群会经过install,reboot,config,join两步过程才能成功。
Use locally installed packages.
Enable shared storage support
check if node passwords are identical

说明:
(1) 这步会生成集群配置文件/etc/cluster/cluster.conf
(2) 也可以直接创建该配置文件。


2. 两节点分别启动集群服务

分别ssh到 node1,node2上,启动cman服务。

# service cman start 
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... done
Starting daemons... done
Starting fencing... done

3. 添加fence设备

说明:
RHCS要实现完整的集群功能,必须要实现fence功能。由于非物理服务器配置等条件限制,特使用VMware ESXi5.X的虚拟fence来实现fence设备的功能。
正是由于有了fence设备可以使用,才得以完整测试RHCS HA功能。

(1)登录进管理界面,点击cluster->Cluster List->
(2)分别选择node1,node2进行如下操作:
(3)选择"Add a fence device to this level",选择vmware fence soap。
(4)添加fence设备

name     : fence设备名,可以用虚拟机计算机名,也可以别取
hostname : vCenter或ESXi主机的地址
Login    : vCenter或ESXi主机的用户名
Password : vCenter或ESXi主机的密码
Use SSL  : 勾取
Power Wait: 可选
Virtual machine name: RHCS_node1
Virtual machine UUID: /vmfs/volumes/datastore3/RHCS_node1/RHCS_node1.vmx

#虚拟机的名称与UUID需要按实际填写,ESXi需要开启ssh登录进系统实际查看。

#手动测试fence功能示例:

# /sbin/fence_vmware_soap -a 192.168.14.70 -z -l root -p 1111111 -n RHCS_node1 -o status
Status: ON

# /sbin/fence_vmware_soap -a 192.168.14.70 -z -l root -p 1111111 -n RHCS_node1 -o list
RHCS_node2,564d5908-e8f6-99f6-18a8-a523c04111b2
RHCS_node1,564d3c96-690c-1f4b-cfbb-a880ca4bca6a


选项:
-o : list,status,reboot等参数


4. 添加Failover Domains 故障转移域

Failover Domains -> Add
名字:rhcs_failover
勾选Prioritized,
No Failback具体情况自己设定;
勾选两台节点,
设定其优先级。

点击提交。


5. 配置GFS服务

(1) GFS服务配置

分别在node1,node2启动CLVM的集成cluster锁服务

# lvmconf --enable-cluster   
# chkconfig clvmd on

# service clvmd start
Activating VG(s):   No volume groups found      [  OK  ]


(2) 在任意一节点对磁盘进行分区,划分出sdb1。然后格式化成gfs2.

node1节点上:

# pvcreate /dev/sdb1
# vgcreate vg /dev/sdb1
# lvcreate -l +100%FREE -n var01 vg

Error locking on node node2: Volume group for uuid not found: QkM2JYKg5EfFuFL6LzJsg7oAfK4zVrkytMVzdziWDmVhBGggTsbr47W1HDEu8FdB
Failed to activate new LV.

出现以上提示,需要在node2上创建物理卷

node2节点上:

# pvcreate /dev/sdb1
Can't initialize physical volume "/dev/sdb1" of volume group "vg1" without -ff
不用管提示什么。

回到node1节点上:

# lvcreate -l +100%FREE -n var01 vg
  Logical volume "var01" created    (能够创建lv了。)

# /etc/init.d/clvmd start

Activating VG(s):   1 logical volume(s) in volume group "vg1" now active
                                                           [  OK  ]

(3) 格式化GFS文件系统

node1节点上:

# mkfs.gfs2 -p lock_dlm -t rhcs:gfs2 -j 3 /dev/vg/var01              
This will destroy any data on /dev/vg/var01.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/vg/var01
Blocksize:                 4096
Device Size                10.00 GB (2620416 blocks)
Filesystem Size:           10.00 GB (2620416 blocks)
Journals:                  3
Resource Groups:           40
Locking Protocol:          "lock_dlm"
Lock Table:                "rhcs:gfs2"
UUID:                      A692D99D-22C4-10E9-3C0C-006CBF7574CD


说明:
rhcs:gfs2这个rhcs就是集群的名字,gfs2是定义的名字,相当于标签吧。
-j是指定挂载这个文件系统的主机个数,不指定默认为1即为管理节点的。
这里实验有两个节点,加上管理主机为3


6. 挂载GFS文件系统

node1,node2 上创建GFS挂载点

# mkdir /oradata

(1)node1,node2手动挂载测试,挂载成功后,创建文件测试集群文件系统情况。
# mount.gfs2 /dev/vg/var01 /oradata

(2)配置开机自动挂载
# vi /etc/fstab
/dev/vg/var01   /oradata gfs2 defaults 0 0

(3)通过管理平台配置挂载(略)

查看挂载情况:
[root@node1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                       14G  2.8G   10G  22% /
/dev/sda1              99M   27M   68M  29% /boot
tmpfs                 506M     0  506M   0% /dev/shm
/dev/hdc              3.1G  3.1G     0 100% /mnt
/dev/mapper/vg-var01   10G  388M  9.7G   4% /oradata

7. 配置表决磁盘

说明:
#表决磁盘是共享磁盘,10M大小就可以了,无需要太大,本例采用/dev/sdc1来进行创建。
#两节点好像不需要配置表决磁盘,两节点以上必须配置。

[root@node1 ~]# fdisk -l

Disk /dev/sdc: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1        1011     1048376+  83  Linux

(1) 创建表决磁盘

[root@node1 ~]# mkqdisk -c /dev/sdc1 -l myqdisk
mkqdisk v0.6.0
Writing new quorum disk label 'myqdisk' to /dev/sdc1.
WARNING: About to destroy all data on /dev/sdc1; proceed [N/y] ? y
Initializing status block for node 1...
Initializing status block for node 2...
Initializing status block for node 3...
Initializing status block for node 4...
Initializing status block for node 5...
Initializing status block for node 6...
Initializing status block for node 7...
Initializing status block for node 8...
Initializing status block for node 9...
Initializing status block for node 10...
Initializing status block for node 11...
Initializing status block for node 12...
Initializing status block for node 13...
Initializing status block for node 14...
Initializing status block for node 15...
Initializing status block for node 16...


(2) 查看表决磁盘信息

[root@node1 ~]# mkqdisk -L
mkqdisk v0.6.0
/dev/disk/by-id/scsi-14f504e46494c450068656b3274732d664452562d48534f63-part1:
/dev/disk/by-path/ip-192.168.14.162:3260-iscsi-iqn.2006-01.com.openfiler:tsn.b2bd5bb312a7-lun-1-part1:
/dev/sdc1:
        Magic:                eb7a62c2
        Label:                myqdisk
        Created:              Sun Mar 24 00:18:12 2013
        Host:                 node1
        Kernel Sector Size:   512
        Recorded Sector Size: 512


(3) 配置表决磁盘qdisk

# 进入管理界面cluster->cluster list->点击Cluster Name: rhcs;
# 选择"Quorum Partition",选择"use a Quorum Partition"

interval     : 2
votes        : 2
TKO          : 10
Minimum Score: 1
Device       : /dev/sdc1

Path to program : ping -c3 -t2 192.168.14.2
Interval        : 3
Score           : 2

# 点击apply


(4) 启动qdisk服务

chkconfig qdiskd on
service qdiskd start
clustat -l

# clustat -l            
Cluster Status for rhcs @ Sun Mar 24 00:26:26 2013
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1                                                               1 Online, Local
 node2                                                               2 Online
 /dev/sdc1                                                           0 Offline, Quorum Disk


8. 添加Resources资源

(1) 添加集群IP资源

点击cluster-> rhcs-> Resources->Add a Resources
选择IP,输入:192.168.14.130
选中monitorlink
点击submit


(2) 添加Oracle启动与关闭脚本资源

#启动oracle数据库的脚本,放在/etc/init.d/dbora下面,名称为oracle,不用配置成服务形成,该脚本会由RHCS服务来管理。
#分别在node1,node2上创建如下脚本。

# vi /etc/init.d/oracle
#!/bin/bash
export ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1
export ORACLE_SID=orcl
start() {
su - oracle<<EOF
echo "Starting Listener ..."
$ORACLE_HOME/bin/lsnrctl start
echo  "Starting Oracle10g Server.. "
sqlplus / as sysdba
startup
exit;
EOF
}
stop() {
su - oracle<<EOF
echo "Shutting down Listener..."
$ORACLE_HOME/bin/lsnrctl stop
echo "Shutting down Oracle10g Server..."
sqlplus / as sysdba
shutdown immediate;
exit
EOF
}
case "$1" in
start)
start
;;
stop)
stop
;;
*)
echo "Usage: $0 {start|stop}"
;;
esac

chmod +x /etc/init.d/dbora


五、安装配置Oracle 10G数据库

说明: 具体安装过程略过,

1. node1节点上
(1) 准备oracle安装环境
(2) 安装oracle数据库软件及补丁
(3) netca
(4) dbca 创建数据库,数据库文件,控制文件,redolog文件,闪回区,规档等都创建在/oradata集群文件系统上。

2. node2节点上
(1) 准备oracle安装环境
(2) 安装oracle数据库软件及补丁
(3) netca

3. 从node1拷贝相关参数文件到node2上

(1) node1打包参数文件
$ cd /u01/app/oracle/product/10.2.0.1/db_1
$ tar czvf dbs.tar.gz dbs
dbs/
dbs/init.ora
dbs/lkORCL
dbs/hc_orcl.dat
dbs/initdw.ora
dbs/spfileorcl.ora
dbs/orapworcl
$ scp dbs.tar.gz  node2:/u01/app/oracle/product/10.2.0/db_1/

(2) node2上

# su - oracle
$ mkdir -p /u01/app/oracle/admin/orcl/{adump,bdump,cdump,dpdump,udump} 
$ cd /u01/app/oracle/product/10.2.0/db_1/
$ tar zxvf dbs.tar.gz


六、配置Oracle10G数据库服务

1. 添加数据库服务

点击cluster->rhcs->Services->Add a Services
ServiceName:oracle10g
选中 Automatically start this service
选中 Failover domain 选择刚创建的rhcs_failover
选中 Reovery policy (恢复策略) restart

点击"add a resource to this service" 添加之前创建的"IP资源","Oracle脚本资源"。
点击"go",即可创建oracle10g服务


2. 查看服务状态

(1) 查看集群状态

[root@node1 db_1]# clustat -l
Cluster Status for rhcs @ Sun Mar 24 12:37:02 2013
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node1                                                               1 Online, Local, rgmanager
 node2                                                               2 Online, rgmanager
 /dev/sdc1                                                           0 Online, Quorum Disk

Service Information
------- -----------

Service Name      : service:oracle10g
  Current State   : started (112)
  Flags           : none (0)
  Owner           : node1
  Last Owner      : none
  Last Transition : Sun Mar 24 12:35:53 2013


(2) 查看集群IP

[root@node1 db_1]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: sit0: <NOARP> mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:25:ee:43 brd ff:ff:ff:ff:ff:ff
    inet 192.168.14.100/24 brd 192.168.14.255 scope global eth0
    inet 192.168.14.130/24 scope global secondary eth0
    inet6 fe80::20c:29ff:fe25:ee43/64 scope link
       valid_lft forever preferred_lft forever
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:25:ee:4d brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.10/24 brd 10.10.10.255 scope global eth1
    inet6 fe80::20c:29ff:fe25:ee4d/64 scope link
       valid_lft forever preferred_lft forever


3. 手工切换服务节点

# clusvcadm -r "oracle10g" -m node2
Trying to relocate service:oracle10g to node2...Success
service:oracle10g is now running on node2

# cat /var/log/messages
Mar 24 21:12:44 node2 clurgmgrd[3601]: <notice> Starting stopped service service:oracle10g
Mar 24 21:12:46 node2 avahi-daemon[3513]: Registering new address record for 192.168.14.130 on eth0.
Mar 24 21:13:43 node2 clurgmgrd[3601]: <notice> Service service:oracle10g started

其它:具体RHCS HA功能还需待详细测试。