Corosync/Openais+Pacemaker+ISCSI+OCFS：构建高可用Web群集

最新推荐文章于 2022-07-15 16:10:17 发布

weixin_33690367

最新推荐文章于 2022-07-15 16:10:17 发布

阅读量180

点赞数

文章标签：开发工具运维 shell

原文链接：http://blog.51cto.com/tywangpanpan/1222413

版权

目的：

实现高可用的Web群集，后方共享存储：ISCSI（IP-SAN）；

为了实现资源同步，采用OCFS群集文件系统.

特点：

高可用节点之间，不必需要心跳线链接

需要去掉通信的口令（scp），实现无障碍通信.

---------------------------------------------

地址规划：

*HA架构服务器*

node1.a.com eth0-ip：192.168.102.101 eth1:192.168.1.100

node2.a.com eth0-ip：192.168.102.102 eth1:192.168.1.200

Vip：192.168.102.200

注：eth0桥接、eth1 Host-Only

---------------------------------------------------------

*Target服务器端*

eth0-ip：192.168.1.10

注：eth0 Host-Only

--------------------------------------------------------

***配置步骤***

————————————————————————————

Step1:准备工作

--------------------

①分别在给个节点上配置静态ip地址（service network restart）

②进行节点间的时钟同步.(hwclock / date -s "2013-06-14 **:**:**")

③修改HA节点的主机名，使相互能进行名称解析.

vim /etc/sysyconfig/network

1 NETWORKING=yes

2 NETWORKING_IPV6=yes

3 HOSTNAME=node1.a.com(node2.a.com)

vim /etc/hosts

3 127.0.0.1 localhost.localdomain localhost

4 ::1 localhost6.localdomain6 localhost6

5 192.168.102.101 node1.a.com node1

6 192.168.102.102 node2.a.com node2

hostname node1.a.com

④实现节点间的无障碍通信（通信时不需要输入对方的root密码）

node1：

ssh-keygen -t rsa //生成node1节点的ssh服务的公钥和私钥对

cd /root/.ssh/

sh-copy-id -i id_rsa.pub node2 //将node1的公钥传递给node2

输入node2的root密码：123456

node2：

ssh-keygen -t rsa //生成node2节点的ssh服务的公钥和私钥对

cd /root/.ssh/

sh-copy-id -i id_rsa.pub node1 //将node1的公钥传递给node1

输入node1的root密码：123456

node1上无障碍通信测试：scp /etc/fstab node2(不再需要root密码)

⑤node1（node2）上配置本地yum源，挂载光盘，安装Corosync相关软件包

yum localinstall cluster-glue-1.0.6-1.6.el5.i386.rpm \\

cluster-glue-libs-1.0.6-1.6.el5.i386.rpm \\

corosync-1.2.7-1.1.el5.i386.rpm \\

corosynclib-1.2.7-1.1.el5.i386.rpm \\

heartbeat-3.0.3-2.3.el5.i386.rpm \\

heartbeat-libs-3.0.3-2.3.el5.i386.rpm \\

libesmtp-1.0.4-5.el5.i386.rpm \\

pacemaker-1.1.5-1.1.el5.i386.rpm \\

pacemaker-libs-1.1.5-1.1.el5.i386.rpm \\

perl-TimeDate-1.16-5.el5.noarch.rpm \\

resource-agents-1.0.4-1.1.el5.i386.rpm --nogpgcheck

rpm -ivh openais-1.1.3-1.6.el5.i386.rpm

rpm -ivh openaislib-1.1.3-1.6.el5.i386.rpm

---------------------------------------

Step2:进行Corosync的具体配置

---------------------------------------

①拷贝生成配置文件，并进行相关的配置

cd /etc/corosync/

cp -p corosync.conf.example corosync.conf

vim corosync.conf

# Please read the corosync.conf.5 manual page

compatibility: whitetank(表示兼容corosync 0.86的版本，向后兼容，兼容老的版本，一些新的功能可能无法实用）

totem { （图腾的意思，多个节点传递心跳时的相关协议的信息）

version: 2 版本号

secauth: off 是否打开安全认证

threads: 0 多少个线程 0 ：无限制

interface {

ringnumber: 0

bindnetaddr: 192.168.102.0 （通过哪个网络地址进行通讯，可以给个主机地址）

mcastaddr: 226.94.1.1

mcastport: 5405

}

logging { （进行的日志的相关选项配置）

fileline: off 一行显示所有的日志信息

to_stderr: no 是否发送标准的出错到标准的出错设备上（屏幕）

to_logfile: yes 将信息输出到日志文件中

to_syslog: yes 同时将信息写入到系统日志中（两个用一个，占系统资源）

logfile: /var/log/cluster/corosync.log （***日志文件的存放目录，需要手动创建，不创建，服务将会起不来***）

debug: off 是否开启debug功能，系统排查时，可以启用该功能

timestamp: on 日志是否记录时间

（以下是openais的东西，可以不用打开）

logger_subsys {

subsys: AMF

debug: off

}

amf {

mode: disabled

}

service { （补充一些东西，前面只是底层的东西，因为要用pacemaker）

ver: 0

name: pacemaker

}

aisexec { （虽然用不到openais ，但是会用到一些子选项）

user: root

group: root

}

②为了方便其他主机加入该集群，需要认证，生成一个authkey

corosync-keygen

[root@node1 corosync]# ll

total 28

-rw-r--r-- 1 root root 5384 Jul 28 2010 amf.conf.example

-r-------- 1 root root 128 May 7 16:16 authkey

-rw-r--r-- 1 root root 513 May 7 16:14 corosync.conf

-rw-r--r-- 1 root root 436 Jul 28 2010 corosync.conf.example

drwxr-xr-x 2 root root 4096 Jul 28 2010 service.d

drwxr-xr-x 2 root root 4096 Jul 28 2010 uidgid.d

③创建日志文件的存放目录

mkdir /var/log/cluster

④进行节点间的配置同步.

[root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/

authkey 100% 128 0.1KB/s 00:00

corosync.conf 100% 513 0.5KB/s 00:00

[root@node1 corosync]# ssh node2 'mkdir /var/log/cluster'

⑤启动服务

service corosync start

ssh node2 '/etc/init.d/corosync start'

⑥查看corosync的引擎启动情况

grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages

⑦查看初始化成员节点通知是否发出

grep -i totem /var/log/messages

⑧检查过程中是否有错误产生

grep -i error: /var/log/messages |grep -v unpack_resources

⑨检查pacemaker是否已经启动了

grep -i pcmk_startup /var/log/messages

⑩在任何一个节点上查看集群的成员状态

[root@node1 ~]# crm status

============

Last updated: Fri Jun 14 22:06:21 2013

Stack: openais

Current DC: node1.a.com - partition with quorum

Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f

2 Nodes configured, 2 expected votes

0 Resources configured.

============

Online: [ node1.a.com node2.a.com ]

-------------------------------------------------------------------

Step3:提供高可用性的服务

---------------------------------

Corosync中，定义服务可以用两种接口：

1：图形（hb_gui）Heartbeat的一种图形工具，需要安装Heartbeat需要的软件包

yum localinstall heartbeat-2.1.4-9.el5.i386.rpm \\

heartbeat-gui-2.1.4-9.el5.i386.rpm \\

heartbeat-pils-2.1.4-10.el5.i386.rpm \\

heartbeat-stonith-2.1.4-10.el5.i386.rpm \\

libnet-1.1.4-3.el5.i386.rpm \\

perl-MailTools-1.77-1.el5.noarch.rpm --nogpgcheck

安装完后：hb_gui图形进行群集配置

2:crm(pacemaker提供的一种shell)

①显示当前的配置信息 crm configure show

②进行配置文件的语法检测 crm_verify -L

[root@node1 corosync]# crm_verify -L

crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined

crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

Errors found during check: config not valid

-V may provide more details

可以看到有stonith错误，在高可用的环境里面，会禁止使用任何支援

可以禁用stonith

方法：

[root@node1 corosync]# crm //进入crm的shell模式下

crm(live)# configure //进入全局配置模式

crm(live)configure# property stonith-enabled=false //关闭stonith机制

crm(live)configure# commit //提交保存配置信息

crm(live)configure# show //显示当前配置

crm(live)configure# exit

再次进行语法检测：crm_verify -L 就不会报错了.

③群集资源类型4种

[root@node1 corosync]# crm

crm(live)# configure

crm(live)# help

primitive 本地主资源（只能运行在一个节点上）

group 把多个资源轨道一个组里面，便于管理

clone 需要在多个节点上同时启用的（如ocfs2 ，stonith ，没有主次之分）

master 有主次之分，如drbd

。。。。。

④用资源代理进行服务的配置

[root@node1 corosync]# crm

crm(live)# ra

crm(live)# classes

heartbeat

lsb

ocf / heartbeat pacemaker

stonith

⑤查看资源代理的脚本列表

[root@node1 corosync]# crm

crm(live)# ra

crm(live)ra# list lsb

NetworkManager acpid anacron apmd

atd auditd autofs avahi-daemon

avahi-dnsconfd bluetooth capi conman

corosync cpuspeed crond cups

cups-config-daemon dnsmasq drbd dund

firstboot functions gpm haldaemon

halt heartbeat hidd hplip

httpd ip6tables ipmi iptables

irda irqbalance iscsi iscsid

isdn kdump killall krb524

kudzu lm_sensors logd lvm2-monitor

mcstrans mdmonitor mdmpd messagebus

microcode_ctl multipathd netconsole netfs

netplugd network nfs nfslock

nscd ntpd o2cb ocfs2

openais openibd pacemaker pand

pcscd portmap psacct rawdevices

rdisc readahead_early readahead_later restorecond

rhnsd rpcgssd rpcidmapd rpcsvcgssd

saslauthd sendmail setroubleshoot single

smartd sshd syslog vncserver

wdaemon winbind wpa_supplicant xfs

xinetd ypbind yum-updatesd

查看ocf的heartbeat

crm(live)ra# list ocf heartbeat

⑥使用info或meta显示一个资源的详细信息

meta ocf:heartbeat:IPaddr

⑦配置资源（IP地址：vip-192.168.102.200 Web服务：httpd）

[root@node1 ~]# crm

crm(live)# configure

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.2.100

crm(live)configure# show //查看

node node1.a.com

node node2.a.com

primitive webip ocf:heartbeat:IPaddr \\

params ip="192.168.102.200"

property $id="cib-bootstrap-options" \\

dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\

cluster-infrastructure="openais" \\

expected-quorum-votes="2" \\

stonith-enabled="false"

crm(live)configure# commit //提交

crm(live)# status //状态查询

============

Last updated: Mon May 7 19:39:37 2013

Stack: openais

Current DC: node1.a.com - partition with quorum

Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f

2 Nodes configured, 2 expected votes

1 Resources configured.

============

Online: [ node1.a.com node2.a.com ]

webip(ocf::heartbeat:IPaddr):Started node1.a.com

可以看出该资源在node1上启动

使用ifconfig 在node1上进行查看

[root@node1 ~]# ifconfig

eth0:0 Link encap:Ethernet HWaddr 00:0C:29:25:D2:BC

inet addr:192.168.102.200 Bcast:192.168.102.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Interrupt:67 Base address:0x2000

定义httpd资源

在node1和node2上安装httpd服务，不需开机启动.

yum install httpd

chkconfig httpd off

查看httpd服务的资源代理：lsb

[root@node1 corosync]# crm

crm(live)# ra

crm(live)ra# list lsb

查看httpd的参数

crm(live)ra# meta lsb:httpd

定义httpd的资源

crm(live)configure# primitive webserver lsb:httpd

crm(live)configure# show

node node1.a.com

node node2.a.com

primitive webip ocf:heartbeat:IPaddr \\

params ip="192.168.102.200"

primitive webserver lsb:httpd

property $id="cib-bootstrap-options" \\

dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\

cluster-infrastructure="openais" \\

expected-quorum-votes="2" \\

stonith-enabled="false"

crm(live)# status

============

Last updated: Mon May 7 20:01:12 2013

Stack: openais

Current DC: node1.a.com - partition with quorum

Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f

2 Nodes configured, 2 expected votes

2 Resources configured.

============

Online: [ node1.a.com node2.a.com ]

webIP(ocf::heartbeat:IPaddr):Started node1.a.com

webserver(lsb:httpd):Started node2.a.com

发现httpd已经启动了，但是在node2节点上

（高级群集服务资源越来越多，会分布在不同的节点上，以尽量负载均衡）

需要约束在同一个节点上，定义成一个组

⑧定义一个资源组，将资源进行绑定

crm(live)# configure

crm(live)configure# help group

The `group` command creates a group of resources.

Usage:

...............

group [...]

[meta attr_list]

[params attr_list]

attr_list :: [$id=] = [=...] | $id-ref=

...............

Example:

...............

group internal_www disk0 fs0 internal_ip apache \\

meta target_role=stopped

...............

定义组进行资源绑定

crm(live)configure# group web-res webip webserver

crm(live)configure# show

node node1.a.com

node node2.a.com

primitive webip ocf:heartbeat:IPaddr \\

params ip="192.168.102.200"

primitive webserver lsb:httpd

group web-res webip webserver

property $id="cib-bootstrap-options" \\

dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\

cluster-infrastructure="openais" \\

expected-quorum-votes="2" \\

stonith-enabled="false"

查看群集的状态

crm(live)# status

============

Last updated: Mon May 7 20:09:06 2013

Stack: openais

Current DC: node1.a.com - partition with quorum

Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f

2 Nodes configured, 2 expected votes

1 Resources configured.

============

Online: [ node1.a.com node2.a.com ]

Resource Group: web-res

webIP(ocf::heartbeat:IPaddr):Started node1.a.com

webserver(lsb:httpd):Started node1.a.com

（现在ip地址和 httpd都已经在node1上了）

------------------------------------------------------------

Step4：进行节点间的切换测试.

---------------------------------------

node1:将corosync服务停掉,在节点node2上观察

service corosync stop

[root@node2 corosync]# crm status

============

Last updated: Mon May 7 20:16:58 2013

Stack: openais

Current DC: node2.a.com - partition WITHOUT quorum

Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f

2 Nodes configured, 2 expected votes

1 Resources configured.

============

Online: [ node2.a.com ]

OFFLINE: [ node1.a.com ]

可以看到：由于node2节点上没有票数，导致不能正常的资源切换.

解决方法：忽略仲裁磁盘选项.quorum

可选参数有：

ignore （忽略）

freeze （冻结，表示已经启用的资源继续实用，没有启用的资源不能启用）

stop （默认）

suicide （所有的资源杀掉）

再node1上：

service corosync start

[root@node1 corosync]# crm

crm(live)# configure

crm(live)configure# property no-quorum-policy=ignore

crm(live)configure# commit

crm(live)configure# show （在次查看quorum 的属性）

node node1.a.com

node node2.a.com

primitive webip ocf:heartbeat:IPaddr \\

params ip="192.168.102.200"

primitive webserver lsb:httpd

group web-res webip webserver

property $id="cib-bootstrap-options" \\

dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\

cluster-infrastructure="openais" \\

expected-quorum-votes="2" \\

stonith-enabled="false" \\

no-quorum-policy="ignore" （已经关闭）

再次进行切换测试，资源轮转正常！

------------------------------------------------------

Step5：corosync的常见指令

--------------------------

①crm_attribute 修改集群的全局属性信息

②crm_resource 修改资源

③6crm_node 管理节点

crm_node -e 查看节点的时代（配置文件修改过几次了）

crm_node -q 显示当前节点的票数

④cibadmin 集群配置的工具

-u, --upgrade Upgrade the configuration to the latest syntax

-Q, --query Query the contents of the CIB

-E, --erase Erase the contents of the whole CIB

-B, --bump Increase the CIB's epoch value by 1

如果某一个资源定义错了，就可以实用该工具进行删除

-D, --delete Delete the first object matching the supplied criteria, Eg.

也可以在crm的命令行下

crm(live)configure# delete

usage: delete [...]

也可以在该模式下执行edit

执行完毕后，commit 提交

--------------------------------------------------------------------

Step6：ISCSI（IP-SAN）存储配置详情

------------------------------------------------

一：target（后方的存储介质）

①新添加一块磁盘（或分区）

fdisk -l

分区：fdisk /dev/sda(n--p--4--+2g-w)---添加一块磁盘sda6

更新分区表：（cat /proc/partitions）

partprobe /dev/sda(不重启，更新分区表)

②安装target需要的软件包,启动服务.

cd /mnt/cdrom/ClusterStorage

rpm -ivh perl-Config-General-2.40-1.e15.noarchrpm

rpm -ivh scsi-target-utils-0.0-5.20080917snap.e15.i386.rpm

service tgtd start

③添加新的iscsi的target.

添加：tgtadm --lld iscsi --op new --mode target --tid=1 --targetname iqn.2013-06.com.a.target:disk

显示：tgtadm --lld iscsi --op show --mode target

存储：tgtadm --lld iscsi --op new --mode=logicalunit --tid=1 --lun=1 --backing-store /dev/sda4

--lld [driver] --op new --mode=logicalunit --tid=[id] --lun=[lun] --backing-store [path]

验证：tgtadm --lld iscsi --op bind --mode=target --tid=1 --initiator-address=192.168.1.0/24

tgtadm --lld [driver] --op bind --mode=target --tid=[id] --initiator-address=[address]

④将配置添加到配置文件中，可以开机自动加载.

vim /etc/tgt/targets.conf

backing-store /dev/sda6

initiator-address 192.168.1.0/24

二：initiator（node1和node2）

cd /mnt/cdrom/Server

rpm -ivh iscsi-initiator-utils-6.2.0.871-0.10.el5.i386.rpm

service iscsi start

发现：iscsiadm --mode discovery --type sendtargets --portal 192.168.1.10

认证登录：iscsiadm --mode node --targetname iqn.2013-06.com.a.target:disk --portal 192.168.1.10:3260 --login

⑤Target端显示在线的用户情况

tgt-admin -s

Target 1: iqn.2013-06.com.a.target:disk

System information:

Driver: iscsi

State: ready

I_T nexus information:

I_T nexus: 1

Initiator: iqn.2013-06.com.a.realserver2

Connection: 0

IP Address: 192.168.1.200

I_T nexus: 2

Initiator: iqn.2013-06.com.a.realserver1

Connection: 0

IP Address: 192.168.1.100

LUN information:

LUN: 0

Type: controller

SCSI ID: deadbeaf1:0

SCSI SN: beaf10

Size: 0 MB

Online: Yes

Removable media: No

Backing store: No backing store

LUN: 1

Type: disk

SCSI ID: deadbeaf1:1

SCSI SN: beaf11

Size: 4178 MB

Online: Yes

Removable media: No

Backing store: /dev/sda6

Account information:

ACL information:

192.168.1.0/24

⑥node1和node2上查看本地的磁盘列表。

fdisk -l

Disk /dev/sdb: 4178 MB, 4178409984 bytes

129 heads, 62 sectors/track, 1020 cylinders

Units = cylinders of 7998 * 512 = 4094976 bytes

Disk /dev/sdb doesn't contain a valid partition table

-------------------------------------------------------------

Step7：将新的磁盘sdb格式为OCFS2群集文件系统.

-------------------------------------------------------------

①在两个节点上安装需要的软件包

yum localinstall ocfs2-2.6.18-164.el5-1.4.7-1.el5.i686.rpm \\

ocfs2-tools-1.4.4-1.el5.i386.rpm \\

ocfs2console-1.4.4-1.el5.i386.rpm

②对主配置文件进行配置.

方法一：手动创建主配置文件

mkdir /etc/ocfs2/

vim cluster.conf

node:

ip_port = 7777

ip_address = 192.168.102.101

number = 0

name = node1.a.com

cluster = ocfs2

node:

ip_port = 7777

ip_address = 192.168.102.102

number = 1

name = node2.a.com

cluster = ocfs2

cluster:

node_count = 2

name = ocfs2

进行节点间的配置同步.

scp -r /etc/ocfs2 node2:/etc/

方法二：GUI图形下进行配置

ocfs2console

③两个节点上分别加载o2cb模块，启动服务.

/etc/init.d/o2cb load

Loading module "configfs":OK

Mounting configfs filesystem at /config:OK

Loading module "ocfs2_nodemanager":OK

Loading module "ocfs2_dlm":OK

Loading module "ocfs2_dlmfs":OK

/etc/init.d/ocfs2 start

chkconfig ocfs2 on

/etc/init.d/o2cb online ocfs2

/etc/init.d/o2cb configure

Configuring the O2CB driver.

这将配置 O2CB 驱动程序的引导属性。以下问题将决定在引导时是否加载驱动程序。当前值将在方括号（“[]”）中显示。按而不键入答案将保留该当前值。Ctrl-C 将终止。

Load O2CB driver on boot (y/n) [n]:y

Cluster to start on boot (Enter "none" to clear) [ocfs2]:ocfs2

Writing O2CB configuration:OK

Loading module "configfs":OK

Mounting configfs filesystem at /config:OK

Loading module "ocfs2_nodemanager":OK

Loading module "ocfs2_dlm":OK

Loading module "ocfs2_dlmfs":OK

Mounting ocfs2_dlmfs filesystem at /dlm:OK

Starting cluster ocfs2:OK

/etc/init.d/o2cb status

Driver for "configfs": Loaded

Filesystem "configfs": Mounted

Driver for "ocfs2_dlmfs": Loaded

Filesystem "ocfs2_dlmfs": Mounted

Checking O2CB cluster ocfs2: Online

Heartbeat dead threshold = 31

Network idle timeout: 30000

Network keepalive delay: 2000

Network reconnect delay: 2000

Checking O2CB heartbeat: Active

④node1上格式化OCFS2文件系统

mkfs -t ocfs2 /dev/sdb

⑤两个节点上分别挂载

mount -t ocfs2 /dev/sdb /var/www/html

mount

/dev/sdb on /var/www/html type ocfs2 (rw,_netdev,heartbeat=local)

cd /var/www/html

echo "Welcome" >index.html

⑥两个节点上进行开机自动挂载

vim /etc/fstab

/dev/sdb /var/www/html ocfs2 defaults 0 0

-------------------------------------------------------------------

Step8:访问测试

--------------------

http://192.168.102.200

Welcome

转载于:https://blog.51cto.com/tywangpanpan/1222413

weixin_33690367

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫