高可用架构有两个核心部分,一个是心跳检测,即检测服务器的运行情况,另一个则是资源转移,主要负责将公共资源再这个成服务器与故障服务器之间搬移。Pacemaker是一个集群管理器,由于corosync没有通用的资源管理器,只能借助Pacemaker来实现资源转移,整个运行过程就是心跳检测不断在网路中检测服务器的运行情况。,一旦检测的设备故障,就进行资源转移,以保证资源的高可用性。
Corosync 是心跳信息传输层,它在传递信息时可通过简单的配置文件来定义信息的传递方式和协议等。它可以实现HA心跳信息传输的功能。
下面就来安装pacemaker和 corosync
[root@server1 ~]# yum install -y pacemaker corosync
[root@server1 ~]# yum install -y crmsh-1.2.6-0.rc2.2.1.x86_64.rpm pssh-2.3.1-2.1.x86_64.rpm
[root@server1 ~]# cd /etc/corosync/
[root@server1 corosync]# ls
amf.conf.example corosync.conf.example.udpu uidgid.d
corosync.conf.example service.d
[root@server1 corosync]# cp corosync.conf.example corosync.conf
修改 corosync.conf的配置文件:
[root@server1 corosync]# vim corosync.conf
将修改好的配置文件直接传给server2:
[root@server1 corosync]# scp corosync.conf 172.25.30.2:/etc/corosync/
打开server1上的corosync:
[root@server1 corosync]# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
可以查看日志:
[root@server1 corosync]# tail -f /var/log/messages
Jul 19 14:18:00 server1 pengine[3576]: notice: stage6: Delaying fencing operations until there are resources to manage
Jul 19 14:18:00 server1 pengine[3576]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-1.bz2
Jul 19 14:18:00 server1 pengine[3576]: notice: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues.
Jul 19 14:18:00 server1 crmd[3577]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on server2.example.com - no waiting
Jul 19 14:18:00 server1 crmd[3577]: notice: run_graph: Transition 1 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1.bz2): Complete
server2上的操作基本类似于server1:
[root@server2 ~]# yum install -y pacemaker corosync
[root@server2 ~]# yum install crmsh-1.2.6-0.rc2.2.1.x86_64.rpm pssh-2.3.1-2.1.x86_64.rpm -y
[root@server2 ~]# cd /etc/corosync/
[root@server2 corosync]# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
#查看有没有端口冲突:
[root@server1 corosync]# crm_mon
Connection to the CIB terminated
Reconnecting...[root@server1 corosync]#
[root@server1 corosync]# crm_verify -LV
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
增加一个fence:
[root@foundation30 cluster]# systemctl start fence_virtd
[root@foundation30 cluster]# systemctl status fence_virtd
fence_virtd.service - Fence-Virt system host daemon
Loaded: loaded (/usr/lib/systemd/system/fence_virtd.service; disabled)
Active: active (running) since Tue 2016-07-19 15:11:21 CST; 9s ago
Process: 4314 ExecStart=/usr/sbin/fence_virtd $FENCE_VIRTD_ARGS (code=exited, status=0/SUCCESS)
Main PID: 4315 (fence_virtd)
CGroup: /system.slice/fence_virtd.service
└─4315 /usr/sbin/fence_virtd -w
[root@server1 ~]# cd /etc/cluster/
[root@server1 cluster]# ls
cman-notify.d fence_xvm.key
[root@server1 cluster]# stonith_admin -a fence_xvm -M
[root@server1 cluster]# yum provides */fence_xvm
[root@server1 cluster]# crm
crm(live)# configure
crm(live)configure# show
node server1.example.com
node server2.example.com
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2"
crm(live)configure# primitive vmfence stonith:fence_xvm params pcmk_host_map="server1.example.com:base1;server2.example.com:base2" op monitor interval=1min
crm(live)configure# commit
Server2这边自动同步过来:
[root@server2 cluster]# crm
crm(live)# configure
crm(live)configure# show
node server1.example.com
node server2.example.com
primitive vmfence stonith:fence_xvm \
params pcmk_host_map="server1.example.com:base1;server2.example.com:base2" \
op monitor interval="1min"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2"
此时可以看到server1上的vmfence已经开启:
[root@server2 cluster]# crm_mon
加一个VIP:
[root@server1 cluster]# crm
crm(live)# configure
crm(live)configure# primitive vip ocf:heartbeat:IPaddr2 params ip=172.25.30.100 cidr_netmask=32 op monitor interval=30s
crm(live)configure# commit
[root@server1 cluster]# crm
crm(live)# configure
crm(live)configure# show
node server1.example.com
node server2.example.com
primitive vip ocf:heartbeat:IPaddr2 \
params ip="172.25.30.100" cidr_netmask="32" \
op monitor interval="30s"
primitive vmfence stonith:fence_xvm \
params pcmk_host_map="server1.example.com:base1;server2.example.com:base2" \
op monitor interval="1min"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2"
[root@server2 cluster]# crm_mon
但是,当server2上的corosync被停止时,server1不能监控到fence_xvm的状态:
所以此时需要property no-quorum-policy=ignore命令来改善这一缺陷,使得,当server2上的corosync被停止时,server1可以自动接管VIP,并且得以健康检测。
[root@server1 cluster]# crm
crm(live)# configure
crm(live)configure# show
node server1.example.com
node server2.example.com
primitive vip ocf:heartbeat:IPaddr2 \
params ip="172.25.30.100" cidr_netmask="32" \
op monitor interval="30s"
primitive vmfence stonith:fence_xvm \
params pcmk_host_map="server1.example.com:base1;server2.example.com:base2" \
op monitor interval="1min"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
no-quorum-policy="ignore"
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# commit
crm(live)configure# show
node server1.example.com
node server2.example.com
primitive vip ocf:heartbeat:IPaddr2 \
params ip="172.25.30.100" cidr_netmask="32" \
op monitor interval="30s"
primitive vmfence stonith:fence_xvm \
params pcmk_host_map="server1.example.com:base1;server2.example.com:base2" \
op monitor interval="1min"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
no-quorum-policy="ignore"
[root@server2 cluster]# /etc/init.d/corosync stop
[root@server1 cluster]# crm_mon
当server2上的corosync再次被打开时,server2自动从server1上接管VIP,即使server1上的corosync状态一直为开启的。
添加Apache:
分别修改server1上和server2上的httpd 的配置文件:
[root@server2 ~]# vim /etc/httpd/conf/httpd.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
[root@server1 ~]# vim /etc/httpd/conf/httpd.conf
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1
</Location>
[root@server1 ~]# crm
crm(live)# configure
crm(live)configure# show
node server1.example.com
node server2.example.com
primitive vip ocf:heartbeat:IPaddr2 \
params ip="172.25.30.100" cidr_netmask="32" \
op monitor interval="30s"
primitive vmfence stonith:fence_xvm \
params pcmk_host_map="server1.example.com:base1;server2.example.com:base2" \
op monitor interval="1min"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
no-quorum-policy="ignore"
crm(live)configure# primitive website ocf:heartbeat:apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=1min
crm(live)configure# commit
WARNING: website: default timeout 20s for start is smaller than the advised 40s
WARNING: website: default timeout 20s for stop is smaller than the advised 60s
crm(live)configure# commit
INFO: apparently there is nothing to commit
INFO: try changing something first
crm(live)configure# colocation website-with-ip inf: website vip
crm(live)configure# commit
此时server2为主节点,VIP以及website均由server2管理,但当server2上的corosync被关掉时,server1自动接管VIP以及website。当server2上的corosync被再次打开时,server1再次从server1上将VIP以及website接管。
[root@server2 ~]# crm_mon
[root@server2 ~]#/etc/init.d/corosync stop
[root@server2 ~]# crm_mon
[root@server3 ~]# dd if=/dev/zero of=/dev/vol0/demo bs=1024 count=1
dd: opening `/dev/vol0/demo': No such file or directory
[root@server3 ~]# dd if=/dev/zero of=/dev/vda bs=1024 count=1
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 0.000393783 s, 2.6 MB/s
[root@server3 ~]# /etc/init.d/tgtd start
Starting SCSI target daemon: [ OK ]
[root@server3 ~]# vim /etc/tgt/targets.conf
[root@server3 ~]# tgt-admin -s
[root@server1 ~]# iscsiadm -m discovery -t st -p 172.25.30.3
Starting iscsid: [ OK ]
172.25.30.3:3260,1 iqn.2016-07.com.example:server.disk
[root@server1 ~]# iscsiadm -m node -l
Logging in to [iface: default, target: iqn.2016-07.com.example:server.disk, portal: 172.25.30.3,3260] (multiple)
Login to [iface: default, target: iqn.2016-07.com.example:server.disk, portal: 172.25.30.3,3260] successful.
[root@server1 ~]# fdisk -l
server2上配置和server1上的相同。
[root@server1 ~]# iscsiadm -m discovery -t st -p 172.25.30.3
Starting iscsid: [ OK ]
172.25.30.3:3260,1 iqn.2016-07.com.example:server.disk
[root@server1 ~]# iscsiadm -m node -l
Logging in to [iface: default, target: iqn.2016-07.com.example:server.disk, portal: 172.25.30.3,3260] (multiple)
Login to [iface: default, target: iqn.2016-07.com.example:server.disk, portal: 172.25.30.3,3260] successful.
[root@server1 ~]# fdisk -l
此时,server1和server2上均出现一块共享盘/dev/sdb,然后在server1上对该共享磁盘进行分区格式化。
[root@server1 ~]# fdisk -cu /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xbb3b10c8.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (2048-16777215, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-16777215, default 16777215):
Using default value 16777215
Command (m for help): p
Disk /dev/sdb: 8589 MB, 8589934592 bytes
64 heads, 32 sectors/track, 8192 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xbb3b10c8
Device Boot Start End Blocks Id System
/dev/sdb1 2048 16777215 8387584 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
分区完成后,生成一块/dev/sdb1磁盘,在server2上无需再分区,只需要同步便可得到该磁盘。
[root@server2 ~]# partprobe
[root@server2 ~]# cat /proc/partitions
major minor #blocks name
8 0 8388608 sda
8 1 512000 sda1
8 2 7875584 sda2
253 0 7036928 dm-0
253 1 835584 dm-1
8 16 8388608 sdb
8 17 8387584 sdb1
[root@server2 ~]# ll /dev/sdb1
brw-rw---- 1 root disk 8, 17 7月 19 16:17 /dev/sdb1
同步完成后在server1上对/dev/sdb1进行格式化
[root@server1 ~]# mkfs.ext4 /dev/sdb1
格式化完成后将该数据库加入
[root@server1 ~]# crm
crm(live)# configure
crm(live)configure# primitive webdata ocf:heartbeat:Filesystem params device=/dev/sdb1 directory=/var/www/html fstype=ext4 op monitor interval=1min
crm(live)configure# commit
WARNING: webdata: default timeout 20s for start is smaller than the advised 60
WARNING: webdata: default timeout 20s for stop is smaller than the advised 60
WARNING: webdata: default timeout 20s for monitor is smaller than the advised 40
crm(live)configure# show
node server1.example.com
node server2.example.com
primitive vip ocf:heartbeat:IPaddr2 \
params ip="172.25.30.100" cidr_netmask="32" \
op monitor interval="30s"
primitive vmfence stonith:fence_xvm \
params pcmk_host_map="server1.example.com:base1;server2.example.com:base2" \
op monitor interval="1min"
primitive webdata ocf:heartbeat:Filesystem \
params device="/dev/sdb1" directory="/var/www/html" fstype="ext4" \
op monitor interval="1min"
primitive website ocf:heartbeat:apache \
params configfile="/etc/httpd/conf/httpd.conf" \
op monitor interval="1min"
colocation website-with-ip inf: website vip
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
no-quorum-policy="ignore"
[root@server2 ~]# crm_mon
crm(live)configure# group apacheservice vip webdata website
INFO: resource references in colocation:website-with-ip updated
INFO: resource references in colocation:website-with-ip updated
crm(live)configure# cd
There are changes pending. Do you want to commit them? Yes
[root@server2 ~]# crm_mon
crm(live)# resource
crm(live)resource# show