实验环境:
高可用节点:[server1、server2] (vm)
后端服务:[server3、server4] (vm)
fence的服务节点:foundation5(真实主机)
此次实验内容是在双机热备的基础上添加内部fence实现的一个防止脑裂的保护机制。
高可用是解决服务集群中单点故障问题的发生,但是服务器也会出现一些不可预知的情况。比如说高可用集群中调度节点假死现象(或系统崩溃),此时假死的调度节点不向外发送心跳,备机开始接管资源。但是此时的假死节点上的资源并没有进行释放,这种情况下,就会造成存储的脑裂现象。
Fence设备
在集群中为了防止服务器出现“脑裂”的现象,集群中一般会添加Fence设备,有的是使用服务器本身的的硬件接口称为内部Fence,有的则是外部电源设备称为外部Fence(powerswitch),当一台服务出现问题响应超时的时候,Fence设备会对服务器直接发出硬件管理指令,将服务器断电重启或关机,并向其他节点发出信号接管服务。
fence的部署
监听端(真实主机)
软件安装
首先在控制vm的真实主机上安装fence的服务组件:
fence-virtd.x86_64、
fence-virtd-multicast.x86_64
fence-virtd-libvirt.x86_64
修改fence配置
通过命令fence_virtd -c进行对fence配置的修改
[root@foundation5 ~]# fence_virtd -c
Module search path [/usr/lib64/fence-virt]:
Available backends:
libvirt 0.3
Available listeners:
multicast 1.2
Listener modules are responsible for accepting requests
from fencing clients.
Listener module [multicast]:
The multicast listener module is designed for use environments
where the guests and hosts may communicate over a network using
multicast.
The multicast address is the address that a client will use to
send fencing requests to fence_virtd.
Multicast IP Address [225.0.0.12]:
Using ipv4 as family.
Multicast IP Port [1229]:
Setting a preferred interface causes fence_virtd to listen only
on that interface. Normally, it listens on all interfaces.
In environments where the virtual machines are using the host
machine as a gateway, this *must* be set (typically to virbr0).
Set to 'none' for no interface.
Interface [virbr0]: br0
The key file is the shared key information which is used to
authenticate fencing requests. The contents of this file must
be distributed to each physical host and virtual machine within
a cluster.
Key File [/etc/cluster/fence_xvm.key]:
Backend modules are responsible for routing requests to
the appropriate hypervisor or management layer.
Backend module [libvirt]:
The libvirt backend module is designed for single desktops or
servers. Do not use in environments where virtual machines
may be migrated between hosts.
Libvirt URI [qemu:///system]:
Configuration complete.
=== Begin Configuration ===
backends {
libvirt {
uri = "qemu:///system";
}
}
listeners {
multicast {
port = "1229";
family = "ipv4";
interface = "br0";
address = "225.0.0.12";
key_file = "/etc/cluster/fence_xvm.key";
}
}
fence_virtd {
module_path = "/usr/lib64/fence-virt";
backend = "libvirt";
listener = "multicast";
}
=== End Configuration ===
Replace /etc/fence_virt.conf with the above [y/N]? y
此时,还缺少认证文件,按照刚才配置的内容,Key File :[/etc/cluster/fence_xvm.key],创建此文件:
取随机数放入此文件,完成认证文件的创建:
[root@foundation5 cluster]# dd if=/dev/urandom of=fence_xvm.key bs=128 count=1
1+0 records in
1+0 records out
128 bytes copied, 0.000145565 s, 879 kB/s
[root@foundation5 cluster]# cat fence_xvm.key
_o�6M0�iN7��L‘&k���ig�C+MW�9�'K�,c�qyI��#@P�������t�W�3!v�i�~�b������ͣp������s���Q��|+ՠ˞���w�
�8^G'��[root@foundation5 cluster]#
对fence服务进行重启:
systemctl restart fence_virtd.service
高可用节点
软件安装:
fence-virt.x86_64
yum install fence-virt.x86_64
查看所有可用的 fence 的列表:
fence的添加
[root@server1 ~]# pcs stonith list
fence_virt - Fence agent for virtual machines
fence_xvm - Fence agent for virtual machines
这里我们使用的是fence_xvm
将在真实主机创建的认证文件发送到高可用节点的/etc/cluster
[root@foundation5 cluster]# scp /etc/cluster/fence_xvm.key server1:/etc/cluster/
fence_xvm.key 100% 128 250.5KB/s 00:00
[root@foundation5 cluster]# scp /etc/cluster/fence_xvm.key server2:/etc/cluster/
fence_xvm.key 100% 128 301.0KB/s 00:00
在pacemaker中添加fence资源:
[root@server1 ~]# pcs stonith create vmfence fence_xvm pcmk_host_map="server1:server1;server2:server2" op monitor interval=60s
[root@server1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Aug 9 10:46:55 2020
Last change: Sun Aug 9 10:46:50 2020 by root via cibadmin on server1
2 nodes configured
3 resources configured
Online: [ server1 server2 ]
Full list of resources:
Resource Group: hagroup
vip (ocf::heartbeat:IPaddr2): Started server1
haproxy (systemd:haproxy): Started server1
vmfence (stonith:fence_xvm): Started server2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
注意:在做虚拟机名和主机名映射关系时,顺序为 主机名:虚拟机名
此时就完成了fence的部署!
测试:手动测试fence运行情况:
当前pacemaker资源为vip和haproxy组成的资源组gagroup
[root@server2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Aug 9 11:03:40 2020
Last change: Sun Aug 9 10:46:50 2020 by root via cibadmin on server1
2 nodes configured
3 resources configured
Online: [ server1 server2 ]
Full list of resources:
Resource Group: hagroup
vip (ocf::heartbeat:IPaddr2): Started server2
haproxy (systemd:haproxy): Started server2
vmfence (stonith:fence_xvm): Started server1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@server2 ~]# fence_xvm -H server2
packet_write_wait: Connection to 172.25.5.2 port 22: Broken pipe
[kiosk@foundation5 Desktop]$ ssh root@server2
root@server2's password:
Last login: Sun Aug 9 10:57:41 2020 from foundation5
[root@server2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Aug 9 11:04:58 2020
Last change: Sun Aug 9 10:46:50 2020 by root via cibadmin on server1
2 nodes configured
3 resources configured
Online: [ server1 server2 ]
Full list of resources:
Resource Group: hagroup
vip (ocf::heartbeat:IPaddr2): Started server1
haproxy (systemd:haproxy): Started server1
vmfence (stonith:fence_xvm): Started server2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
资源成功迁移!断电自启成功实现!
可以看到,fence和资源不会同时运行在同一主机。[root@server2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Aug 9 11:03:40 2020
Last change: Sun Aug 9 10:46:50 2020 by root via cibadmin on server1
2 nodes configured
3 resources configured
Online: [ server1 server2 ]
Full list of resources:
Resource Group: hagroup
vip (ocf:💓IPaddr2): Started server2
haproxy (systemd:haproxy): Started server2
vmfence (stonith:fence_xvm): Started server1
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@server2 ~]# fence_xvm -H server2
packet_write_wait: Connection to 172.25.5.2 port 22: Broken pipe
[kiosk@foundation5 Desktop]$ ssh root@server2
root@server2’s password:
Last login: Sun Aug 9 10:57:41 2020 from foundation5
[root@server2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: server1 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Sun Aug 9 11:04:58 2020
Last change: Sun Aug 9 10:46:50 2020 by root via cibadmin on server1
2 nodes configured
3 resources configured
Online: [ server1 server2 ]
Full list of resources:
Resource Group: hagroup
vip (ocf:💓IPaddr2): Started server1
haproxy (systemd:haproxy): Started server1
vmfence (stonith:fence_xvm): Started server2
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled