MFS高可用(pacemaker+corosync+Fence解决mfsmaster)
1 MFS高可用
Pacemaker是整个高可用集群的控制中心,用来管理整个集群的资源状态行为,客户端通过 pacemaker来配置、管理、监控整个集群的运行状态。Pacemaker利用集群基础架构(Corosync或者 Heartbeat)提供的消息和集群成员管理功能,实现节点和资源级别的故障检测和资源恢复,从而最大程度保证集群服务的高可用。
Pacemaker仅是资源管理器,并不提供集群心跳信息,由于任何高可用集群都必须具备心跳监测机制,Pacemaker的心跳机制主要基于 Corosync或Heartbeat来实现
FENCE设备:集群可能会要求节点关闭电源以保证共享数据和资源恢复的完整性, Pacemaker引人了节点隔离机制,而隔离机制主要通过 STONITH进程实现。 STONITH是一种强制性的隔离措施, STONINH功能通常是依靠控制远程电源开关以关闭或开启节点来实现。在 Pacemaker中, STONITH设备被当成资源模块并被配置到集群信息 CIB中,从而使其故障情况能够被轻易地监控到。集群管理器要隔离某个节点时,只需 STONITHd的客户端简单地发出 Fencing某个节点的请求, STONITHD就会自动完成全部剩下的工作,即配置成为集群资源的 STONITH设备最终便会响应这个请求,并对节点做出 Fenceing操作
2 Master端的数据共享(iscsi)
iSCSI是一种存储设备远程映射技术,它可以将一个远程服务器上的存储设备映射到本地,并呈现为一个块设备。从普通用户的角度,映射过来的磁盘与本地安装的磁盘毫无差异。ISCSI通过TCP/IP技术,将存储设备端透过iscsi target(iscsi 目标端)功能,做成可以提供磁盘的服务器端,再透过iscsi initiator(iscsi初始化用户)功能,做成能够挂载使用使用iscsi设置来进行磁盘的应用了
(1)server2作为iscsi文件系统的服务端
- 解除挂载(server2作为chunk解除挂载)
[root@server2 ~]# umount /mnt/chunk1/
[root@server2 ~]# fdisk /dev/vdb
Welcome to fdisk (util-linux 2.23.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): d
Selected partition 1
Partition 1 is deleted
Command (m for help): p
Disk /dev/vdb: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0xddafcca9
Device Boot Start End Blocks Id System
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@server2 ~]# cat /proc/partitions
major minor #blocks name
252 0 20971520 vda
252 1 1048576 vda1
252 2 19921920 vda2
253 0 17821696 dm-0
253 1 2097152 dm-1
252 16 10485760 vdb
(2)安装iscsi文件系统
yum install targetcli -y
- 开始划分共享的设备
[root@server2 ~]# targetcli
Warning: Could not load preferences file /root/.targetcli/prefs.bin.
targetcli shell version 2.1.fb46
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.
/> ls
o- / ......................................................................................................................... [...]
o- backstores .............................................................................................................. [...]
| o- block .................................................................................................. [Storage Objects: 0]
| o- fileio ................................................................................................. [Storage Objects: 0]
| o- pscsi .................................................................................................. [Storage Objects: 0]
| o- ramdisk ................................................................................................ [Storage Objects: 0]
o- iscsi ............................................................................................................ [Targets: 0]
o- loopback ......................................................................................................... [Targets: 0]
/> /backstores/block create my_disk /dev/vdb
Created block storage object my_disk using /dev/vdb.
/> /iscsi create iqn.2020-11.org.westos:strage1
Created target iqn.2020-11.org.westos:strage1.
Created TPG 1.
Global pref auto_add_default_portal=true
Created default portal listening on all IPs (0.0.0.0), port 3260.
/> /iscsi/iqn.2020-11.org.westos:strage1/tpg1/luns create /backstores/block/my_disk
Created LUN 0.
/> /iscsi/iqn.2020-11.org.westos:strage1/tpg1/acls create iqn.2020-11.org.westos:westoskey
Created Node ACL for iqn.2020-11.org.westos:westoskey
Created mapped LUN 0.
(3)安装iscsi客户端(server1和server5)
yum install iscsi-initiator-utils.x86_64 -y
- key检查,编辑配置文件:vim /etc/iscsi/initiatorname.iscsi,和在服务器端设立的key一致
InitiatorName=iqn.2020-11.org.westos:westoskey
- 开启iscsid服务:
systemctl start iscsid.service
(4)发现设备和登陆设备
iscsiadm -m discovery -t st -p 172.25.12.2 ## 发现设备
iscsiadm -m node -T iqn.2020-11.org.westos:strage1 172.25.12.2 -l ## 登陆2共享出来的设备
(5)给共享的设备分区
[root@server1 ~]# fdisk /dev/sda
Welcome to fdisk (util-linux 2.23.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-20971519, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-20971519, default 20971519):
Using default value 20971519
Partition 1 of type Linux and of size 10 GiB is set
Command (m for help): p
Disk /dev/sda: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0xddafcca9
Device Boot Start End Blocks Id System
/dev/sda1 2048 20971519 10484736 83 Linux
Command (m for help): w
The partition table has been altered!
fdisk -l
- 格式化分区:
mkfs.ext4 /dev/sda1
- 挂载格式化后的分区:
/dev/sda1
,将(server1)master端的数据目录同步到server5的master端
[root@server1 ~]# mount /dev/sda1 /mnt
[root@server1 ~]# cd /var/lib/mfs/ ## mfs的数据目录
[root@server1 mfs]# ls
changelog.0.mfs changelog.3.mfs metadata.crc metadata.mfs.back.1 stats.mfs
changelog.2.mfs changelog.4.mfs metadata.mfs metadata.mfs.empty
[root@server1 mfs]# cp -p * /mnt/ ## 带权限拷贝/var/lib/mfs的所有数据文件到/dev/sda1上
[root@server1 mfs]# cd /mnt/
[root@server1 mnt]# ls
changelog.0.mfs changelog.3.mfs lost+found metadata.mfs metadata.mfs.empty
changelog.2.mfs changelog.4.mfs metadata.crc metadata.mfs.back.1 stats.mfs
[root@server1 mnt]# chown mfs.mfs /mnt/ ## 当目录属于mfs用户和组时,才能正常使用
[root@server1 mnt]# cd
[root@server1 ~]# umount /mnt/
[root@server1 ~]# mount /dev/sda1 /var/lib/mfs/ ## 将/dev/sda1上的数据挂载到/var/lib/mfs/目录下
[root@server1 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/rhel-root 17811456 1275780 16535676 8% /
devtmpfs 495352 0 495352 0% /dev
tmpfs 507372 54192 453180 11% /dev/shm
tmpfs 507372 13292 494080 3% /run
tmpfs 507372 0 507372 0% /sys/fs/cgroup
/dev/vda1 1038336 135076 903260 14% /boot
tmpfs 101476 0 101476 0% /run/user/0
/dev/sda1 10189076 41788 9606668 1% /var/lib/mfs
[root@server1 ~]# cd /var/lib/mfs/ ##
[root@server1 mfs]# ls
changelog.0.mfs changelog.3.mfs lost+found metadata.mfs metadata.mfs.empty
changelog.2.mfs changelog.4.mfs metadata.crc metadata.mfs.back.1 stats.mfs
服务开启成功,就说明数据文件拷贝成功,共享磁盘可以正常使用
systemctl restart moosefs-master
- server1解除/dev/sda1的挂载:
umount /dev/sda1
(6)server5共享server1的mfs的数据
- 同步分区:
partprobe
[root@server5 ~]# mount /dev/sda1 /var/lib/mfs ## 首动挂载共享磁盘,同步server1的master数据
[root@server5 ~]# cd /var/lib/mfs
[root@server5 mfs]# ls
changelog.0.mfs changelog.3.mfs lost+found metadata.mfs.back metadata.mfs.empty
changelog.2.mfs changelog.4.mfs metadata.crc metadata.mfs.back.1 stats.mfs
[root@server5 mfs]# systemctl restart moosefs-master ## 服务可以开启,数据共享成功
[root@server5 mfs]# umount /dev/sda1 ## 解除挂载
server1和server5
- 编辑mfs的服务配置文件:
vim /usr/lib/systemd/system/moosefs-master.service
- 重新加载配置文件:
systemctl daemon-reload
- 重启服务:
systemctl restart moosefs-master
3 Pacemake的部署(高可用)
(1)编辑安装Pacemake的仓库文件:
[dvd]
name=rhel7.6
baseurl=http://172.25.12.250/rhel7.6
gpgcheck=0
[dvd1]
name=HighAvailability
baseurl=http://172.25.12.250/rhel7.6/addons/HighAvailability
gpgcheck=0
(2)安装软件(server1和server5)
yum install -y pacemaker pcs psmisc policycoreutils-python -y
- 启动:
systemctl enable --now pcsd.service
(2)集群各节点之间进行认证
- 在集群各节点上给hacluster用户设定相同的密码:
echo westos|passwd --stdin hacluster
,hacluster用户在安装pcs时自动创建 - 也可以新建一个用户,但是在集群各节点上必须有一个一模一样的账号,而且账号的密码也必须相同
- 验证hacluster用户:
pcs cluster auth server1 server5
,输入的用户名必须为pcs自动创建的hacluster用户,否则不能添加成功
(3)创建并启动名为mycluster的集群,其中server1、server4为集群成员
pcs cluster setup --name mycluster server1 server5
- 开机自动开启pacemaker(资源管理)和corosync(心跳监测)服务
pcs cluster start --all
pcs cluster enable --all
- 查看节点状态:
corosync-cfgtool -s
- 查看集群的状态:
pcs status
- 检查集群服务是否正常:
crm_verify -LV
- 有 Fencing设备时,禁用STONITH 组件功能在 stonith-enabled=“false”
pcs property set stonith-enabled=false
(4) pcs创建集群资源
- 给集群添加虚拟的VIP资源,利用ocf💓IPaddr定义一个vip资源
pcs resource create vip ocf:heartbeat:IPaddr2 ip=172.25.12.100 cidr_netmask=32 op monitor interval=30s
- 编辑dns文件(server1~server5):
vim /etc/hosts
172.25.12.100 mfsmaster
- 创建mfsdata的资源
pcs resource create mfsdata ocf:heartbeat:Filesystem device=/dev/sda1 directory=/var/lib/mfs/ fstype=ext4 op monitor interval=30s
- 创建mfsd文件系统的资源
pcs resource create mfsmaster systemd:moosefs-master op monitor interval=60s
- 把服务集中在一台主机上,其中添加的顺序是有严格限制的,按照资源添加的顺序,进行添加
pcs resource group add mfsgroup vip mfsdata mfsmaster
- 此时服务已集中在一台主机上了
(5)测试
- 让server1失效并处于待机状态,发现backup master上的服务迁移到master上也就是server5
pcs node standby
- 浏览器端测试:
- 让server1处于online状态:
pcs node unstandby
,server1设置为 unstandby状态以后,server1的状态为Online,但是运行在server5上的资源并不会迁移
4 Fence的部署
(1)真机部署fense
- 关闭防火墙:
systemctl stop firewalld.service
- 关闭selinux:
vim /etc/selinux/config
(2)master端(server1、server5):yum install fence-virt.x86_64 -y
- 查询fence设备的元数据:
stonith_admin -M -a fence_xvm
- 在server1和server4中新建目录:
mkdir /etc/cluster
- 并将在真实主机创建的认证文件发送到server1和server4中默认key文件存放的的位置
scp root@172.25.12.250:/etc/cluster/fence_xvm.key /etc/cluster/fence_xvm.key
(3)在pacemaker中添加fence资源
pcs stonith create vmfence fence_xvm pcmk_host_map="server1:mfs1;server5:mfs5" op monitor interval=60s
- 注意: 在做虚拟机名和主机名映射关系时,映射图的顺序:主机名:虚拟机名
(4)启用STONITH组件的功能:
pcs property set stonith-enabled=true
- 验证群集配置信息:
crm_verify -LV
(5)测试:
- 内核崩溃(server1):
echo c > /proc/sysrq-trigger