最新最全的~教你如何搭建高可用Lustre双机集群

碧蓝幻想

已于 2023-08-29 10:47:21 修改

阅读量2.3k

点赞数 3

分类专栏： Lustre 文章标签：服务器集群 Lustre 高可用 pacemaker corosync 文件系统

于 2023-08-23 20:42:09 首次发布

本文链接：https://blog.csdn.net/qq_56776641/article/details/132460107

版权

Lustre 专栏收录该内容

5 篇文章

订阅专栏

1.搭建双机lustre高可用集群：

1.环境说明：

主机名	系统	挂载情况	IP地址	Lustre集群名	内存
mds001	Centos7.9	（共享磁盘）1个mgs，1个MDT，2个OST	192.168.10.21/209.21	global	1G
mds002	Centos7.9	（共享磁盘）1个mgs，1个MDT，2个OST	192.168.10.22/209.22	global	1G
client	Centos7.9	无	192.168.10.41	无	1G

mds01，mds02作为mds的同时也做oss，做5个共享盘：1个mgs，2个mdt，2个ost，搭建一套高可用的lustre服务集群

2.两个虚拟机使用共享磁盘：

前提：两台虚拟机没有拍摄快照
在mds001主机中：
- 添加五块5G的硬盘
  
  SCSI > 创建新虚拟磁盘 > 指定磁盘容量，立即分配所有磁盘空间，将虚拟磁盘存储为单个文件

在mds001和mds002的虚拟机目录下，找个后缀名为vmx的文件，在文件末尾添加一下内容：

scsi1.sharedBus = "virtual"
disk.locking = "false"
diskLib.dataCacheMaxSize = "0"
diskLib.dataCacheMaxReadAheadSize = "0"
diskLib.dataCacheMinReadAheadSize = "0"
diskLib.dataCachePageSize = "4096"
diskLib.maxUnsyncedWrites = "0"
disk.EnableUUID = "TRUE"

重启两台虚拟机发现，添加成功

[root@mds001 ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   20G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   19G  0 part 
  ├─centos-root 253:0    0   17G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
sdb               8:16   0    5G  0 disk 
sdc               8:32   0    5G  0 disk 
sdd               8:48   0    5G  0 disk 
sde               8:64   0    5G  0 disk 
sdf               8:80   0    5G  0 disk 
sr0              11:0    1  9.5G  0 rom  /mnt/cdrom

3.安装和配置Lustre：

两台主机都需要下载OSS服务器所需要的包：E2fsprogs包只是在Ext4的原版RPM包基础上增加了对Lustre⽀持

mkdir ~/e2fsprogs && cd ~/e2fsprogs
wget -c -r -nd https://downloads.whamcloud.com/public/e2fsprogs/1.44.5.wc1/el7/RPMS/x86_64/
rm -rf index.html* unknown.gif *.gif sha256sum

全部rpm安装：

[root@mds001 e2fsprogs]# cd ~/e2fsprogs && rpm -Uvh *
准备中...                          ################################# [100%]
正在升级/安装...
   1:libcom_err-1.42.12.wc1-4.el7.cent################################# [  8%]
   2:e2fsprogs-libs-1.42.12.wc1-4.el7.################################# [ 15%]
   3:libcom_err-devel-1.42.12.wc1-4.el################################# [ 23%]
   4:libss-1.42.12.wc1-4.el7.centos   ################################# [ 31%]
   5:e2fsprogs-1.42.12.wc1-4.el7.cento################################# [ 38%]
   6:libss-devel-1.42.12.wc1-4.el7.cen################################# [ 46%]
   7:e2fsprogs-devel-1.42.12.wc1-4.el7################################# [ 54%]
   8:e2fsprogs-static-1.42.12.wc1-4.el################################# [ 62%]
   9:e2fsprogs-debuginfo-1.42.12.wc1-4################################# [ 69%]
正在清理/删除...
  10:e2fsprogs-1.42.9-19.el7          ################################# [ 77%]
  11:e2fsprogs-libs-1.42.9-19.el7     ################################# [ 85%]
  12:libss-1.42.9-19.el7              ################################# [ 92%]
  13:libcom_err-1.42.9-19.el7         ################################# [100%]

两台主机都需要下载MDS服务器所需要的包：

wget命令参数	说明
-c	断点续传
-r	递归下载
-nd	不分层，所有文件下载到当前目录下

rpm包	说明
kernel-*.el7_lustre.x86_64.rpm	带 Lustre 补丁的 Linux 内核
kmod-lustre-*.el7.x86_64.rpm	Lustre 补丁内核模块
kmod-lustre-osd-ldiskfs-*.el7.x86_64.rpm	基于 ldiskfs 的 Lustre 后端文件系统工具
lustre-*.el7.x86_64.rpm	Lustre 软件命令行工具
lustre-osd-ldiskfs-mount-*.el7.x86_64.rpm	基于ldiskfs 的 mount.lustre和mkfs。lustre相关帮助文档

mkdir ~/lustre2.12.1 && cd ~/lustre2.12.1
yum install -y wget
wget \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kernel-3.10.0-957.10.1.el7_lustre.x86_64.rpm \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kmod-lustre-2.12.1-1.el7.x86_64.rpm \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kmod-lustre-osd-ldiskfs-2.12.1-1.el7.x86_64.rpm \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/lustre-2.12.1-1.el7.x86_64.rpm \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/lustre-osd-ldiskfs-mount-2.12.1-1.el7.x86_64.rpm

安装依赖（否则报错error: Failed dependencies:）：

yum clean all && yum repolist
yum install -y linux-firmware dracut selinux-policy-targeted kexec-tools libyaml perl

全部rpm安装：(如果无法安装就强行安装)
```
cd ~/lustre2.12.1 && rpm -ivh *.rpm --force
```
重启服务器：
```
init 6
```

检查内核：

[root@master ~]# uname -r
3.10.0-957.el7_lustre.x86_64

加载Lustre模块（此为临时加载，重启失效）：

[root@master ~]# modprobe lustre && lsmod | grep lustre
lustre                758679  0 
lmv                   177987  1 lustre
mdc                   232938  1 lustre
lov                   314581  1 lustre
ptlrpc               2264705  7 fid,fld,lmv,mdc,lov,osc,lustre
obdclass             1962422  8 fid,fld,lmv,mdc,lov,osc,lustre,ptlrpc
lnet                  595941  6 lmv,osc,lustre,obdclass,ptlrpc,ksocklnd
libcfs                421295  11 fid,fld,lmv,mdc,lov,osc,lnet,lustre,obdclass,ptlrpc,ksocklnd

查看lustre版本：

[root@mds001 ~]# modinfo lustre
filename:       /lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/lustre/fs/lustre.ko
license:        GPL
version:        2.12.1
description:    Lustre Client File System
author:         OpenSFS, Inc. <http://www.lustre.org/>
retpoline:      Y
rhelversion:    7.6
srcversion:     E50D950B04B4044ABCBCFA3
depends:        obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov
vermagic:       3.10.0-957.10.1.el7_lustre.x86_64 SMP mod_unload modversions

只有在某个节点挂载上了，才表明MGS，MDT或者是OST被创建了

五个硬盘都在mds001服务器上格式化，因为是共享硬盘，所以在mds002服务器上可以查看到格式化类型：

格式化参数	说明
--fsname	设置Lustre集群的名称，Lustre文件系统的标识，必须唯一
--servicenode	mgs节点的IP地址，Lnet网络
--mgsnode	mgs节点的IP地址，Lnet网络
--mgs	将分区格式化为MGS，MGS（ManaGe Server）是⽤来记录整个Lustre状态的服务
--mdt	将分区格式化为MDT，MDT（MetaData Target）是存放Lustre元数据服务的设备
--ost	将分区格式化为OST，OST（Object Storage Target）则是存储Lustre数据的设备
--index	设置设备的在lustre集群中的标签值，如：--mdt --index=1，LABEL="global-MDT0001"，在同集群具有唯一性
--reformat	跳过检查，防止格式化操作清除已有的数据
--replace	替换

# 关于MDT和OST，--index的值要从0开始，不然系统/var/log/messages日志显示序列号等待错误

mkfs.lustre --mgs --servicenode=192.168.209.21@tcp2 --servicenode=192.168.209.22@tcp2 --backfstype=ldiskfs --reformat /dev/sdb
mkfs.lustre --fsname global --mdt --index=0 --servicenode=192.168.209.24@tcp2 --servicenode=192.168.209.25@tcp2 --mgsnode=192.168.209.24@tcp2 --mgsnode=192.168.209.25@tcp2 --backfstype=ldiskfs --reformat /dev/sdc
mkfs.lustre --fsname global --mdt --index=1 --servicenode=192.168.209.24@tcp2 --servicenode=192.168.209.25@tcp2 --mgsnode=192.168.209.24@tcp2 --mgsnode=192.168.209.25@tcp2 --backfstype=ldiskfs --reformat /dev/sdd
mkfs.lustre --fsname global --ost --index=0 --servicenode=192.168.209.24@tcp2 --servicenode=192.168.209.25@tcp2 --mgsnode=192.168.209.24@tcp2 --mgsnode=192.168.209.25@tcp2 --backfstype=ldiskfs --reformat /dev/sde
mkfs.lustre --fsname global --ost --index=1 --servicenode=192.168.209.24@tcp2 --servicenode=192.168.209.25@tcp2 --mgsnode=192.168.209.24@tcp2 --mgsnode=192.168.209.25@tcp2 --backfstype=ldiskfs --reformat /dev/sdf

在mds002服务器上可以查看格式化类型：

[root@mds002 ~]# blkid /dev/sdb
/dev/sdc: LABEL="global:MDT0000" UUID="d99a724e-d199-4f59-9a4e-7164882f754b" TYPE="ext4"

在corosync中创建的挂载资源需要创建目录：

[root@mds001 ~]# mkdir /mnt/mgs;mkdir /mnt/mdt1;mkdir /mnt/mdt2;mkdir /mnt/ost1;mkdir /mnt/ost2
[root@mds002 ~]# mkdir /mnt/mgs;mkdir /mnt/mdt1;mkdir /mnt/mdt2;mkdir /mnt/ost1;mkdir /mnt/ost2

4.搭建时间同步chrony：

两个节点都并且连接到阿里云时间服务器，安装：
```
[root@mds001 ~]# yum -y install chrony
```

启动进程：

[root@mds001 ~]# systemctl enable chronyd;systemctl start chronyd

修改配置文件/etc/chrony.conf

[root@mds001 ~]# sed -i '/^server [0-9]/d' /etc/chrony.conf
[root@mds001 ~]# sed -i '2a\server 192.168.10.24 iburst\' /etc/chrony.conf
[root@mds001 ~]# sed -i 's/#allow 192.168.0.0\/16/allow 192.168.10.0\/24/' /etc/chrony.conf
[root@mds001 ~]# sed -i 's/#local stratum 10/local stratum 10/' /etc/chrony.conf

重启服务：

[root@mds001 ~]# systemctl restart chronyd

查看时间同步状态：

[root@mds001 ~]# timedatectl status
      Local time: 三 2023-07-26 23:02:14 EDT
  Universal time: 四 2023-07-27 03:02:14 UTC
        RTC time: 四 2023-07-27 03:02:14
       Time zone: America/New_York (EDT, -0400)
     NTP enabled: yes
NTP synchronized: yes
 RTC in local TZ: no
      DST active: yes
 Last DST change: DST began at
                  日 2023-03-12 01:59:59 EST
                  日 2023-03-12 03:00:00 EDT
 Next DST change: DST ends (the clock jumps one hour backwards) at
                  日 2023-11-05 01:59:59 EDT
                  日 2023-11-05 01:00:00 EST

开启网络时间同步：

[root@mds001 ~]# timedatectl set-ntp true

查看具体的同步信息：

[root@mds001 ~]# chronyc sources -v
210 Number of sources = 1

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| /   '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* 203.107.6.88                  2   6    17     2  +2105us[+1914us] +/-   20ms

5.配置ssh免密：

两个节点各自创建密钥对

# ssh-keygen - 生成、管理和转换认证密钥，t制定类型     
[root@mds001 ~]# ssh-keygen -t rsa -b 1024 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:jRwgxJr1/bIRZ630UbN4QJtlVI1qsYkqQ+SOe++pwS4 root@mds001
The key's randomart image is:
+---[RSA 1024]----+
|   oo .    ...+oo|
|    o...    o=+ .|
|   + + ..  ooO o |
|  o   +.o+= O o  |
|     +  SB.+ o   |
|    ..+ + o .    |
|     .oo +       |
|    E.....       |
|     oo++        |
+----[SHA256]-----+
[root@mds002 ~]# ssh-keygen -t rsa -b 1024
Generating public/private rsa key pair.
....

在被登录主机创建目录和文件：

mkdir ~/.ssh;touch ~/.ssh/authorized_keys
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.51

两个节点相互复制该公钥文件到对方的该目录下：

[root@mds001 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.10.22
The authenticity of host '192.168.10.22 (192.168.10.22)' can't be established.
ECDSA key fingerprint is SHA256:8GotQw1f08FF9REsxJKn9ObpvvOib0h1W2sfJNClXwk.
ECDSA key fingerprint is MD5:a4:7d:50:65:13:8d:17:dd:8e:b9:10:6f:64:e8:9b:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.10.22' (ECDSA) to the list of known hosts.
root@192.168.10.22's password: 
id_rsa.pub                                                       100%  225   125.2KB/s   00:00  
[root@mds002 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.10.21
The authenticity of host '192.168.10.21 (192.168.10.21)' can't be established.
...

在本地服务器上登陆对端服务器

[root@mds001 ~]# ssh root@192.168.10.22
Last login: Wed Jul 26 21:24:53 2023 from 192.168.10.1
[root@mds002 ~]# 
[root@mds002 ~]# ssh root@192.168.10.21
Last login: Wed Jul 26 23:23:22 2023 from 192.168.10.1
[root@mds001 ~]#

6.搭建Corosync高可用：

主从服务器全部配置域名映射，关闭防火墙，关闭selinux：

cat << eof >> /etc/hosts
192.168.10.21 mds001
192.168.10.22 mds002
eof

cat << eof >> /etc/hosts
192.168.10.24 mds004
192.168.10.25 mds005
eof

systemctl stop firewalld
systemctl disable firewalld
sed -i -e  's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config

各节点都安装：

工具	说明
pacemaker	最为广泛的开源集群资源管理器（故障检测和资源恢复，保证集群服务的高可用）
pcs	corosync中的命令行工具
psmisc	管理进程的工具
policycoreutils-python	python软件包，用于在 SELinux（Security-Enhanced Linux）环境中操作和管理 SELinux 策略

[root@mds001 ~]# yum install pacemaker pcs policycoreutils-python -y

各节点都设置pscd服务开机自启：

[root@mds001 ~]# systemctl enable pcsd;systemctl restart pcsd

在集群各节点上给hacluster用户设定相同的密码

[root@mds001 ~]# echo "110119" |passwd --stdin hacluster
更改用户 hacluster 的密码 。
passwd：所有的身份验证令牌已经成功更新。

认证各节点的用户名（hacluster）和密码（110119），在一个节点上认证即可:

[root@mds001 ~]# pcs cluster auth mds00{1,2}
Username: hacluster
Password: 
master: Authorized
slave1: Authorized
slave2: Authorized

创建集群：

[root@mds001 ~]# pcs cluster setup --name mylustre mds00{1,2}
Destroying cluster on nodes: mds001, mds002...
mds001: Stopping Cluster (pacemaker)...
mds002: Stopping Cluster (pacemaker)...
mds002: Successfully destroyed cluster
mds001: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'mds001', 'mds002'
mds001: successful distribution of the file 'pacemaker_remote authkey'
mds002: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
mds001: Succeeded
mds002: Succeeded

Synchronizing pcsd certificates on nodes mds001, mds002...
mds001: Success
mds002: Success
Restarting pcsd on the nodes in order to reload the certificates...
mds001: Success
mds002: Success

启动集群：

[root@mds001 ~]# pcs cluster start --all
mds001: Starting Cluster (corosync)...
mds002: Starting Cluster (corosync)...
mds001: Starting Cluster (pacemaker)...
mds002: Starting Cluster (pacemaker)...

好了，到此一个2节点的corosync+pacemaker集群就创建启动好了

在各节点查看corosync pacemaker是否都启动了，（其中一个节点的pacemaker日志报错，它这里告诉我们我配置了stonith选项，却没有发现stonith设备），解决方法：后期关闭关闭stonith选项即可。

[root@mds001 ~]# systemctl status corosync pacemaker
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
   Active: active (running) since 三 2023-07-26 22:21:09 EDT; 2min 41s ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
  Process: 67381 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 67389 (corosync)
   CGroup: /system.slice/corosync.service
           └─67389 corosync

7月 26 22:21:09 mds001 corosync[67389]:  [TOTEM ] A new membership (192.168.10.21:9) was for...: 2
7月 26 22:21:09 mds001 corosync[67389]:  [CPG   ] downlist left_list: 0 received
7月 26 22:21:09 mds001 corosync[67389]:  [CPG   ] downlist left_list: 0 received
7月 26 22:21:09 mds001 corosync[67389]:  [CPG   ] downlist left_list: 0 received
7月 26 22:21:09 mds001 corosync[67389]:  [VOTEQ ] Waiting for all cluster members. Current v...: 2
7月 26 22:21:09 mds001 corosync[67389]:  [QUORUM] This node is within the primary component ...ce.
7月 26 22:21:09 mds001 corosync[67389]:  [QUORUM] Members[2]: 1 2
7月 26 22:21:09 mds001 corosync[67389]:  [MAIN  ] Completed service synchronization, ready t...ce.
7月 26 22:21:09 mds001 corosync[67381]: Starting Corosync Cluster Engine (corosync): [  OK  ]
7月 26 22:21:09 mds001 systemd[1]: Started Corosync Cluster Engine.

● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
   Active: active (running) since 三 2023-07-26 22:21:10 EDT; 2min 40s ago
     Docs: man:pacemakerd
           https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
 Main PID: 67424 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─67424 /usr/sbin/pacemakerd -f
           ├─67425 /usr/libexec/pacemaker/cib
           ├─67426 /usr/libexec/pacemaker/stonithd
           ├─67427 /usr/libexec/pacemaker/lrmd
           ├─67428 /usr/libexec/pacemaker/attrd
           ├─67429 /usr/libexec/pacemaker/pengine
           └─67430 /usr/libexec/pacemaker/crmd

7月 26 22:21:11 mds001 crmd[67430]:   notice: Node mds002 state is now member
7月 26 22:21:11 mds001 stonith-ng[67426]:   notice: Node mds002 state is now member
7月 26 22:21:11 mds001 crmd[67430]:   notice: The local CRM is operational
7月 26 22:21:11 mds001 crmd[67430]:   notice: State transition S_STARTING -> S_PENDING
7月 26 22:21:11 mds001 attrd[67428]:   notice: Node mds002 state is now member
7月 26 22:21:11 mds001 attrd[67428]:   notice: Recorded local node as attribute writer (was unset)
7月 26 22:21:13 mds001 crmd[67430]:   notice: Fencer successfully connected
7月 26 22:21:32 mds001 crmd[67430]:  warning: Input I_DC_TIMEOUT received in state S_PENDIN...pped
7月 26 22:21:32 mds001 crmd[67430]:   notice: State transition S_ELECTION -> S_PENDING
7月 26 22:21:32 mds001 crmd[67430]:   notice: State transition S_PENDING -> S_NOT_DC
Hint: Some lines were ellipsized, use -l to show in full.

查看集群状态

[root@mds001 ~]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: mds002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
 Last updated: Wed Jul 26 22:24:52 2023
 Last change: Wed Jul 26 22:21:32 2023 by hacluster via crmd on mds002
 2 nodes configured
 0 resource instances configured

PCSD Status:
  mds002: Online
  mds001: Online

验证集群配置信息（发生错误，需要关闭stonith选项）：

[root@mds001 ~]# crm_verify -LV
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

7.配置资源防护：

安装所有防护代理：

[root@mds001 resource.d]# yum install -y fence-agents-all

查看本机中的资源防护代理：

[root@mds001 resource.d]# pcs stonith list
fence_amt_ws - Fence agent for AMT (WS)
fence_apc - Fence agent for APC over telnet/ssh
fence_apc_snmp - Fence agent for APC, Tripplite PDU over SNMP
fence_bladecenter - Fence agent for IBM BladeCenter
fence_brocade - Fence agent for HP Brocade over telnet/ssh
fence_cisco_mds - Fence agent for Cisco MDS
fence_cisco_ucs - Fence agent for Cisco UCS
fence_compute - Fence agent for the automatic resurrection of OpenStack compute instances
fence_drac5 - Fence agent for Dell DRAC CMC/5
fence_eaton_snmp - Fence agent for Eaton over SNMP
fence_emerson - Fence agent for Emerson over SNMP
fence_eps - Fence agent for ePowerSwitch
fence_evacuate - Fence agent for the automatic resurrection of OpenStack compute instances
fence_heuristics_ping - Fence agent for ping-heuristic based fencing
fence_hpblade - Fence agent for HP BladeSystem
fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
fence_idrac - Fence agent for IPMI
fence_ifmib - Fence agent for IF MIB
fence_ilo - Fence agent for HP iLO
fence_ilo2 - Fence agent for HP iLO
fence_ilo3 - Fence agent for IPMI
fence_ilo3_ssh - Fence agent for HP iLO over SSH
fence_ilo4 - Fence agent for IPMI
fence_ilo4_ssh - Fence agent for HP iLO over SSH
fence_ilo5 - Fence agent for IPMI
fence_ilo5_ssh - Fence agent for HP iLO over SSH
fence_ilo_moonshot - Fence agent for HP Moonshot iLO
fence_ilo_mp - Fence agent for HP iLO MP
fence_ilo_ssh - Fence agent for HP iLO over SSH
fence_imm - Fence agent for IPMI
fence_intelmodular - Fence agent for Intel Modular
fence_ipdu - Fence agent for iPDU over SNMP
fence_ipmilan - Fence agent for IPMI
fence_kdump - fencing agent for use with kdump crash recovery service
fence_mpath - Fence agent for multipath persistent reservation
fence_redfish - I/O Fencing agent for Redfish
fence_rhevm - Fence agent for RHEV-M REST API
fence_rsa - Fence agent for IBM RSA
fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
fence_sbd - Fence agent for sbd
fence_scsi - Fence agent for SCSI persistent reservation
fence_virt - Fence agent for virtual machines
fence_vmware_rest - Fence agent for VMware REST API

开启stonith设备：
```
pcs property set stonith-enabled=true
```

配置fence_heuristics_ping

pcs stonith create stonith-ping-mds001 fence_heuristics_ping ping_targets=192.168.10.21
pcs stonith create stonith-ping-mds002 fence_heuristics_ping ping_targets=192.168.10.22

8.配置Lnet网络：

总共需要有两种网卡（ens33，ens38）

修改配置文件：/etc/modprobe.d/lustre.conf

cat << eof > /etc/modprobe.d/lustre.conf
options lnet networks="tcp(ens33),tcp2(ens38)"
eof

复制lustre.conf文件到另一个节点上：

scp /etc/modprobe.d/lustre.conf root@mds001:/etc/modprobe.d

两个节点，重新加载模块：
```
lustre_rmmod && modprobe -v lustre
```

查看：

[root@mds001 ~]# lctl list_nids
192.168.10.21@tcp
192.168.209.21@tcp2
[root@mds002 ~]#  lctl list_nids
192.168.10.22@tcp
192.168.209.22@tcp2

9.使用Lustre代理方式添加：

ocf:lustre:Lustre：
- 由于其范围较窄，它比 Lustre 存储资源更简单ocf:heartbeat:Filesystem，因此更适合管理 Lustre 存储资源。
- 专为 Lustre OSD 开发，该 RA 由 Lustre 项目分发，并在 Lustre 2.10.0 版本及以上版本中提供
- ocf:heartbeat:ZFS：（得自己安装，用到了存储池）

下载：（需要注意两个节点都需要安装）

wget https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/lustre-resource-agents-2.12.1-1.el7.x86_64.rpm

rpm -ivh lustre-resource-agents-2.12.1-1.el7.x86_64.rpm

创建资源：

参数	说明
<resource name>	资源名
target=	用于存储的块设备的路径
mountpoint=	是 OSD 的安装点

pcs resource create global-mgs ocf:lustre:Lustre target=/dev/sdb mountpoint=/mnt/mgs
pcs resource create global-mdt1 ocf:lustre:Lustre target=/dev/sdc mountpoint=/mnt/mdt1
pcs resource create global-mdt2 ocf:lustre:Lustre target=/dev/sdd mountpoint=/mnt/mdt2
pcs resource create global-ost1 ocf:lustre:Lustre target=/dev/sde mountpoint=/mnt/ost1
pcs resource create global-ost2 ocf:lustre:Lustre target=/dev/sdf mountpoint=/mnt/ost2

配置资源优先级：不能添加资源组，会将资源绑定在一个节点上，使优先级无效

pcs constraint location add global-constraint-mgs global-mgs mds001 10
pcs constraint location add global-constraint-mdt1 global-mdt1 mds001 10
pcs constraint location add global-constraint-mdt2 global-mdt2 mds002 10
pcs constraint location add global-constraint-ost1 global-ost1 mds001 10
pcs constraint location add global-constraint-ost2 global-ost2 mds002 10

设置启动顺序，mgs > mdt > ost：

pcs constraint order start global-mgs then start global-mdt1
pcs constraint order start global-mgs then start global-mdt2
pcs constraint order start global-mgs then start global-ost1
pcs constraint order start global-mgs then start global-ost2

10.创建Lnet监控资源：

Pacemaker 可以配置为监视集群服务器的各个方面，以帮助确定整体系统的运行状况。这提供了额外的数据点，用于做出有关在何处运行资源的决策。
Lustre 2.10版本引入了两种监控资源代理：
- ocf:lustre:healthLNET– 用于监控 LNet 连接（需要配置LNet网络接口）
- ocf:lustre:healthLUSTRE– 用于监控Lustre的健康状况（需要安装）

创建：(没有在那个节点之分)

创建参数	说明
lctl	告诉资源代理使用`lctl ping`来监视 LNet NID 。如果未设置，则使用常规系统 ping 命令。
multiplier	是一个正整数值，乘以响应ping数的机器数量。结果需要大于资源粘性值。
device	是要监控的网络设备，例如`eth1`、`ib0`。
host_list	是要尝试 ping 的以空格分隔的 LNet NID 列表。如果`lctl=false`，`host_list`应包含常规主机名或 IP 地址。
--clone	告诉 Pacemaker 在集群中的每个节点上启动资源实例。

pcs resource create ping-lnet ocf:lustre:healthLNET \
lctl=true \
multiplier=1001 \
device=ens33 \
host_list="192.168.209.21@tcp2 192.168.209.22@tcp2" \
--clone

11.创建Ustre监控资源：

ocf:lustre:healthLUSTRE遵循与相同的实现模型ocf:lustre:healthLNET
只不过不是监视 LNet NID，而是ocf:lustre:healthLUSTRE监视 Lustre 文件的内容health_check并维护一个名为的属性lustred。

创建：

pcs resource create global-healthLUSTRE ocf:lustre:healthLUSTRE --clone

12.进行故障转移：

[root@mds004 ~]# pcs status
Cluster name: MyUpdate
Stack: corosync
Current DC: mds004 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 10 06:09:44 2023
Last change: Thu Aug 10 06:07:46 2023 by root via cibadmin on mds004

2 nodes configured
11 resource instances configured

Online: [ mds004 mds005 ]

Full list of resources:

 Clone Set: ping-lnet-clone [ping-lnet]
     Started: [ mds004 mds005 ]
 Clone Set: global-healthLUSTRE-clone [global-healthLUSTRE]
     Started: [ mds004 mds005 ]
 stonith-ping-mds004    (stonith:fence_heuristics_ping):        Started mds004
 stonith-ping-mds005    (stonith:fence_heuristics_ping):        Started mds005
 global-mgs     (ocf::lustre:Lustre):   Started mds004
 global-mdt1    (ocf::lustre:Lustre):   Started mds004
 global-mdt2    (ocf::lustre:Lustre):   Started mds005
 global-ost1    (ocf::lustre:Lustre):   Started mds004
 global-ost2    (ocf::lustre:Lustre):   Started mds005

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

[root@mds005 lustre2.12.1]# pcs status
Cluster name: MyUpdate
Stack: corosync
Current DC: mds005 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 10 06:11:16 2023
Last change: Thu Aug 10 06:07:46 2023 by root via cibadmin on mds004

2 nodes configured
11 resource instances configured

Online: [ mds005 ]
OFFLINE: [ mds004 ]

Full list of resources:

 Clone Set: ping-lnet-clone [ping-lnet]
     Started: [ mds005 ]
     Stopped: [ mds004 ]
 Clone Set: global-healthLUSTRE-clone [global-healthLUSTRE]
     Started: [ mds005 ]
     Stopped: [ mds004 ]
 stonith-ping-mds004    (stonith:fence_heuristics_ping):        Started mds005
 stonith-ping-mds005    (stonith:fence_heuristics_ping):        Started mds005
 global-mgs     (ocf::lustre:Lustre):   Started mds005
 global-mdt1    (ocf::lustre:Lustre):   Started mds005
 global-mdt2    (ocf::lustre:Lustre):   Started mds005
 global-ost1    (ocf::lustre:Lustre):   Started mds005
 global-ost2    (ocf::lustre:Lustre):   Started mds005

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

2.客户端远程测试：

1.下载并安装：

下载：

rpm包	说明
kernel-*.el7_lustre.x86_64.rpm	带 Lustre 补丁的 Linux 内核
kernel-devel-*.el7_lustre.x86_64.rpm	编译第三方模块 (如网终驱动程序) 所需的内核树部分
kernel-headers-*.el7_lustre.x86_64.rpm	在/user/include 下的头文件，用于编译用户空间和内核相关代码
lustre-client-dkms-*.el7.noarch.rpm	kmod-lustre-client 的代客户端 RPM，含动态内核模块支持(DKMS)。
lustre-client-*.el7.x86_64.rpm	客户端命令行工具

mkdir lustre2.12 && cd lustre2.12
wget \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kernel-3.10.0-957.10.1.el7_lustre.x86_64.rpm \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kernel-devel-3.10.0-957.10.1.el7_lustre.x86_64.rpm \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kernel-headers-3.10.0-957.10.1.el7_lustre.x86_64.rpm \
https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el7/client/RPMS/x86_64/lustre-client-dkms-2.12.7-1.el7.noarch.rpm \
wget https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/client/RPMS/x86_64/lustre-client-2.12.1-1.el7.x86_64.rpm

安装（需要安装gcc等依赖）：

[root@client lustre2.12]# rpm -ivh *.rpm
错误：依赖检测失败：
        /usr/bin/expect 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
        dkms >= 2.2.0.3-28.git.7c3e7c5 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
        gcc 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
        kernel-devel 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
        libyaml-devel 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要

安装扩展源：
```
yum install -y epel-release
```

安装依赖：

yum install -y perl expect gcc kernel-devel libyaml-devel dkms

rpm安装内核：

[root@client lustre2.12]# rpm -ivh kernel-*.rpm --force

安装成功了，但我们注意到，⽇志表明，Lustre内核模块并没有⾃动编译，原因：内核头⽂件和内核版本不同。

[root@client ~]# rpm -qa | grep kernel
kernel-3.10.0-1160.el7.x86_64
kernel-tools-3.10.0-1160.el7.x86_64
kernel-headers-3.10.0-957.10.1.el7_lustre.x86_64
kernel-3.10.0-957.10.1.el7_lustre.x86_64
kernel-tools-libs-3.10.0-1160.el7.x86_64
kernel-devel-3.10.0-957.10.1.el7_lustre.x86_64
kernel-devel-3.10.0-1160.92.1.el7.x86_64

重启主机查看内核：

[root@client ~]# init 6
[root@client ~]# uname -r
3.10.0-957.10.1.el7_lustre.x86_64

rpm安装客户端：

[root@client ~]# rpm -ivh lustre-client-dkms-2.12.1-1.el7.noarch.rpm
准备中...                          ################################# [100%]
正在升级/安装...
   1:lustre-client-dkms-2.12.1-1.el7  ################################# [100%]
Loading new lustre-client-2.12.1 DKMS files...
Deprecated feature: REMAKE_INITRD (/usr/src/lustre-client-2.12.1/dkms.conf)
Building for 3.10.0-957.10.1.el7_lustre.x86_64
Building initial module for 3.10.0-957.10.1.el7_lustre.x86_64
Deprecated feature: REMAKE_INITRD (/var/lib/dkms/lustre-client/2.12.1/source/dkms.conf)
configure: WARNING:

No selinux package found, unable to build selinux enabled tools

Done.
Deprecated feature: REMAKE_INITRD (/var/lib/dkms/lustre-client/2.12.1/source/dkms.conf)
Deprecated feature: REMAKE_INITRD (/var/lib/dkms/lustre-client/2.12.1/source/dkms.conf)

lnet_selftest.ko.xz:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/

lnet.ko.xz:
.....
[root@client ~]# rpm -ivh lustre-client-2.12.1-1.el7.x86_64.rpm
准备中...                          ################################# [100%]
正在升级/安装...
   1:lustre-client-2.12.1-1.el7       ################################# [100%]

加载模块：

[root@client ~]# modprobe lustre
[root@client ~]# lsmod | grep lustre
lustre                754256  0 
lmv                   177987  1 lustre
mdc                   237615  1 lustre
lov                   314554  1 lustre
ptlrpc               1345789  7 fid,fld,lmv,mdc,lov,osc,lustre
obdclass             1741312  8 fid,fld,lmv,mdc,lov,osc,lustre,ptlrpc
lnet                  600632  6 lmv,osc,lustre,obdclass,ptlrpc,ksocklnd
libcfs                415252  11 fid,fld,lmv,mdc,lov,osc,lnet,lustre,obdclass,ptlrpc,ksocklnd

查看lustre版本：

[root@mds001 ~]# modinfo lustre
filename:       /lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/lustre/fs/lustre.ko
license:        GPL
version:        2.12.1
description:    Lustre Client File System
author:         OpenSFS, Inc. <http://www.lustre.org/>
retpoline:      Y
rhelversion:    7.6
srcversion:     E50D950B04B4044ABCBCFA3
depends:        obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov
vermagic:       3.10.0-957.10.1.el7_lustre.x86_64 SMP mod_unload modversions

2.挂载客户端到双机lustre：

创建目录，远程挂载：

[root@client ~]# mkdir /mnt/global-client1;mount -t lustre 192.168.10.21@tcp:192.168.10.22@tcp:/global /mnt/global-client1
[root@client ~]# mkdir /mnt/global-client2;mount -t lustre 192.168.10.21@tcp:192.168.10.22@tcp:/global /mnt/global-client2

查看：

[root@client ~]# mount | grep lustre
192.168.10.21@tcp:192.168.10.22@tcp:/global on /mnt/global-client1 type lustre (rw,seclabel,lazystatfs)
192.168.10.21@tcp:192.168.10.22@tcp:/global on /mnt/global-client2 type lustre (rw,seclabel,lazystatfs)

[root@client ~]# lfs df
UUID                   1K-blocks        Used   Available Use% Mounted on
global-MDT0000_UUID      2330824       16548     2106120   1% /mnt/global-client1[MDT:0]
global-MDT0002_UUID      2330824       16468     2106200   1% /mnt/global-client1[MDT:2]
global-OST0000_UUID      3865564       34084     3605384   1% /mnt/global-client1[OST:0]
global-OST0002_UUID      3865564       34088     3605380   1% /mnt/global-client1[OST:2]

filesystem_summary:      7731128       68172     7210764   1% /mnt/global-client1

UUID                   1K-blocks        Used   Available Use% Mounted on
global-MDT0000_UUID      2330824       16548     2106120   1% /mnt/global-client2[MDT:0]
global-MDT0002_UUID      2330824       16468     2106200   1% /mnt/global-client2[MDT:2]
global-OST0000_UUID      3865564       34084     3605384   1% /mnt/global-client2[OST:0]
global-OST0002_UUID      3865564       34088     3605380   1% /mnt/global-client2[OST:2]

filesystem_summary:      7731128       68172     7210764   1% /mnt/global-client2

4.测试：

在/mnt/global客户端处，创建文件并写数据：

[root@client ~]# echo "Hello Lustre,I am Centos7.0-client  client1" > /mnt/global-client1/client-test

在/mnt/global2客户端处，发现文件first_file并且查看到数据

[root@client ~]# ll /mnt/global-client2
总用量 5
-rw-r--r--. 1 root root 44 7月  26 05:16 client-test
-rw-r--r--. 1 root root 26 7月  26 02:51 first_file
[root@client ~]# cat /mnt/global-client2/client-test
Hello Lustre,I am Centos7.0-client  client1

到此高可用的Lustre集群就完全搭建完成。