最新最全的~教你如何搭建高可用Lustre双机集群

1.搭建双机lustre高可用集群:

1.环境说明:

主机名系统挂载情况IP地址Lustre集群名内存
mds001Centos7.9(共享磁盘)1个mgs,1个MDT,2个OST192.168.10.21/209.21global1G
mds002Centos7.9(共享磁盘)1个mgs,1个MDT,2个OST192.168.10.22/209.22global1G
clientCentos7.9192.168.10.411G

mds01,mds02作为mds的同时 也做oss,做5个共享盘:1个mgs,2个mdt,2个ost,搭建一套高可用的lustre服务集群

2.两个虚拟机使用共享磁盘:

  • 前提:两台虚拟机没有拍摄快照

  • 在mds001主机中:

    • 添加五块5G的硬盘

      SCSI > 创建新虚拟磁盘 > 指定磁盘容量 ,立即分配所有磁盘空间,将虚拟磁盘存储为单个文件 

  • 在mds001和mds002的虚拟机目录下,找个后缀名为vmx的文件,在文件末尾添加一下内容:

    scsi1.sharedBus = "virtual"
    disk.locking = "false"
    diskLib.dataCacheMaxSize = "0"
    diskLib.dataCacheMaxReadAheadSize = "0"
    diskLib.dataCacheMinReadAheadSize = "0"
    diskLib.dataCachePageSize = "4096"
    diskLib.maxUnsyncedWrites = "0"
    disk.EnableUUID = "TRUE"
  • 重启两台虚拟机发现,添加成功

    [root@mds001 ~]# lsblk
    NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sda               8:0    0   20G  0 disk 
    ├─sda1            8:1    0    1G  0 part /boot
    └─sda2            8:2    0   19G  0 part 
      ├─centos-root 253:0    0   17G  0 lvm  /
      └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
    sdb               8:16   0    5G  0 disk 
    sdc               8:32   0    5G  0 disk 
    sdd               8:48   0    5G  0 disk 
    sde               8:64   0    5G  0 disk 
    sdf               8:80   0    5G  0 disk 
    sr0              11:0    1  9.5G  0 rom  /mnt/cdrom

3.安装和配置Lustre:

  • 两台主机都需要下载OSS服务器所需要的包:E2fsprogs包只是在Ext4的原版RPM包基础上增加了对Lustre⽀持

    mkdir ~/e2fsprogs && cd ~/e2fsprogs
    wget -c -r -nd https://downloads.whamcloud.com/public/e2fsprogs/1.44.5.wc1/el7/RPMS/x86_64/
    rm -rf index.html* unknown.gif *.gif sha256sum
  • 全部rpm安装:

    [root@mds001 e2fsprogs]# cd ~/e2fsprogs && rpm -Uvh *
    准备中...                          ################################# [100%]
    正在升级/安装...
       1:libcom_err-1.42.12.wc1-4.el7.cent################################# [  8%]
       2:e2fsprogs-libs-1.42.12.wc1-4.el7.################################# [ 15%]
       3:libcom_err-devel-1.42.12.wc1-4.el################################# [ 23%]
       4:libss-1.42.12.wc1-4.el7.centos   ################################# [ 31%]
       5:e2fsprogs-1.42.12.wc1-4.el7.cento################################# [ 38%]
       6:libss-devel-1.42.12.wc1-4.el7.cen################################# [ 46%]
       7:e2fsprogs-devel-1.42.12.wc1-4.el7################################# [ 54%]
       8:e2fsprogs-static-1.42.12.wc1-4.el################################# [ 62%]
       9:e2fsprogs-debuginfo-1.42.12.wc1-4################################# [ 69%]
    正在清理/删除...
      10:e2fsprogs-1.42.9-19.el7          ################################# [ 77%]
      11:e2fsprogs-libs-1.42.9-19.el7     ################################# [ 85%]
      12:libss-1.42.9-19.el7              ################################# [ 92%]
      13:libcom_err-1.42.9-19.el7         ################################# [100%]
  • 两台主机都需要下载MDS服务器所需要的包:

    wget命令参数说明
    -c断点续传
    -r递归下载
    -nd不分层,所有文件下载到当前目录下
    rpm包说明
    kernel-*.el7_lustre.x86_64.rpm带 Lustre 补丁的 Linux 内核
    kmod-lustre-*.el7.x86_64.rpmLustre 补丁内核模块
    kmod-lustre-osd-ldiskfs-*.el7.x86_64.rpm基于 ldiskfs 的 Lustre 后端文件系统工具
    lustre-*.el7.x86_64.rpmLustre 软件命令行工具
    lustre-osd-ldiskfs-mount-*.el7.x86_64.rpm基于ldiskfs 的 mount.lustre和mkfs。lustre相关帮助文档
    mkdir ~/lustre2.12.1 && cd ~/lustre2.12.1
    yum install -y wget
    wget \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kernel-3.10.0-957.10.1.el7_lustre.x86_64.rpm \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kmod-lustre-2.12.1-1.el7.x86_64.rpm \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kmod-lustre-osd-ldiskfs-2.12.1-1.el7.x86_64.rpm \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/lustre-2.12.1-1.el7.x86_64.rpm \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/lustre-osd-ldiskfs-mount-2.12.1-1.el7.x86_64.rpm
  • 安装依赖(否则报错error: Failed dependencies:):

    yum clean all && yum repolist
    yum install -y linux-firmware dracut selinux-policy-targeted kexec-tools libyaml perl
  • 全部rpm安装:(如果无法安装就强行安装)

    cd ~/lustre2.12.1 && rpm -ivh *.rpm --force
  • 重启服务器:

    init 6
  • 检查内核:

    [root@master ~]# uname -r
    3.10.0-957.el7_lustre.x86_64
  • 加载Lustre模块(此为临时加载,重启失效):

    [root@master ~]# modprobe lustre && lsmod | grep lustre
    lustre                758679  0 
    lmv                   177987  1 lustre
    mdc                   232938  1 lustre
    lov                   314581  1 lustre
    ptlrpc               2264705  7 fid,fld,lmv,mdc,lov,osc,lustre
    obdclass             1962422  8 fid,fld,lmv,mdc,lov,osc,lustre,ptlrpc
    lnet                  595941  6 lmv,osc,lustre,obdclass,ptlrpc,ksocklnd
    libcfs                421295  11 fid,fld,lmv,mdc,lov,osc,lnet,lustre,obdclass,ptlrpc,ksocklnd
  • 查看lustre版本:

    [root@mds001 ~]# modinfo lustre
    filename:       /lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/lustre/fs/lustre.ko
    license:        GPL
    version:        2.12.1
    description:    Lustre Client File System
    author:         OpenSFS, Inc. <http://www.lustre.org/>
    retpoline:      Y
    rhelversion:    7.6
    srcversion:     E50D950B04B4044ABCBCFA3
    depends:        obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov
    vermagic:       3.10.0-957.10.1.el7_lustre.x86_64 SMP mod_unload modversions 
  • 只有在某个节点挂载上了,才表明MGS,MDT或者是OST被创建了

  • 五个硬盘都在mds001服务器上格式化,因为是共享硬盘,所以在mds002服务器上可以查看到格式化类型:

    格式化参数说明
    --fsname设置Lustre集群的名称,Lustre文件系统的标识,必须唯一
    --servicenodemgs节点的IP地址,Lnet网络
    --mgsnodemgs节点的IP地址,Lnet网络
    --mgs将分区格式化为MGS,MGS(ManaGe Server)是⽤来记录整个Lustre状态的服务
    --mdt将分区格式化为MDT,MDT(MetaData Target)是存放Lustre元数据服务的设备
    --ost将分区格式化为OST,OST(Object Storage Target)则是存储Lustre数据的设备
    --index设置设备的在lustre集群中的标签值,如:--mdt --index=1,LABEL="global-MDT0001",在同集群具有唯一性
    --reformat跳过检查,防止格式化操作清除已有的数据
    --replace替换
    # 关于MDT和OST,--index的值要从0开始,不然系统/var/log/messages日志显示序列号等待错误
    
    mkfs.lustre --mgs --servicenode=192.168.209.21@tcp2 --servicenode=192.168.209.22@tcp2 --backfstype=ldiskfs --reformat /dev/sdb
    mkfs.lustre --fsname global --mdt --index=0 --servicenode=192.168.209.24@tcp2 --servicenode=192.168.209.25@tcp2 --mgsnode=192.168.209.24@tcp2 --mgsnode=192.168.209.25@tcp2 --backfstype=ldiskfs --reformat /dev/sdc
    mkfs.lustre --fsname global --mdt --index=1 --servicenode=192.168.209.24@tcp2 --servicenode=192.168.209.25@tcp2 --mgsnode=192.168.209.24@tcp2 --mgsnode=192.168.209.25@tcp2 --backfstype=ldiskfs --reformat /dev/sdd
    mkfs.lustre --fsname global --ost --index=0 --servicenode=192.168.209.24@tcp2 --servicenode=192.168.209.25@tcp2 --mgsnode=192.168.209.24@tcp2 --mgsnode=192.168.209.25@tcp2 --backfstype=ldiskfs --reformat /dev/sde
    mkfs.lustre --fsname global --ost --index=1 --servicenode=192.168.209.24@tcp2 --servicenode=192.168.209.25@tcp2 --mgsnode=192.168.209.24@tcp2 --mgsnode=192.168.209.25@tcp2 --backfstype=ldiskfs --reformat /dev/sdf
  • 在mds002服务器上可以查看格式化类型:

    [root@mds002 ~]# blkid /dev/sdb
    /dev/sdc: LABEL="global:MDT0000" UUID="d99a724e-d199-4f59-9a4e-7164882f754b" TYPE="ext4" 
  • 在corosync中创建的挂载资源需要创建目录:

    [root@mds001 ~]# mkdir /mnt/mgs;mkdir /mnt/mdt1;mkdir /mnt/mdt2;mkdir /mnt/ost1;mkdir /mnt/ost2
    [root@mds002 ~]# mkdir /mnt/mgs;mkdir /mnt/mdt1;mkdir /mnt/mdt2;mkdir /mnt/ost1;mkdir /mnt/ost2

4.搭建时间同步chrony:

  • 两个节点都并且连接到阿里云时间服务器,安装:

    [root@mds001 ~]# yum -y install chrony
  • 启动进程:

    [root@mds001 ~]# systemctl enable chronyd;systemctl start chronyd
  • 修改配置文件/etc/chrony.conf

    [root@mds001 ~]# sed -i '/^server [0-9]/d' /etc/chrony.conf
    [root@mds001 ~]# sed -i '2a\server 192.168.10.24 iburst\' /etc/chrony.conf
    [root@mds001 ~]# sed -i 's/#allow 192.168.0.0\/16/allow 192.168.10.0\/24/' /etc/chrony.conf
    [root@mds001 ~]# sed -i 's/#local stratum 10/local stratum 10/' /etc/chrony.conf
  • 重启服务:

    [root@mds001 ~]# systemctl restart chronyd
  • 查看时间同步状态:

    [root@mds001 ~]# timedatectl status
          Local time: 三 2023-07-26 23:02:14 EDT
      Universal time: 四 2023-07-27 03:02:14 UTC
            RTC time: 四 2023-07-27 03:02:14
           Time zone: America/New_York (EDT, -0400)
         NTP enabled: yes
    NTP synchronized: yes
     RTC in local TZ: no
          DST active: yes
     Last DST change: DST began at
                      日 2023-03-12 01:59:59 EST
                      日 2023-03-12 03:00:00 EDT
     Next DST change: DST ends (the clock jumps one hour backwards) at
                      日 2023-11-05 01:59:59 EDT
                      日 2023-11-05 01:00:00 EST
  • 开启网络时间同步:

    [root@mds001 ~]# timedatectl set-ntp true
  • 查看具体的同步信息:

    [root@mds001 ~]# chronyc sources -v
    210 Number of sources = 1
    ​
      .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
     / .- Source state '*' = current synced, '+' = combined , '-' = not combined,
    | /   '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
    ||                                                 .- xxxx [ yyyy ] +/- zzzz
    ||      Reachability register (octal) -.           |  xxxx = adjusted offset,
    ||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
    ||                                \     |          |  zzzz = estimated error.
    ||                                 |    |           \
    MS Name/IP address         Stratum Poll Reach LastRx Last sample               
    ===============================================================================
    ^* 203.107.6.88                  2   6    17     2  +2105us[+1914us] +/-   20ms

5.配置ssh免密:

  • 两个节点各自创建密钥对

    # ssh-keygen - 生成、管理和转换认证密钥,t制定类型     
    [root@mds001 ~]# ssh-keygen -t rsa -b 1024 
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa): 
    Created directory '/root/.ssh'.
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /root/.ssh/id_rsa.
    Your public key has been saved in /root/.ssh/id_rsa.pub.
    The key fingerprint is:
    SHA256:jRwgxJr1/bIRZ630UbN4QJtlVI1qsYkqQ+SOe++pwS4 root@mds001
    The key's randomart image is:
    +---[RSA 1024]----+
    |   oo .    ...+oo|
    |    o...    o=+ .|
    |   + + ..  ooO o |
    |  o   +.o+= O o  |
    |     +  SB.+ o   |
    |    ..+ + o .    |
    |     .oo +       |
    |    E.....       |
    |     oo++        |
    +----[SHA256]-----+
    [root@mds002 ~]# ssh-keygen -t rsa -b 1024
    Generating public/private rsa key pair.
    ....
  • 在被登录主机创建目录和文件:

    mkdir ~/.ssh;touch ~/.ssh/authorized_keys
    ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.10.51
  • 两个节点相互复制该公钥文件到对方的该目录下:

    [root@mds001 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.10.22
    The authenticity of host '192.168.10.22 (192.168.10.22)' can't be established.
    ECDSA key fingerprint is SHA256:8GotQw1f08FF9REsxJKn9ObpvvOib0h1W2sfJNClXwk.
    ECDSA key fingerprint is MD5:a4:7d:50:65:13:8d:17:dd:8e:b9:10:6f:64:e8:9b:d6.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '192.168.10.22' (ECDSA) to the list of known hosts.
    root@192.168.10.22's password: 
    id_rsa.pub                                                       100%  225   125.2KB/s   00:00  
    [root@mds002 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.10.21
    The authenticity of host '192.168.10.21 (192.168.10.21)' can't be established.
    ...
  • 在本地服务器上登陆对端服务器

    [root@mds001 ~]# ssh root@192.168.10.22
    Last login: Wed Jul 26 21:24:53 2023 from 192.168.10.1
    [root@mds002 ~]# 
    [root@mds002 ~]# ssh root@192.168.10.21
    Last login: Wed Jul 26 23:23:22 2023 from 192.168.10.1
    [root@mds001 ~]# 

6.搭建Corosync高可用:

  • 主从服务器全部配置域名映射,关闭防火墙,关闭selinux:

    cat << eof >> /etc/hosts
    192.168.10.21 mds001
    192.168.10.22 mds002
    eof
    cat << eof >> /etc/hosts
    192.168.10.24 mds004
    192.168.10.25 mds005
    eof
    systemctl stop firewalld
    systemctl disable firewalld
    sed -i -e  's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config
  • 各节点都安装:

    工具说明
    pacemaker最为广泛的开源集群资源管理器(故障检测和资源恢复,保证集群服务的高可用)
    pcscorosync中的命令行工具
    psmisc管理进程的工具
    policycoreutils-pythonpython软件包,用于在 SELinux(Security-Enhanced Linux)环境中操作和管理 SELinux 策略
    [root@mds001 ~]# yum install pacemaker pcs policycoreutils-python -y
  • 各节点都设置pscd服务开机自启:

    [root@mds001 ~]# systemctl enable pcsd;systemctl restart pcsd
  • 在集群各节点上给hacluster用户设定相同的密码

    [root@mds001 ~]# echo "110119" |passwd --stdin hacluster
    更改用户 hacluster 的密码 。
    passwd:所有的身份验证令牌已经成功更新。
  • 认证各节点的用户名(hacluster)和密码(110119),在一个节点上认证即可:

    [root@mds001 ~]# pcs cluster auth mds00{1,2}
    Username: hacluster
    Password: 
    master: Authorized
    slave1: Authorized
    slave2: Authorized
  • 创建集群:

    [root@mds001 ~]# pcs cluster setup --name mylustre mds00{1,2}
    Destroying cluster on nodes: mds001, mds002...
    mds001: Stopping Cluster (pacemaker)...
    mds002: Stopping Cluster (pacemaker)...
    mds002: Successfully destroyed cluster
    mds001: Successfully destroyed cluster
    ​
    Sending 'pacemaker_remote authkey' to 'mds001', 'mds002'
    mds001: successful distribution of the file 'pacemaker_remote authkey'
    mds002: successful distribution of the file 'pacemaker_remote authkey'
    Sending cluster config files to the nodes...
    mds001: Succeeded
    mds002: Succeeded
    ​
    Synchronizing pcsd certificates on nodes mds001, mds002...
    mds001: Success
    mds002: Success
    Restarting pcsd on the nodes in order to reload the certificates...
    mds001: Success
    mds002: Success
  • 启动集群:

    [root@mds001 ~]# pcs cluster start --all
    mds001: Starting Cluster (corosync)...
    mds002: Starting Cluster (corosync)...
    mds001: Starting Cluster (pacemaker)...
    mds002: Starting Cluster (pacemaker)...
  • 好了,到此一个2节点的corosync+pacemaker集群就创建启动好了

  • 各节点查看corosync pacemaker是否都启动了,(其中一个节点的pacemaker日志报错,它这里告诉我们我配置了stonith选项,却没有发现stonith设备),解决方法:后期关闭关闭stonith选项即可。

    [root@mds001 ~]# systemctl status corosync pacemaker
    ● corosync.service - Corosync Cluster Engine
       Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
       Active: active (running) since 三 2023-07-26 22:21:09 EDT; 2min 41s ago
         Docs: man:corosync
               man:corosync.conf
               man:corosync_overview
      Process: 67381 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
     Main PID: 67389 (corosync)
       CGroup: /system.slice/corosync.service
               └─67389 corosync
    ​
    7月 26 22:21:09 mds001 corosync[67389]:  [TOTEM ] A new membership (192.168.10.21:9) was for...: 2
    7月 26 22:21:09 mds001 corosync[67389]:  [CPG   ] downlist left_list: 0 received
    7月 26 22:21:09 mds001 corosync[67389]:  [CPG   ] downlist left_list: 0 received
    7月 26 22:21:09 mds001 corosync[67389]:  [CPG   ] downlist left_list: 0 received
    7月 26 22:21:09 mds001 corosync[67389]:  [VOTEQ ] Waiting for all cluster members. Current v...: 2
    7月 26 22:21:09 mds001 corosync[67389]:  [QUORUM] This node is within the primary component ...ce.
    7月 26 22:21:09 mds001 corosync[67389]:  [QUORUM] Members[2]: 1 2
    7月 26 22:21:09 mds001 corosync[67389]:  [MAIN  ] Completed service synchronization, ready t...ce.
    7月 26 22:21:09 mds001 corosync[67381]: Starting Corosync Cluster Engine (corosync): [  OK  ]
    7月 26 22:21:09 mds001 systemd[1]: Started Corosync Cluster Engine.
    ​
    ● pacemaker.service - Pacemaker High Availability Cluster Manager
       Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
       Active: active (running) since 三 2023-07-26 22:21:10 EDT; 2min 40s ago
         Docs: man:pacemakerd
               https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html
     Main PID: 67424 (pacemakerd)
       CGroup: /system.slice/pacemaker.service
               ├─67424 /usr/sbin/pacemakerd -f
               ├─67425 /usr/libexec/pacemaker/cib
               ├─67426 /usr/libexec/pacemaker/stonithd
               ├─67427 /usr/libexec/pacemaker/lrmd
               ├─67428 /usr/libexec/pacemaker/attrd
               ├─67429 /usr/libexec/pacemaker/pengine
               └─67430 /usr/libexec/pacemaker/crmd
    ​
    7月 26 22:21:11 mds001 crmd[67430]:   notice: Node mds002 state is now member
    7月 26 22:21:11 mds001 stonith-ng[67426]:   notice: Node mds002 state is now member
    7月 26 22:21:11 mds001 crmd[67430]:   notice: The local CRM is operational
    7月 26 22:21:11 mds001 crmd[67430]:   notice: State transition S_STARTING -> S_PENDING
    7月 26 22:21:11 mds001 attrd[67428]:   notice: Node mds002 state is now member
    7月 26 22:21:11 mds001 attrd[67428]:   notice: Recorded local node as attribute writer (was unset)
    7月 26 22:21:13 mds001 crmd[67430]:   notice: Fencer successfully connected
    7月 26 22:21:32 mds001 crmd[67430]:  warning: Input I_DC_TIMEOUT received in state S_PENDIN...pped
    7月 26 22:21:32 mds001 crmd[67430]:   notice: State transition S_ELECTION -> S_PENDING
    7月 26 22:21:32 mds001 crmd[67430]:   notice: State transition S_PENDING -> S_NOT_DC
    Hint: Some lines were ellipsized, use -l to show in full.
  • 查看集群状态

    [root@mds001 ~]# pcs cluster status
    Cluster Status:
     Stack: corosync
     Current DC: mds002 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
     Last updated: Wed Jul 26 22:24:52 2023
     Last change: Wed Jul 26 22:21:32 2023 by hacluster via crmd on mds002
     2 nodes configured
     0 resource instances configured
    ​
    PCSD Status:
      mds002: Online
      mds001: Online
  • 验证集群配置信息(发生错误,需要关闭stonith选项):

    [root@mds001 ~]# crm_verify -LV
       error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
       error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
       error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
    Errors found during check: config not valid

7.配置资源防护:

  • 安装所有防护代理:

    [root@mds001 resource.d]# yum install -y fence-agents-all
  • 查看本机中的资源防护代理:

    [root@mds001 resource.d]# pcs stonith list
    fence_amt_ws - Fence agent for AMT (WS)
    fence_apc - Fence agent for APC over telnet/ssh
    fence_apc_snmp - Fence agent for APC, Tripplite PDU over SNMP
    fence_bladecenter - Fence agent for IBM BladeCenter
    fence_brocade - Fence agent for HP Brocade over telnet/ssh
    fence_cisco_mds - Fence agent for Cisco MDS
    fence_cisco_ucs - Fence agent for Cisco UCS
    fence_compute - Fence agent for the automatic resurrection of OpenStack compute instances
    fence_drac5 - Fence agent for Dell DRAC CMC/5
    fence_eaton_snmp - Fence agent for Eaton over SNMP
    fence_emerson - Fence agent for Emerson over SNMP
    fence_eps - Fence agent for ePowerSwitch
    fence_evacuate - Fence agent for the automatic resurrection of OpenStack compute instances
    fence_heuristics_ping - Fence agent for ping-heuristic based fencing
    fence_hpblade - Fence agent for HP BladeSystem
    fence_ibmblade - Fence agent for IBM BladeCenter over SNMP
    fence_idrac - Fence agent for IPMI
    fence_ifmib - Fence agent for IF MIB
    fence_ilo - Fence agent for HP iLO
    fence_ilo2 - Fence agent for HP iLO
    fence_ilo3 - Fence agent for IPMI
    fence_ilo3_ssh - Fence agent for HP iLO over SSH
    fence_ilo4 - Fence agent for IPMI
    fence_ilo4_ssh - Fence agent for HP iLO over SSH
    fence_ilo5 - Fence agent for IPMI
    fence_ilo5_ssh - Fence agent for HP iLO over SSH
    fence_ilo_moonshot - Fence agent for HP Moonshot iLO
    fence_ilo_mp - Fence agent for HP iLO MP
    fence_ilo_ssh - Fence agent for HP iLO over SSH
    fence_imm - Fence agent for IPMI
    fence_intelmodular - Fence agent for Intel Modular
    fence_ipdu - Fence agent for iPDU over SNMP
    fence_ipmilan - Fence agent for IPMI
    fence_kdump - fencing agent for use with kdump crash recovery service
    fence_mpath - Fence agent for multipath persistent reservation
    fence_redfish - I/O Fencing agent for Redfish
    fence_rhevm - Fence agent for RHEV-M REST API
    fence_rsa - Fence agent for IBM RSA
    fence_rsb - I/O Fencing agent for Fujitsu-Siemens RSB
    fence_sbd - Fence agent for sbd
    fence_scsi - Fence agent for SCSI persistent reservation
    fence_virt - Fence agent for virtual machines
    fence_vmware_rest - Fence agent for VMware REST API
  • 开启stonith设备:

    pcs property set stonith-enabled=true
  • 配置fence_heuristics_ping

    pcs stonith create stonith-ping-mds001 fence_heuristics_ping ping_targets=192.168.10.21
    pcs stonith create stonith-ping-mds002 fence_heuristics_ping ping_targets=192.168.10.22

8.配置Lnet网络:

  • 总共需要有两种网卡(ens33,ens38)

  • 修改配置文件:/etc/modprobe.d/lustre.conf

    cat << eof > /etc/modprobe.d/lustre.conf
    options lnet networks="tcp(ens33),tcp2(ens38)"
    eof
  • 复制lustre.conf文件到另一个节点上:

    scp /etc/modprobe.d/lustre.conf root@mds001:/etc/modprobe.d
  • 两个节点,重新加载模块:

    lustre_rmmod && modprobe -v lustre
  • 查看:

    [root@mds001 ~]# lctl list_nids
    192.168.10.21@tcp
    192.168.209.21@tcp2
    [root@mds002 ~]#  lctl list_nids
    192.168.10.22@tcp
    192.168.209.22@tcp2

9.使用Lustre代理方式添加:

  • ocf:lustre:Lustre

    • 由于其范围较窄,它比 Lustre 存储资源更简单ocf:heartbeat:Filesystem,因此更适合管理 Lustre 存储资源

    • 专为 Lustre OSD 开发,该 RA 由 Lustre 项目分发,并在 Lustre 2.10.0 版本及以上版本中提供

    • ocf:heartbeat:ZFS: (得自己安装,用到了存储池)

  • 下载:(需要注意两个节点都需要安装)

    wget https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/lustre-resource-agents-2.12.1-1.el7.x86_64.rpm
    rpm -ivh lustre-resource-agents-2.12.1-1.el7.x86_64.rpm
  • 创建资源:

    参数说明
    <resource name>资源名
    target=用于存储的块设备的路径
    mountpoint=是 OSD 的安装点
    pcs resource create global-mgs ocf:lustre:Lustre target=/dev/sdb mountpoint=/mnt/mgs
    pcs resource create global-mdt1 ocf:lustre:Lustre target=/dev/sdc mountpoint=/mnt/mdt1
    pcs resource create global-mdt2 ocf:lustre:Lustre target=/dev/sdd mountpoint=/mnt/mdt2
    pcs resource create global-ost1 ocf:lustre:Lustre target=/dev/sde mountpoint=/mnt/ost1
    pcs resource create global-ost2 ocf:lustre:Lustre target=/dev/sdf mountpoint=/mnt/ost2
  • 配置资源优先级:不能添加资源组,会将资源绑定在一个节点上,使优先级无效

    pcs constraint location add global-constraint-mgs global-mgs mds001 10
    pcs constraint location add global-constraint-mdt1 global-mdt1 mds001 10
    pcs constraint location add global-constraint-mdt2 global-mdt2 mds002 10
    pcs constraint location add global-constraint-ost1 global-ost1 mds001 10
    pcs constraint location add global-constraint-ost2 global-ost2 mds002 10
  • 设置启动顺序,mgs > mdt > ost:

    pcs constraint order start global-mgs then start global-mdt1
    pcs constraint order start global-mgs then start global-mdt2
    pcs constraint order start global-mgs then start global-ost1
    pcs constraint order start global-mgs then start global-ost2

10.创建Lnet监控资源:

  • Pacemaker 可以配置为监视集群服务器的各个方面,以帮助确定整体系统的运行状况。这提供了额外的数据点,用于做出有关在何处运行资源的决策。

  • Lustre 2.10版本引入了两种监控资源代理:

    • ocf:lustre:healthLNET– 用于监控 LNet 连接(需要配置LNet网络接口)

    • ocf:lustre:healthLUSTRE– 用于监控Lustre的健康状况(需要安装)

  • 创建:(没有在那个节点之分)

    创建参数说明
    lctl告诉资源代理使用lctl ping来监视 LNet NID 。如果未设置,则使用常规系统 ping 命令。
    multiplier是一个正整数值,乘以响应ping数的机器数量。结果需要大于资源粘性值。
    device是要监控的网络设备,例如eth1ib0
    host_list是要尝试 ping 的以空格分隔的 LNet NID 列表。如果lctl=falsehost_list应包含常规主机名或 IP 地址。
    --clone告诉 Pacemaker 在集群中的每个节点上启动资源实例。
    pcs resource create ping-lnet ocf:lustre:healthLNET \
    lctl=true \
    multiplier=1001 \
    device=ens33 \
    host_list="192.168.209.21@tcp2 192.168.209.22@tcp2" \
    --clone

11.创建Ustre监控资源:

  • ocf:lustre:healthLUSTRE遵循与 相同的实现模型ocf:lustre:healthLNET

  • 只不过不是监视 LNet NID,而是ocf:lustre:healthLUSTRE监视 Lustre 文件的内容health_check并维护一个名为 的属性lustred

  • 创建:

    pcs resource create global-healthLUSTRE ocf:lustre:healthLUSTRE --clone 

12.进行故障转移:

[root@mds004 ~]# pcs status
Cluster name: MyUpdate
Stack: corosync
Current DC: mds004 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 10 06:09:44 2023
Last change: Thu Aug 10 06:07:46 2023 by root via cibadmin on mds004
​
2 nodes configured
11 resource instances configured
​
Online: [ mds004 mds005 ]
​
Full list of resources:
​
 Clone Set: ping-lnet-clone [ping-lnet]
     Started: [ mds004 mds005 ]
 Clone Set: global-healthLUSTRE-clone [global-healthLUSTRE]
     Started: [ mds004 mds005 ]
 stonith-ping-mds004    (stonith:fence_heuristics_ping):        Started mds004
 stonith-ping-mds005    (stonith:fence_heuristics_ping):        Started mds005
 global-mgs     (ocf::lustre:Lustre):   Started mds004
 global-mdt1    (ocf::lustre:Lustre):   Started mds004
 global-mdt2    (ocf::lustre:Lustre):   Started mds005
 global-ost1    (ocf::lustre:Lustre):   Started mds004
 global-ost2    (ocf::lustre:Lustre):   Started mds005
​
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@mds005 lustre2.12.1]# pcs status
Cluster name: MyUpdate
Stack: corosync
Current DC: mds005 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Thu Aug 10 06:11:16 2023
Last change: Thu Aug 10 06:07:46 2023 by root via cibadmin on mds004
​
2 nodes configured
11 resource instances configured
​
Online: [ mds005 ]
OFFLINE: [ mds004 ]
​
Full list of resources:
​
 Clone Set: ping-lnet-clone [ping-lnet]
     Started: [ mds005 ]
     Stopped: [ mds004 ]
 Clone Set: global-healthLUSTRE-clone [global-healthLUSTRE]
     Started: [ mds005 ]
     Stopped: [ mds004 ]
 stonith-ping-mds004    (stonith:fence_heuristics_ping):        Started mds005
 stonith-ping-mds005    (stonith:fence_heuristics_ping):        Started mds005
 global-mgs     (ocf::lustre:Lustre):   Started mds005
 global-mdt1    (ocf::lustre:Lustre):   Started mds005
 global-mdt2    (ocf::lustre:Lustre):   Started mds005
 global-ost1    (ocf::lustre:Lustre):   Started mds005
 global-ost2    (ocf::lustre:Lustre):   Started mds005
​
Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

2.客户端远程测试:

1.下载并安装:

  • 下载:

    rpm包说明
    kernel-*.el7_lustre.x86_64.rpm带 Lustre 补丁的 Linux 内核
    kernel-devel-*.el7_lustre.x86_64.rpm编译第三方模块 (如网终驱动程序) 所需的内核树部分
    kernel-headers-*.el7_lustre.x86_64.rpm在/user/include 下的头文件,用于编译用户空间和内核相关代码
    lustre-client-dkms-*.el7.noarch.rpmkmod-lustre-client 的代客户端 RPM,含动态内核模块支持(DKMS)。
    lustre-client-*.el7.x86_64.rpm客户端命令行工具
    mkdir lustre2.12 && cd lustre2.12
    wget \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kernel-3.10.0-957.10.1.el7_lustre.x86_64.rpm \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kernel-devel-3.10.0-957.10.1.el7_lustre.x86_64.rpm \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/server/RPMS/x86_64/kernel-headers-3.10.0-957.10.1.el7_lustre.x86_64.rpm \
    https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el7/client/RPMS/x86_64/lustre-client-dkms-2.12.7-1.el7.noarch.rpm \
    wget https://downloads.whamcloud.com/public/lustre/lustre-2.12.1/el7/client/RPMS/x86_64/lustre-client-2.12.1-1.el7.x86_64.rpm
  • 安装(需要安装gcc等依赖):

    [root@client lustre2.12]# rpm -ivh *.rpm
    错误:依赖检测失败:
            /usr/bin/expect 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
            dkms >= 2.2.0.3-28.git.7c3e7c5 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
            gcc 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
            kernel-devel 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
            libyaml-devel 被 lustre-client-dkms-2.12.1-1.el7.noarch 需要
  • 安装扩展源:

    yum install -y epel-release
  • 安装依赖:

    yum install -y perl expect gcc kernel-devel libyaml-devel dkms
  • rpm安装内核:

    [root@client lustre2.12]# rpm -ivh kernel-*.rpm --force
  • 安装成功了,但我们注意到,⽇志表明,Lustre内核模块并没有⾃动编译,原因:内核头⽂件和内核版本不同

    [root@client ~]# rpm -qa | grep kernel
    kernel-3.10.0-1160.el7.x86_64
    kernel-tools-3.10.0-1160.el7.x86_64
    kernel-headers-3.10.0-957.10.1.el7_lustre.x86_64
    kernel-3.10.0-957.10.1.el7_lustre.x86_64
    kernel-tools-libs-3.10.0-1160.el7.x86_64
    kernel-devel-3.10.0-957.10.1.el7_lustre.x86_64
    kernel-devel-3.10.0-1160.92.1.el7.x86_64
  • 重启主机查看内核:

    [root@client ~]# init 6
    [root@client ~]# uname -r
    3.10.0-957.10.1.el7_lustre.x86_64
  • rpm安装客户端:

    [root@client ~]# rpm -ivh lustre-client-dkms-2.12.1-1.el7.noarch.rpm
    准备中...                          ################################# [100%]
    正在升级/安装...
       1:lustre-client-dkms-2.12.1-1.el7  ################################# [100%]
    Loading new lustre-client-2.12.1 DKMS files...
    Deprecated feature: REMAKE_INITRD (/usr/src/lustre-client-2.12.1/dkms.conf)
    Building for 3.10.0-957.10.1.el7_lustre.x86_64
    Building initial module for 3.10.0-957.10.1.el7_lustre.x86_64
    Deprecated feature: REMAKE_INITRD (/var/lib/dkms/lustre-client/2.12.1/source/dkms.conf)
    configure: WARNING:
    ​
    No selinux package found, unable to build selinux enabled tools
    ​
    Done.
    Deprecated feature: REMAKE_INITRD (/var/lib/dkms/lustre-client/2.12.1/source/dkms.conf)
    Deprecated feature: REMAKE_INITRD (/var/lib/dkms/lustre-client/2.12.1/source/dkms.conf)
    ​
    lnet_selftest.ko.xz:
    Running module version sanity check.
     - Original module
       - No original module exists within this kernel
     - Installation
       - Installing to /lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/
    ​
    lnet.ko.xz:
    .....
    [root@client ~]# rpm -ivh lustre-client-2.12.1-1.el7.x86_64.rpm
    准备中...                          ################################# [100%]
    正在升级/安装...
       1:lustre-client-2.12.1-1.el7       ################################# [100%]
  • 加载模块:

    [root@client ~]# modprobe lustre
    [root@client ~]# lsmod | grep lustre
    lustre                754256  0 
    lmv                   177987  1 lustre
    mdc                   237615  1 lustre
    lov                   314554  1 lustre
    ptlrpc               1345789  7 fid,fld,lmv,mdc,lov,osc,lustre
    obdclass             1741312  8 fid,fld,lmv,mdc,lov,osc,lustre,ptlrpc
    lnet                  600632  6 lmv,osc,lustre,obdclass,ptlrpc,ksocklnd
    libcfs                415252  11 fid,fld,lmv,mdc,lov,osc,lnet,lustre,obdclass,ptlrpc,ksocklnd
  • 查看lustre版本:

    [root@mds001 ~]# modinfo lustre
    filename:       /lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/lustre/fs/lustre.ko
    license:        GPL
    version:        2.12.1
    description:    Lustre Client File System
    author:         OpenSFS, Inc. <http://www.lustre.org/>
    retpoline:      Y
    rhelversion:    7.6
    srcversion:     E50D950B04B4044ABCBCFA3
    depends:        obdclass,ptlrpc,libcfs,lnet,lmv,mdc,lov
    vermagic:       3.10.0-957.10.1.el7_lustre.x86_64 SMP mod_unload modversions 

2.挂载客户端到双机lustre:

  • 创建目录,远程挂载:

    [root@client ~]# mkdir /mnt/global-client1;mount -t lustre 192.168.10.21@tcp:192.168.10.22@tcp:/global /mnt/global-client1
    [root@client ~]# mkdir /mnt/global-client2;mount -t lustre 192.168.10.21@tcp:192.168.10.22@tcp:/global /mnt/global-client2
  • 查看:

    [root@client ~]# mount | grep lustre
    192.168.10.21@tcp:192.168.10.22@tcp:/global on /mnt/global-client1 type lustre (rw,seclabel,lazystatfs)
    192.168.10.21@tcp:192.168.10.22@tcp:/global on /mnt/global-client2 type lustre (rw,seclabel,lazystatfs)
    [root@client ~]# lfs df
    UUID                   1K-blocks        Used   Available Use% Mounted on
    global-MDT0000_UUID      2330824       16548     2106120   1% /mnt/global-client1[MDT:0]
    global-MDT0002_UUID      2330824       16468     2106200   1% /mnt/global-client1[MDT:2]
    global-OST0000_UUID      3865564       34084     3605384   1% /mnt/global-client1[OST:0]
    global-OST0002_UUID      3865564       34088     3605380   1% /mnt/global-client1[OST:2]
    ​
    filesystem_summary:      7731128       68172     7210764   1% /mnt/global-client1
    ​
    UUID                   1K-blocks        Used   Available Use% Mounted on
    global-MDT0000_UUID      2330824       16548     2106120   1% /mnt/global-client2[MDT:0]
    global-MDT0002_UUID      2330824       16468     2106200   1% /mnt/global-client2[MDT:2]
    global-OST0000_UUID      3865564       34084     3605384   1% /mnt/global-client2[OST:0]
    global-OST0002_UUID      3865564       34088     3605380   1% /mnt/global-client2[OST:2]
    ​
    filesystem_summary:      7731128       68172     7210764   1% /mnt/global-client2

4.测试:

  • 在/mnt/global客户端处,创建文件并写数据:

    [root@client ~]# echo "Hello Lustre,I am Centos7.0-client  client1" > /mnt/global-client1/client-test
  • 在/mnt/global2客户端处,发现文件first_file并且查看到数据

    [root@client ~]# ll /mnt/global-client2
    总用量 5
    -rw-r--r--. 1 root root 44 7月  26 05:16 client-test
    -rw-r--r--. 1 root root 26 7月  26 02:51 first_file
    [root@client ~]# cat /mnt/global-client2/client-test
    Hello Lustre,I am Centos7.0-client  client1

    到此高可用的Lustre集群就完全搭建完成。

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
Lustre is a shared disk file system, generally used for large scale cluster computing. The name Lustre is a portmanteau of Linux and cluster. The project aims to provide a file system for clusters of tens of thousands of nodes with petabytes of storage capacity, without compromising speed or security. Lustre is available under the GNU GPL.<br><br>Lustre is designed, developed and maintained by Sun Microsystems, Inc. with input from many other individuals and companies. Sun completed its acquisition of Cluster File Systems, Inc., including the Lustre file system, on October 2, 2007, with the intention of bringing the benefits of Lustre technologies to Sun's ZFS file system and the Solaris operating system.<br><br>Lustre file systems are used in computer clusters ranging from small workgroup clusters to large-scale, multi-site clusters. Fifteen of the top 30 supercomputers in the world use Lustre file systems, including the world's fastest supercomputer, the Blue Gene/L at Lawrence Livermore National Laboratory (LLNL). Other supercomputers that use the Lustre file system include systems at Oak Ridge National Laboratory, Pacific Northwest National Laboratory, and Los Alamos National Laboratory in North America, the largest system in Asia at Tokyo Institute of Technology, and the largest system in Europe at CEA.<br><br>Lustre file systems can support up to tens of thousands of client systems, petabytes (PBs) of storage and hundreds of gigabytes per second (GB/s) of I/O throughput. Businesses ranging from Internet service providers to large financial institutions deploy Lustre file systems in their datacenters. Due to the high scalability of Lustre file systems, Lustre deployments are popular in the oil and gas, manufacturing, rich media and finance sectors.<br>
您好!对于搭建 Linux HPC(高性能计算)集群,以下是一般的步骤: 1. 硬件准备: - 购买适当的服务器和网络设备。 - 确保服务器具有足够的计算能力和存储空间。 2. 操作系统选择: - 选择适合 HPC 集群的 Linux 发行版,如 CentOS、Ubuntu Server 或者 Red Hat Enterprise Linux(RHEL)等。 3. 网络架构规划: - 设计网络拓扑结构,包括子网划分、IP 地址分配等。 - 设置网络交换机和路由器,确保良好的网络连接。 4. 安装和配置操作系统: - 在每个服务器上安装选择的 Linux 发行版。 - 配置网络设置和基本系统设置,如主机名、域名解析等。 5. 文件系统和共享存储: - 选择合适的文件系统,如 Lustre、GPFS 等。 - 配置共享存储,确保所有节点对共享文件系统的访问。 6. 软件安装: - 安装并配置 HPC 软件栈,如 MPI(消息传输接口)、OpenMP(开放多处理)等。 - 安装作业调度器,如 Slurm、PBS(Portable Batch System)等。 7. 集群管理工具: - 配置集群管理工具,如 Ganglia、Nagios 等,以监控集群的性能和健康状态。 8. 测试与优化: - 运行一些基准测试来评估集群的性能。 - 根据测试结果进行调优,如调整网络设置、优化作业调度策略等。 以上是一般的搭建步骤,具体的搭建过程可能会因集群规模、硬件和软件选择等而有所不同。希望能对您有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值