在CentOS7上部署GFS集群

最新推荐文章于 2024-08-20 16:48:22 发布

huzhenwei

最新推荐文章于 2024-08-20 16:48:22 发布

阅读量1.1w

点赞数 1

分类专栏： Linux 文章标签： CentOS7 存储 GFS 集群文件系统

本文链接：https://blog.csdn.net/huzhenwei/article/details/80708592

版权

Linux 专栏收录该内容

47 篇文章 0 订阅

订阅专栏

在CentOS7上部署GFS集群

在CentOS7上部署GFS集群

准备工作

了解GFS的知识（一定要熟读！）

本文提到的GFS是Redhat的Global File System，不是Google的Google File System。GFS的官方文档链接如下：
RHEL6版（中文）https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/6/html/global_file_system_2/
RHEL7版（英文）https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/global_file_system_2/index

集群各主机hostname和hosts文件设置

在集群的每个主机上，修改hostname文件并确保主机名在集群中独一无二，在hosts文件中列出集群所有主机的hostname和IP地址。示例如下：

[root@a01 ~]# cat /etc/hostname
a01

[root@a01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.201 a01
192.168.100.202 a02
192.168.100.203 a03

确保各主机已连接到SAN或IP-SAN

可参考我之前的一篇文章（但不要执行最后一步“在客户机上对ISCSI存储卷进行分区和挂载”，否则会遇到许多意料之外的问题）：
https://blog.csdn.net/huzhenwei/article/details/80690623

安装必要的软件包

yum -y install gfs2-utils  lvm2-cluster
yum install lvm2-sysvinit

配置corosync服务

新建/etc/corosync/corosync.conf文件，内容示例如下：

# Please read the corosync.conf.5 manual page
totem {
        version: 2
        cluster_name: cluster0
        # crypto_cipher and crypto_hash: Used for mutual node authentication.
        # If you choose to enable this, then do remember to create a shared
        # secret with "corosync-keygen".
        # enabling crypto_cipher, requires also enabling of crypto_hash.
        crypto_cipher: none
        crypto_hash: none
        clear_node_high_bit: yes

        # interface: define at least one interface to communicate
        # over. If you define more than one interface stanza, you must
        # also set rrp_mode.
        interface {
                # Rings must be consecutively numbered, starting at 0.
                ringnumber: 0
                # This is normally the *network* address of the
                # interface to bind to. This ensures that you can use
                # identical instances of this configuration file
                # across all your cluster nodes, without having to
                # modify this option.
                bindnetaddr: 192.168.100.0
                # However, if you have multiple physical network
                # interfaces configured for the same subnet, then the
                # network address alone is not sufficient to identify
                # the interface Corosync should bind to. In that case,
                # configure the *host* address of the interface
                # instead:
                # bindnetaddr: 192.168.1.1
                # When selecting a multicast address, consider RFC
                # 2365 (which, among other things, specifies that
                # 239.255.x.x addresses are left to the discretion of
                # the network administrator). Do not reuse multicast
                # addresses across multiple Corosync clusters sharing
                # the same network.
                mcastaddr: 239.255.1.1
                # Corosync uses the port you specify here for UDP
                # messaging, and also the immediately preceding
                # port. Thus if you set this to 5405, Corosync sends
                # messages over UDP ports 5405 and 5404.
                mcastport: 5405
                # Time-to-live for cluster communication packets. The
                # number of hops (routers) that this ring will allow
                # itself to pass. Note that multicast routing must be
                # specifically enabled on most network routers.
                ttl: 1
        }
}

logging {
        # Log the source file and line where messages are being
        # generated. When in doubt, leave off. Potentially useful for
        # debugging.
        fileline: off
        # Log to standard error. When in doubt, set to no. Useful when
        # running in the foreground (when invoking "corosync -f")
        to_stderr: no
        # Log to a log file. When set to "no", the "logfile" option
        # must not be set.
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        # Log to the system log daemon. When in doubt, set to yes.
        to_syslog: yes
        # Log debug messages (very verbose). When in doubt, leave off.
        debug: on
        # Log messages with time stamps. When in doubt, set to on
        # (unless you are only logging to syslog, where double
        # timestamps can be annoying).
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        expected_votes: 3
        votes: 1
}

注意关键配置项：
cluster_name: 此项设置的集群名，在之后格式化逻辑卷为gfs2类型时要用到。
clear_node_high_bit: yes：dlm服务需要，否则无法启动。
quorum {}模块：集群选举投票设置，必需配置。如果是2节点的集群，可参考如下配置：

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        two_node: 1
        expected_votes: 2
        #votes: 1
}

配置lvm

# 开启lvm的集群模式。
# 这个命令会自动修改/etc/lvm/lvm.conf配置文件中的locking_type和use_lvmetad选项的值
[root@a01 ~]# lvmconf --enable-cluster

修改后区别如下：
[root@a01 ~]# diff /etc/lvm/lvm.conf /etc/lvm/lvm.conf.lvmconfold
771c771
<     locking_type = 3
---
>       locking_type = 3
940c940
<     use_lvmetad = 0
---
>       use_lvmetad = 1

启动相关服务

注意先后顺序，先在集群所有主机上启动corosync，集群状态正常之后，在各个主机上启动dlm、clvmd服务。

#在集群所有主机上启动corosync服务
systemctl enable corosync
systemctl start corosync

在corosync服务启动完成后，可以使用corosync-quorumtool -s查看集群的状态，确保状态中的Quorate为Yes。示例如下：

[root@a01 ~]# corosync-quorumtool -s
Quorum information
------------------
Date:             Fri Jun 15 16:56:07 2018
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1084777673
Ring ID:          1084777673/380
Quorate:          Yes

在集群各个主机上启动dlm、clvmd服务

systemctl enable dlm
systemctl start dlm

systemctl enable clvmd
systemctl start clvmd

设置集群卷组等并格式化gfs分区

在集群的一台主机上运行如下命令：

# 查看物理卷信息，如果没有列出存储服务器上的块设备，请确保各主机已连接到SAN或IP-SAN
pvscan
pvs

# 创建卷组、逻辑卷并将逻辑卷格式化为gfs2类型
vgcreate -Ay -cy gfsvg /dev/mapper/mpatha
lvcreate -L 800G -n gfsvol1 gfsvg 
lvs -o +devices gfsvg
# 下面命令中的-j 4参数是指日志区的数量为4，这个数量通常为集群主机数N+1，
# 集群规模扩大时，可以使用gfs2_jadd命令添加日志区。
# -t后的“cluster0”为集群名，确保它与corosync.conf中配置的cluster_name一致。
mkfs.gfs2 -p lock_dlm -t cluster0:gfsvolfs -j 4 /dev/gfsvg/gfsvol1

挂载gfs2逻辑卷

在集群的所有主机上运行如下命令：

# 创建挂载点目录
mkdir /mnt/iscsigfs
# 挂载gfs2分区
# 如果主机提示没有/dev/gfsvg/gfsvol1这个设备，原因是主机在创建此逻辑卷之前启动，将主机重新启动即可。
mount -t gfs2 /dev/gfsvg/gfsvol1 /mnt/iscsigfs -o noatime,nodiratime

设置开机自动挂载，请参考：
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/storage_administration_guide/iscsi-api

常见错误

运行pvs、vgcreate等lvm相关命令时提示connect()等错误
错误示例如下，原因是clvmd没有正常启动
connect() failed on local socket: 没有那个文件或目录 Internal cluster locking initialisation failed. WARNING: Falling back to local file-based locking. Volume Groups with the clustered attribute will be inaccessible.
运行pvs、vgcreate等lvm相关命令时提示Skipping clustered volume group XXX或者Device XXX excluded by a filter
这通常是做了vgremove等操作后、或块设备上存在旧的卷组信息导致的，可以通过重新格式化块设备的方式解决。命令示例如下：
mkfs.xfs /dev/mapper/mpatha