在CentOS7上部署GFS集群

在CentOS7上部署GFS集群

准备工作

了解GFS的知识(一定要熟读!)

本文提到的GFS是Redhat的Global File System,不是Google的Google File System。GFS的官方文档链接如下:
RHEL6版(中文)https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/6/html/global_file_system_2/
RHEL7版(英文)https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/global_file_system_2/index

集群各主机hostname和hosts文件设置

在集群的每个主机上,修改hostname文件并确保主机名在集群中独一无二,在hosts文件中列出集群所有主机的hostname和IP地址。示例如下:

[root@a01 ~]# cat /etc/hostname
a01

[root@a01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.201 a01
192.168.100.202 a02
192.168.100.203 a03

确保各主机已连接到SAN或IP-SAN

可参考我之前的一篇文章(但不要执行最后一步“在客户机上对ISCSI存储卷进行分区和挂载”,否则会遇到许多意料之外的问题):
https://blog.csdn.net/huzhenwei/article/details/80690623

安装必要的软件包

yum -y install gfs2-utils  lvm2-cluster
yum install lvm2-sysvinit

配置corosync服务

新建/etc/corosync/corosync.conf文件,内容示例如下:

# Please read the corosync.conf.5 manual page
totem {
        version: 2
        cluster_name: cluster0
        # crypto_cipher and crypto_hash: Used for mutual node authentication.
        # If you choose to enable this, then do remember to create a shared
        # secret with "corosync-keygen".
        # enabling crypto_cipher, requires also enabling of crypto_hash.
        crypto_cipher: none
        crypto_hash: none
        clear_node_high_bit: yes

        # interface: define at least one interface to communicate
        # over. If you define more than one interface stanza, you must
        # also set rrp_mode.
        interface {
                # Rings must be consecutively numbered, starting at 0.
                ringnumber: 0
                # This is normally the *network* address of the
                # interface to bind to. This ensures that you can use
                # identical instances of this configuration file
                # across all your cluster nodes, without having to
                # modify this option.
                bindnetaddr: 192.168.100.0
                # However, if you have multiple physical network
                # interfaces configured for the same subnet, then the
                # network address alone is not sufficient to identify
                # the interface Corosync should bind to. In that case,
                # configure the *host* address of the interface
                # instead:
                # bindnetaddr: 192.168.1.1
                # When selecting a multicast address, consider RFC
                # 2365 (which, among other things, specifies that
                # 239.255.x.x addresses are left to the discretion of
                # the network administrator). Do not reuse multicast
                # addresses across multiple Corosync clusters sharing
                # the same network.
                mcastaddr: 239.255.1.1
                # Corosync uses the port you specify here for UDP
                # messaging, and also the immediately preceding
                # port. Thus if you set this to 5405, Corosync sends
                # messages over UDP ports 5405 and 5404.
                mcastport: 5405
                # Time-to-live for cluster communication packets. The
                # number of hops (routers) that this ring will allow
                # itself to pass. Note that multicast routing must be
                # specifically enabled on most network routers.
                ttl: 1
        }
}

logging {
        # Log the source file and line where messages are being
        # generated. When in doubt, leave off. Potentially useful for
        # debugging.
        fileline: off
        # Log to standard error. When in doubt, set to no. Useful when
        # running in the foreground (when invoking "corosync -f")
        to_stderr: no
        # Log to a log file. When set to "no", the "logfile" option
        # must not be set.
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        # Log to the system log daemon. When in doubt, set to yes.
        to_syslog: yes
        # Log debug messages (very verbose). When in doubt, leave off.
        debug: on
        # Log messages with time stamps. When in doubt, set to on
        # (unless you are only logging to syslog, where double
        # timestamps can be annoying).
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        expected_votes: 3
        votes: 1
}

注意关键配置项:
cluster_name: 此项设置的集群名,在之后格式化逻辑卷为gfs2类型时要用到。
clear_node_high_bit: yes:dlm服务需要,否则无法启动。
quorum {}模块:集群选举投票设置,必需配置。如果是2节点的集群,可参考如下配置:

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        two_node: 1
        expected_votes: 2
        #votes: 1
}

配置lvm

# 开启lvm的集群模式。
# 这个命令会自动修改/etc/lvm/lvm.conf配置文件中的locking_type和use_lvmetad选项的值
[root@a01 ~]# lvmconf --enable-cluster

修改后区别如下:
[root@a01 ~]# diff /etc/lvm/lvm.conf /etc/lvm/lvm.conf.lvmconfold
771c771
<     locking_type = 3
---
>       locking_type = 3
940c940
<     use_lvmetad = 0
---
>       use_lvmetad = 1

启动相关服务

注意先后顺序,先在集群所有主机上启动corosync,集群状态正常之后,在各个主机上启动dlm、clvmd服务。

#在集群所有主机上启动corosync服务
systemctl enable corosync
systemctl start corosync

在corosync服务启动完成后,可以使用corosync-quorumtool -s查看集群的状态,确保状态中的Quorate为Yes。示例如下:

[root@a01 ~]# corosync-quorumtool -s
Quorum information
------------------
Date:             Fri Jun 15 16:56:07 2018
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1084777673
Ring ID:          1084777673/380
Quorate:          Yes

在集群各个主机上启动dlm、clvmd服务

systemctl enable dlm
systemctl start dlm

systemctl enable clvmd
systemctl start clvmd

设置集群卷组等并格式化gfs分区

在集群的一台主机上运行如下命令:

# 查看物理卷信息,如果没有列出存储服务器上的块设备,请确保各主机已连接到SAN或IP-SAN
pvscan
pvs

# 创建卷组、逻辑卷并将逻辑卷格式化为gfs2类型
vgcreate -Ay -cy gfsvg /dev/mapper/mpatha
lvcreate -L 800G -n gfsvol1 gfsvg 
lvs -o +devices gfsvg
# 下面命令中的-j 4参数是指日志区的数量为4,这个数量通常为集群主机数N+1,
# 集群规模扩大时,可以使用gfs2_jadd命令添加日志区。
# -t后的“cluster0”为集群名,确保它与corosync.conf中配置的cluster_name一致。
mkfs.gfs2 -p lock_dlm -t cluster0:gfsvolfs -j 4 /dev/gfsvg/gfsvol1

挂载gfs2逻辑卷

在集群的所有主机上运行如下命令:

# 创建挂载点目录
mkdir /mnt/iscsigfs
# 挂载gfs2分区
# 如果主机提示没有/dev/gfsvg/gfsvol1这个设备,原因是主机在创建此逻辑卷之前启动,将主机重新启动即可。
mount -t gfs2 /dev/gfsvg/gfsvol1 /mnt/iscsigfs -o noatime,nodiratime

设置开机自动挂载,请参考:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/storage_administration_guide/iscsi-api

常见错误

  • 运行pvs、vgcreate等lvm相关命令时提示connect()等错误
    错误示例如下,原因是clvmd没有正常启动

    connect() failed on local socket: 没有那个文件或目录
    Internal cluster locking initialisation failed.
    WARNING: Falling back to local file-based locking.
    Volume Groups with the clustered attribute will be inaccessible.
  • 运行pvs、vgcreate等lvm相关命令时提示Skipping clustered volume group XXX或者Device XXX excluded by a filter
    这通常是做了vgremove等操作后、或块设备上存在旧的卷组信息导致的,可以通过重新格式化块设备的方式解决。命令示例如下:

    mkfs.xfs /dev/mapper/mpatha
展开阅读全文

没有更多推荐了,返回首页