搭建glusterfs集群

搭建glusterfs集群

Glusterfs简介

GlusterFS是Scale-Out存储解决方案Gluster的核心,它是一个开源的分布式文件系统,具有强大的横向扩展能力,通过扩展能够支持数PB存储容量和处理数千客户端。GlusterFS借助TCP/IP或InfiniBandRDMA网络将物理分布的存储资源聚集在一起,使用单一全局命名空间来管理数据。

说起glusterfs可能比较陌生,可能大家更多的听说和使用的是NFS,GFS,HDFS之类的,这之中的NFS应该是使用最为广泛的,简单易于管理,但是NFS以及后边会说到MooseFS都会存在单点故障,为了解决这个问题一般情况下都会结合DRBD进行块儿复制。但是glusterfs就完全不用考虑这个问题了,因为它是一个完全的无中心的系统。

glusterfs官网

glusterfs文档

Glusterfs特点

  • 扩展性和高性能

    GlusterFS利用双重特性来提供几TB至数PB的高扩展存储解决方案。Scale-Out架构允许通过简单地增加资源来提高存储容量和性能,磁盘、计算和I/O资源都可以独立增加,支持10GbE和InfiniBand等高速网络互联。Gluster弹性哈希(ElasticHash)解除了GlusterFS对元数据服务器的需求,消除了单点故障和性能瓶颈,真正实现了并行化数据访问。

  • 高可用性

    GlusterFS可以对文件进行自动复制,如镜像或多次复制,从而确保数据总是可以访问,甚至是在硬件故障的情况下也能正常访问。自我修复功能能够把数据恢复到正确的状态,而且修复是以增量的方式在后台执行,几乎不会产生性能负载。GlusterFS没有设计自己的私有数据文件格式,而是采用操作系统中主流标准的磁盘文件系统(如EXT3、ZFS)来存储文件,因此数据可以使用各种标准工具进行复制和访问。

  • 弹性卷管理

    数据储存在逻辑卷中,逻辑卷可以从虚拟化的物理存储池进行独立逻辑划分而得到。存储服务器可以在线进行增加和移除,不会导致应用中断。逻辑卷可以在所有配置服务器中增长和缩减,可以在不同服务器迁移进行容量均衡,或者增加和移除系统,这些操作都可在线进行。文件系统配置更改也可以实时在线进行并应用,从而可以适应工作负载条件变化或在线性能调优。

系统说明

GlusterFS主要分为Server端和Client端,其中server端主要包含glusterd和glusterfsd两种进程,分别用于管理GlusterFS系统进程本身(监听24007端口)和存储块—brick(一个brick对应一个glusterfsd进程,并监听49152+端口);

GlusterFS的Client端支持NFS、CIFS、FTP、libgfapi以及基于FUSE的本机客户端等多种访问方式,生产中我们一般都会使用基于FUSE的客户端(其他的可以自行尝试)。

GlusterFS的配置文件保存在/var/lib/glusterd下,日志保存在/var/log下。生产中建议搭建6台以上服务节点,卷类型使用分布式复制卷,副本数设为3,基本文件系统类型使用xfs。(brick数为副本数的倍数时,复制卷会自动转化为分布式复制卷)

系统安装

环境说明
ip角色存储设备FS类型
192.168.2.10server1/dev/sdbext4
192.168.2.11server2/dev/sdbext4
192.168.2.12server3/dev/sdbext4
192.168.2.13server4/dev/sdbext4

hostname设置及ssh免密登录设置(以server1为例,每台server都需要设置):

[root@server1 ~]# vim /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.10 server1
192.168.2.11 server2
192.168.2.12 server3
192.168.2.13 server4

[root@server1 ~]# ssh-keygen
[root@server1 ~]# for i in {10..13}
> do
> scp /etc/hosts  root@192.168.2.$i:/etc/hosts
> ssh-copy-id root@192.168.2.$i
> done
[root@server1 ~]# ssh root@server2
[root@server2 ~]# 登出
Connection to server2 closed.

关闭防火墙

[root@server1 ~]# systemctl stop firewalld  && systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

selinux设置

[root@server1 ~]# vim /etc/selinux/config
SELINUX=disabled
[root@server1 ~]# setenforce 0
[root@server1 ~]# getenforce 
Permissive
[root@server1 ~]# yum install -y flex bison openssl-devel libacl-devel sqlite-devel libxml2-devel libtool automake autoconf gcc attr   gcc gcc-c++  libuuid-devel

# liburcu-bp需源码安装,yum源里面没有
[root@server1 ~]# wget https://github.com/urcu/userspace-rcu/archive/v0.7.16.tar.gz -O userspace-rcu-0.7.16.tar.gz
[root@server1 ~]# tar -xf userspace-rcu-0.7.16.tar.gz 
[root@server1 ~]# cd /root/userspace-rcu-0.7.16

# 先执行常规命令安装,进入源码目录后
[root@server1 userspace-rcu-0.7.16]# ./bootstrap 
[root@server1 userspace-rcu-0.7.16]# ./configure && make && make install

# 执行完常规安装命令后需要执行下面两个命令,才可以让系统找到urcu.
[root@server1 userspace-rcu-0.7.16]# ldconfig         # 进行动态缓存,一定要记得执行,否则后面执行启动glusterd时报错排错排到怀疑人生!!!!!!!!!!!!!
[root@server1 userspace-rcu-0.7.16]# pkg-config --libs --cflags liburcu-bp.pc liburcu.pc
-I/usr/local/include  -L/usr/local/lib -lurcu-bp -lurcu  


# 此外如果要geo 复制功能,需要额外安装,并开启ssh服务:
[root@server1 ~]# yum -y install passwd openssh-client openssh-server

安装完以上依赖后,我们从官网下载源码,再编译glusterfs,gluserfs编译命令为常规命令,配置时加上–enable-debug表示编译为带debug信息的调试版本在官网下载GlusterFS源码包

[root@server1 ~]# wget https://download.gluster.org/pub/gluster/glusterfs/8/8.2/glusterfs-8.2.tar.gz
[root@server1 ~]# tar -xf glusterfs-8.2.tar.gz 
[root@server1 ~]# cd glusterfs-8.2/
[root@server1 glusterfs-8.2]# ./autogen.sh 

... GlusterFS autogen ...

Running aclocal...
Running autoheader...
Running libtoolize...
Running autoconf...
Running automake...

Please proceed with configuring, compiling, and installing.
[root@server1 glusterfs-8.2]# ./configure --prefix=/usr/local
GlusterFS configure summary
===========================
FUSE client          : yes
epoll IO multiplex   : yes
fusermount           : yes
readline             : no
georeplication       : yes
Linux-AIO            : no
Enable Debug         : no
Enable ASAN          : no
Enable TSAN          : no
Use syslog           : yes
XML output           : yes
Unit Tests           : no
Track priv ports     : yes
POSIX ACLs           : yes
SELinux features     : yes
firewalld-config     : no
Events               : yes
EC dynamic support   : x64 sse avx
Use memory pools     : yes
Nanosecond m/atimes  : yes
Server components    : yes
Legacy gNFS server   : no
IPV6 default         : no
Use TIRPC            : missing
With Python          : 2.7
Cloudsync            : yes
Link with TCMALLOC   : no

[root@server1 glusterfs-8.2]# make && make install

在使用glusterfs的时候要注意,局域网内的主机名不能相同,并且主机名可以解析

# 启动glusterd
[root@server1 ~]# systemctl start glusterd.service 
[root@server1 ~]# systemctl enable glusterd.service 
Created symlink from /etc/systemd/system/multi-user.target.wants/glusterd.service to /usr/local/lib/systemd/system/glusterd.service.
[root@server1 ~]# systemctl status glusterd.service 
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/local/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
   Active: active (running) since 三 2020-11-11 11:16:54 CST; 34s ago
     Docs: man:glusterd(8)
 Main PID: 80492 (glusterd)
   CGroup: /system.slice/glusterd.service
           └─80492 /usr/local/sbin/glusterd -p /usr/local/var/run/glusterd.pid --log-level INFO

[root@server1 ~]# ps -ef| grep gluster
root      80492      1  0 11:16 ?        00:00:00 /usr/local/sbin/glusterd -p /usr/local/var/run/glusterd.pid --log-level INFO
root      80571   2060  0 11:17 pts/0    00:00:00 grep --color=auto gluster

GLUSTERFS集群规划和配置

整体流程:分区----格式化—挂载

分区
[root@server1 ~]# fdisk -l     # 查看当前分区磁盘情况

磁盘 /dev/sda:21.5 GB, 21474836480 字节,41943040 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节
磁盘标签类型:dos
磁盘标识符:0x000c1619

   设备 Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048     2099199     1048576   83  Linux
/dev/sda2         2099200    41943039    19921920   8e  Linux LVM

磁盘 /dev/mapper/centos-root:18.2 GB, 18249416704 字节,35643392 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节


磁盘 /dev/mapper/centos-swap:2147 MB, 2147483648 字节,4194304 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节

[root@server1 ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   20G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   19G  0 part 
  ├─centos-root 253:0    0   17G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
sr0              11:0    1  9.6G  0 rom  /etc/gz


# 在本次实验我选择新加一个磁盘,我是用的是VMware,直接关机添加磁盘即可

在这里插入图片描述

[root@server1 ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   20G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   19G  0 part 
  ├─centos-root 253:0    0   17G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
sdb               8:16   0   20G  0 disk 
sr0              11:0    1  9.6G  0 rom  /etc/gz

# 接下来!对/dev/sdb进行一系列各种各样的操作!!!!
[root@server1 ~]# fdisk /dev/sdb       # 分区
欢迎使用 fdisk (util-linux 2.23.2)。

更改将停留在内存中,直到您决定将更改写入磁盘。
使用写入命令前请三思。

Device does not contain a recognized partition table
使用磁盘标识符 0x218b7141 创建新的 DOS 磁盘标签。

命令(输入 m 获取帮助):n           # 新建
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p):            # 添加分区类型,选择默认回车即可
Using default response p
分区号 (1-4,默认 1)# 回车
起始 扇区 (2048-41943039,默认为 2048)# 回车
将使用默认值 2048
Last 扇区, +扇区 or +size{K,M,G} (2048-41943039,默认为 41943039):+5G         # 自定义分区大小,以M、G等为结尾
分区 1 已设置为 Linux 类型,大小设为 5 GiB

命令(输入 m 获取帮助):p         # 打印当前分区

磁盘 /dev/sdb:21.5 GB, 21474836480 字节,41943040 个扇区
Units = 扇区 of 1 * 512 = 512 bytes
扇区大小(逻辑/物理):512 字节 / 512 字节
I/O 大小(最小/最佳):512 字节 / 512 字节
磁盘标签类型:dos
磁盘标识符:0x218b7141

   设备 Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048    10487807     5242880   83  Linux

命令(输入 m 获取帮助):w              # 保存退出
The partition table has been altered!

Calling ioctl() to re-read partition table.
正在同步磁盘。


# 格式化
[root@server1 ~]# mkfs.ext4 /dev/sdb1
[root@server1 ~]# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   20G  0 disk 
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   19G  0 part 
  ├─centos-root 253:0    0   17G  0 lvm  /
  └─centos-swap 253:1    0    2G  0 lvm  [SWAP]
sdb               8:16   0   20G  0 disk 
└─sdb1            8:17   0    5G  0 part 
sr0              11:0    1  9.6G  0 rom  /etc/gz

# 挂载
[root@server1 ~]# mkdir /node
[root@server1 ~]# mount  /dev/sdb1 /node/
[root@server1 ~]# df -h /node
文件系统        容量  已用  可用 已用% 挂载点
/dev/sdb1       4.8G   20M  4.6G    1% /node

# 开机自动挂载
[root@server1 ~]# vim /etc/fstab 
/dev/sdb1 /node ext4 defaults 0 0
[root@server1 ~]# mount -a
配置 glusterfs 集群
[root@server1 ~]# gluster peer status             # 查看集群的状态,当前信任池当中没有其他主机
Number of Peers: 0
[root@server1 ~]# gluster peer probe server2      # 配置信任池(一端添加就行)
peer probe: success. 
[root@server1 ~]# gluster peer probe server3
peer probe: success
[root@server1 ~]# gluster peer probe server4
peer probe: success

[root@server1 ~]# gluster peer status
Number of Peers: 3

Hostname: server2
Uuid: 687334ba-bec2-41eb-a51d-36779607bf59
State: Peer in Cluster (Connected)

Hostname: server3
Uuid: 11cfe205-643e-4237-b171-7569f0cf1b57
State: Peer in Cluster (Connected)

Hostname: server4
Uuid: 2282466a-14c8-4356-b645-b287c7929abd
State: Peer in Cluster (Connected)

[root@server1 ~]# gluster pool list              # 查看存储池
UUID					Hostname 	State
687334ba-bec2-41eb-a51d-36779607bf59	server2  	Connected 
11cfe205-643e-4237-b171-7569f0cf1b57	server3  	Connected 
2282466a-14c8-4356-b645-b287c7929abd	server4  	Connected 
37dc4314-61a3-4e37-be76-a4075ca59a71	localhost	Connected 



# 创建卷
[root@server1 ~]# gluster volume list            # 目前的集群没有卷
No volumes present in cluster

[root@server1 ~]#gluster volume create data replica 4 server1:/node server2:/node server3:/node server4:/node    # 创建卷,但是!此时会发现报错!

------------------------------------------------------------------------------------------------------------
ps:
报错如下:
volume create: data: failed: The brick server1:/data/glusterfs is being created in the root partition. It is recommended that you don't use the system's root partition for storage backend. Or use 'force' at the end of the command if you want to override this behavior.
这是因为我们创建的brick在系统盘,这个在gluster的默认情况下是不允许的,生产环境下也尽可能的与系统盘分开,如果必须这样请使用force,集群默认是不可以在root下创建卷!!!
[root@server1 ~]# gluster volume create data replica 4 server1:/node server2:/node server3:/node server4:/node force       # 根据提示在末尾加上force
volume create: data: success: please start the volume to access data

或者也可以在执行编译时创建用户,使用该用户启动GlusterFS,让他对集群有完全权限

------------------------------------------------------------------------------------------------------------

单磁盘模式,调试环境推荐
[root@server1 ~]# gluster vol  create test server1:/test  force
volume create: test: success: please start the volume to access data

多磁盘,无raid,试验、测试环境推荐
[root@server1 ~]# gluster vol  create testdata server1:/testdata  server2:/testdata server3:/testdata server4:/testdata  force
volume create: testdata: success: please start the volume to access data

多磁盘,有raid1。线上高并发环境推荐。
[root@server1 ~]#  gluster volume create data replica 4 server1:/node server2:/node server3:/node server4:/node 
volume create: data: success: please start the volume to access data

注:以上命令中,磁盘数量必须为复制份数的整数倍。
此外有raid0,raid10,raid5,raid6等方法,但是在线上小文件集群不推荐使用。
============================================================================================================

[root@server1 ~]# gluster volume list      # 集群中出现刚创建的卷
data
test
testdata

[root@server1 ~]# gluster volume info     # 查看卷的详细信息
 
Volume Name: data
Type: Replicate
Volume ID: 3c711bfd-599d-463b-bec4-51ef49a5be21
Status: Created
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: server1:/node
Brick2: server2:/node
Brick3: server3:/node
Brick4: server4:/node
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
 
Volume Name: test
Type: Distribute
Volume ID: e0161069-8913-43f6-abb6-f172441bfe35
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/test
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
 
Volume Name: testdata
Type: Distribute
Volume ID: 15874f41-0fab-4f28-885b-747536d8ba22
Status: Created
Snapshot Count: 0
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: server1:/testdata
Brick2: server2:/testdata
Brick3: server3:/testdata
Brick4: server4:/testdata
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

启动卷:
[root@server1 ~]# gluster volume start data     # 启动卷
volume start: data: success
[root@server1 ~]# gluster vol start test
volume start: test: success
[root@server1 ~]# gluster vol start testdata
volume start: testdata: success

挂载测试
[root@server1 ~]# mkdir /mount_data         
[root@server1 ~]# mount -t glusterfs  -o acl server1:/data  /mount_data/          # server1将卷挂载
[root@server1 ~]# mkdir /mount_test
[root@server1 ~]# mount -t glusterfs  -o acl server1:/test  /mount_test/
[root@server1 ~]# mkdir /mount_testdata
[root@server1 ~]# mount -t glusterfs  -o acl server1:/testdata/  /mount_testdata/

[root@server1 ~]# df -h
文件系统                 容量  已用  可用 已用% 挂载点
devtmpfs                 475M     0  475M    0% /dev
tmpfs                    487M     0  487M    0% /dev/shm
tmpfs                    487M  7.7M  479M    2% /run
tmpfs                    487M     0  487M    0% /sys/fs/cgroup
/dev/mapper/centos-root   37G  1.9G   36G    5% /
/dev/sr0                 9.6G  9.6G     0  100% /mnt/gz
/dev/sda1               1014M  137M  878M   14% /boot
tmpfs                     98M     0   98M    0% /run/user/0
/dev/sdc1                4.8G   22M  4.6G    1% /node
server1:/data            4.8G   71M  4.6G    2% /mount_data
server1:/test             37G  2.2G   35G    6% /mount_test
server1:/testdata        148G  8.8G  140G    6% /mount_testdata

[root@server2 ~]# mount -t glusterfs  -o acl server1:/data  /mount_data/      # server1将卷挂载到mnt,但此处报错没有attr包
WARNING: getfattr not found, certain checks will be skipped..
[root@server2 ~]# yum -y install attr                         # 下载软件包
[root@server2 ~]# touch /mnt/{1..10}test.txt                  # 在server2上创建文件

[root@server1 ~]# ls /mount_data/                         # 创建成功后在集群内的其他主机包括本机均可以查看到改文件夹下面创建的新文件
10test.txt  2test.txt  4test.txt  6test.txt  8test.txt  lost+found
1test.txt   3test.txt  5test.txt  7test.txt  9test.txt
[root@server4 ~]# ls /mount_data/
10test.txt  2test.txt  4test.txt  6test.txt  8test.txt  lost+found
1test.txt   3test.txt  5test.txt  7test.txt  9test.txt

在线扩容

随着业务的增长,集群容量不够时,需要添加更多的机器和磁盘到集群中来。
a. 普通情况只需要增加分布的广度就可以,增加的磁盘数量必须为最小扩容单元的整数倍,即replica×stripe,或disperse数的整数倍:

[root@server1 ~]# mkdir /datanode
[root@server1 ~]# gluster vol  add-brick test server1:/datanode/  server2:/datanode/  server3:/datanode/ server4:/datanode/ force
volume add-brick: success

[root@server1 ~]# gluster vol info test     # 查看test卷的信息
 
Volume Name: test
Type: Distribute
Volume ID: e0161069-8913-43f6-abb6-f172441bfe35
Status: Started
Snapshot Count: 0
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: server1:/test
Brick2: server1:/datanode
Brick3: server2:/datanode
Brick4: server3:/datanode
Brick5: server4:/datanode
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

该方法执行完后,需要新增的磁盘可能还没有被实际使用,这时需要平衡数据:
[root@server1 ~]# gluster vol rebalance test start
volume rebalance: test: success: Rebalance on test has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: 353d9ec2-1f72-4a2d-9fb6-d6326c26ab43

执行完add-brick命令后,新增的磁盘还没有被实际使用,且系统不会自动复制,这时需要修复数据,让系统达到新指定的备份数
[root@server1 ~]# gluster vol heal test full
Launching heal operation to perform full self heal on volume test has been unsuccessful:
Self-heal-daemon is disabled. Heal will not be triggered on volume test

注意:一次只增加一个备份,如果一次增加多个备份,目前版本可能出错。
------------------------------------------------------------------------------------------------------------
使用GlusterFS
a. 挂载了GlusterFS的某个卷后,就可以将其当做本地文件访问,代码中只需使用原生的文件api即可。这种方式使用不一定需要root权限,只要拥有对应目录或文件的权限即可。
b. 直接API方式,这种方式需要root权限才能使用,并且java、python、ruby的api包装目前都不够完整,一般情况不推荐使用。

在线收缩

可能原先配置比例不合理,打算将部分存储机器用于其他用途时,跟扩容一样,也分两种情况。

a. 降低分布广度,移除的磁盘必须是一整个或多个存储单元,在volume info的结果列表中是连续的多块磁盘。该命令会自动均衡数据。
[root@server1 ~]# gluster vol remove-brick test server3:/datanode  server4:/datanode start
It is recommended that remove-brick be run with cluster.force-migration option disabled to prevent possible data corruption. Doing so will ensure that files that receive writes during migration will not be migrated and will need to be manually copied after the remove-brick commit operation. Please check the value of the option and update accordingly. 
Do you want to continue with your current cluster.force-migration settings? (y/n) y
volume remove-brick start: success
ID: 943f61e1-02da-4b79-a08a-55f06b7c468a

启动后需要查看删除的状态,实际是自动均衡的状态,直到状态从in progress变为completed。
[root@server1 ~]# gluster vol remove-brick test server3:/datanode  server4:/datanode status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                                 server3                0        0Bytes             0             0             0            completed        0:00:00
                                 server4                0        0Bytes             0             0             0            completed        0:00:00

状态显示执行完成后,提交该移除操作。
[root@server1 ~]# gluster vol remove-brick test server3:/datanode  server4:/datanode commit
volume remove-brick commit: success
Check the removed bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. 

b. 降低备份数,移除磁盘必须是符合要求(好难表达)。在volume info的结果列表中一般是零散的多块磁盘(ip可能是连续的)。该命令不需要均衡数据。
[root@server1 ~]# gluster vol remove-brick test server1:/datanode  server2:/datanode force    # 移除
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) y
volume remove-brick commit force: success

[root@server1 ~]# gluster vol info test
 
Volume Name: test
Type: Distribute
Volume ID: e0161069-8913-43f6-abb6-f172441bfe35
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: server1:/test
Options Reconfigured:
performance.client-io-threads: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

降低备份数时,只是简单删除,而且命令最后用的也是force参数,如果原先系统数据没有复制好,那么也就会出现部分丢失。因此该操作需要极其谨慎。必须先保证数据完整,执行gluster volume heal vol_name full命令修复,并执行gluster volume heal vol_name info,和 gluster volume status检查,确保数据正常情况下再进行。
配置负载均衡
[root@server1 ~]# gluster vol reblance status         # 当前没有配置负载均衡
unrecognized word: reblance (position 1)
[root@server1 ~]# gluster vol reblance testdata status    # 查看卷的负载
unrecognized word: reblance (position 1)
[root@server1 ~]# gluster vol rebalance testdata status
volume rebalance: testdata: failed: Rebalance not started for volume testdata.
[root@server1 ~]# gluster vol rebalance testdata start     # 启动负载均衡
volume rebalance: testdata: success: Rebalance on testdata has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: 7c7dd7d7-1515-4637-805d-dc5dc43f471b

[root@server1 ~]# gluster vol rebalance testdata status     # 启动负载均衡后查看的状态
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                                 server2                0        0Bytes             0             0             0            completed        0:00:00
                                 server3                0        0Bytes             0             0             0            completed        0:00:00
                                 server4                0        0Bytes             0             0             0            completed        0:00:00
                               localhost                0        0Bytes             0             0             0            completed        0:00:00
volume rebalance: testdata: success
设置卷的参数
[root@server1 ~]# gluster vol set testdata  performance.cache-size 256MB     # 设置 cache 大小(此处要根据实际情况,如果设置太大可能导致后面客户端挂载失败)
volume set: success
[root@server1 ~]# gluster vol info
 
Volume Name: testdata
Type: Distribute
Volume ID: 15874f41-0fab-4f28-885b-747536d8ba22
Status: Started
Snapshot Count: 0
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: server1:/testdata
Brick2: server2:/testdata
Brick3: server3:/testdata
Brick4: server4:/testdata
Options Reconfigured:
performance.cache-size: 256MB
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
配置客户端

要求:GFS 客户端节点必须能连通GFS服务器节点

[root@client ~]# yum install -y glusterfs glusterfs-fuse

# 在GFS Client节点上,创建一个本地目录:
[root@client ~]# mkdir -p /test/gluster-test

# 将本地目录挂载到GFS Volume:
[root@client ~]# mount.glusterfs 192.168.2.10:/data /test/gluster-test/

# 查看挂载情况:
[root@client ~]# df -h  /test/gluster-test/
文件系统            容量  已用  可用 已用% 挂载点
192.168.2.10:/data   34G  5.1G   29G   15% /test/gluster-test
测试
单文件测试
测试方法:在客户端创建一个1G大小的文件

- DHT模式,默认模式,既DHT, 也叫 分布卷: 将文件已hash算法随机分布到 一台服务器节点中存储。
[root@client ~]# time dd if=/dev/zero of=hello bs=1000M count=1
记录了1+0 的读入
记录了1+0 的写出
1048576000字节(1.0 GB)已复制,9.7207 秒,108 MB/秒

real	0m9.858s
user	0m0.002s
sys	0m7.171s

- AFR模式,复制模式,既AFR, 创建volume 时带 replica x 数量: 将文件复制到 replica x 个节点中。
[root@client ~]# time dd if=/dev/zero of=hello.txt bs=1024M count=1
记录了1+0 的读入
记录了1+0 的写出
1073741824字节(1.1 GB)已复制,5.06884 秒,212 MB/秒

real	0m5.206s
user	0m0.001s
sys	0m3.194s


- Striped 模式,条带模式,既Striped, 创建volume 时带 stripe x 数量: 将文件切割成数据块,分别存储到 stripe x 个节点中 ( 类似raid 0 )[root@client ~]# time dd if=/dev/zero of=hello bs=1000M count=1
记录了1+0 的读入
记录了1+0 的写出
1048576000字节(1.0 GB)已复制,4.92539 秒,213 MB/秒

real	0m5.047s
user	0m0.001s
sys	0m3.036s

- 条带复制卷模式 (Number of Bricks: 1 x 2 x 2 = 4),分布式条带模式(组合型),最少需要4台服务器才能创建。 创建volume 时 stripe 2 server = 4 个节点:是DHT 与 Striped 的组合型。
[root@client ~]# time dd if=/dev/zero of=hello bs=1000M count=1
记录了1+0 的读入
记录了1+0 的写出
1048576000字节(1.0 GB)已复制,5.0472 秒,208 MB/秒

real	0m5.173s
user	0m0.000s
sys	0m3.098s

- 分布式复制模式 (Number of Bricks: 2 x 2 = 4),分布式复制模式(组合型), 最少需要4台服务器才能创建。 创建volume 时 replica 2 server = 4 个节点:是DHT 与 AFR 的组合型。
[root@client ~]# time dd if=/dev/zero of=haha bs=100M count=10
记录了10+0 的读入
记录了10+0 的写出
1048576000字节(1.0 GB)已复制,1.00275 秒,1.0 GB/秒

real	0m1.018s
user	0m0.001s
sys	0m0.697s

针对 分布式复制模式还做了如下测试:

4K随机测试:
写测试:
# 安装fio 
[root@client ~]# yum -y install libaio-devel.x86_64
[root@client ~]# yum -y install fio
[root@client ~]# fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite -size=10G -filename=1.txt -name="EBS 4KB randwrite test" -iodepth=32 -runtime=60
write: IOPS=4111, BW=16.1MiB/s (16.8MB/s)(964MiB/60001msec)
WRITE: bw=16.1MiB/s (16.8MB/s), 16.1MiB/s-16.1MiB/s (16.8MB/s-16.8MB/s), io=964MiB (1010MB), run=60001-60001msec

读测试:
[root@client ~]# fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randread -size=10G -filename=1.txt -name="EBS 4KB randread test" -iodepth=8 -runtime=60
read: IOPS=77.5k, BW=303MiB/s (318MB/s)(10.0GiB/33805msec)
READ: bw=303MiB/s (318MB/s), 303MiB/s-303MiB/s (318MB/s-318MB/s), io=10.0GiB (10.7GB), run=33805-33805msec

512K顺序写测试
[root@client ~]# fio -ioengine=libaio -bs=512k -direct=1 -thread -rw=write -size=10G -filename=512.txt -name="EBS 512KB seqwrite test" -iodepth=64 -runtime=60
write: IOPS=1075, BW=531MiB/s (556MB/s)(2389MiB/4501msec)
WRITE: bw=531MiB/s (556MB/s), 531MiB/s-531MiB/s (556MB/s-556MB/s), io=2389MiB (2505MB), run=4501-4501msec
其他的维护命令
卸载volume
卸载与挂载操作是一对。虽然没有卸载也可以停止volume,但是这样做是会出问题,如果集群较大,可能导致后面volume启动失败。
[root@server1 ~]# umount  /mount_test

停止volume
停止与启动操作是一对。停止前最好先卸载所有客户端。
[root@server1 ~]# gluster vol stop test
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: test: success

删除volume
[root@server1 ~]# gluster vol delete test
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: test: success

注: 删除 磁盘 以后,必须删除 磁盘( /opt/gluster/data ) 中的 ( .glusterfs/ .trashcan/ )目录。
否则创建新 volume 相同的 磁盘 会出现文件 不分布,或者 类型 错乱 的问题。

------------------------------------------------------------------------------------------------------------
卸载某个节点GlusterFS磁盘
[root@server1 ~]# gluster peer detach server4       # 提示如果要卸载该节点的磁盘就要先remove-brick
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: failed: Peer server4 hosts one or more bricks. If the peer is in not recoverable state then use either replace-brick or remove-brick command with force to remove all bricks from the peer and attempt the peer detach again.

[root@server1 ~]# gluster vol info testdata
 
Volume Name: testdata
Type: Distribute
Volume ID: 15874f41-0fab-4f28-885b-747536d8ba22
Status: Started
Snapshot Count: 0
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: server1:/testdata
Brick2: server2:/testdata
Brick3: server3:/testdata
Brick4: server4:/testdata
Options Reconfigured:
performance.cache-size: 256MB
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

[root@server1 ~]# gluster vol remove-brick testdata server4:/testdata start        # remove-brick移除的时候不能选择replica 复制数的卷,否则commit时会报错
It is recommended that remove-brick be run with cluster.force-migration option disabled to prevent possible data corruption. Doing so will ensure that files that receive writes during migration will not be migrated and will need to be manually copied after the remove-brick commit operation. Please check the value of the option and update accordingly. 
Do you want to continue with your current cluster.force-migration settings? (y/n) y
volume remove-brick start: success
ID: 24084420-9676-4f10-ac43-72ddc35caccf
[root@server1 ~]# gluster vol remove-brick testdata server4:/testdata status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                                 server4                0        0Bytes             0             0             0            completed        0:00:00
[root@server1 ~]# gluster vol remove-brick testdata server4:/testdata commit
volume remove-brick commit: success
Check the removed bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. 

[root@server1 ~]# gluster peer detach server4    # 移除节点
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: success


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值