Glusterfs的理论和配置

最新推荐文章于 2024-08-19 17:57:56 发布

xdfwsl

最新推荐文章于 2024-08-19 17:57:56 发布

阅读量1.4k

点赞数 1

分类专栏： linux

linux 专栏收录该内容

145 篇文章 0 订阅

订阅专栏

GlusterFS

实验目标：

掌握Glusterfs的理论和配置

实验理论：

GlusterFS（Gluster File System）分布式文件系统的诞生：

以淘宝为例，淘宝把全国所有的数据放在杭州，在双11时，面对巨大的访问量，会发生什么？（并发处理用LB解决）大量的用户时时刻刻的对磁盘阵列读写，面对这么强大的攻势，在强大的磁盘阵列也得崩溃，即使访问量不大，有的人离杭州远，有的离杭州近，用户访问速度也是不同的。

使用分布式文件系统可以解决这些问题，分布式文件系统就是在每一个城市放上一台服务器，那么上海的用户就访问上海的服务器，北京的用户就访问北京的服务器，整体数据就像是一个池子，把整体数据切成一个一个的数据分片，每个池子的数据分片分别放在不同城市的服务器上。

针对不同城市的数据，为保证数据的安全，采用RAID的技术保证数据的永不丢失，采用LVM技术来保证硬盘随着数据的日渐增多可以增加硬盘大小，

即使一个城市上的数据损坏了，也可以还原数据

分布式文件系统现在有四种：

1、GlusterFS 2、Ceph 3、MooseFS 4、Luster：

GlusterFS和Ceph是4种分布式文件系统中最好的两个，这两个现在都被红帽收购！

现在是GlusterFS的天下，未来是Ceph的天下

实验拓扑：

Node1、Node2、Node3、Node4、Node5，IP分配为：192.168.1.121----125

注意：

一、所有设备都要关闭防火墙和SELinux

二、Node1到Node5用的是glusterfs3.0的iso镜像，在安装界面时要按Esc键，输入linux，回车，这样系统只会拷贝自己需要的iso里的文件来安装系统，系统安装会快些，如果直接在安装界面回车，会把整个iso文件拷贝过来，系统安装会很慢

实验步骤：

步骤一、创建分区

Node1、2、3、4：节点1、2、3、4都要做的配置

[root@node1~]# vim /etc/hosts

192.168.1.121 node1.example.com node1

192.168.1.122 node2.example.com node2

192.168.1.123 node3.example.com node3

192.168.1.124 node4.example.com node4

192.168.1.125 node5.example.com node5

[root@node1 ~]# service glusterd status

glusterd (pid 1207) is running... 用专门的iso安装系统，默认就启动了Glusterd服务，其它系统就需要你去安装glusterd这个服务了

[root@node1 ~]# fdisk -cu /dev/sda

Command (m for help): n

Command action

e extended

p primary partition (1-4)

Selected partition 4

First sector (107005952-209715199, default107005952):

Using default value 107005952

Last sector, +sectors or +size{K,M,G}(107005952-209715199, default 209715199):

Using default value 209715199

Command (m for help): n

First sector (107008000-209715199, default107008000):

Using default value 107008000

Last sector, +sectors or +size{K,M,G}(107008000-209715199, default 209715199): +10G

Command (m for help): t

Partition number (1-5): 5

Hex code (type L to list codes): 8e

Changed system type of partition 5 to 8e(Linux LVM)

Command (m for help): w

The partition table has been altered!

[root@node1 ~]# partx -a /dev/sda

BLKPG: Device or resource busy

error adding partition 1

BLKPG: Device or resource busy

error adding partition 2

BLKPG: Device or resource busy

error adding partition 3

[root@node1 ~]# pvcreate /dev/sda5

Physical volume "/dev/sda5" successfully created

[root@node1 ~]# vgcreate vg_bricks /dev/sda5

Volume group "vg_bricks" successfully created

[root@node1 ~]# lvcreate -L 5G -T vg_bricks/brickspool 在vg_bricks中，创建一个LV池子brickspool，大小为5G

Logical volume "lvol0" created

Logical volume "brickspool" created

[root@node1 ~]# lvcreate -V 1G -T vg_bricks/brickspool -n brick1 在LV池子brickspool中，创建一个LV叫brick1 大小1G

Logical volume "brick1" created

[root@node1 ~]# mkfs.xfs -i size=512 /dev/vg_bricks/brick1 xfs格式化时要指定大小

meta-data=/dev/vg_bricks/brick1 isize=512 agcount=8, agsize=32768 blks

= sectsz=512 attr=2, projid32bit=0

data = bsize=4096 blocks=262144,imaxpct=25

= sunit=0 swidth=0 blks

naming =version 2 bsize=4096 ascii-ci=0

log =internal log bsize=4096 blocks=2560,version=2

= sectsz=512 sunit=0 blks, lazy-count=1

realtime =none extsz=4096 blocks=0, rtextents=0

ode1 ~]# mkdir -p /bricks/brick1

[root@node1 ~]# echo "/dev/vg_bricks/brick1 /bricks/brick1 xfs defaults 0 0" >> /etc/fstab /dev/vg_bricks/brick1开机自动挂载到/bricks/brick1

[root@node1 ~]# mount -a

[root@node1 ~]# mkdir /bricks/brick1/brick 挂载完成后在创建这个存放数据分片的目录

步骤二、搭建Gluster

Node1:

各个城市节点分区完毕后，把这些节点组合起来，形成一个大的“池子”，来综合管理数据，包括数据的动态切割分配，数据的冗余备份，在node1上做，就默认把node1当成主节点

[root@node1 ~]# gluster 敲gluster命令进入到gluster命令行中

gluster> peer probe node1.example.com

peer probe: success. Probe on localhost not needed 在节点1上做，自己怎么加入自己？

gluster> peer probe node2.example.com

peer probe: success.

gluster> peer probe node3.example.com

peer probe: success.

gluster> peer probe node4.example.com

peer probe: success.

gluster> peer status

Number of Peers: 3

Hostname: node2.example.com

Uuid: ce2fe11c-94b8-4884-9d3a-271f01eff280

State: Peer in Cluster (Connected)

Hostname: node3.example.com

Uuid: 28db58f8-5f8a-4a7f-94a9-03a8a8137fdd

State: Peer in Cluster (Connected)

Hostname: node4.example.com

Uuid: 808a34d9-80cf-4077-acfa-f255f52aa9be

State: Peer in Cluster (Connected)

创建第一个GlusterFS组并实现两两备份

[root@node1 ~]# gluster

gluster> volume create firstvol replica 2 node1.example.com:/bricks/brick1/brick node2.example.com:/bricks/brick1/brick node3.example.com:/bricks/brick1/brick node4.example.com:/bricks/brick1/brick 创建firstvol卷

volume create: firstvol: success: pleasestart the volume to access data

gluster> volume start firstvol 创建完firstvol卷后，启动firstvol，否则使用不了

volume start: firstvol: success

gluster> volume info firstvol 查看firstvol卷信息

Volume Name: firstvol

Type: Distributed-Replicate

Volume ID: 4087aecf-a4fe-4773-a1af-fa19d6a7aba5

Status: Started

Snap Volume: no

Number of Bricks: 2 x 2 = 4 两两备份模式

Transport-type: tcp

Bricks:

Brick1: node1.example.com:/bricks/brick1/brick

Brick2: node2.example.com:/bricks/brick1/brick

Brick3: node3.example.com:/bricks/brick1/brick

Brick4: node4.example.com:/bricks/brick1/brick

Options Reconfigured:

performance.readdir-ahead: on

auto-delete: disable

snap-max-soft-limit: 90

snap-max-hard-limit: 256

gluster>exit exit退出时自动保存

客户端 NFS挂载：

[root@node5 ~]# mount -t nfs node1.example.com:/firstvol /mnt

[root@node5 ~]# cd /mnt 使用NFS把主节点node1的firstvol卷挂载过来

[root@node5 mnt]# touch file{0..9} 创建10个文件

[root@node5 mnt]# ls

file0 file1 file2 file3 file4 file5 file6 file7 file8 file9

Node1：

[root@node1 ~]# cd /bricks/brick1/brick/

[root@node1 brick]# ls

file3 file4 file7 file9

Node2：

[root@node2 brick1]# cd brick/

[root@node2 brick]# ls

file3 file4 file7 file9

Node3：

[root@node3 ~]# cd /bricks/brick1/brick

[root@node3 brick]# ls

file0 file1 file2 file5 file6 file8

Node4：

[root@node4 ~]# cd /bricks/brick1/brick/

[root@node4 brick]# ls

file0 file1 file2 file5 file6 file8

客户端 samba挂载：

Node1： samba使用cifs协议来共享和挂载，服务端和客户端都要安装cifs-utils软件包

[root@node1 ~]# service smb restart

Shutting down SMB services: [FAILED]

Starting SMB services: [ OK ]

[root@node1 ~]# service nmb restart

Shutting down NMB services: [FAILED]

Starting NMB services: [ OK ]

[root@node1 ~]# chkconfig smb on

[root@node1 ~]# chkconfig nmb on

[root@node1 ~]# useradd user1

[root@node1 ~]# smbpasswd -a user1

New SMB password:

Retype new SMB password:

Added user user1.

[root@node1 ~]# vim /etc/samba/smb.conf 只需要启动samba服务，不需要你写smb.conf

[gluster-firstvol] 配置文件，有api接口自动把配置写入文件中

comment = For samba share of volumefirstvol

vfs objects = glusterfs

glusterfs:volume = firstvol

glusterfs:logfile = /var/log/samba/glusterfs-firstvol.%M.log

glusterfs:loglevel = 7

path = /

read only = no

guest ok = yes

Node5：

[root@node5 ~]# smbclient -L 192.168.1.121 需要查看samba通过cifs共享过来的共享名

Enter root's password:

Anonymous login successful

Domain=[MYGROUP] OS=[Unix] Server=[Samba3.6.9-169.1.el6rhs]

Sharename Type Comment

--------- ---- ------- 共享名为gluster-firstvol

gluster-firstvol Disk For samba share of volume firstvol

IPC$ IPC IPC Service (Samba Server Version 3.6.9-169.1.el6rhs)

Anonymous login successful

Domain=[MYGROUP] OS=[Unix] Server=[Samba3.6.9-169.1.el6rhs]

Server Comment

--------- -------

NODE1 Samba Server Version3.6.9-169.1.el6rhs

NODE5 Samba Server Version3.6.9-169.1.el6rhs

Workgroup Master

--------- -------

MYGROUP NODE1

[root@node5 ~]# mount -t cifs //192.168.1.121/gluster-firstvol /mnt -o user=user1,password=redhat

[root@node5 ~]# df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda2 50395844 1751408 46084436 4% /

tmpfs 506168 0 506168 0% /dev/shm

/dev/sda1 198337 29772 158325 16% /boot

//192.168.1.121/gluster-firstvol

2076672 66592 2010080 4% /mnt

挂载成功！

步骤三、添加节点

如果你节点的磁盘空间不够用了，需要添加新的节点，来应对更大的数据量

Node3：

[root@node3 brick]# lvcreate -V 1G -T vg_bricks/brickspool -n brick2

Logical volume "brick2" created

[root@node3 brick]# mkfs.xfs -i size=512 /dev/vg_bricks/brick2

meta-data=/dev/vg_bricks/brick2 isize=512 agcount=8, agsize=32768 blks

= sectsz=512 attr=2, projid32bit=0

data = bsize=4096 blocks=262144, imaxpct=25

= sunit=0 swidth=0 blks

naming =version 2 bsize=4096 ascii-ci=0

log =internal log bsize=4096 blocks=2560,version=2

= sectsz=512 sunit=0 blks, lazy-count=1

realtime =none extsz=4096 blocks=0, rtextents=0

[root@node3 brick]# mkdir /bricks/brick2/

[root@node3 ~]# echo "/dev/vg_bricks/brick2 /bricks/brick2 xfs defaults 0 0" >> /etc/fstab

[root@node3 brick2]# mkdir /bricks/brick2/brick

Node4：

[root@node4 brick]# lvcreate -V 1G -T vg_bricks/brickspool -n brick2

Logical volume "brick2" created

[root@node4 brick]# mkfs.xfs -i size=512 /dev/vg_bricks/brick2

meta-data=/dev/vg_bricks/brick2 isize=512 agcount=8, agsize=32768 blks

= sectsz=512 attr=2, projid32bit=0

data = bsize=4096 blocks=262144,imaxpct=25

= sunit=0 swidth=0 blks

naming =version 2 bsize=4096 ascii-ci=0

log =internal log bsize=4096 blocks=2560,version=2

= sectsz=512 sunit=0 blks,lazy-count=1

realtime =none extsz=4096 blocks=0, rtextents=0

[root@node4 brick]# mkdir /bricks/brick2/

[root@node4 ~]# echo "/dev/vg_bricks/brick2 /bricks/brick2 xfs defaults 0 0" >> /etc/fstab

[root@node4 brick2]# mkdir /bricks/brick2/brick

Node1：

gluster> volume add-brick firstvol node3.example.com:/bricks/brick3/brick node4.example.com:/bricks/brick3/brick

volume add-brick: success

gluster> volume info firstvol

Volume Name: firstvol

Type: Distributed-Replicate

Volume ID:4087aecf-a4fe-4773-a1af-fa19d6a7aba5

Status: Started

Snap Volume: no

Number of Bricks: 3 x 2 = 6

Transport-type: tcp

Bricks:

Brick1:node1.example.com:/bricks/brick1/brick

Brick2:node2.example.com:/bricks/brick1/brick

Brick3:node3.example.com:/bricks/brick1/brick

Brick4:node4.example.com:/bricks/brick1/brick

Brick5:node3.example.com:/bricks/brick3/brick

Brick6:node4.example.com:/bricks/brick3/brick

Options Reconfigured:

performance.readdir-ahead: on

auto-delete: disable

snap-max-soft-limit: 90

snap-max-hard-limit: 256

gluster>exit

步骤四、均衡分布（Rebalance）

虽然节点添加成功，但是查看新节点空间，是没有任何内容的（数据不会重新分配）

[root@node3 /]# ls /bricks/brick3/brick

[root@node3 /]#

即便现在添加数据也不会有

[root@node5 mnt]# touch text{0..10}

[root@node5 mnt]# ls

file0 file2 file4 file6 file8 text0 text10 text3 text5 text7 text9

file1 file3 file5 file7 file9 text1 text2 text4 text6 text8

[root@node1 ~]# cd /bricks/brick1/brick 新数据被添加被添加到Node1和Node2上

[root@node1 brick]# ls

file3 file4 file7 file9 text1 text3 text8 text9

[root@node3 /]# ls /bricks/brick1/brick/ 新数据被添加到Node3和Node4的/bricks/brick1上

file0 file1 file2 file5 file6 file8 text0 text10 text2 text4 text5 text6 text7

[root@node3 /]# ls /bricks/brick3/brick 但是Node和Node4新被添加的空间却没有内容！

[root@node3 /]#

-------------------------无论你添加多少都没有内容---------------------------------------------------------------

由于以上原因，我们需要使用Rebalance参数让数据均衡分布在不同的节点空间上

[root@node1 brick]# gluster volume rebalance firstvol start

volume rebalance: firstvol: success:Starting rebalance on volume firstvol has been successful.

ID: fc79b11d-13a0-4428-9e53-79f04dbc65a4

[root@node3 /]# ls /bricks/brick3/brick

file0 file1 file2 text0 text10 text3

[root@node4 ~]# ls /bricks/brick3/brick Node3和Node4上的新空间被分配到了数据

file0 file1 file2 text0 text10 text3

原先存在的数据也会均衡分布一下

均衡分布之后，所有的内容并不是平均、一样多的。各个节点空间中，新添加的数据会像“轮询”一样分布。比如：再创建一个文件，会在node1&node2上，第二个创建的会在node3&node4的原空间上，第三个才会出现在node3&node4的新空间上。

步骤五、ACL、Quota

ACL:

没有设置acl参数的磁盘是不能设置acl权限的，需要添加磁盘的acl参数方可

node5：

root@node5 mnt]# setfacl -m u:user1:rw- /mnt

setfacl: /mnt: Operation not supported 不支持acl

为分区添加acl选项

[root@node5 home]#vim /etc/fstab

node1.example.com:/firstvol /mnt nfs defaults,_netdev,acl 0 0

打开gluster的acl

[root@node1 brick]# gluster

gluster> volume set firstvol acl on

volume set: success

gluster> exit

重挂分区

[root@node5 home]# mount -o acl node1.example.com:/firstvol /mnt

重新设置用户的ACL

[root@node5 ~]# mount

node1:/firstvolon /mnt type nfs (rw,acl,addr=192.168.1.121)

[root@node5 mnt]# setfacl -m u:user1:rw- /mnt

[root@node5 home]# getfacl /mnt

getfacl: Removing leading '/' from absolutepath names

# file: mnt

# owner: root

# group: root

user::rwx

user:user1:rw-

group::r-x

mask::rwx

other::r-x

[root@node5 ~]# cd /mnt

[root@node5 mnt]# su user1

[user1@node5 mnt]$ touch 123

[user1@node5 mnt$ ls

file0 file2 file4 file6 file8 text0 text10 text3 text5 text7 text9

file1 file3 file5 file7 file9 text1 text2 text4 text6 text8 123

Quota：

普通的磁盘有磁盘配额的设置选项，那么，作为glusterfs也是有磁盘配额选项的

Node1：

[root@node1 /]# gluster

gluster> volume quota firstvol enable 开启Quota功能

volume quota : success

Node5：

[root@node5 /]# cd /mnt

[root@node5 mnt]# mkdir Limit

[root@node5 mnt]# ls

file0 file2 file4 file6 file8 Limit text1 text2 text4 text6 text8

file1 file3 file5 file7 file9 text0 123 text3 text5 text7 text9 text10

Node1：

对/mnt/Limit进行磁盘限制，只能使用20M空间

gluster> volume quota firstvol limit-usage /Limit 20MB 限制大小

volume quota : success

验证：

在/mnt/Limit里创建一个25M的文件

[root@node5 mnt]# cd /mnt/Limit

[root@node5 Limit]# ls

[root@node5 Limit]# dd if=/dev/zero of=bigfile bs=1M count=25

25+0 records in

25+0 records out

26214400 bytes (26 MB) copied, 2.22657 s,11.8 MB/s

[root@node5 Limit]# ls

bigfile

[root@node5 Limit]# ll -h

total 25M

-rw-r--r-- 1 root root 25M Nov 3 2015 bigfile

直接创建出了25M的内容，从这可以看出，磁盘配额并没有生效，这是因为没有同步到磁盘，需要把数据同步到磁盘中，同步之后，配额生效

[root@node5 Limit]# sync

[root@node5 Limit]# dd if=/dev/zero of=bigfile1 bs=1M count=25

dd:opening `bigfile1': Disk quota exceeded

gluster> volume quota firstvol list Quota查看

Path Hard-limit Soft-limit Used Available Soft-limit exceeded?Hard-limit exceeded?

---------------------------------------------------------------------------------------------------------------------------

/Limit 20.0MB 80% 50.0MB 0Bytes Yes Yes

步骤六、节点间漂移CTDB

上述试验中，存在一个问题。Node1是一个主节点，node5直接就挂载了node1上的”池子”firstvol,当node1出现问题之后，整个GlusterFS就挂了。

所以为保证其容错性、可考性，需要实现节点之间的飘移，当一个节点出现问题之后，其它节点能接收任务，继续提供服务，这个技术就叫CTDB，此技术需要有一个VIP，对外来说，访问GlusterFS使用的就是VIP。

Node1、2、3、4：节点1、2、3、4都要做的配置

[root@node1 /]# lvcreate -V 1G -T vg_bricks/brickspool -n brick4

Logical volume "brick4" created

[root@node1 /]# mkfs.xfs -i size=512 /dev/vg_bricks/brick4

meta-data=/dev/vg_bricks/brick4 isize=512 agcount=8, agsize=32768 blks

= sectsz=512 attr=2, projid32bit=0

data = bsize=4096 blocks=262144,imaxpct=25

= sunit=0 swidth=0 blks

naming =version 2 bsize=4096 ascii-ci=0

log =internal log bsize=4096 blocks=2560,version=2

= sectsz=512 sunit=0 blks,lazy-count=1

realtime =none extsz=4096 blocks=0, rtextents=0

[root@node1 /]# mkdir /bricks/brick4

[root@node1 /]# echo "/dev/vg_bricks/brick4 /bricks/brick4 xfs defaults 0 0" >> /etc/fstab

[root@node1 /]# mount -a

[root@node1 /]# mkdir /bricks/brick4/brick

[root@node1 brick]# gluster

gluster> volume create ctdb replica 4 node1.example.com:/bricks/brick4/brick node2.example.com:/bricks/brick4/brick node3.example.com:/bricks/brick4/brick node4.example.com:/bricks/brick4/brick

volume create: ctdb: success: please startthe volume to access data

先不要启动ctdb卷，所有节点修改卷

修改ctdb服务的配置文件，node1-4都要进行相同的配置

[root@node1 brick]# vim /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh

# to prevent the script from running forvolumes it was not intended.

# User needs to set META to the volume thatserves CTDB lockfile.

META="ctdb" 此处名字要与创建的gluster volume名字一样

[root@node1 brick]# vim /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh

# to prevent the script from running forvolumes it was not intended.

# User needs to set META to the volume thatserves CTDB lockfile.

META="ctdb" 此处名字要与创建的gluster volume名字一样

启动ctdb卷

gluster> volume start ctdb

volume start: ctdb: success

安装ctdb2.5软件，node1-4都要进行相同的配置

[root@node1 /]# yum -y remove ctdb 要安装ctdb2.5版本，先要卸载原有的ctdb

[root@node1 /]# yum -y install ctdb2.5

创建节点文件（用于指定那些设备之间进行漂移），node1-4都要配置

[root@node1 ~]# vim /etc/ctdb/nodes

192.168.1.121

192.168.1.122

192.168.1.123

192.168.1.124

创建VIP文件，指定VIP地址及在那个网卡上进行漂移。node1-4都要配置

[root@node1 /]#vim /etc/ctdb/public_addresses

192.168.1.1/24 eth0

启动ctdb服务，node1-4都要启动此服务，否则不会实现漂移

[root@node1/]# /etc/init.d/ctdb restart

Shutting down ctdbd service: CTDB is notrunning

[ OK ]

Starting ctdbd service: [ OK ]

[root@node1 /]# chkconfig ctdb on

[root@node1 ~]# ctdb status

Number of nodes:4

pnn:0192.168.1.121 OK

pnn:1192.168.1.122 OK

pnn:2192.168.1.123 OK

pnn:3192.168.1.124 OK (THIS NODE)

Generation:1192510318

Size:4

hash:0 lmaster:0

hash:1 lmaster:1

hash:2 lmaster:2

hash:3 lmaster:3

Recovery mode:NORMAL (0)

Recovery master:0

[root@node2 /]# ctdb ip

Public IPs on node 1

192.168.1.1-1

ctdb软件在生成VIP时，速度很慢……耐心等待

Node4上生成了VIP为192.168.1.1/24，代表node4为主节点

[root@node4 ~]# ip addr list

1: lo: <LOOPBACK,UP,LOWER_UP> mtu16436 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

2: eth0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen1000

link/ether 00:0c:29:42:b7:f2 brd ff:ff:ff:ff:ff:ff

inet 192.168.1.124/24 brd 192.168.1.255 scope global eth0

inet 192.168.1.1/24 brd 192.168.1.255 scopeglobal secondary eth0

用Node5来ping VIP，进行VIP验证，

[root@node5 ~]# ping 192.168.1.1

PING 192.168.1.1 (192.168.1.1) 56(84) bytesof data.

64 bytes from 192.168.1.1: icmp_seq=1ttl=64 time=4.64 ms

64 bytes from 192.168.1.1: icmp_seq=2ttl=64 time=0.609 ms

--- 192.168.1.1 ping statistics ---

2 packets transmitted, 2 received, 0%packet loss, time 1380ms

rtt min/avg/max/mdev =0.609/2.625/4.642/2.017 ms

VIP连通性正常

NFS挂载：

Node5使用VIP来进行挂载

[root@node5 ~]# mount -t nfs 192.168.1.1:/ctdb /mnt

[root@node5 ~]# df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda2 50395844 1748972 46086872 4% /

tmpfs 506168 0 506168 0% /dev/shm

/dev/sda1 198337 29772 158325 16% /boot

192.168.1.1:/ctdb 1038336 32768 1005568 4% /mnt

[root@node5 ~]# cd /mnt

[root@node5 mnt]# touch file{1..5}

[root@node5 mnt]# ls

file1 file2 file3 file4 file5 lockfile

节点之间的漂移，也就是说漂移的这几个节点服务器之间的数据应该是一样的，这样才能保证漂移时数据稳定，ctdb漂移要节点数据是一样的。

[root@node1 ~]# ls /bricks/brick1/brick/

file1 file2 file3 file4 file5 lockfile

[root@node2 ~]# ls /bricks/brick1/brick/

file1 file2 file3 file4 file5 lockfile

[root@node3 ~]# ls /bricks/brick1/brick/

file1 file2 file3 file4 file5 lockfile

[root@node4 ~]# ls /bricks/brick1/brick/

file1 file2 file3 file4 file5 lockfile

验证节点间漂移：

[root@node4 ~]#init 0 关闭节点4

[root@node1 ~]# ctdb status 查看其它节点信息，发现VIP漂移到了Node1上

Number of nodes:4

pnn:0 192.168.1.121 OK (THIS NODE)

pnn:1 192.168.1.122 OK

pnn:2 192.168.1.123 OK

pnn:3192.168.1.124 DISCONNECTED|UNHEALTHY|INACTIVE

Generation:495021029

Size:3

hash:0 lmaster:0

hash:1 lmaster:1

hash:2 lmaster:2

Recovery mode:NORMAL (0)

Recovery master:0

[root@node1 ~]# ip addr list

1: lo: <LOOPBACK,UP,LOWER_UP> mtu16436 qdisc noqueue state UNKNOWN

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

2: eth0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen1000

link/ether 00:0c:29:29:d3:3a brd ff:ff:ff:ff:ff:ff

inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0

inet192.168.1.1/24 brd 192.168.1.255 scope global secondary eth0

Node1变成了主节点，接替Node4使用VIP进行工作

Node5挂载的设备也正常，说明设备漂移成功

[root@node5 mnt]# df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda2 50395844 1749132 46086712 4% /

tmpfs 506168 0 506168 0% /dev/shm

/dev/sda1 198337 29772 158325 16% /boot

192.168.1.1:/ctdb 1038336 32768 1005568 4% /mnt

注意：有时候，查看ctdb status，会显示没有OK，这是怎么回事了？

查看日志：/var/log/log.ctdb会显示：（don‘t open /gluster/lock/lockfifile.No such file or directory）,此时需要手动创建此文件（所有节点都要创建，先创建/gluster/lock目录），然后再重启服务即可。

步骤七、站点间复制

站点间复制主要用于对站点数据的备份， master 站点在生成新数据之后，实时同步到

slave 站点来防止 master 站点数据的丢失，slave站点只是 master 站点的备份，只有备份作用，master 站点的数据会实时同步到 slave站点，但是 slave 站点只是备份的作用，在 slave 站点上创建的数据是不会复制到 master 站点的

在Node1&Node2上创建mastervol卷，并关在使用

[root@node1 ~]# gluster

gluster> volume create mastervol replica 2 node1.example.com:/bricks/brick1/brick node2.example.com:/bricks/brick1/brick

volume create: mastervol: success: pleasestart the volume to access data

gluster> volume start mastervol

volume start: mastervol: success

gluster> exit

[root@node1~]# echo "node1.example.com:/mastervol /tmp/mastervol glusterfs defaults,_netdev 0 0" >> /etc/fstab

[root@node1 ~]#mount -a

[root@node1 ~]# df

Filesystem 1K-blocks Used Available Use% Mounted on

[root@node1 /]# df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda2 50395844 1793688 46042156 4% /

tmpfs 506168 0 506168 0% /dev/shm

/dev/sda1 198337 29791 158306 16% /boot

/dev/mapper/vg_bricks-brick1

1038336 33296 1005040 4% /bricks/brick1

node1.example.com:/mastervol

1038336 33408 1004928 4% /mnt

Node5上创建salvevol卷，并挂载使用

[root@node5 mnt]# gluster

gluster> volume create slavevol node5.example.com:/bricks/brick1/brick

volume create: slavevol: success: pleasestart the volume to access data

gluster> volume start slavevol

volume start: slavevol: success

gluster> exit

[root@node5 /]# echo "node5.exampe.com:/slavevol /mnt glusterfs defaults,_netdev 0 0 " >> /etc/fstab

[root@node5 /]# mount -a

[root@node5 /]# df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda2 50395844 1793544 46042300 4% /

tmpfs 506168 0 506168 0% /dev/shm

/dev/sda1 198337 29772 158325 16% /boot

/dev/mapper/vg_bricks-brick1

1038336 33296 1005040 4% /bricks/brick1

node5.example.com:/slavevol

1038336 33408 1004928 4% /mnt

在Node5上创建相应的用户和宿组（用于备份时的账户验证）

[root@node5 /]# groupadd geogroup

[root@node5 /]# useradd -G geogroup geoaccount

[root@node5 /]# passwd geoaccount

Changing password for user geoaccount.

New password:

BAD PASSWORD: it is based on a dictionaryword

BAD PASSWORD: is too simple

Retype new password:

passwd: all authentication tokens updatedsuccessfully.

主从站点的复制，是需要站点间连接的，在node1上以node5的账户登陆到node5上复制数据以实现备份。而每次登陆都需要输入node5的账号信息，有鉴于此，可以使用秘钥验证来实现自动登陆，把master站点上的公钥传给slave站点，slave站点利用公钥验证master的私钥即可实现两节点的自动连接

[root@node1 ~]# ssh-keygen master站点创建秘钥

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

3d:af:76:30:7c:bc:5c:c3:36:82:a0:d3:98:ae:15:0droot@node1.example.com

The key's randomart image is:

+--[ RSA 2048]----+

| |

| E |

| o.. |

| .=Sooo . |

| =.. =o+ * |

| ... =.= o |

| .. ..+ |

| .. ... |

+-----------------+

用geoaccount账户把master站点的公钥复制到slave站点上

[root@node1 ~]# ssh-copy-id geoaccount@192.168.1.125

geoaccount@192.168.1.125's password:

Now try logging into the machine, with"ssh 'geoaccount@192.168.1.125'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keysthat you weren't expecting.

node5上针对文件/etc/glusterfs/glusterd.vol添加如下内容

[root@node5 ~]# vim /etc/glusterfs/glusterd.vol

volume management

type mgmt/glusterd

option working-directory /var/lib/glusterd

option transport-type socket,rdma

option mountbroker-root /var/mountbroker-root

option mountbroker-geo-replication.geoaccount slavevol

option geo-replication-log-group geogroup

option rpc-auth-allow-insecure on

option transport.socket.keepalive-time 10

option transport.socket.keepalive-interval 2

option transport.socket.read-fail-log off

option ping-timeout 0

# option base-port 49152

end-volume

创建对应的/var/mountbroker-root目录，并给权限0711

[root@node5 ~]# mkdir /var/mountbroker-root

[root@node5 ~]# chmod 0711 /var/mountbroker-root

最后在node5上启动glusterd服务

[root@node5 ~]# service glusterd restart

Starting glusterd: [ OK ]

在node1（即master）上实现站点复制

[root@node1 ~]# gluster

gluster> system:: execute gsec_create

Common secret pub file present at/var/lib/glusterd/geo-replication/common_secret.pem.pub

gluster> volume geo-replication mastervol geoaccount@192.168.1.125::slavevol create push-pem 指定slave站点通过geoaccount账户把master站点上的mastervol卷的数据复制到slave站点上的slavevol卷上

Creating geo-replication session betweenmastervol & geoaccount@192.168.1.125::slavevol has been successful

gluster> volume geo-replication mastervol geoaccount@192.168.1.125::slavevol start

Starting geo-replication session between mastervol& geoaccount@192.168.1.125::slavevol has been successful 启动主从站点之间的数据复制

gluster> volume geo-replication mastervol geoaccount@192.168.1.125::slavevol status 查看主从站点关系状态

MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS

---------------------------------------------------------------------------------------------------------------------------------------------

node1.example.com mastervol /bricks/brick1/brick geoaccount@192.168.1.125::slavevol faulty N/A N/A

node2.example.com mastervol /bricks/brick1/brick geoaccount@192.168.1.125::slavevol faulty N/A N/A

node5上执行如下命令

[root@node5 ~]# /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoaccount mastervol slavevo

Successfully copied file.

Command executed successfully.