方案简介
本文使用ESXi模拟4个存储节点,ceph使用的版本为ceph version 14.2.10 nautilus (stable)。操作系统为BCLinux。操作系统下的版本信息为BigCloud Enterprise Linux release 8.2.2107 (Core)。
Bcache概述
bcache 是一个 Linux 内核块层超速缓存。可使用一个或多个高速磁盘(例如 SSD)作为一个或多个速度低速磁盘的超速缓存。bcache支持三种缓存策略:
- writeback:回写策略,所有的数据将先写入缓存盘,然后等待系统将数据回写入后端数据盘中。
- writethrough:直写策略(默认策略),数据将会同时写入缓存盘和后端数据盘。
- writearoud:数据将直接写入后端磁盘。
集群环境规划
物理组网
这里使用的是虚拟机进行模拟。建立4台虚拟机,每个虚拟机添加20块HDD和4块1.8TB的nvme。物理组网方式如下:
IP地址规划
IP地址规划如下:
节点 | 千兆 | public | cluster |
node01 | 192.168.13.1 | 188.188.13.1 | 10.10.13.1 |
node02 | 192.168.13.2 | 188.188.13.2 | 10.10.13.2 |
node03 | 192.168.13.3 | 188.188.13.3 | 10.10.13.3 |
node04 | 192.168.13.4 | 188.188.13.4 | 10.10.13.4 |
client01 | 192.168.13.101 | 188.188.13.101 |
硬盘划分
ceph应用,每个nvme分15个区,其中3个一组对应一个HDD。
DB分区、WAL分区、Bcache、HDD对应一个osd服务。
如果是物理机,所有机械盘使用jbod模式。这里是虚拟机,添加了4块1.8T的nvme和20块8TB的HDD。
存储节点环境搭建
基础环境配置
配置主机名
node01-node04节点执行,这里以node01节点为例:
[root@localhost ~]# hostnamectl set-hostname node01
配置hosts文件
配置node01-node04的/etc/hosts文件。
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.13.1 node01
192.168.13.2 node02
192.168.13.3 node03
192.168.13.4 node04
设置防火墙
所有存储节点关闭防火墙,下述命令在node01-node04所有存储节点执行。
[root@localhost ~]# systemctl stop firewalld
[root@localhost ~]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
设置Selinux
所有存储节点关闭selinux,在node01-node04所有存储节点执行,修改完/etc/selinux/config中的SELINUX=disabled后重启所有存储节点。
[root@localhost ~]# cat /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
节点间免密访问
配置信任关系,配置node01到所有存储节点的无密访问。
配置密钥。
[root@node01 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
……
|=X= o+.+ . |
|*E*=+++ + |
|&BB+**B= |
+----[SHA256]-----+
分别设置node01到其他节点的信任:
[root@node01 ~]# ssh-copy-id node02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@node02's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'node02'"
and check to make sure that only the key(s) you wanted were added.
这里只设置了node01到其他节点的免密。建议使用同样方式设置所有节点间免密访问。
配置NTP时钟同步
配置node01为chrony服务器,修改/etc/chrony.conf文件。
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
pool 192.168.13.1 iburst
……
# Allow NTP client access from local network.
allow 192.168.0.0/16
# Serve time even if not synchronized to a time source.
local stratum 10
……
配置node02-node04向node01对时,修改/etc/chrony.conf文件。
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
pool 192.168.13.1 iburst
……
分别重启node01-node04的chronyd服务,并确认时间同步。
[root@node01 ~]# systemctl restart chronyd
[root@node01 ~]# chronyc sources -v
.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current best, '+' = combined, '-' = not combined,
| / 'x' = may be in error, '~' = too variable, '?' = unusable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset,
|| Log2(Polling interval) --. | | yyyy = measured offset,
|| \ | | zzzz = estimated error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^? node01 0 6 3 - +0ns[ +0ns] +/- 0ns
[root@node01 ~]# date
Sun Dec 24 23:31:24 CST 2023
配置Yum源
本次部署使用离线缓存yum源,可根据实际需要选择网络yum源。将准备好的yum离线文件拷贝到node01-node04,并修改repo文件。
[root@node01 yum.repos.d]# pwd
/etc/yum.repos.d
[root@node01 yum.repos.d]# cat ceph-local.repo
[local]
name=local
baseurl=file:///etc/yum.repos.d/ceph14-rpm
enabled=1
gpgcheck=0
[root@node01 yum.repos.d]# ls
BCLinux-AppStream.repo.old BCLinux-BaseOS.repo.old BCLinux-Kernel.repo.old BCLinux-PowerTools.repo.old ceph14-rpm ceph-local.repo
[root@node01 yum.repos.d]#
Bcache配置
加载bcache模块
在node01-node04每个节点加载bcache模块。需要注意的是,内核需要具备bcache模块,如果不能加载成功,需要下载对应的操作系统内核开启bcache后编译。
[root@node01 ~]# modprobe bcache
[root@node01 ~]# lsmod |grep bcache
bcache 270336 0
安装bcache-tool工具
1.通过传输工具将下载的bcache-tools-1.1.tar.gz包上传到“/home”目录下。解压缩。
cd /home/
tar -zxvf bcache-tools-1.1.tar.gz
cd /home/bcache-tools-1.1
2.安装依赖。
yum install libblkid-devel -y
3.执行安装。
make
make install
4.安装后检查make-bcache命令,输出如下代表安装成功。
[root@node01 bcache-tools]# make-bcache
Please supply a device
Usage: make-bcache [options] device
-C, --cache Format a cache device
-B, --bdev Format a backing device
……
-h, --help display this help and exit
硬盘分区
确认每个节点的硬盘配置情况:当前配置20块8TB HDD硬盘和4块1.8TB的nvme硬盘。
每块nvme盘对应5块HDD。每块HDD部署1个osd服务,对应nvme上wal分区15G,db分区30G,cache分区300G。
[root@node02 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
……
sdb 8:16 0 7.8T 0 disk
sdc 8:32 0 7.8T 0 disk
……
sdt 65:48 0 7.8T 0 disk
sdu 65:64 0 7.8T 0 disk
nvme0n1 259:0 0 1.8T 0 disk
nvme0n2 259:1 0 1.8T 0 disk
nvme0n3 259:2 0 1.8T 0 disk
nvme0n4 259:3 0 1.8T 0 disk
以下以一个nvme分区进行举例,所有nvme盘均进行相同操作。
[root@node01 ~]# parted -s /dev/nvme0n1 mklabel gpt
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 1MiB 30GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 30GiB 60GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 60GiB 90GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 90GiB 120GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 120GiB 150GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 150GiB 165GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 165GiB 180GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 180GiB 195GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 195GiB 210GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 210GiB 225GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 225GiB 525GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 525GiB 825GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 825GiB 1125GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 1125GiB 1425GiB
[root@node01 ~]# parted -s /dev/nvme0n1 mkpart primary 1425GiB 1725GiB
查看nvme的分区情况。
[root@node01 ~]# parted /dev/nvme0n1
GNU Parted 3.2
Using /dev/nvme0n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: NVMe Device (nvme)
Disk /dev/nvme0n1: 1933GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 32.2GB 32.2GB primary
2 32.2GB 64.4GB 32.2GB primary
3 64.4GB 96.6GB 32.2GB primary
4 96.6GB 129GB 32.2GB primary
5 129GB 161GB 32.2GB primary
6 161GB 177GB 16.1GB primary
7 177GB 193GB 16.1GB primary
8 193GB 209GB 16.1GB primary
9 209GB 225GB 16.1GB primary
10 225GB 242GB 16.1GB primary
11 242GB 564GB 322GB primary
12 564GB 886GB 322GB primary
13 886GB 1208GB 322GB primary
14 1208GB 1530GB 322GB primary
15 1530GB 1852GB 322GB primary
make-bcache
每个nvme的bcache分区和对应的HDD做成进行make-bcache操作。以sdb为例,命令参考如下。
[root@node01 ~]# make-bcache -C --discard -w 4K -b 2M --wipe-bcache /dev/nvme0n1p11 -B --writeback --wipe-bcache /dev/sdb
UUID: e67b87eb-8bea-48cd-8fd1-c2eee496eed4
Set UUID: 2a3d8718-6a22-470d-8dea-a18ec59e0eb9
version: 0
nbuckets: 153600
block_size: 8
bucket_size: 4096
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1
UUID: 3397aad8-3f43-48e1-a61c-9665a1f5fd36
Set UUID: 2a3d8718-6a22-470d-8dea-a18ec59e0eb9
version: 1
block_size: 8
data_offset: 16
[root@node01 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
……
sdb 8:16 0 7.8T 0 disk
└─bcache0 252:0 0 7.8T 0 disk
sdc 8:32 0 7.8T 0 disk
……
nvme0n1 259:0 0 1.8T 0 disk
├─nvme0n1p1 259:19 0 30G 0 part
├─nvme0n1p2 259:20 0 30G 0 part
……
├─nvme0n1p9 259:27 0 15G 0 part
├─nvme0n1p10 259:28 0 15G 0 part
├─nvme0n1p11 259:29 0 300G 0 part
│ └─bcache0 252:0 0 7.8T 0 disk
├─nvme0n1p12 259:30 0 300G 0 part
……
按照上面的方式确保所有HDD和对应的bcache分区进行绑定。
Ceph集群部署和配置
安装Ceph基础库
安装ceph基础库,在node01-node04节点上执行,如下以ceph01为例,每个节点均安装ceph及其依赖包。
[root@node01 ~]# yum install -y ceph
Unable to connect to Registration Management Service
Last metadata expiration check: 0:03:54 ago on Mon 25 Dec 2023 01:49:08 AM CST.
Dependencies resolved.
===============================================================================
Package Architecture Version Repository Size
===============================================================================
Installing:
ceph x86_64 2:14.2.10-0.el8 local 6.3 k
Installing dependencies:
ceph-base x86_64 2:14.2.10-0.el8 local 5.3 M
ceph-common x86_64 2:14.2.10-0.el8 local 19 M
……
python3-xmlsec-1.3.3-7.el8.x86_64 python3-zc-lockfile-2.0-2.el8.noarch rdma-core-26.0-8.el8.x86_64
Complete!
安装配置mon服务
创建uuid,node01节点执行如下,记录uuid。
[root@node01 ~]# uuidgen
3fd206a8-b655-4d7d-9dab-b70b6f3e065b
创建令牌环,node01节点执行如下。
[root@node01 ~]# ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
creating /tmp/ceph.mon.keyring
[root@node01 ~]# ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' --cap mgr 'allow *'
creating /etc/ceph/ceph.client.admin.keyring
[root@node01 ~]# ceph-authtool --create-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring --gen-key -n client.bootstrap-osd --cap mon 'profile bootstrap-osd' --cap mgr 'allow r'
creating /var/lib/ceph/bootstrap-osd/ceph.keyring
[root@node01 ~]# ceph-authtool /tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
importing contents of /etc/ceph/ceph.client.admin.keyring into /tmp/ceph.mon.keyring
[root@node01 ~]# ceph-authtool /tmp/ceph.mon.keyring --import-keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
importing contents of /var/lib/ceph/bootstrap-osd/ceph.keyring into /tmp/ceph.mon.keyring
修改/tmp/ceph.mon.keyring属主为ceph,检查是否修改成功。在node01节点执行。
[root@node01 ~]# chown ceph:ceph /tmp/ceph.mon.keyring
[root@node01 ~]# ls -l /tmp/ceph.mon.keyring
-rw------- 1 ceph ceph 357 Dec 25 18:36 /tmp/ceph.mon.keyring
配置monmap,--add参数后跟mon节点的主机名和public IP地址,--fsid后跟上面步骤记录的uuid值。只需要在node01节点执行。
[root@node01 ~]# monmaptool --create --add node01 188.188.13.1 --add node02 188.188.13.2 --add node03 188.188.13.3 --fsid 3fd206a8-b655-4d7d-9dab-b70b6f3e065b /tmp/monmap
monmaptool: monmap file /tmp/monmap
monmaptool: set fsid to 3fd206a8-b655-4d7d-9dab-b70b6f3e065b
monmaptool: writing epoch 0 to /tmp/monmap (3 monitors)
[root@node01 ~]#
在node01节点配置ceph.conf配置文件。在/etc/ceph/目录下创建ceph.conf配置文件,文件内容如下,fsid后跟uuid,mon initial members后跟所有mon节点主机名,mon host后跟所有mon节点public IP,public network后跟public网段,格式如下,cluster network后跟cluster网段,格式如下。
[root@node01 ~]# vi /etc/ceph/ceph.conf
[root@node01 ~]# cat /etc/ceph/ceph.conf
[global]
fsid = 3fd206a8-b655-4d7d-9dab-b70b6f3e065b
mon initial members = node01,node02,node03
mon host = 188.188.13.1,188.188.13.2,188.188.13.3
public network = 188.188.13.0/24
cluster network = 10.10.13.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 256
osd pool default pgp num = 256
osd crush chooseleaf type = 1
mon_allow_pool_delete = true
将node01节点的/etc/ceph目录拷贝至所有其他节点。下述操作在node01节点执行。
[root@node01 ~]# for i in {2..4};do scp -r /etc/ceph/ node0$i:/etc/ ;done
rbdmap 100% 92 164.6KB/s 00:00
ceph.client.admin.keyring 100% 151 98.8KB/s 00:00
ceph.conf 100% 500 825.7KB/s 00:00
rbdmap 100% 92 98.5KB/s 00:00
ceph.client.admin.keyring 100% 151 169.6KB/s 00:00
ceph.conf 100% 500 927.0KB/s 00:00
rbdmap 100% 92 131.1KB/s 00:00
ceph.client.admin.keyring 100% 151 189.0KB/s 00:00
ceph.conf 100% 500 714.5KB/s 00:00
[root@node01 ~]#
将node01节点的/tmp/monmap /tmp/ceph.mon.keyring文件拷贝至所有其他ceph节点的/tmp目录下,并修改属主为ceph。
[root@node01 ~]# for i in {2..4};do scp /tmp/monmap /tmp/ceph.mon.keyring node0$i:/tmp/;done
monmap 100% 313 179.3KB/s 00:00
ceph.mon.keyring 100% 357 400.3KB/s 00:00
monmap 100% 313 246.1KB/s 00:00
ceph.mon.keyring 100% 357 532.0KB/s 00:00
monmap 100% 313 119.2KB/s 00:00
ceph.mon.keyring 100% 357 341.8KB/s 00:00
[root@node01 ~]# ssh node01 'cd /tmp/;chown -R ceph:ceph /tmp/monmap /tmp/ceph.mon.keyring'
[root@node01 ~]# ssh node02 'cd /tmp/;chown -R ceph:ceph /tmp/monmap /tmp/ceph.mon.keyring'
[root@node01 ~]# ssh node03 'cd /tmp/;chown -R ceph:ceph /tmp/monmap /tmp/ceph.mon.keyring'
[root@node01 ~]# ssh node04 'cd /tmp/;chown -R ceph:ceph /tmp/monmap /tmp/ceph.mon.keyring'
[root@node01 ~]# for i in {1..4};do ssh node0$i 'ls -l /tmp/|grep ceph';done
-rw------- 1 ceph ceph 357 Dec 25 18:36 ceph.mon.keyring
-rw-r--r-- 1 ceph ceph 313 Dec 25 19:00 monmap
-rw------- 1 ceph ceph 357 Dec 25 19:09 ceph.mon.keyring
-rw-r--r-- 1 ceph ceph 313 Dec 25 19:09 monmap
-rw------- 1 ceph ceph 357 Dec 25 19:09 ceph.mon.keyring
-rw-r--r-- 1 ceph ceph 313 Dec 25 19:09 monmap
-rw------- 1 ceph ceph 357 Dec 25 19:09 ceph.mon.keyring
-rw-r--r-- 1 ceph ceph 313 Dec 25 19:09 monmap
[root@node01 ~]#
在所有mon节点创建mon数据目录,并修改属主为ceph。在node01节点执行如下。
[root@node01 ~]# for i in {1..3};do ssh node0$i 'mkdir /var/lib/ceph/mon/ceph-`hostname`';done
[root@node01 ~]# for i in {1..3};do ssh node0$i 'chown -R ceph:ceph /var/lib/ceph/mon/ceph-`hostname`';done
[root@node01 ~]# for i in {1..3};do ssh node0$i 'ls -l /var/lib/ceph/mon/';done
total 0
drwxr-xr-x 2 ceph ceph 6 Dec 25 21:00 ceph-node01
total 0
drwxr-xr-x 2 ceph ceph 6 Dec 25 21:00 ceph-node02
total 0
drwxr-xr-x 2 ceph ceph 6 Dec 25 21:00 ceph-node03
[root@node01 ~]#
在所有mon节点上配置mon服务,在node01节点执行命令如下。
[root@node01 ~]# for i in {1..3};do ssh node0$i 'ceph-mon --mkfs -i `hostname` --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring';done
[root@node01 ~]#
再次修改所有mon节点/var/lib/ceph/mon/下属主为ceph。在node01节点执行如下。
[root@node01 ~]# for i in {1..3};do ssh node0$i 'chown -R ceph:ceph /var/lib/ceph/mon/ceph-`hostname`';done
[root@node01 ~]# for i in {1..3};do ssh node0$i 'ls -l /var/lib/ceph/mon/ceph-`hostname`';done
total 8
-rw------- 1 ceph ceph 77 Dec 25 21:02 keyring
-rw------- 1 ceph ceph 8 Dec 25 21:02 kv_backend
drwxr-xr-x 2 ceph ceph 112 Dec 25 21:02 store.db
total 8
-rw------- 1 ceph ceph 77 Dec 25 21:02 keyring
-rw------- 1 ceph ceph 8 Dec 25 21:02 kv_backend
drwxr-xr-x 2 ceph ceph 112 Dec 25 21:02 store.db
total 8
-rw------- 1 ceph ceph 77 Dec 25 21:02 keyring
-rw------- 1 ceph ceph 8 Dec 25 21:02 kv_backend
drwxr-xr-x 2 ceph ceph 112 Dec 25 21:02 store.db
[root@node01 ~]#
所有mon节点启动ceph-mon服务,在ceph01节点执行如下。
[root@node01 ~]# for i in {1..3};do ssh node0$i 'systemctl start ceph-mon@`hostname`';done
[root@node01 ~]# for i in {1..3};do ssh node0$i 'systemctl status ceph-mon@`hostname`|grep active';done
Active: active (running) since Mon 2023-12-25 21:05:18 CST; 1min 9s ago
Active: active (running) since Mon 2023-12-25 21:05:19 CST; 1min 9s ago
Active: active (running) since Mon 2023-12-25 21:05:19 CST; 1min 9s ago
[root@node01 ~]#
所有mon节点开启msgr2,检查ceph状态。
[root@node01 ~]# for i in {1..3};do ssh node0$i 'ceph mon enable-msgr2';done
[root@node01 ~]# ceph -s
cluster:
id: 3fd206a8-b655-4d7d-9dab-b70b6f3e065b
health: HEALTH_OK
services:
mon: 3 daemons, quorum node01,node02,node03 (age 10s)
mgr: no daemons active
osd: 0 osds: 0 up, 0 in
data:
……
安装配置mgr服务
mgr安装配置,如无特殊说明,本小节命令在node01节点执行。
创建目录并注册。
[root@node01 ~]# mkdir /var/lib/ceph/mgr/ceph-node01_mgr/
[root@node01 ~]# ceph auth get-or-create mgr.node01_mgr mon 'allow profile mgr' osd 'allow *' mds 'allow *' > /var/lib/ceph/mgr/ceph-node01_mgr/keyring
配置mgr服务。
[root@node01 ~]# ceph-mgr -i node01_mgr
[root@node01 ~]# systemctl start ceph-mgr@node01_mgr
[root@node01 ~]# systemctl enable ceph-mgr@node01_mgr
Created symlink /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@node01_mgr.service → /usr/lib/systemd/system/ceph-mgr@.service.
[root@node01 ~]# systemctl status ceph-mgr@node01_mgr
● ceph-mgr@node01_mgr.service - Ceph cluster manager daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2023-12-25 21:17:19 CST; 16s ago
Main PID: 75567 (ceph-mgr)
……
安装配置osd服务
创建OSD,将/var/lib/ceph/bootstrap-osd/ceph.keyring文件同步到所有OSD节点,在ceph01节点执行如下。
[root@node01 ~]# for i in {2..4};do scp /var/lib/ceph/bootstrap-osd/ceph.keyring node0$i:/var/lib/ceph/bootstrap-osd/;done
ceph.keyring 100% 129 589.3KB/s 00:00
ceph.keyring 100% 129 515.3KB/s 00:00
ceph.keyring 100% 129 527.2KB/s 00:00
创建osd,每个bcacheX(X为0-19)部署1个osd服务,对应nvme上wal分区15G,db分区30G,cache分区300G。
这里以其中一个osd为例,观察lsblk的输出。
[root@node01 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
……
sdb 8:16 0 7.8T 0 disk
└─bcache0 252:0 0 7.8T 0 disk
……
sdu 65:64 0 7.8T 0 disk
└─bcache19 252:2432 0 7.8T 0 disk
nvme0n1 259:0 0 1.8T 0 disk
├─nvme0n1p1 259:19 0 30G 0 part
……
├─nvme0n1p6 259:24 0 15G 0 part
├─nvme0n1p7 259:25 0 15G 0 part
├─nvme0n1p8 259:26 0 15G 0 part
├─nvme0n1p9 259:27 0 15G 0 part
├─nvme0n1p10 259:28 0 15G 0 part
├─nvme0n1p11 259:29 0 300G 0 part
│ └─bcache0 252:0 0 7.8T 0 disk
……
bcache0对应的wal分区为nvme0n1p6,对应的db分区为nvme0n1p1。
[root@node01 ~]# ceph-volume lvm create --data /dev/bcache0 --block.wal /dev/nvme0n1p6 --block.db /dev/nvme0n1p1
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new a8bd184d-00b2-4d84-a674-6d65e7e2904a
Running command: /usr/sbin/vgcreate --force --yes ceph-eca15c09-2ee6-42f1-a088-40f7fac1b92d /dev/bcache0
stdout: Physical volume "/dev/bcache0" successfully created.
……
Running command: /usr/bin/systemctl enable --runtime ceph-osd@0
stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@0.service → /usr/lib/systemd/system/ceph-osd@.service.
Running command: /usr/bin/systemctl start ceph-osd@0
--> ceph-volume lvm activate successful for osd ID: 0
--> ceph-volume lvm create successful for: /dev/bcache0
[root@node01 ~]#
查看集群状态。
[root@node01 ~]# ceph -s
cluster:
id: 3fd206a8-b655-4d7d-9dab-b70b6f3e065b
health: HEALTH_WARN
OSD count 1 < osd_pool_default_size 3
services:
mon: 3 daemons, quorum node01,node02,node03 (age 26m)
mgr: node01_mgr(active, starting, since 0.990093s)
osd: 1 osds: 1 up (since 2m), 1 in (since 2m)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 31 GiB used, 7.8 TiB / 7.8 TiB avail
pgs:
[root@node01 ~]#
按照此方式,将所有节点的所有bcacheX创建osd,并指定wal和db分区。
全部完成osd创建后查看集群状态。
[root@node01 ~]# ceph -s
cluster:
id: 3fd206a8-b655-4d7d-9dab-b70b6f3e065b
health: HEALTH_OK
services:
mon: 3 daemons, quorum node01,node02,node03 (age 59m)
mgr: node01_mgr(active, starting, since 1.52551s)
osd: 80 osds: 80 up (since 41s), 80 in (since 41s)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 2.4 TiB used, 625 TiB / 627 TiB avail
pgs:
[root@node01 ~]#
至此,Ceph集群已经完成部署。下一篇文章介绍部署块、文件和对象存储的应用。