一、环境准备
本次使用的实验环境是Vmware 16
1.1、实验节点规划
服务器角色 | 主机名 | IP地址 | 操作系统版本 | Ceph角色 | 硬件配置 | 备注 |
---|---|---|---|---|---|---|
跳板机 | jumpserver.shiyan.com | 172.172.8.11 | centos 7 | NTP server | 内存:4GB CPU:4个 磁盘: - 系统盘:20G(分区无要求) | 非必要的虚机 由于操作环境限制只能使用跳板机登陆其它节点 |
ceph-node1 | ceph-node1.shiyan.com | 172.172.8.50 | centos 7 | mon osd mgr mds radosgw | 内存:4GB CPU:4个 磁盘: - 系统盘:20G(分区无要求) - 数据盘:50G*3(不分区,不格式化) | |
ceph-node2 | ceph-node2.shiyan.com | 172.172.8.51 | centos 7 | mon osd mgr mds radosgw | 内存:4GB CPU:4个 磁盘: - 系统盘:20G(分区无要求) - 数据盘:50G*3(不分区,不格式化) | |
ceph-node3 | ceph-node3.shiyan.com | 172.172.8.52 | centos 7 | mon osd mgr mds radosgw | 内存:4GB CPU:4个 磁盘: - 系统盘:20G(分区无要求) - 数据盘:50G*3(不分区,不格式化) | |
client | client.shiyan.com | 172.172.8.100 | centos 7 | client | 内存:4GB CPU:4个 磁盘: - 系统盘:20G(分区无要求) | 客户端 用于测试Ceph集群功能 |
1.2、系统镜像下载链接
http://mirrors.aliyun.com/centos/7.9.2009/isos/x86_64/?spm=a2c6h.25603864.0.0.4b38f5adDQBOAr
1.3、模版机操作系统初始化
1.3.1、修改网卡的命名规则
- 在配置文件/etc/default/grub的第六行的末尾添加net.ifnames=0 biosdevname=0
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet net.ifnames=0 biosdevname=0"
- 重新生成grub配置文件
grub2-mkconfig -o /boot/grub2/grub.cfg #重新生成grub配置文件
reboot
- 查看网卡信息
ifconfig | head -1
- 删除识别错误的网络配置
nmcli connection show
nmcli connection delete ens33
nmcli connection delete 有线连接\ 1
nmcli connection show
- 添加新的网卡名
nmcli connection add ifname eth0 con-name eth0 type ethernet
nmcli connection show
1.3.2、yum源配置
[root@jumpserver ~]# cd /etc/yum.repos.d/
[root@jumpserver yum.repos.d]# wget https://mirrors.aliyun.com/repo/Centos-7.repo
1.3.2.1、常用软件包安装
yum -y install lrzsz bash-completion net-tools vim wget
1.3.3、安装python3
https://blog.csdn.net/qq_37996012/article/details/134387394 源码安装参考链接
yum -y install python3
1.3.4、安装ansible
[root@localhost ~]# yum install -y epel-release
[root@localhost ~]# yum -y install ansible
1.3.5、关闭防火墙和selinux
[root@localhost ~]# vim /etc/selinux/config
SELINUX=disabled
[root@localhost ~]# systemctl disable firewalld --now
1.3.6、优化ssh远程连接
[root@localhost ~]# vim /etc/ssh/sshd_config
UseDNS no
[root@localhost ~]# systemctl restart sshd
以上初始化动作完成后,关机创建快照,然后使用完整克隆复制虚拟机
二、Ceph安装包获取
# 官方源
https://download.ceph.com/rpm-15.2.17/el7/x86_64/
# 清华源
https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-15.2.17/el7/x86_64/
配置Ceph的yum源
vi /etc/yum.repos.d/ceph.repo
#添加如下内容:
[Ceph]
name=Ceph packages for $basearch
baseurl=https://download.ceph.com/rpm-15.2.17/el7/$basearch
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1
[Ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-15.2.17/el7/noarch
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1
[ceph-source]
name=Ceph source packages
baseurl=https://download.ceph.com/rpm-15.2.17/el7/SRPMS
enabled=1
gpgcheck=0
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
priority=1
也可以将官网中的rpm包下载下来,放到本地创建自己的rpm仓库 https://download.ceph.com/rpm-18.2.1/el8/x86_64/
三、部署Ceph集群
3.1、部署前准备
3.1.1、跳板机部分
3.1.2、创建工作目录
[root@jumpserver ~]# mkdir ceph-cluster #为部署工具创建目录,存放密钥与配置文件
[root@jumpserver ~]# cd ceph-cluster/
[root@jumpserver ~]# ceph-cluster]# pwd
/root/ceph-cluster
3.1.3、自定义ansible配置文件
[root@jumpserver ceph-cluster]# vim ansible.cfg
[defaults]
inventory = iplist
host_key_checking = False #用来禁止ssh的指纹key字串检查
remote_user = root
3.1.4、创建主机清单
[root@jumpserver ceph-cluster]# vim iplist
[root@jumpserver ceph-cluster]# cat iplist
[ceph_node]
172.172.8.50 hostname=ceph-node1.shiyan.com ansible_ssh_pass=123456
172.172.8.51 hostname=ceph-node2.shiyan.com ansible_ssh_pass=123456
172.172.8.52 hostname=ceph-node3.shiyan.com ansible_ssh_pass=123456
[client]
172.172.8.100 hostname=client.shiyan.com ansible_ssh_pass=123456
3.1.5、跳板机配置免密登陆其它主机
[root@jumpserver ceph-cluster]# ssh-keygen # 一路回车
[root@jumpserver ceph-cluster]# ansible all -m authorized_key -a "user=root state=present key='{{ lookup('file', '/root/.ssh/id_rsa.pub') }}'"
3.1.6、配置实验节点主机名
[root@jumpserver ceph-cluster]# ansible all -m shell -a 'hostnamectl set-hostname {{hostname}}'
[root@jumpserver ceph-cluster]# ansible all -m shell -a 'hostname' -f 1
172.172.8.100 | CHANGED | rc=0 >>
client.shiyan.com
172.172.8.50 | CHANGED | rc=0 >>
ceph-node1.shiyan.com
172.172.8.51 | CHANGED | rc=0 >>
ceph-node2.shiyan.com
172.172.8.52 | CHANGED | rc=0 >>
ceph-node3.shiyan.com
3.1.7、配置/etc/hosts并同步到所有主机
[root@jumpserver ceph-cluster]# vim hosts.yaml
---
- name: Configure /etc/hosts
hosts: all
become: yes
tasks:
- name: Configure /etc/hosts
blockinfile:
path: /etc/hosts
block: |
172.172.8.50 ceph-node1.shiyan.com ceph-node1
172.172.8.51 ceph-node2.shiyan.com ceph-node2
172.172.8.52 ceph-node3.shiyan.com ceph-node3
172.172.8.100 client.shiyan.com client
[root@jumpserver ceph-cluster]# ansible-playbook hosts.yaml
[root@jumpserver ceph-cluster]# ansible all -m shell -a 'cat /etc/hosts'
3.1.8、配置NTP服务器
使用跳板机作为NTP服务器,其它节点作为客户端
# 服务端配置
[root@jumpserver ceph-cluster]# yum -y install chrony ntpdate
[root@jumpserver ceph-cluster]# ntpdate ntp.aliyun.com # 手动同步一次
[root@jumpserver ceph-cluster]# vim /etc/chrony.conf
[root@jumpserver ceph-cluster]# cat /etc/chrony.conf
server ntp.aliyun.com
rtcsync
allow 172.172.8.0/24
local stratum 10
logdir /var/log/chrony
[root@jumpserver ceph-cluster]# systemctl restart chronyd
[root@jumpserver ceph-cluster]# systemctl enable chronyd
# 配置客户端
## 创建配置文件模版
[root@jumpserver ceph-cluster]# vim chrony.conf.template
[root@jumpserver ceph-cluster]# cat chrony.conf.template
server 172.172.8.11
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
[root@jumpserver ceph-cluster]# vim chrony.yaml
---
- hosts: all
gather_facts: no
tasks:
- name: Install chrony ntpdate
yum:
name:
- chrony
- ntpdate
state: present
- name: Copy file chrony.conf.template to server
copy:
src: chrony.conf.template
dest: /etc/chrony.conf
- name: ntpdate 172.172.8.11
shell: ntpdate 172.172.8.11
- name: Restart service chronyd
service:
name: chronyd
state: restarted
enabled: true
[root@jumpserver ceph-cluster]# ansible-playbook chrony.yaml
3.1.2、ceph-node 三台节点部分
配置无密码连接(包括自己远程自己也不需要密码)
[root@ceph-node1 ~]# ssh-keygen -f /root/.ssh/id_rsa -N ''
[root@ceph-node1 ~]# for i in 50 51 52 100; do ssh-copy-id 172.172.8.$i; done
[root@ceph-node2 ~]# ssh-keygen -f /root/.ssh/id_rsa -N ''
[root@ceph-node2 ~]# for i in 50 51 52 100; do ssh-copy-id 172.172.8.$i; done
[root@ceph-node3 ~]# ssh-keygen -f /root/.ssh/id_rsa -N ''
[root@ceph-node3 ~]# for i in 50 51 52 100; do ssh-copy-id 172.172.8.$i; done
[root@client ~]# ssh-keygen -f /root/.ssh/id_rsa -N ''
[root@client ~]# for i in 50 51 52 100; do ssh-copy-id 172.172.8.$i; done
3.1.3、环境准备完成后,关机拍摄快照
[root@jumpserver ceph-cluster]# ansible all -m shell -a 'poweroff'
3.2、部署Ceph服务
3.2.1、在node1安装部署软件ceph-deploy
[root@ceph-node1 ~]# pip3 install ceph-deploy==2.0.1 -i https://mirrors.aliyun.com/pypi/simple/
[root@ceph-node1 ~]# ceph-deploy --version
2.0.1
[root@ceph-node1 ~]# mkdir ceph-cluster
[root@ceph-node1 ~]# cd ceph-cluster
[root@ceph-node1 ceph-cluster]#
3.2.2、给所有节点安装ceph相关软件包
[root@ceph-node1 ceph-cluster]# for i in ceph-node1 ceph-node2 ceph-node3
> do
> ssh $i "yum -y install ceph ceph-mon ceph-osd ceph-mds ceph-radosgw ceph-mgr"
> done
3.2.3、创建Ceph集群配置,在ceph-cluster目录下生成Ceph配置文件(ceph.conf)
[root@ceph-node1 ceph-cluster]# ceph-deploy new ceph-node1 ceph-node2 ceph-node3
[root@ceph-node1 ceph-cluster]# ll
总用量 16
-rw-r--r-- 1 root root 250 2月 5 14:55 ceph.conf
-rw-r--r-- 1 root root 4396 2月 5 14:55 ceph-deploy-ceph.log
-rw------- 1 root root 73 2月 5 14:55 ceph.mon.keyring
3.2.4、初始化所有节点的mon服务,启动mon服务。
拷贝当前目录的配置文件到所有节点的/etc/ceph/目录并启动mon服务。
[root@ceph-node1 ceph-cluster]# ceph-deploy mon create-initial
#配置文件ceph.conf中有三个mon的IP,ceph-deploy脚本知道自己应该远程谁
[root@ceph-node1 ceph-cluster]# cp -a *.keyring /etc/ceph/
[root@ceph-node1 ceph-cluster]# scp *.keyring 172.172.8.51:/etc/ceph/
[root@ceph-node1 ceph-cluster]# scp *.keyring 172.172.8.52:/etc/ceph/
# 使用以下命令也可以分发配置
ceph-deploy admin ceph-node1 ceph-node2 ceph-node3
3.2.5、查看ceph集群状态
[root@ceph-node1 ceph-cluster]# ceph -s
cluster:
id: abfa0578-6ba1-44cc-9942-fac8488ab2f9
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
services:
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 14m)
mgr: no daemons active
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
[root@ceph-node1 ceph-cluster]#
可以看到当前集群状态为HEALTH_WARN , 警告信息是“mons are allowing insecure global_id reclaim”
解决方案:禁用不安全模式
参考链接 https://blog.csdn.net/qq_40984972/article/details/123420421
[root@ceph-node1 ceph-cluster]# ceph config set mon auth_allow_insecure_global_id_reclaim false # 稍等几秒后
[root@ceph-node1 ceph-cluster]# ceph -s
cluster:
id: abfa0578-6ba1-44cc-9942-fac8488ab2f9
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 16m)
mgr: no daemons active
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
3.3、创建OSD
各节点磁盘信息
[root@ceph-node1 ceph-cluster]# for i in ceph-node1 ceph-node2 ceph-node3; do echo $i ; ssh $i "lsblk | grep -v sda | grep disk"; echo ; done
ceph-node1
sdb 8:16 0 50G 0 disk
sdc 8:32 0 50G 0 disk
sdd 8:48 0 50G 0 disk
ceph-node2
sdb 8:16 0 50G 0 disk
sdc 8:32 0 50G 0 disk
sdd 8:48 0 50G 0 disk
ceph-node3
sdb 8:16 0 50G 0 disk
sdc 8:32 0 50G 0 disk
sdd 8:48 0 50G 0 disk
3.3.1、初始化清空磁盘数据(仅node1操作即可,这个好像不是必要的动作,待确认)。
非必要动作,本次实验是全新的环境,不涉及脏数据
[root@ceph-node1 ceph-cluster]# ceph-deploy disk zap ceph-node1 /dev/sdb /dev/sdc
[root@ceph-node1 ceph-cluster]# ceph-deploy disk zap ceph-node2 /dev/sdb /dev/sdc
[root@ceph-node1 ceph-cluster]# ceph-deploy disk zap ceph-node3 /dev/sdb /dev/sdc
可能会有缺少模块的问题
pip3 install six pyyaml pecan werkzeug flask-restful
3.3.2、创建OSD
ceph-deploy osd create --data /dev/sdb ceph-node1
ceph-deploy osd create --data /dev/sdc ceph-node1
ceph-deploy osd create --data /dev/sdb ceph-node2
ceph-deploy osd create --data /dev/sdc ceph-node2
ceph-deploy osd create --data /dev/sdb ceph-node3
ceph-deploy osd create --data /dev/sdc ceph-node3
3.3.3、检查状态
[root@ceph-node2 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.29279 root default
-3 0.09760 host ceph-node1
0 hdd 0.04880 osd.0 up 1.00000 1.00000
1 hdd 0.04880 osd.1 up 1.00000 1.00000
-5 0.09760 host ceph-node2
2 hdd 0.04880 osd.2 up 1.00000 1.00000
3 hdd 0.04880 osd.3 up 1.00000 1.00000
-7 0.09760 host ceph-node3
4 hdd 0.04880 osd.4 up 1.00000 1.00000
5 hdd 0.04880 osd.5 up 1.00000 1.00000
[root@ceph-node2 ~]# ceph -s
cluster:
id: abfa0578-6ba1-44cc-9942-fac8488ab2f9
health: HEALTH_WARN
no active mgr
services:
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 80m)
mgr: no daemons active
osd: 6 osds: 6 up (since 40s), 6 in (since 40s)
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
3.4、部署MGR服务
[root@ceph-node1 ceph-cluster]# ceph-deploy mgr create ceph-node1 ceph-node2 ceph-node3
检查状态
[root@ceph-node1 ~]# ceph -s
cluster:
id: abfa0578-6ba1-44cc-9942-fac8488ab2f9
health: HEALTH_WARN
Module 'restful' has failed dependency: No module named 'pecan'
services:
mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 87s)
mgr: ceph-node3(active, since 10m), standbys: ceph-node2, ceph-node1
osd: 6 osds: 6 up (since 10m), 6 in (since 101m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 6.0 GiB used, 294 GiB / 300 GiB avail
pgs: 1 active+clean
Module ‘restful’ has failed dependency: No module named ‘pecan’ 缺少模块问题
pip3 install pecan werkzeug
安装完成后重启服务器