Centos7.9上利用cephadm安装Ceph Octopus 15.2的采坑记录,附带K8S挂载方法
0.亮点
1 准备
1.1 修改历史记录
1、修改profile文件
(1)修改history增加日期时间:在最后增加HISTTIMEFORMAT
(2)增加history记录的条数:修改histsize, 默认1000条改为10000条
# vi /etc/profile
export HISTTIMEFORMAT='%F %T '
HISTSIZE=10000
2、 使配置生效
# source /etc/profile
1.2 升级系统内核
1、检查现有内核:
# uname -rs
2、在 CentOS7 上启用 ELRepo 仓库
导入该源的秘钥
# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
3、Centos7启用该源仓库
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
Cento8:启用该源仓库
# yum install https://www.elrepo.org/elrepo-release-8.0-1.el8.elrepo.noarch.rpm
Retrieving http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
Retrieving http://elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
Preparing... ################################# [100%]
Updating / installing...
1:elrepo-release-7.0-4.el7.elrepo ################################# [100%]
4、查看有哪些内核版本可供安装
# yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* elrepo-kernel: mirrors.neusoft.edu.cn
elrepo-kernel | 3.0 kB 00:00:00
elrepo-kernel/primary_db | 2.0 MB 00:00:16
Available Packages
elrepo-release.noarch 7.0-5.el7.elrepo elrepo-kernel kernel-lt.x86_64 5.4.113-1.el7.elrepo elrepo-kernel kernel-lt-devel.x86_64 5.4.113-1.el7.elrepo elrepo-kernel kernel-lt-doc.noarch 5.4.113-1.el7.elrepo elrepo-kernel kernel-lt-headers.x86_64 5.4.113-1.el7.elrepo elrepo-kernel kernel-lt-tools.x86_64 5.4.113-1.el7.elrepo elrepo-kernel kernel-lt-tools-libs.x86_64 5.4.113-1.el7.elrepo elrepo-kernel kernel-lt-tools-libs-devel.x86_64 5.4.113-1.el7.elrepo elrepo-kernel kernel-ml.x86_64 5.11.15-1.el7.elrepo elrepo-kernel kernel-ml-devel.x86_64 5.11.15-1.el7.elrepo elrepo-kernel kernel-ml-doc.noarch 5.11.15-1.el7.elrepo elrepo-kernel kernel-ml-headers.x86_64 5.11.15-1.el7.elrepo elrepo-kernel kernel-ml-tools.x86_64 5.11.15-1.el7.elrepo elrepo-kernel kernel-ml-tools-libs.x86_64 5.11.15-1.el7.elrep o elrepo-kernel kernel-ml-tools-libs-devel.x86_64 5.11.15-1.el7.elrepo elrepo-kernel perf.x86_64 5.11.15-1.el7.elrepo elrepo-kernel python-perf.x86_64
5、安装的长期稳定版本lt(推荐,!!!不要安装不稳定版本)
# yum --enablerepo=elrepo-kernel install kernel-lt -y
6、解决无法启动问题
本步一定要做,否则无法启动:问题解决:centos7更换内核后,启动时出现:pstore: unknown compression: deflate,并卡死
# vi /etc/default/grub
在 GRUB_CMDLINE_LINUX 最后添加 mgag200.modeset=0,即:
GRUB_CMDLINE_LINUX="crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet mgag200.modeset=0"
7、 重新生成grub
# grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
8、查看当前的启动顺序
# awk -F\' '$1=="menuentry " {print i++ " : " $2}' /boot/efi/EFI/centos/grub.cfg
0 : CentOS Linux (5.4.113-1.el7.elrepo.x86_64) 7 (Core)
1 : CentOS Linux (3.10.0-1160.el7.x86_64) 7 (Core)
2 : CentOS Linux (0-rescue-022eb2e91fc04884b3ecfa2051d8e32a) 7 (Core)
9、调整默认启动内核
# grub2-set-default "CentOS Linux (5.4.113-1.el7.elrepo.x86_64) 7 (Core)"
10、查看当前默认启动内核
# grub2-editenv list
saved_entry=CentOS Linux (5.4.113-1.el7.elrepo.x86_64) 7 (Core)
11、 重启系统
# reboot
12、检查当前启动的内核版本:
# uname -rs
1.3 配置免密登录
1、每个节点:编辑hosts文件,建立HOSTNAME
# sudo vi /etc/hosts
10.14.83.183 ceph1 ceph1
10.14.83.184 ceph2 ceph2
10.14.83.119 ceph3 ceph3
2、每个节点:添加ceph用户(注意用户名不要用ceph,ceph系统内部保留用户,会无法安装)
ceph1:# useradd cephpmsc
ceph2:# useradd cephpmsc
ceph3:# useradd cephpmsc
3、每个节点:添加ceph用户密码
ceph1:# passwd cephpmsc
ceph1:# B9N5SjkXJd*****
ceph2:# passwd cephpmsc
ceph2:# B9N5SjkXJd*****
ceph3:# passwd cephpmsc
ceph3:# B9N5SjkXJd*****
4、每个节点:sudo权限
ceph1:# echo "cephpmsc ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/cephpmsc
ceph1:# sudo chmod 0440 /etc/sudoers.d/cephpmsc
ceph2:# echo "cephpmsc ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/cephpmsc
ceph2:# sudo chmod 0440 /etc/sudoers.d/cephpmsc
ceph3:# echo "cephpmsc ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/cephpmsc
ceph3:# sudo chmod 0440 /etc/sudoers.d/cephpmsc
5、每个节点:切换到ceph用户
ceph1: # su - cephpmsc
ceph2: # su - cephpmsc
ceph3: # su - cephpmsc
6、每个节点建立config:
$: mkdir ~/.ssh
$: vi ~/.ssh/config
Host ceph1
Hostname ceph1
User cephpmsc
Host ceph2
Hostname ceph2
User cephpmsc
Host ceph3
Hostname ceph3
User cephpmsc
7、对config修改权限(否则会提示Bad owner or permissions on .ssh/config的解决)
ceph1: $ sudo chmod 700 ~/.ssh/config
ceph2: $ sudo chmod 700 ~/.ssh/config
ceph3: $ sudo chmod 700 ~/.ssh/config
ceph1: $ sudo chmod 600 ~/.ssh/authorized_keys
ceph2: $ sudo chmod 600 ~/.ssh/authorized_keys
ceph3: $ sudo chmod 600 ~/.ssh/authorized_keys
ceph1: $ sudo chmod 755 ~/.ssh
ceph2: $ sudo chmod 755 ~/.ssh
ceph3: $ sudo chmod 755 ~/.ssh
8、每个节点:生成SSH密钥(用ceph用户),直接回车两次
ceph1: $ssh-keygen
ceph2: $ssh-keygen
ceph3: $ssh-keygen
9、把公钥拷贝到各Ceph节点(已经建立用户,添加了hosts文件,不用给本机拷贝,输入yes, 密码B9N5SjkXJ******)
ceph1: $ssh-copy-id cephpmsc@ceph2
ceph1: $ssh-copy-id cephpmsc@ceph3
ceph2: $ssh-copy-id cephpmsc@ceph1
ceph2: $ssh-copy-id cephpmsc@ceph3
ceph3: $ssh-copy-id cephpmsc@ceph1
ceph3: $ssh-copy-id cephpmsc@ceph2
10、测试免密登录(注意本机也需要)
ceph1: $ssh cephpmsc@ceph1
ceph1: $ssh cephpmsc@ceph2
ceph1: $ssh cephpmsc@ceph3
ceph2: $ssh cephpmsc@ceph1
ceph2: $ssh cephpmsc@ceph2
ceph2: $ssh cephpmsc@ceph3
ceph3: $ssh cephpmsc@ceph1
ceph3: $ssh cephpmsc@ceph2
ceph3: $ssh cephpmsc@ceph3
问题1:免密登录失败,检查安全日志
# sudo tail /var/log/secure -n 20
发现:
Apr 19 14:56:19 ceph2 sshd[2748]: Authentication refused: bad ownership or modes for directory /home/cephpmsc/.ssh
解决办法:
用户目录权限为 755 或者 700,不能是77x。
检查:
$ sudo ls ~/ -la
drwxrwxr-x. 2 cephpmsc cephpmsc 94 Apr 19 14:58 .ssh
改为:
$ sudo chmod 755 ~/.ssh
drwxr-xr-x. 2 cephpmsc cephpmsc 94 Apr 19 14:45 .ssh
重新执行
# ssh-copy-id cephpmsc@ceph2
ceph1: $ssh cephpmsc@ceph2
1.4升级容器服务
centos7中docker必须升级到docker-ce 20版本以上,或者安装podman。注意,集群各节点不建议混用docker和podman。
1、请先卸载旧版本docker
# sudo yum remove docker docker-common container-selinux docker-selinux docker-engine docker-ce
2、更新yum
# yum update
3、安装 yum-utils,它提供了 yum-config-manager,可用来管理yum源
#sudo yum install -y yum-utils
4、添加yum源
# sudo yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
Loaded plugins: fastestmirror
adding repo from: http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
grabbing file http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo to /etc/yum.repos.d/docker-ce.repo
repo saved to /etc/yum.repos.d/docker-ce.repo
5、更新索引
# sudo yum makecache fast
6、安装 docker-ce
# sudo yum install -y docker-ce
7、启动 docker
# sudo systemctl start docker
8、验证是否安装成功
# sudo docker info
9、或者可以改用podman,集群内所有节点安装podman服务(跟升级docker-ce二选一)
# yum install -y podman
1.5 配置时间同步
1、安装chrony
需要在所有节点都开启时间同步服务,否则无法满足分布一致性协议要求,健康状态会报警
由于容器内部采用Centos8,已经不支持ntp服务,改用了新的时间服务chrony,需要安装:
# yum install chrony -y
2、启动服务并加入开机自启动
# systemctl enable chronyd.service
# systemctl start chronyd.service
# systemctl status chronyd.service
● chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-08-24 14:32:11 CST; 5s ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Process: 2743 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
Process: 2737 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 2740 (chronyd)
Tasks: 1
Memory: 2.6M
CGroup: /system.slice/chronyd.service
└─2740 /usr/sbin/chronyd
3、设置当前系统为Asia/Shanghai上海时区:
$ timedatectl set-timezone Asia/Shanghai
4、 设置完时区后,强制同步下系统时钟:
$ chronyc -a makestep
200 OK
5、设置硬件时间,硬件时间默认为UTC:
$ timedatectl set-local-rtc 1
6、启用NTP时间同步:
$ timedatectl set-ntp yes
7、校准时间服务器:
$ chronyc tracking
Reference ID : 55D62674 (mail.light-speed.de)
Stratum : 3
Ref time (UTC) : Sat May 22 01:18:49 2021
System time : 0.016832426 seconds fast of NTP time
Last offset : +0.006914454 seconds
RMS offset : 0.014994740 seconds
Frequency : 26.872 ppm slow
Residual freq : +1.933 ppm
Skew : 132.852 ppm
Root delay : 0.404872686 seconds
Root dispersion : 0.017830199 seconds
Update interval : 64.7 seconds
Leap status : Normal
7、修改chrony配置改为内网源(未来无法访问外网)
[root@ceph3 ~]# vi /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 10.14.83.52 iburst
server 10.0.64.17 iburst
8、重启chronyd服务,配置生效
[root@ceph3 ~]# systemctl restart chronyd
9、验证chronyd服务配置
[root@ceph3 ~]# chronyc sources -v
210 Number of sources = 1
.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| / '?' = unreachable, 'x' = time may be in error, <