1.etcd报错日志如下:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 42m (x4 over 97m) kubelet, node-1 Unable to attach or mount volumes: unmounted volumes=[datadir], unattached volumes=[default-token-dqm5l datadir]: timed out waiting for the condition
Warning FailedMount 8m44s (x35 over 122m) kubelet, node-1 Unable to attach or mount volumes: unmounted volumes=[datadir], unattached volumes=[datadir default-token-dqm5l]: timed out waiting for the condition
Warning FailedMount 3m36s (x70 over 121m) kubelet, node-1 (combined from similar events): MountVolume.WaitForAttach failed for volume "pvc-aa058e40-e8bd-4dc4-9baa-f4f8f09fbb52" : fail to check rbd image status with: (exit status 108), rbd output: (2020-03-31 14:14:29.833755 7fc905de3d40 -1 did not load config file, using default settings.
2020-03-31 14:14:29.842372 7fc905de3d40 -1 Errors while parsing config file!
2020-03-31 14:14:29.842382 7fc905de3d40 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-03-31 14:14:29.842383 7fc905de3d40 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-03-31 14:14:29.842385 7fc905de3d40 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2020-03-31 14:14:29.844940 7fc905de3d40 -1 Errors while parsing config file!
2020-03-31 14:14:29.844948 7fc905de3d40 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2020-03-31 14:14:29.844949 7fc905de3d40 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2020-03-31 14:14:29.844950 7fc905de3d40 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
2020-03-31 14:14:29.892299 7fc905de3d40 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2020-03-31 14:14:29.899423 7fc8e4ff9700 -1 librbd::image::OpenRequest: failed to stat v2 image header: (108) Cannot send after transport endpoint shutdown
2020-03-31 14:14:29.899602 7fc8dbfff700 -1 librbd::ImageState: 0x5614aa2e7e20 failed to open image: (108) Cannot send after transport endpoint shutdown
rbd: error opening image kubernetes-dynamic-pvc-25ef75ac-7305-11ea-8e50-0a580ae80056: (108) Cannot send after transport endpoint shutdown
2.可以看出是无法umont或mount volume,其中后端存储是ceph,且状态为waring, 其他正常
[root@busybox-openstack-d56f9fdf9-5qss7 /]$ ceph -s
cluster:
id: 5d00f823-8fd0-4160-a3c2-b5a9bc53008c
health: HEALTH_WARN
Degraded data redundancy: 17404/66867 objects degraded (26.028%), 119 pgs degraded, 288 pgs undersized
services:
mon: 3 daemons, quorum node-1,node-3,node-4
mgr: node-3(active)
osd: 8 osds: 7 up, 7 in
flags nodeep-scrub
rbd-mirror: 1 daemon active
rgw: 2 daemons active
data:
pools: 12 pools, 544 pgs
objects: 22.29k objects, 71.6GiB
usage: 164GiB used, 4.19TiB / 4.35TiB avail
pgs: 17404/66867 objects degraded (26.028%)
256 active+clean
169 active+undersized
119 active+undersized+degraded
io:
client: 25.6KiB/s rd, 685KiB/s wr, 6op/s rd, 73op/s wr
3.查看ceph 的黑名单,发现ceph blacklist里节点node-1在黑名单里,可能的原因可以查看以下参考链接
[root@node-1 ~]# kubectl exec -it -n openstack busybox-openstack-d56f9fdf9-5qss7 bash
cep()[root@busybox-openstack-d56f9fdf9-5qss7 /]# ceph osd blacklist ls
192.168.31.2:0/0 2021-03-30 15:04:07.367587
4.从ceph的黑名单中移除该节点,etcd启动正常
()[root@busybox-openstack-d56f9fdf9-5qss7 /]# ceph osd blacklist rm 192.168.31.2
un-blacklisting 192.168.31.2:0/0
参考链接: