记录一次k8s节点故障的解决记录

现象1:一直有一个节点未准备

[root@master ~]# kubectl get nodes 
NAME STATUS ROLES AGE VERSION 
master Ready master 60m v1.17.0 
node1 NotReady <none> 30m v1.17.0 
node2 Ready <none> 29m v1.17.0

现象2:有一个flannel显示ImagePullBackOff

[root@master ~]# kubectl get pods -n kube-system 
NAME READY STATUS RESTARTS AGE 
coredns-9d85f5447-r2dx6 1/1 Running 0 64m 
coredns-9d85f5447-zskjc 1/1 Running 0 64m 
etcd-master 1/1 Running 0 64m 
kube-apiserver-master 1/1 Running 0 64m 
kube-controller-manager-master 1/1 Running 0 64m 
kube-flannel-ds-7bknh 1/1 Running 0 33m 
kube-flannel-ds-9xwsr 0/1 Init:ImagePullBackOff 1 35m 
kube-flannel-ds-tspl2 1/1 Running 0 44m 
kube-proxy-ggd7p 1/1 Running 1 35m 
kube-proxy-m8ljk 1/1 Running 0 64m 
kube-proxy-xrt7c 1/1 Running 0 33m 
kube-scheduler-master 1/1 Running 0 64m

现象3:查看kube-flannel-ds-9xwsr 发现是pull镜像超时

[root@master ~]# kubectl describe pod -n kube-system kube-flannel-ds-9xwsr 
Name: kube-flannel-ds-9xwsr 
Namespace: kube-system 
Priority: 2000001000 . . 
Events: 
Type Reason Age From Message 
---- ------ ---- ---- ------- 
Normal Scheduled 46m default-scheduler Successfully assigned kube-system/kube-flannel-ds-9xwsr to node1 
Normal Pulling 46m kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0" 
Normal Pulled 46m kubelet, node1 Successfully pulled image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0" 
Normal Created 46m kubelet, node1 Created container install-cni-plugin Normal Started 46m kubelet, node1 Started container install-cni-plugin 
Normal Pulling 46m kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1" 
Normal SandboxChanged 17m kubelet, node1 Pod sandbox changed, it will be killed and re-created. 
Normal Started 17m kubelet, node1 Started container install-cni-plugin Normal Created 17m kubelet, node1 Created container install-cni-plugin 
Normal Pulled 17m kubelet, node1 Container image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0" already present on machine 
Normal Pulling 10m (x4 over 17m) kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1" 
Warning Failed 9m23s (x4 over 15m) kubelet, node1 Error: ErrImagePull 
Warning Failed 9m11s (x5 over 15m) kubelet, node1 Error: ImagePullBackOff 
Warning Failed 6m24s (x5 over 15m) kubelet, node1 Failed to pull image "rancher/mirrored-flannelcni-flannel:v0.16.1": rpc error: code = Unknown desc = context canceled 
Normal BackOff 112s (x23 over 15m) kubelet, node1 Back-off pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1"

解决过程:

在故障节点尝试的操作:

1、重启故障节点(未能解决)

2、尝试启动停止的容器,发现启动不了

3、重启daemon和docker,未能解决

4、停止运行中的容器,并删除未启动的容器,故障解决

具体操作过程如下所示

#查看容器运行状态(此处是为了方便和重启后做对比,此处有6个容器)

[root@node1 ~]# docker ps -a 
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 
b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 2 minutes ago Up 2 minutes k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 
46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 2 minutes ago Up 2 minutes k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 
12326131af76 cd5235cd7dc2 "cp -f /flannel /opt…" 9 minutes ago Exited (0) 2 minutes ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1 
2f7b7ceec68e 7d54289267dc "/usr/local/bin/kube…" 10 minutes ago Exited (2) 2 minutes ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_1 
268dc494222a registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 minutes ago Up 9 minutes k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1 
b1ac093e353f registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 minutes ago Exited (0) 2 minutes ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907 

 #重新加载daemon 和重启docker服务

#重新加载daemon 
[root@node1 ~]# systemctl daemon-reload 
#重启docker服务 [root@node1 ~]# systemctl restart docker 
#再次查看容器运行状态,发现多了2个Exited状态的容器和1个Created状态的容器总计有9个容器
[root@node1 ~]# docker ps -a 
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 
6a742f121b59 cd5235cd7dc2 "cp -f /flannel /opt…" 10 seconds ago Exited (0) 9 seconds ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_0 
36e69e7a3dcd 7d54289267dc "/usr/local/bin/kube…" 10 seconds ago Up 9 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3 
95b0547dbf88 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 seconds ago Up 10 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3 
cd90ee8a56cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 seconds ago Up 9 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_3 
b1237af848cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 11 seconds ago Created k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_2 
b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 16 minutes ago Exited (2) 11 seconds ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 
46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 16 minutes ago Exited (0) 11 seconds ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 
268dc494222a registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 23 minutes ago Exited (0) 11 seconds ago k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1 
b1ac093e353f registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 23 minutes ago Exited (0) 16 minutes ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d690

#过了一段时间之后发现,刚刚新增的3个容器消失了,有变回了6个

[root@node1 ~]# docker ps -a 
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 
6a742f121b59 cd5235cd7dc2 "cp -f /flannel /opt…" 50 seconds ago Exited (0) 49 seconds ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_0 
36e69e7a3dcd 7d54289267dc "/usr/local/bin/kube…" 50 seconds ago Up 49 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3 
95b0547dbf88 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 50 seconds ago Up 49 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3 
cd90ee8a56cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 50 seconds ago Up 49 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_3 
b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 16 minutes ago Exited (2) 51 seconds ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 
46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 16 minutes ago Exited (0) 51 seconds ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d690

##停止运行中的容器 

[root@node1 ~]# docker stop 36e69e7a3dcd 95b0547dbf88 cd90ee8a56cf 
36e69e7a3dcd 
95b0547dbf88 
cd90ee8a56cf 

#删除所有容器,发现有4个容器无法删除提示在运行 

[root@node1 ~]# docker container rm $(docker ps -qa) 
11aaee411eb8 
e2efad2fb393 
a7ad16a4f86e 
36e69e7a3dcd 
95b0547dbf88 
cd90ee8a56cf 
Error response from daemon: You cannot remove a running container 79d354753c5d143c1b2bd95d1aa52ca48fa861e530da967c86b6537e52895647. Stop the container before attempting removal or force remove 
Error response from daemon: You cannot remove a running container 74f74b6b43e5dc02110aec24d38489412c4833e18e4ee860d8019bfcdae4aad8. Stop the container before attempting removal or force remove 
Error response from daemon: You cannot remove a running container 30f4881c1809d107a2bf717c45a780a9b04ae8cc9f40278723a74686ef3f72f2. Stop the container before attempting removal or force remove 
Error response from daemon: You cannot remove a running container 9a82cbc9374c73288bc0e3bc8205c3c1d298b4409483c38a8d2edcf7682100ec. Stop the container before attempting removal or force remove

#再次查看,发现确实有4个容器在运行,而且是全新运行的容器

[root@node1 ~]# docker ps -a 
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 
79d354753c5d 7d54289267dc "/usr/local/bin/kube…" 26 seconds ago Up 26 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_4 
74f74b6b43e5 404fc3ab6749 "/opt/bin/flanneld -…" 39 seconds ago Up 38 seconds k8s_kube-flannel_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1 
30f4881c1809 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 41 seconds ago Up 40 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_4 
9a82cbc9374c registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 42 seconds ago Up 41 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-4

#回到Master节点查看,发现故障节点已经恢复,且准备完毕

[root@master ~]# kubectl get nodes 
NAME STATUS ROLES AGE VERSION 
master Ready master 96m v1.17.0 
node1 Ready <none> 67m v1.17.0 
node2 Ready <none> 65m v1.17.0 

#kube-flannel-ds-9xwsr 故障的flannel也恢复了 

[root@master ~]# kubectl get pods -n kube-system 
NAME READY STATUS RESTARTS AGE 
coredns-9d85f5447-r2dx6 1/1 Running 0 96m 
coredns-9d85f5447-zskjc 1/1 Running 0 96m 
etcd-master 1/1 Running 0 96m 
kube-apiserver-master 1/1 Running 0 96m 
kube-controller-manager-master 1/1 Running 0 96m 
kube-flannel-ds-7bknh 1/1 Running 0 65m 
kube-flannel-ds-9xwsr 1/1 Running 1 67m 
kube-flannel-ds-tspl2 1/1 Running 0 76m 
kube-proxy-ggd7p 1/1 Running 4 67m 
kube-proxy-m8ljk 1/1 Running 0 96m 
kube-proxy-xrt7c 1/1 Running 0 65m 
kube-scheduler-master 1/1 Running 0 96m

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值