1.问题描述
K8S集群无法创建成功服务,总是处于Pending 状态。
calico 网络状态也貌似正常。
删除重建calico网络,发现总是处于pending状态 ,同时节点也处于NotReady状态,不正常。
[root@K8S1 work]# kubectl get pod --all-namespaces |grep calico
kube-system calico-kube-controllers-554647c955-kqrbd 0/1 Pending 0 16s
kube-system calico-node-2n6xv 0/1 Pending 0 17s
kube-system calico-node-6sqw5 0/1 Pending 0 17s
kube-system calico-node-cnv72 0/1 Pending 0 17s
kube-system calico-node-n7gc4 0/1 Pending 0 17s
kube-system calico-node-shcx5 0/1 Pending 0 17s
kube-system calico-node-t6pk6 0/1 Pending 0 17s
kube-system calico-typha-6454f6cfd7-d2cp5 0/1 Pending 0 17s
[root@K8S1 work]# kubectl get nodes
--节点状态异常。
[root@K8S1 work]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s1 NotReady <none> 5d20h v1.24.3
k8s2 NotReady <none> 5d20h v1.24.3
k8s3 NotReady <none> 5d20h v1.24.3
k8s4 NotReady <none> 5d20h v1.24.3
k8s5 NotReady <none> 5d20h v1.24.3
k8s6 NotReady <none> 5d20h v1.24.3
2.kubelet未启动
[root@K8S1 work]# for node_all_ip in ${ALL_IPS[@]}
> do
> echo ">>> ${node_all_ip}"
> ssh root@${node_all_ip} "systemctl status kubelet|grep Active"
> done
>>> 192.168.1.12
Active: activating (auto-restart) (Result: exit-code) since Sun 2024-07-28 15:31:14 CST; 4s ago
>>> 192.168.1.13
Active: activating (auto-restart) (Result: exit-code) since Sun 2024-07-28 15:31:20 CST; 2s ago
>>> 192.168.1.14
Active: activating (auto-restart) (Result: exit-code) since Sun 2024-07-28 15:31:24 CST; 2s ago
>>> 192.168.1.15
Active: activating (auto-restart) (Result: exit-code) since Sun 2024-07-28 15:31:29 CST; 1s ago
>>> 192.168.1.16
Active: activating (auto-restart) (Result: exit-code) since Sun 2024-07-28 15:31:30 CST; 4s ago
>>> 192.168.1.17
Active: activating (auto-restart) (Result: exit-code) since Sun 2024-07-28 15:31:38 CST; 2s ago
重启kubelet;
[root@K8S1 work]# for node_all_ip in ${ALL_IPS[@]}
> do
> echo ">>> ${node_all_ip}"
> ssh root@${node_all_ip} "systemctl status kubelet|grep Active"
> done
>>> 192.168.1.12
Active: active (running) since Sun 2024-07-28 15:32:57 CST; 11s ago
>>> 192.168.1.13
Active: active (running) since Sun 2024-07-28 15:32:59 CST; 10s ago
>>> 192.168.1.14
Active: active (running) since Sun 2024-07-28 15:33:00 CST; 9s ago
>>> 192.168.1.15
Active: active (running) since Sun 2024-07-28 15:33:01 CST; 8s ago
>>> 192.168.1.16
Active: active (running) since Sun 2024-07-28 15:33:02 CST; 7s ago
>>> 192.168.1.17
Active: active (running) since Sun 2024-07-28 15:33:04 CST; 6s ago
3.重新部署calico网络插件
--再次检查发现calico 网络正常。
[root@K8S1 work]# kubectl get pods -A |grep calico
kube-system calico-kube-controllers-554647c955-kqrbd 1/1 Running 0 4m24s
kube-system calico-node-2n6xv 1/1 Running 0 4m25s
kube-system calico-node-6sqw5 1/1 Running 0 4m25s
kube-system calico-node-cnv72 1/1 Running 0 4m25s
kube-system calico-node-n7gc4 1/1 Running 0 4m25s
kube-system calico-node-shcx5 1/1 Running 0 4m25s
kube-system calico-node-t6pk6 1/1 Running 0 4m25s
kube-system calico-typha-6454f6cfd7-d2cp5 1/1 Running 0 4m25s
--节点状态也恢复异常。
[root@K8S1 work]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s1 Ready <none> 5d21h v1.24.3
k8s2 Ready <none> 5d21h v1.24.3
k8s3 Ready <none> 5d21h v1.24.3
k8s4 Ready <none> 5d21h v1.24.3
k8s5 Ready <none> 5d21h v1.24.3
k8s6 Ready <none> 5d21h v1.24.3
4.总结
由此可见:kubelet 服务失败,会导致Nodes节点异常,进而导致calico 网络异常。
最终造成所有的服务无法部署成功。