vmware正常运行的的k8s集群,在部署的时候突然无法发布,pod始终处于 creatingContainer的状态,使用kubectl describe
命令查看后,报如下问题:
Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container
"d8a2283ac124556c9e0677b4c8523e65558f1cff2bffff058d4bf0bdfbda70a0" network for pod "canary-nginx01-5b97c5bd66-wmcq9": networkPlugin
cni failed to set up pod "canary-nginx01-5b97c5bd66-wmcq9_default" network: error getting ClusterInformation: connection is
unauthorized: Unauthorized, failed to clean up sandbox container "d8a2283ac124556c9e0677b4c8523e65558f1cff2bffff058d4bf0bdfbda70a0"
network for pod "canary-nginx01-5b97c5bd66-wmcq9": networkPlugin cni failed to teardown pod "canary-nginx01-5b97c5bd66-
wmcq9_default" network: error getting ClusterInformation: connection is unauthorized: Unauthorized]
猜测原因是因为vmware的虚拟机挂起导致的,尝试重启每个节点的kubelet,并没有效果
解决方法,因为是测试环境,所以先用kubectl get pod -A 获取所有的pod
[root@docker104 ~]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default canary-nginx01-5b97c5bd66-llt75 0/1 ContainerCreating 0 2m41s
default canary-nginx01-5b97c5bd66-wmcq9 0/1 ContainerCreating 0 2m48s
default canary-nginx02-66476666d9-692gm 0/1 ContainerCreating 0 2m27s
default canary-nginx02-66476666d9-r7td7 0/1 ContainerCreating 0 2m12s
kube-system calico-kube-controllers-77959b97b9-rm8kv 1/1 Running 3 9d
kube-system calico-node-8kf68 1/1 Running 1 9d
kube-system calico-node-cn9q4 1/1 Running 1 9d
kube-system calico-node-hkfrw 1/1 Running 1 9d
kube-system coredns-57d4cbf879-6kf56 1/1 Running 1 9d
kube-system coredns-57d4cbf879-wnksq 1/1 Running 1 9d
kube-system etcd-docker104 1/1 Running 1 9d
kube-system kube-apiserver-docker104 1/1 Running 1 9d
kube-system kube-controller-manager-docker104 1/1 Running 10 9d
kube-system kube-proxy-9cqct 1/1 Running 1 9d
kube-system kube-proxy-qm4n5 1/1 Running 1 9d
kube-system kube-proxy-whkp6 1/1 Running 1 9d
kube-system kube-scheduler-docker104 1/1 Running 7 9d
kube-system metrics-server-5794ccf74d-pxvp2 1/1 Running 0 19h
kubernetes-dashboard dashboard-metrics-scraper-5594697f48-jbk49 1/1 Running 1 36h
kubernetes-dashboard kubernetes-dashboard-5c785c8bcf-72rnp 1/1 Running 1 36h
然后感觉哪个有问题就删哪个,先从kube-proxy开始删,再删 calico网络插件相关的,因为k8s有pod自动恢复机制,删除这些pod以后,会自动重建pod,要注意的是这些pod都在 kube-system
的命名空间下,所以要注意删除的时候也要加 -n kube-system
来指定命名空间,否则会报不存在的错误。
kubectl delete pod calico-kube-controllers-77959b97b9-rm8kv -n kube-system
kubectl delete pod calico-node-8kf68 -n kube-system
kubectl delete pod calico-node-cn9q4 -n kube-system
kubectl delete pod calico-node-hkfrw -n kube-system
kubectl delete pod kube-proxy-9cqct kube-proxy-qm4n5 kube-proxy-whkp6 -n kube-system
删了一遍就好用了