rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4196772 vs. 4194304)
现象:kubernetes集群不可用,所有work节点离线
问题定位:
执行kubectl get node 发现work节点都是NotReady状态
登入到work节点查看日志发现
Nov 1 10:32:34 izwz9a75ak59utsbrrj9crz kubelet: E1101 10:32:34.119157 1669 kuberuntime_container.go:323] getKubeletContainers failed: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4196772 vs. 4194304)
Nov 1 10:32:34 izwz9a75ak59utsbrrj9crz kubelet: E1101 10:32:34.119174 1669 generic.go:197] GenericPLEG: Unable to retrieve pods: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4196772 vs. 4194304)
发现/var/lib/docker/containers 下有上万个容器文件
[root@k8s-master-2 ~]# docker ps -a|wc -l
10180
该问题是有k8s的一个bug https://github.com/kubernetes/kubernetes/issues/63858
解决方法:
登入到work节点清除不用的容器残留
docker system prune
重启docker和kubelet
systemctl restart docker&&systemctl restart kubelet
k8s强制删除pods:
kubectl delete pods cloudagile-mariadb-0 -n intelligence-data-lab --grace-period=0 --force
转自:https://blog.csdn.net/u011181610/article/details/83623286