问题情况
[root@test-master-113 ~]# kubectl get nodes
NAME STATUS AGE VERSION
test-master-113 Ready,SchedulingDisabled,master 347d v1.7.6
test-slave-114 Ready 206d v1.7.6-custom
test-slave-115 NotReady 292d v1.7.6-custom
test-slave-116 Ready 164d v1.7.6-custom
test-slave-117 Ready 292d v1.7.6-custom
not ready 的节点上的pod的状况
kubectl get pods -n kube-system -owide | grep test-slave-115
kubectl-m77z1 1/1 NodeLost 1 24d 192.168.128.47 test-slave-115
kube-proxy-5h2gw 1/1 NodeLost 1 24d 10.39.0.115 test-slave-115
filebeat-lvk51 1/1 NodeLost 66 24d 192.168.128.24 test-slave-115
//其中calico的容器也起不来
有问题的节点kubelet的日志
[root@test-slave-115 ~]# journalctl -f -u kubelet
-- Logs begin at Fri 2018-05-04 13:21:34 CST. --
May 07 01:30:58 test-slave-115 kubelet[3318]: INFO:0507 01:30:50.951587 3318 fsHandler.go:131] du and find on following dirs took 4.330195473s: [ /var/lib/docker/containers/18d5af2341f7a236bf69aaef16f89928f3f75f11bb4e1940ee59ecc72892933b]
May 07 01:31:12 test-slave-115 kubelet[3318]: WARNING:0507 01:30:52.520159 3318 image_gc_manager.go:165] [imageGCManager] Failed to monitor images: rpc error: code = 4 desc = context deadline exceeded
May 07 01:32:00 test-slave-115 kubelet[3318]: INFO:0507 01:31:55.922308 3318 fsHandler.go:131] du and find on following dirs took 28.910714943s: [ /var/lib/docker/containers/5c3b452fd9335d725b04d6f298ccd01d4b3f591b3e30fc73032af9e2f97d4563]
May 07 01:32:08 test-slave-115 kubelet[3318]: INFO:0507 01:32:02.456758 3318 kubelet.go:1820] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 3h23m37.650989275s ago; threshold is 3m0s]
May 07 01:32:10 test-slave-115 kubelet[3318]: ERROR:0507 01:32:07.963481 3318 container_manager_linux.go:98] Unable to ensure the docker processes run in the desired containers
May 07 01:32:11 test-slave-115 kubelet[3318]: INFO:0507 01:32:03.185036 3318 fsHandler.go:131] du and find on following dirs took 1m12.441587857s: [ /var/lib/docker/containers/eece9416bc08782ab170ce1f7512435d8e39e092b49eed735c61e1885f20c878]
May 07 01:32:13 test-slave-115 kubelet[3318]: ERROR:0507 01:32:13.173524 3318 remote_runtime.go:411] Status from runtime service failed: rpc error: code = 4 desc = context deadline exceeded
May 07 01:32:16 test-slave-115 kubelet[3318]: INFO:0507 01:32:13.093159 3318 logs.go:41] http: TLS handshake error from 10.39.0.113:53854: write tcp 10.39.0.115:10250->10.39.0.113:53854: write: broken pipe
May 07 01:32:39 test-slave-115 kubelet[3318]: WARNING:0507 01:32:35.117411 3318 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/zabbix-agent.service": 0x40000100 == IN_CREATE|IN_ISDIR): open /sys/fs/cgroup/cpu,cpuacct/system.slice/zabbix-agent.service: no such file or directory
May 07 01:32:39 test-slave-115 kubelet[3318]: INFO:0507 01:32:38.982705 3318 logs.go:41] http: TLS handshake error from 10.39.0.113:54218: EOF
执行docker ps
或者其他docker命令返回都非常地慢
问题解决
//重启docker
systemctl restart docker
问题解决
[root@test-master-113 ~]# kubectl get pods --all-namespaces -owide | grep test-slave-115
kube-system calico-node-6fqgq 2/2 Running 22 24d 10.39.0.115 test-slave-115
kube-system filebeat-lvk51 1/1 Running 67 24d 192.168.128.24 test-slave-115
kube-system kube-proxy-5h2gw 1/1 Running 2 24d 10.39.0.115 test-slave-115
kube-system kubectl-m77z1 1/1 Running 2 24d 192.168.128.47 test-slave-115