一、问题描述
环境
系统: centOS7
k8s 1 master, 2 node
IP | ||
---|---|---|
k8s | master | 176 |
k8s | node | 175 |
k8s | node | 176 |
kubelet、kube-proxy、kube-dns直接安装在linux系统上
问题
卸载监控时,一直一些组件未删除(正常是所有相关组件会被删除)
[root@docker176 docker]# kubectl -n kube-system get pod,svc
NAME READY STATUS RESTARTS AGE
po/calico-node-hmv3f 2/2 Running 0 12d
po/calico-node-qvk9m 2/2 NodeLost 4 19d
po/calico-policy-controller-2698340612-cbvh2 1/1 Running 0 23h
po/calico-policy-controller-2698340612-kgdkk 1/1 Unknown 2 19d
po/heapster-v1.3.0-3194101127-5q02n 2/2 Unknown 4 19d
二、原因
查看到 175节点 NotReady状态,不正常
[root@docker176 docker]# kubectl get nodes
NAME STATUS AGE VERSION
192.168.14.175 NotReady 19d v1.6.2
192.168.14.176 Ready 12d v1.6.2
查看问题pod所在节点信息 kubectl -n kube-system describe pod calico-policy-controller-2698340612-kgdkk,此pod部署在175这个问题节点上
[root@docker176 ~]# kubectl -n kube-system describe pod calico-policy-controller-2698340612-kgdkk
Name: calico-policy-controller-2698340612-kgdkk
Namespace: kube-system
Node: 192.168.14.175/192.168.14.175
Start Time: Fri, 22 Feb 2019 00:24:56 +0800
三、问题解决方法
1. 删除节点
kubectl delete node 192.168.14.175
[root@docker176 docker]# kubectl get nodes
NAME STATUS AGE VERSION
192.168.14.175 NotReady 19d v1.6.2
192.168.14.176 Ready 12d v1.6.2
[root@docker176 docker]# kubectl delete node 192.168.14.175
node "192.168.14.175" deleted
2. 重启 kubelet
[root@docker175 ~]# systemctl restart kubelet
查看是否正常启动注册,STATUS 为 Ready表示成功启动并注册到k8s的master中
[root@docker176 docker]# kubectl get nodes
NAME STATUS AGE VERSION
192.168.14.175 Ready 13s v1.6.2
192.168.14.176 Ready 12d v1.6.2
查看组件是否删除,heapster把所有组件删除了
[root@docker176 ~]# kubectl -n kube-system get pod,svc
NAME READY STATUS RESTARTS AGE
po/calico-node-hmv3f 2/2 Running 0 12d
po/calico-node-z6skb 2/2 Running 0 3m
po/calico-policy-controller-2698340612-cbvh2 1/1 Running 0 23h
po/kube-dns-3412393464-csgt3 3/3 Running 2 23h
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-dns 10.254.0.10 <none> 53/UDP,53/TCP 75d
svc/kubelet None <none> 10250/TCP 20d