1. 问题:
两台节点都是Ready状态,node上 docker / kubelet / flanneld / kube-proxy 服务都运行正常,为什么pod只能被调度到一台上?
2. 现状:
master]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
192.168.89.133 Ready <none> 194d v1.16.2 192.168.89.133 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://1.13.1
192.168.89.134 Ready <none> 193d v1.16.2 192.168.89.134 <none> CentOS Linux 7 (Core) 3.10.0-1062.el7.x86_64 docker://1.13.1
两台机器都是Ready状态
master]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-dns-6685cc54bd-lb5v4 3/3 Running 0 46m 10.1.84.2 192.168.89.134 <none> <none>
但是所有的pod只调度到节点2
3. 根据现有问题思考原因
思考一:很久以前配置了节点1不可用
思考二:节点1更改过服务权限配置
思考三:master做过限制,禁止调度到节点1
4. 排错:
先看正常的节点2拥有哪些端口在节点1上没有的
节点2]# ss -anptu | less
udp UNCONN 0 0 192.168.122.1:53 *:* users:(("dnsmasq",pid=2083,fd=5))
udp UNCONN 0 0 *%virbr0:67 *:* users:(("dnsmasq",pid=2083,fd=3))
tcp LISTEN 0 5 192.168.122.1:53 *:* users:(("dnsmasq",pid=2083,fd=6))
发现 节点1上没有 dnsmasq,也就是53端口,也没有安装dnsmasq
节点1]# yum -y install dnsmasq
节点1]# dnsmasq
节点1]# ss -anptu | grep dnsmasq
udp UNCONN 0 0 *:53 *:* users:(("dnsmasq",pid=92515,fd=4))
udp UNCONN 0 0 [::]:53 [::]:* users:(("dnsmasq",pid=92515,fd=6))
tcp LISTEN 0 5 *:53 *:* users:(("dnsmasq",pid=92515,fd=5))
tcp LISTEN 0 5 [::]:53 [::]:* users:(("dnsmasq",pid=92515,fd=7))
master]# kubectl delete -f kubedns-controller.yaml
deployment.apps "kube-dns" deleted
master]# kubectl create -f kubedns-controller.yaml
deployment.apps/kube-dns created
master]# kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-dns-6685cc54bd-bqjsn 3/3 Running 0 4m22s 10.1.5.2 192.168.89.133 <none> <none>
发现可以重新调度到节点1上了,但是不确定这时候节点2是否也是正常的,现在多建几个pods验证下
重新安装dashboard
master]# kubectl create -f recommended.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created
master]# kubectl get pods -n kubernetes-dashboard -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
dashboard-metrics-scraper-5f4dc864c4-rt45p 1/1 Running 0 13s 10.1.84.2 192.168.89.134 <none> <none>
kubernetes-dashboard-687bd5c7d7-zrppg 1/1 Running 0 14s 10.1.5.3 192.168.89.133 <none> <none>
5. 总结
将错误的和正确的比较,不一定是核心配置,也有可能是周边应用引起的,我这节点1之前一直无法调度,没有注意,今天注意了下,原来是dns问题,没有这个dnsmasq做支撑,master可能就无法调度pod到这台机器了。