K8s节点notReady状态解决
挂掉原因:我想要通过externalIP来发布一个service,同时要删除旧的pod,删除命令执行后,节点就不可用了。
错误操作复现
- 创建externalIP类型的service
- 将已有的deployments/demo的节点置为0(这一步有大问题)
- 删除已有的pod节点(直接卡死,之后node节点全部断掉了)
[root@k8s-master01 ~]# kubectl apply -f services-externalip-demo.yaml
service/demo-externalip-service created
[root@k8s-master01 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo-externalip-service ClusterIP 10.97.55.241 192.168.15.154 80/TCP 4s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 128d
[root@k8s-master01 ~]# kubectl describe po demo-6666947f9f-ggd4h
Name: demo-6666947f9f-ggd4h
Namespace: default
Priority: 0
Node: k8s-node01/192.168.15.153
Start Time: Fri, 02 Apr 2021 21:44:42 +0800
Labels: app=demo
pod-template-hash=6666947f9f
Annotations: <none>
Status: Running
IP: 10.244.3.9
Controlled By: ReplicaSet/demo-6666947f9f
Containers:
nginx:
Container ID: docker://b18e31dcb300803ce7e611eaecfa45a70b857b403e2dd3ff92db5a341e3306bb
Image: nginx:latest
Image ID: docker-pullable://nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 02 Apr 2021 21:45:22 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-srskw (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-srskw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-srskw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5h (x13 over 5h15m) default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
Warning FailedScheduling 19m default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
Normal Scheduled 19m default-scheduler Successfully assigned default/demo-6666947f9f-ggd4h to k8s-node01
Normal Pulling <invalid> kubelet, k8s-node01 Pulling image "nginx:latest"
Normal Pulled <invalid> kubelet, k8s-node01 Successfully pulled image "nginx:latest"
Normal Created <invalid> kubelet, k8s-node01 Created container nginx
Normal Started <invalid> kubelet, k8s-node01 Started container nginx
[root@k8s-master01 ~]# kubectl describe -f services-externalip-demo.yaml
Name: demo-externalip-service
Namespace: default
Labels: app=demo-service
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"demo-service"},"name":"demo-externalip-service","namespa...
Selector: app=demo
Type: ClusterIP
IP: 10.97.55.241
External IPs: 192.168.15.154
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints:
Session Affinity: None
Events: <none>
[root@k8s-master01 ~]# kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
demo 0/1 1 0 5h16m
[root@k8s-master01 ~]# kubectl describe -f services-externalip-demo.yaml
Name: demo-externalip-service
Namespace: default
Labels: app=demo-service
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"demo-service"},"name":"demo-externalip-service","namespa...
Selector: app=demo
Type: ClusterIP
IP: 10.97.55.241
External IPs: 192.168.15.154
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints:
Session Affinity: None
Events: <none>
[root@k8s-master01 ~]# kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
demo 0/1 1 0 5h16m
[root@k8s-master01 ~]# kubectl scale deployments/demo --replicas=0
deployment.extensions/demo scaled
[root@k8s-master01 ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
demo-6666947f9f-ggd4h 1/1 Terminating 0 5h16m
[root@k8s-master01 ~]# kubectl delete po demo-6666947f9f-ggd4h
pod "demo-6666947f9f-ggd4h" deleted
^C
[root@k8s-master01 ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
demo-6666947f9f-ggd4h 1/1 Terminating 0 5h19m
[root@k8s-master01 ~]# kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
demo 0/0 0 0 5h19m
[root@k8s-master01 ~]# kubectl create deploy demo --image=nginx:latest
Error from server (AlreadyExists): deployments.apps "demo" already exists
[root@k8s-master01 ~]# kubectl scale deployments/demo --replicas=3
deployment.extensions/demo scaled
[root@k8s-master01 ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
demo-6666947f9f-ggd4h 1/1 Terminating 0 5h20m
demo-6666947f9f-m42q2 0/1 Pending 0 3s
demo-6666947f9f-t58r7 0/1 Pending 0 3s
demo-6666947f9f-xcjzs 0/1 Pending 0 3s
[root@k8s-master01 ~]# kubectl describe po demo-6666947f9f-m42q2
Name: demo-6666947f9f-m42q2
Namespace: default
Priority: 0
Node: <none>
Labels: app=demo
pod-template-hash=6666947f9f
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/demo-6666947f9f
Containers:
nginx:
Image: nginx:latest
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-srskw (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-srskw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-srskw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16s (x2 over 16s) default-scheduler 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 128d v1.15.1
k8s-node01 NotReady <none> 123d v1.15.1
k8s-node02 NotReady <none> 123d v1.15.1
解决方式1-尝试重启所有服务器(失败)
[root@k8s-master01 ~]# poweroff
连接断开
连接成功
Last login: Thu Apr 1 21:32:56 2021 from 192.168.15.1
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 128d v1.15.1
k8s-node01 NotReady <none> 123d v1.15.1
k8s-node02 NotReady <none> 123d v1.15.1
[root@k8s-master01 ~]# kubectl describe nodes k8s-node01
Name: k8s-node01
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8s-node01
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"96:6b:71:dc:33:3d"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.15.153
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 29 Nov 2020 00:31:58 +0800
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure Unknown Fri, 02 Apr 2021 22:12:36 +0800 Thu, 01 Apr 2021 22:13:25 +0800 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Fri, 02 Apr 2021 22:12:36 +0800 Thu, 01 Apr 2021 22:13:25 +0800 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Fri, 02 Apr 2021 22:12:36 +0800 Thu, 01 Apr 2021 22:13:25 +0800 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Fri, 02 Apr 2021 22:12:36 +0800 Thu, 01 Apr 2021 22:13:25 +0800 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 192.168.15.153
Hostname: k8s-node01
Capacity:
cpu: 2
ephemeral-storage: 100610052Ki
hugepages-2Mi: 0
memory: 4028688Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 92722223770
hugepages-2Mi: 0
memory: 3926288Ki
pods: 110
System Info:
Machine ID: 87ff2ee4182e421680e90c865344076c
System UUID: 48D64D56-CD82-7CD9-7265-00C117529BB5
Boot ID: 9059ab9f-2607-4fe1-880d-f5ccb9f8e784
Kernel Version: 4.4.244-1.el7.elrepo.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.13
Kubelet Version: v1.15.1
Kube-Proxy Version: v1.15.1
PodCIDR: 10.244.3.0/24
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default demo-6666947f9f-m42q2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9m56s
default demo-6666947f9f-t58r7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9m56s
default demo-6666947f9f-xcjzs 0 (0%) 0 (0%) 0 (0%) 0 (0%) 9m56s
kube-system kube-flannel-ds-rk4t5 100m (5%) 100m (5%) 50Mi (1%) 50Mi (1%) 123d
kube-system kube-proxy-fbtln 0 (0%) 0 (0%) 0 (0%) 0 (0%) 123d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (5%) 100m (5%)
memory 50Mi (1%) 50Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting <invalid> kubelet, k8s-node01 Starting kubelet.
Normal NodeAllocatableEnforced <invalid> kubelet, k8s-node01 Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory <invalid> (x2 over <invalid>) kubelet, k8s-node01 Node k8s-node01 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure <invalid> (x2 over <invalid>) kubelet, k8s-node01 Node k8s-node01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID <invalid> (x2 over <invalid>) kubelet, k8s-node01 Node k8s-node01 status is now: NodeHasSufficientPID
Warning Rebooted <invalid> kubelet, k8s-node01 Node k8s-node01 has been rebooted, boot id: 36b21c39-94a9-437f-bf8a-191eb5181dbe
Normal NodeReady <invalid> kubelet, k8s-node01 Node k8s-node01 status is now: NodeReady
Normal Starting <invalid> kube-proxy, k8s-node01 Starting kube-proxy.
Normal Starting <invalid> kubelet, k8s-node01 Starting kubelet.
Normal NodeAllocatableEnforced <invalid> kubelet, k8s-node01 Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory <invalid> (x2 over <invalid>) kubelet, k8s-node01 Node k8s-node01 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure <invalid> (x2 over <invalid>) kubelet, k8s-node01 Node k8s-node01 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID <invalid> (x2 over <invalid>) kubelet, k8s-node01 Node k8s-node01 status is now: NodeHasSufficientPID
Warning Rebooted <invalid> kubelet, k8s-node01 Node k8s-node01 has been rebooted, boot id: 9059ab9f-2607-4fe1-880d-f5ccb9f8e784
Normal NodeReady <invalid> kubelet, k8s-node01 Node k8s-node01 status is now: NodeReady
Normal Starting <invalid> kube-proxy, k8s-node01 Starting kube-proxy.
[root@k8s-master01 ~]# kubectl describe nodes k8s-node02
Name: k8s-node02
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=k8s-node02
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"fe:73:ec:96:1c:45"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.15.152
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 28 Nov 2020 23:23:32 +0800
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure Unknown Fri, 02 Apr 2021 22:13:53 +0800 Thu, 01 Apr 2021 22:14:35 +0800 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Fri, 02 Apr 2021 22:13:53 +0800 Thu, 01 Apr 2021 22:14:35 +0800 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Fri, 02 Apr 2021 22:13:53 +0800 Thu, 01 Apr 2021 22:14:35 +0800 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Fri, 02 Apr 2021 22:13:53 +0800 Thu, 01 Apr 2021 22:14:35 +0800 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 192.168.15.152
Hostname: k8s-node02
Capacity:
cpu: 2
ephemeral-storage: 100610052Ki
hugepages-2Mi: 0
memory: 4028688Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 92722223770
hugepages-2Mi: 0
memory: 3926288Ki
pods: 110
System Info:
Machine ID: cab995456bd34aab927d7b5cb22daf5c
System UUID: 29144D56-00D2-A845-03B6-DBC78819D1F1
Boot ID: 0c9f8d85-10ad-4d7f-8c80-e752a420a7e5
Kernel Version: 4.4.244-1.el7.elrepo.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.13
Kubelet Version: v1.15.1
Kube-Proxy Version: v1.15.1
PodCIDR: 10.244.2.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system kube-flannel-ds-lpdbl 100m (5%) 100m (5%) 50Mi (1%) 50Mi (1%) 123d
kube-system kube-proxy-tkwvb 0 (0%) 0 (0%) 0 (0%) 0 (0%) 123d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (5%) 100m (5%)
memory 50Mi (1%) 50Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeAllocatableEnforced <invalid> kubelet, k8s-node02 Updated Node Allocatable limit across pods
Normal Starting <invalid> kubelet, k8s-node02 Starting kubelet.
Warning Rebooted <invalid> kubelet, k8s-node02 Node k8s-node02 has been rebooted, boot id: 0768fc77-bb23-4fe9-ae2e-ee987dc16b54
Normal NodeReady <invalid> kubelet, k8s-node02 Node k8s-node02 status is now: NodeReady
Normal NodeHasNoDiskPressure <invalid> (x2 over <invalid>) kubelet, k8s-node02 Node k8s-node02 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID <invalid> (x2 over <invalid>) kubelet, k8s-node02 Node k8s-node02 status is now: NodeHasSufficientPID
Normal NodeHasSufficientMemory <invalid> (x2 over <invalid>) kubelet, k8s-node02 Node k8s-node02 status is now: NodeHasSufficientMemory
Normal Starting <invalid> kube-proxy, k8s-node02 Starting kube-proxy.
Normal NodeAllocatableEnforced <invalid> kubelet, k8s-node02 Updated Node Allocatable limit across pods
Normal Starting <invalid> kubelet, k8s-node02 Starting kubelet.
Normal NodeHasSufficientMemory <invalid> (x2 over <invalid>) kubelet, k8s-node02 Node k8s-node02 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure <invalid> (x2 over <invalid>) kubelet, k8s-node02 Node k8s-node02 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID <invalid> (x2 over <invalid>) kubelet, k8s-node02 Node k8s-node02 status is now: NodeHasSufficientPID
Warning Rebooted <invalid> kubelet, k8s-node02 Node k8s-node02 has been rebooted, boot id: 0c9f8d85-10ad-4d7f-8c80-e752a420a7e5
Normal NodeReady <invalid> kubelet, k8s-node02 Node k8s-node02 status is now: NodeReady
Normal Starting <invalid> kube-proxy, k8s-node02 Starting kube-proxy.
解决方式2-删除externalIP的service并重启
又怀疑是因为service创建的有问题
[root@k8s-master01 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
demo-externalip-service ClusterIP 10.97.55.241 192.168.15.154 80/TCP 50m
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 129d
[root@k8s-master01 ~]# kubectl delete -f services-externalip-demo.yaml
service "demo-externalip-service" deleted
[root@k8s-master01 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 129d
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 129d v1.15.1
k8s-node01 NotReady <none> 123d v1.15.1
k8s-node02 NotReady <none> 123d v1.15.1
[root@k8s-master01 ~]# systemctl restart kubelet
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 129d v1.15.1
k8s-node01 NotReady <none> 123d v1.15.1
k8s-node02 NotReady <none> 123d v1.15.1
又在从节点上看kubelet的日志,发现联不通master节点的6443端口
4月 02 22:59:06 k8s-node02 kubelet[46204]: E0402 22:59:06.527365 46204 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://192.168.15.154:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.15.154:6443: connect: connection refused
看master节点的6443端口,服务是启动着的;
[root@k8s-master01 ~]# netstat -lntup|grep 6443
tcp6 0 0 :::6443 :::* LISTEN 2017/kube-apiserver
又把目标转向kube-system命名空间下的pod
[root@k8s-master01 ~]# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-gskbg 1/1 Running 14 129d
coredns-5c98db65d4-kgrls 1/1 Running 14 129d
etcd-k8s-master01 1/1 Running 14 129d
kube-apiserver-k8s-master01 1/1 Running 18 129d
kube-controller-manager-k8s-master01 1/1 Running 22 129d
kube-flannel-ds-25ch5 1/1 Running 8 128d
kube-flannel-ds-lpdbl 0/1 Error 2 123d
kube-flannel-ds-rk4t5 0/1 Error 2 123d
kube-proxy-6ksnl 1/1 Running 14 129d
kube-proxy-fbtln 0/1 Error 2 123d
kube-proxy-tkwvb 0/1 Error 2 123d
kube-scheduler-k8s-master01 1/1 Running 26 129d
看启动flannel的Error状态的pod的api信息,看不出什么原因
Normal SandboxChanged <invalid> kubelet, k8s-node02 Pod sandbox changed, it will be killed and re-created.
Normal Pulled <invalid> kubelet, k8s-node02 Container image "jmgao1983/flannel:latest" already present on machine
Normal Created <invalid> kubelet, k8s-node02 Created container install-cni
Normal Started <invalid> kubelet, k8s-node02 Started container install-cni
Normal Pulled <invalid> kubelet, k8s-node02 Container image "jmgao1983/flannel:latest" already present on machine
Normal Created <invalid> kubelet, k8s-node02 Created container kube-flannel
Normal Started <invalid> kubelet, k8s-node02 Started container kube-flannel
Normal SandboxChanged <invalid> kubelet, k8s-node02 Pod sandbox changed, it will be killed and re-created.
Normal Pulled <invalid> kubelet, k8s-node02 Container image "jmgao1983/flannel:latest" already present on machine
Normal Created <invalid> kubelet, k8s-node02 Created container install-cni
看pod的日志,发现说是从节点的10250端口访问不了,发现从节点的10250端口是有服务启动的
[root@k8s-master01 ~]# kubectl logs -f kube-flannel-ds-lpdbl -n kube-system
Error from server: Get https://192.168.15.152:10250/containerLogs/kube-system/kube-flannel-ds-lpdbl/kube-flannel?follow=true: dial tcp 192.168.15.152:10250: connect: no route to host
[root@k8s-master01 ~]# kubectl logs -f kube-proxy-fbtln -n kube-system
Error from server: Get https://192.168.15.153:10250/containerLogs/kube-system/kube-proxy-fbtln/kube-proxy?follow=true: dial tcp 192.168.15.153:10250: connect: no route to host
[root@k8s-master01 ~]# netstat -lntup|grep 10250
tcp6 0 0 :::10250 :::* LISTEN 70516/kubelet
从节点10250服务
[root@k8s-node01 ~]# netstat -lntup|grep 10250
tcp6 0 0 :::10250 :::* LISTEN 46643/kubelet
[root@k8s-node02 ~]# netstat -lntup|grep 10250
tcp6 0 0 :::10250 :::* LISTEN 46204/kubelet
最终把信息看向了connect: no route to host
从节点ping主节点
[root@k8s-node01 ~]# ping 192.168.15.154
PING 192.168.15.154 (192.168.15.154) 56(84) bytes of data.
64 bytes from 192.168.15.154: icmp_seq=1 ttl=64 time=0.080 ms
64 bytes from 192.168.15.154: icmp_seq=2 ttl=64 time=0.041 ms
64 bytes from 192.168.15.154: icmp_seq=3 ttl=64 time=0.064 ms
[root@k8s-node02 ~]# ping 192.168.15.154
PING 192.168.15.154 (192.168.15.154) 56(84) bytes of data.
64 bytes from 192.168.15.154: icmp_seq=1 ttl=64 time=0.047 ms
64 bytes from 192.168.15.154: icmp_seq=2 ttl=64 time=0.040 ms
主节点访问从节点
[root@k8s-master01 ~]# telnet 192.168.15.153 10250
Trying 192.168.15.153...
telnet: connect to address 192.168.15.153: No route to host
[root@k8s-master01 ~]# ping 192.168.15.153
PING 192.168.15.153 (192.168.15.153) 56(84) bytes of data.
From 192.168.15.154 icmp_seq=1 Destination Host Unreachable
From 192.168.15.154 icmp_seq=2 Destination Host Unreachable
From 192.168.15.154 icmp_seq=3 Destination Host Unreachable
又查看三台机器的网络配置信息,发现没问题(因为之前一直可以用),防火墙也是关闭的,路由信息也没什么异常
[root@k8s-master01 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33
[root@k8s-master01 ~]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: man:firewalld(1)
[root@k8s-master01 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.15.2 0.0.0.0 UG 100 0 0 ens33
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.2.0 10.244.2.0 255.255.255.0 UG 0 0 0 flannel.1
10.244.3.0 10.244.3.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.15.0 0.0.0.0 255.255.255.0 U 100 0 0 ens33
最终结论是:主节点服务器ping不通从节点,从节点可以ping主节点。
重启主节点服务器和一台从节点,发现就可以ping通了,从节点也自动注册上来了
[root@k8s-master01 ~]# ping www.baidu.com
PING www.a.shifen.com (14.215.177.38) 56(84) bytes of data.
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=1 ttl=128 time=33.2 ms
64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=2 ttl=128 time=34.1 ms
^C
--- www.a.shifen.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 33.253/33.701/34.149/0.448 ms
[root@k8s-master01 ~]# ping 192.168.15.153
PING 192.168.15.153 (192.168.15.153) 56(84) bytes of data.
64 bytes from 192.168.15.153: icmp_seq=1 ttl=64 time=0.351 ms
64 bytes from 192.168.15.153: icmp_seq=2 ttl=64 time=0.309 ms
^C
--- 192.168.15.153 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.309/0.330/0.351/0.021 ms
[root@k8s-master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 129d v1.15.1
k8s-node01 Ready <none> 123d v1.15.1
总结:上面第一次重启没有恢复可能是没有完全重启成功,需要观察kube-system下的服务是否都启动成功,如果有没启动成功的,可以使用kubectl logs -f
来看日志,再分析问题出现在什么地方。一般kube-proxy出问题,就是网络的连通性问题。