Kubernetes集群节点NotReady问题排查

问题

Kubernetes集群节点NotReady
在这里插入图片描述

先排查swap是否关闭

free -h

关闭swap

swapoff -a 
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

从master节点describe故障节点状态

[root@wl-master /home/ubuntu]# kubectl describe node worker02-wl-2
Name:               worker02-wl-2
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=worker02-wl-2
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"c6:4d:65:5c:87:3a"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.10.10.209
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 25 Oct 2022 06:22:18 +0000
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  worker02-wl-2
  AcquireTime:     <unset>
  RenewTime:       Thu, 03 Nov 2022 02:38:51 +0000
Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Tue, 25 Oct 2022 06:24:16 +0000   Tue, 25 Oct 2022 06:24:16 +0000   FlannelIsUp         Flannel is running on this node
  MemoryPressure       Unknown   Thu, 03 Nov 2022 02:38:36 +0000   Thu, 03 Nov 2022 02:39:32 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Thu, 03 Nov 2022 02:38:36 +0000   Thu, 03 Nov 2022 02:39:32 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Thu, 03 Nov 2022 02:38:36 +0000   Thu, 03 Nov 2022 02:39:32 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Thu, 03 Nov 2022 02:38:36 +0000   Thu, 03 Nov 2022 02:39:32 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
Addresses:
  InternalIP:  10.10.10.209
  Hostname:    worker02-wl-2
Capacity:
  cpu:                4
  ephemeral-storage:  40458684Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             4038644Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  37286723113
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3936244Ki
  pods:               110
System Info:
  Machine ID:                 cd1b7061a9d545dd8219916c9737143b
  System UUID:                CD1B7061-A9D5-45DD-8219-916C9737143B
  Boot ID:                    114cf509-6163-490d-9240-3e7246d0d8b7
  Kernel Version:             4.15.0-194-generic
  OS Image:                   Ubuntu 18.04.6 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://18.6.1
  Kubelet Version:            v1.19.2
  Kube-Proxy Version:         v1.19.2
PodCIDR:                      10.244.3.0/24
PodCIDRs:                     10.244.3.0/24
Non-terminated Pods:          (4 in total)
  Namespace                   Name                           CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                           ------------  ----------  ---------------  -------------  ---
  istio-system                prometheus-69f7f4d689-cllns    0 (0%)        0 (0%)      0 (0%)           0 (0%)         3h45m
  kube-flannel                kube-flannel-ds-lwztz          100m (2%)     100m (2%)   50Mi (1%)        50Mi (1%)      9d
  kube-system                 kube-flannel-ds-p2w5q          100m (2%)     100m (2%)   50Mi (1%)        50Mi (1%)      9d
  kube-system                 kube-proxy-49mvl               0 (0%)        0 (0%)      0 (0%)           0 (0%)         9d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                200m (5%)   200m (5%)
  memory             100Mi (2%)  100Mi (2%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:              <none>

可以看到condition部分显示的为UnknownKubelet stopped posting node status. 大致的意思是 Kubelet 停止发送 node 状态了。正常情况下显示如下:

[root@wl-master /home/ubuntu]# kubectl describe node worker02-wl-1
Name:               worker02-wl-1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=worker02-wl-1
                    kubernetes.io/os=linux
Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"c6:1c:70:48:b0:cc"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 10.10.10.229
                    kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 19 Oct 2022 14:09:42 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  worker02-wl-1
  AcquireTime:     <unset>
  RenewTime:       Thu, 03 Nov 2022 06:25:15 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Mon, 24 Oct 2022 06:17:58 +0000   Mon, 24 Oct 2022 06:17:58 +0000   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Thu, 03 Nov 2022 06:21:06 +0000   Wed, 19 Oct 2022 14:09:42 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 03 Nov 2022 06:21:06 +0000   Wed, 19 Oct 2022 14:09:42 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 03 Nov 2022 06:21:06 +0000   Wed, 19 Oct 2022 14:09:42 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 03 Nov 2022 06:21:06 +0000   Wed, 19 Oct 2022 14:09:43 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.10.10.229
  Hostname:    worker02-wl-1
Capacity:
  cpu:                4
  ephemeral-storage:  40470732Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             4038632Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  37297826550
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3936232Ki
  pods:               110
System Info:
  Machine ID:                 0d5a0ccf23de42b899ac201e08ceb571
  System UUID:                0D5A0CCF-23DE-42B8-99AC-201E08CEB571
  Boot ID:                    fb332d05-2f0d-4668-967c-27ae026644fe
  Kernel Version:             4.15.0-169-generic
  OS Image:                   Ubuntu 18.04.6 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://18.6.1
  Kubelet Version:            v1.19.2
  Kube-Proxy Version:         v1.19.2
PodCIDR:                      10.244.2.0/24
PodCIDRs:                     10.244.2.0/24
Non-terminated Pods:          (14 in total)
  Namespace                   Name                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                    ------------  ----------  ---------------  -------------  ---
  default                     details-v1-79f774bdb9-m498h             10m (0%)      2 (50%)     40Mi (1%)        1Gi (26%)      9d
  default                     productpage-v1-6b746f74dc-q5p5q         10m (0%)      2 (50%)     40Mi (1%)        1Gi (26%)      9d
  default                     ratings-v1-b6994bb9-zcqlt               10m (0%)      2 (50%)     40Mi (1%)        1Gi (26%)      9d
  default                     reviews-v1-545db77b95-mvncw             10m (0%)      2 (50%)     40Mi (1%)        1Gi (26%)      9d
  default                     reviews-v2-7bf8c9648f-g69rm             10m (0%)      2 (50%)     40Mi (1%)        1Gi (26%)      9d
  default                     reviews-v3-84779c7bbc-8fq88             10m (0%)      2 (50%)     40Mi (1%)        1Gi (26%)      9d
  istio-operator              istio-operator-99f9c574d-cjt2m          50m (1%)      200m (5%)   128Mi (3%)       256Mi (6%)     9d
  istio-system                istio-egressgateway-757584858-kgfzq     10m (0%)      2 (50%)     40Mi (1%)        1Gi (26%)      10d
  istio-system                istio-ingressgateway-cf4f68b6d-h4p2q    10m (0%)      2 (50%)     40Mi (1%)        1Gi (26%)      9d
  istio-system                istiod-6c5f5698d-4pn4f                  10m (0%)      0 (0%)      100Mi (2%)       0 (0%)         10d
  kube-flannel                kube-flannel-ds-6bkj5                   100m (2%)     100m (2%)   50Mi (1%)        50Mi (1%)      14d
  kube-system                 coredns-6c76c8bb89-8hmxk                100m (2%)     0 (0%)      70Mi (1%)        170Mi (4%)     9d
  kube-system                 kube-flannel-ds-lwmn4                   100m (2%)     100m (2%)   50Mi (1%)        50Mi (1%)      10d
  kube-system                 kube-proxy-r46s6                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         14d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                440m (11%)   16400m (409%)
  memory             718Mi (18%)  8718Mi (226%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-1Gi      0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:              <none>

查看下 Kubelet 是否在正常运行,在子节点使用命令:systemctl status kubelet,如果状态为 Failed,那么是需要重启:需要重启docker:sudo systemctl restart docker;需要重启kubelet:sudo systemctl restart kubelet。但如果是正常运行,请继续向下看。

查看NotReady节点上的pod状态

kubectl get pod --all-namespaces -owide |grep worker02-wl-2

在这里插入图片描述
可以看到网络flannel有问题。

查看错误pod情况

[root@wl-master /home/ubuntu]# kubectl describe pod kube-flannel-ds-lwztz -n kube-flannel
Name:                 kube-flannel-ds-lwztz
Namespace:            kube-flannel
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 worker02-wl-2/10.10.10.209
Start Time:           Tue, 25 Oct 2022 06:22:19 +0000
Labels:               app=flannel
                      controller-revision-hash=745f596757
                      pod-template-generation=1
                      tier=node
Annotations:          <none>
Status:               Running
IP:                   10.10.10.209
IPs:
  IP:           10.10.10.209
Controlled By:  DaemonSet/kube-flannel-ds
Init Containers:
  install-cni-plugin:
    Container ID:  docker://060253344ea958620357439a1c1fa1c021f27f47ef832b9c68d298be93320e42
    Image:         docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
    Image ID:      docker-pullable://rancher/mirrored-flannelcni-flannel-cni-plugin@sha256:28d3a6be9f450282bf42e4dad143d41da23e3d91f66f19c01ee7fd21fd17cb2b
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /flannel
      /opt/cni/bin/flannel
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 25 Oct 2022 06:24:35 +0000
      Finished:     Tue, 25 Oct 2022 06:24:35 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /opt/cni/bin from cni-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-2bnmk (ro)
  install-cni:
    Container ID:  docker://d23820d6f05f19411581a837a74a0b2634923c200704e74541fdd7cf7ca04586
    Image:         docker.io/rancher/mirrored-flannelcni-flannel:v0.20.0
    Image ID:      docker-pullable://rancher/mirrored-flannelcni-flannel@sha256:24e693e10c53c9d5dd78196f77cd5328d6b9d90aff203a37de07d3b040dc938d
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
    Args:
      -f
      /etc/kube-flannel/cni-conf.json
      /etc/cni/net.d/10-flannel.conflist
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 25 Oct 2022 06:25:13 +0000
      Finished:     Tue, 25 Oct 2022 06:25:13 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/cni/net.d from cni (rw)
      /etc/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-2bnmk (ro)
Containers:
  kube-flannel:
    Container ID:  docker://cc05c1d6adc620ed6c05697380977acc14cff86b885b22bde51d928ba69b4e0c
    Image:         docker.io/rancher/mirrored-flannelcni-flannel:v0.20.0
    Image ID:      docker-pullable://rancher/mirrored-flannelcni-flannel@sha256:24e693e10c53c9d5dd78196f77cd5328d6b9d90aff203a37de07d3b040dc938d
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/bin/flanneld
    Args:
      --ip-masq
      --kube-subnet-mgr
    State:          Waiting
      Reason:       CreateContainerConfigError
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      input/output error
      Exit Code:    128
      Started:      Wed, 26 Oct 2022 12:18:23 +0000
      Finished:     Wed, 26 Oct 2022 12:18:23 +0000
    Ready:          False
    Restart Count:  353
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:           kube-flannel-ds-lwztz (v1:metadata.name)
      POD_NAMESPACE:      kube-flannel (v1:metadata.namespace)
      EVENT_QUEUE_DEPTH:  5000
    Mounts:
      /etc/kube-flannel/ from flannel-cfg (rw)
      /run/flannel from run (rw)
      /run/xtables.lock from xtables-lock (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flannel-token-2bnmk (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  run:
    Type:          HostPath (bare host directory volume)
    Path:          /run/flannel
    HostPathType:  
  cni-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  
  cni:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  flannel-token-2bnmk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  flannel-token-2bnmk
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     :NoScheduleop=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:          <none>

实在不行,重新join。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
当遇到k8snode节点notready问题时,可以按照以下步骤进行排查和解决: 1. 检查节点状态:使用命令 `kubectl get nodes` 检查节点的状态,确保节点处于 `Ready` 状态。如果节点状态为 `NotReady`,则表示存在问题。 2. 检查节点事件:使用命令 `kubectl describe node <node-name>` 查看节点的事件,以了解是否有任何故障或异常情况。 3. 检查kubelet日志:使用命令 `journalctl -u kubelet -n 100` 查看kubelet的日志,以查找任何与节点notready相关的错误或警告信息。 4. 检查容器运行时日志:如果使用的是Docker作为容器运行时,可以使用命令 `journalctl -u docker -n 100` 查看Docker的日志。如果使用的是其他容器运行时,可以查找相应的日志文件。 5. 检查网络配置:确保节点能够与其他节点和控制平面正常通信。检查网络配置是否正确,并确保防火墙规则没有阻止必要的流量。 6. 检查资源使用情况:检查节点的资源使用情况,例如CPU、内存、存储等。确保节点上的资源充足以正常运行Pod。 7. 检查配置文件:检查节点的配置文件,例如kubelet配置文件、节点标签等。确保配置文件没有错误,并且节点的配置与集群的要求一致。 8. 重启kubelet服务:尝试重启kubelet服务,可以使用命令 `sudo systemctl restart kubelet`。重启后,观察节点状态是否变为Ready。 9. 联系硬件供应商:如果怀疑节点故障,例如硬件故障或操作系统崩溃,可以联系硬件供应商寻求支持。 10. 检查其他组件:如果以上步骤都没有解决问题,可以检查其他与节点相关的组件,例如网络插件、存储插件等。 在排查问题时,可以结合使用多个命令和工具,以获取更全面的信息和诊断结果。根据具体的情况,可能需要进一步查找相关文档或寻求社区的帮助来解决问题
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值