【NoReady】Kubernetes集群新添加node节点出错排查记录

最新推荐文章于 2024-11-17 19:52:57 发布

云矩阵

最新推荐文章于 2024-11-17 19:52:57 发布

阅读量3.2k

点赞数

文章标签： kubernetes 容器云原生

本文链接：https://blog.csdn.net/qq_45392321/article/details/132487704

版权

文章目录

1、查看node节点概况🔎

发现新添加的Node节点处在NoReady状态。

[root@master01 ~]# kubectl get nodes
NAME       STATUS     ROLES           AGE    VERSION
master01   Ready      control-plane   24h    v1.28.0
node01     Ready      <none>          24h    v1.28.0
node02     NotReady   <none>          122m   v1.28.0

查看此节点的详细信息

[root@master01 ~]# kubectl get nodes
NAME       STATUS     ROLES           AGE    VERSION
master01   Ready      control-plane   24h    v1.28.0
node01     Ready      <none>          24h    v1.28.0
node02     NotReady   <none>          122m   v1.28.0
[root@master01 ~]# kubectl describe node node02
Name:               node02
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node02
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 24 Aug 2023 09:59:44 +0800
Taints:             node.kubernetes.io/not-ready:NoExecute
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  node02
  AcquireTime:     <unset>
  RenewTime:       Thu, 24 Aug 2023 12:02:30 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
  InternalIP:  192.168.20.30
  Hostname:    node02
Capacity:
  cpu:                4
  ephemeral-storage:  27245572Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8107004Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  25109519114
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8004604Ki
  pods:               110
System Info:
  Machine ID:                 8f112fe303914f1e8e27c6b68d205117
  System UUID:                cccb4d56-2724-7bd9-9a5d-25df2e878d03
  Boot ID:                    ee9e1155-e71e-41a2-b07c-d621654a7429
  Kernel Version:             5.14.0-284.25.1.el9_2.x86_64
  OS Image:                   Rocky Linux 9.2 (Blue Onyx)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.23
  Kubelet Version:            v1.28.0
  Kube-Proxy Version:         v1.28.0
PodCIDR:                      10.10.2.0/24
PodCIDRs:                     10.10.2.0/24
Non-terminated Pods:          (2 in total)
  Namespace                   Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                     ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-skdz2    100m (2%)     0 (0%)      50Mi (0%)        0 (0%)         122m
  kube-system                 kube-proxy-zj662         0 (0%)        0 (0%)      0 (0%)           0 (0%)         122m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             50Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:              <none>
[root@master01 ~]#

container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

通过查看Node节点的详细信息，发现是网络问题，接着进一步排查有关网络的Pod的运行情况。

2、查看Pod容器概况🔎

kubectl get pods --all-namespaces

查看对应Pod的详细信息

kubectl describe pods/kube-flannel-ds-skdz2 -n kube-flannel

Warning  FailedCreatePodSandBox  3s (x219 over 48m)    kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/7be4fcbe7145f777339cd0a3e43223c9861058af77e2e528b58138aebcbce56d/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown

🔴 至此，发现问题。问题出在runc安装路径找不到

3、问题发现及解决✅

🟢 首先，排查问题，发现是在node节点上安装runc时，安装路径出现错误。安装操作步骤如下，重新排查runc安装路径。

2️⃣Step 2:Installing runc

# https://github.com/opencontainers/runc/releases 下载对应的安装包

$ wget https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.amd64

$ mkdir -p /usr/local/sbin/runc

$ install -m 755 runc.amd64 /usr/local/sbin/runc

[root@node02 ~]# ll /usr/local/sbin/
总用量 10436
-rwxr-xr-x 1 root root 10684992  8月 24 14:53 runc

发现此时有关网络的pod状态已经恢复正常

[root@master01 ~]# kubectl get pods -n kube-flannel
NAME                    READY   STATUS    RESTARTS      AGE
kube-flannel-ds-jmsr9   1/1     Running   0             95m
kube-flannel-ds-jpc9k   1/1     Running   2 (45m ago)   95m
kube-flannel-ds-nlr95   1/1     Running   2 (44m ago)   95m

查看node02节点详细信息

[root@master01 ~]# kubectl describe node node02
Name:               node02
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node02
                    kubernetes.io/os=linux
Annotations:        csi.volume.kubernetes.io/nodeid: {"rook-ceph.cephfs.csi.ceph.com":"node02","rook-ceph.rbd.csi.ceph.com":"node02"}
                    flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"12:2a:cd:4a:6a:7c"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.20.30
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 24 Aug 2023 09:59:44 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  node02
  AcquireTime:     <unset>
  RenewTime:       Thu, 24 Aug 2023 15:45:00 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 24 Aug 2023 14:59:09 +0800   Thu, 24 Aug 2023 14:59:09 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:59:09 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.20.30
  Hostname:    node02
Capacity:
  cpu:                4
  ephemeral-storage:  27245572Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8107012Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  25109519114
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8004612Ki
  pods:               110
System Info:
  Machine ID:                 8f112fe303914f1e8e27c6b68d205117
  System UUID:                cccb4d56-2724-7bd9-9a5d-25df2e878d03
  Boot ID:                    2f59eb8b-d2cc-41c5-874a-a2d31a2c0da6
  Kernel Version:             5.14.0-284.25.1.el9_2.x86_64
  OS Image:                   Rocky Linux 9.2 (Blue Onyx)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.23
  Kubelet Version:            v1.28.0
  Kube-Proxy Version:         v1.28.0
PodCIDR:                      10.10.2.0/24
PodCIDRs:                     10.10.2.0/24
Non-terminated Pods:          (8 in total)
  Namespace                   Name                                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-jmsr9                               100m (2%)     0 (0%)      50Mi (0%)        0 (0%)         97m
  kube-system                 kube-proxy-zj662                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h45m
  rook-ceph                   csi-cephfsplugin-dbfgd                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   csi-rbdplugin-xvccs                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   rook-ceph-crashcollector-node02-796978746f-7zfm9    0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  rook-ceph                   rook-ceph-mgr-a-54bf4765f-lskgr                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         39m
  rook-ceph                   rook-ceph-mon-c-b467f78dd-7bwz4                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   rook-ceph-osd-0-7575dcff-bpglm                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             50Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type     Reason                   Age                From             Message
  ----     ------                   ----               ----             -------
  Normal   Starting                 48m                kube-proxy
  Normal   Starting                 98m                kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity      98m                kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  98m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasNoDiskPressure    98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasSufficientPID
  Warning  Rebooted                 98m                kubelet          Node node02 has been rebooted, boot id: 1a7c4fda-ca1d-4db9-8af0-186ec828da5b
  Normal   NodeNotReady             98m                kubelet          Node node02 status is now: NodeNotReady
  Normal   NodeHasSufficientMemory  98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasSufficientMemory
  Normal   RegisteredNode           48m                node-controller  Node node02 event: Registered Node node02 in Controller
  Warning  InvalidDiskCapacity      48m                kubelet          invalid capacity 0 on image filesystem
  Normal   Starting                 48m                kubelet          Starting kubelet.
  Normal   NodeHasSufficientMemory  48m                kubelet          Node node02 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    48m                kubelet          Node node02 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     48m                kubelet          Node node02 status is now: NodeHasSufficientPID
  Warning  Rebooted                 48m (x2 over 48m)  kubelet          Node node02 has been rebooted, boot id: 2f59eb8b-d2cc-41c5-874a-a2d31a2c0da6
  Normal   NodeAllocatableEnforced  48m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeReady                45m                kubelet          Node node02 status is now: NodeReady