【NoReady】Kubernetes集群新添加node节点出错排查记录

1、查看node节点概况🔎

发现新添加的Node节点处在NoReady状态。

[root@master01 ~]# kubectl get nodes
NAME       STATUS     ROLES           AGE    VERSION
master01   Ready      control-plane   24h    v1.28.0
node01     Ready      <none>          24h    v1.28.0
node02     NotReady   <none>          122m   v1.28.0

查看此节点的详细信息

[root@master01 ~]# kubectl get nodes
NAME       STATUS     ROLES           AGE    VERSION
master01   Ready      control-plane   24h    v1.28.0
node01     Ready      <none>          24h    v1.28.0
node02     NotReady   <none>          122m   v1.28.0
[root@master01 ~]# kubectl describe node node02
Name:               node02
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node02
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 24 Aug 2023 09:59:44 +0800
Taints:             node.kubernetes.io/not-ready:NoExecute
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  node02
  AcquireTime:     <unset>
  RenewTime:       Thu, 24 Aug 2023 12:02:30 +0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Thu, 24 Aug 2023 12:00:06 +0800   Thu, 24 Aug 2023 10:07:57 +0800   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
  InternalIP:  192.168.20.30
  Hostname:    node02
Capacity:
  cpu:                4
  ephemeral-storage:  27245572Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8107004Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  25109519114
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8004604Ki
  pods:               110
System Info:
  Machine ID:                 8f112fe303914f1e8e27c6b68d205117
  System UUID:                cccb4d56-2724-7bd9-9a5d-25df2e878d03
  Boot ID:                    ee9e1155-e71e-41a2-b07c-d621654a7429
  Kernel Version:             5.14.0-284.25.1.el9_2.x86_64
  OS Image:                   Rocky Linux 9.2 (Blue Onyx)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.23
  Kubelet Version:            v1.28.0
  Kube-Proxy Version:         v1.28.0
PodCIDR:                      10.10.2.0/24
PodCIDRs:                     10.10.2.0/24
Non-terminated Pods:          (2 in total)
  Namespace                   Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                     ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-skdz2    100m (2%)     0 (0%)      50Mi (0%)        0 (0%)         122m
  kube-system                 kube-proxy-zj662         0 (0%)        0 (0%)      0 (0%)           0 (0%)         122m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             50Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:              <none>
[root@master01 ~]#
container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

通过查看Node节点的详细信息,发现是网络问题,接着进一步排查有关网络的Pod的运行情况。

2、查看Pod容器概况🔎

kubectl get pods --all-namespaces

查看对应Pod的详细信息

kubectl describe pods/kube-flannel-ds-skdz2 -n kube-flannel
Warning  FailedCreatePodSandBox  3s (x219 over 48m)    kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/7be4fcbe7145f777339cd0a3e43223c9861058af77e2e528b58138aebcbce56d/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown

🔴 至此,发现问题。问题出在runc安装路径找不到


3、问题发现及解决✅

🟢 首先,排查问题,发现是在node节点上安装runc时,安装路径出现错误。安装操作步骤如下,重新排查runc安装路径。

2️⃣Step 2:Installing runc

# https://github.com/opencontainers/runc/releases 下载对应的安装包

$ wget https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.amd64

$ mkdir -p /usr/local/sbin/runc

$ install -m 755 runc.amd64 /usr/local/sbin/runc

[root@node02 ~]# ll /usr/local/sbin/
总用量 10436
-rwxr-xr-x 1 root root 10684992  824 14:53 runc

发现此时有关网络的pod状态已经恢复正常

[root@master01 ~]# kubectl get pods -n kube-flannel
NAME                    READY   STATUS    RESTARTS      AGE
kube-flannel-ds-jmsr9   1/1     Running   0             95m
kube-flannel-ds-jpc9k   1/1     Running   2 (45m ago)   95m
kube-flannel-ds-nlr95   1/1     Running   2 (44m ago)   95m

查看node02节点详细信息

[root@master01 ~]# kubectl describe node node02
Name:               node02
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node02
                    kubernetes.io/os=linux
Annotations:        csi.volume.kubernetes.io/nodeid: {"rook-ceph.cephfs.csi.ceph.com":"node02","rook-ceph.rbd.csi.ceph.com":"node02"}
                    flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"12:2a:cd:4a:6a:7c"}
                    flannel.alpha.coreos.com/backend-type: vxlan
                    flannel.alpha.coreos.com/kube-subnet-manager: true
                    flannel.alpha.coreos.com/public-ip: 192.168.20.30
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 24 Aug 2023 09:59:44 +0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  node02
  AcquireTime:     <unset>
  RenewTime:       Thu, 24 Aug 2023 15:45:00 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 24 Aug 2023 14:59:09 +0800   Thu, 24 Aug 2023 14:59:09 +0800   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:06:28 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 24 Aug 2023 15:40:31 +0800   Thu, 24 Aug 2023 14:59:09 +0800   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.20.30
  Hostname:    node02
Capacity:
  cpu:                4
  ephemeral-storage:  27245572Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8107012Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  25109519114
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             8004612Ki
  pods:               110
System Info:
  Machine ID:                 8f112fe303914f1e8e27c6b68d205117
  System UUID:                cccb4d56-2724-7bd9-9a5d-25df2e878d03
  Boot ID:                    2f59eb8b-d2cc-41c5-874a-a2d31a2c0da6
  Kernel Version:             5.14.0-284.25.1.el9_2.x86_64
  OS Image:                   Rocky Linux 9.2 (Blue Onyx)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.23
  Kubelet Version:            v1.28.0
  Kube-Proxy Version:         v1.28.0
PodCIDR:                      10.10.2.0/24
PodCIDRs:                     10.10.2.0/24
Non-terminated Pods:          (8 in total)
  Namespace                   Name                                                CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                ------------  ----------  ---------------  -------------  ---
  kube-flannel                kube-flannel-ds-jmsr9                               100m (2%)     0 (0%)      50Mi (0%)        0 (0%)         97m
  kube-system                 kube-proxy-zj662                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         5h45m
  rook-ceph                   csi-cephfsplugin-dbfgd                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   csi-rbdplugin-xvccs                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   rook-ceph-crashcollector-node02-796978746f-7zfm9    0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
  rook-ceph                   rook-ceph-mgr-a-54bf4765f-lskgr                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         39m
  rook-ceph                   rook-ceph-mon-c-b467f78dd-7bwz4                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         45m
  rook-ceph                   rook-ceph-osd-0-7575dcff-bpglm                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         34m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             50Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type     Reason                   Age                From             Message
  ----     ------                   ----               ----             -------
  Normal   Starting                 48m                kube-proxy
  Normal   Starting                 98m                kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity      98m                kubelet          invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  98m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeHasNoDiskPressure    98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasSufficientPID
  Warning  Rebooted                 98m                kubelet          Node node02 has been rebooted, boot id: 1a7c4fda-ca1d-4db9-8af0-186ec828da5b
  Normal   NodeNotReady             98m                kubelet          Node node02 status is now: NodeNotReady
  Normal   NodeHasSufficientMemory  98m (x2 over 98m)  kubelet          Node node02 status is now: NodeHasSufficientMemory
  Normal   RegisteredNode           48m                node-controller  Node node02 event: Registered Node node02 in Controller
  Warning  InvalidDiskCapacity      48m                kubelet          invalid capacity 0 on image filesystem
  Normal   Starting                 48m                kubelet          Starting kubelet.
  Normal   NodeHasSufficientMemory  48m                kubelet          Node node02 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    48m                kubelet          Node node02 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     48m                kubelet          Node node02 status is now: NodeHasSufficientPID
  Warning  Rebooted                 48m (x2 over 48m)  kubelet          Node node02 has been rebooted, boot id: 2f59eb8b-d2cc-41c5-874a-a2d31a2c0da6
  Normal   NodeAllocatableEnforced  48m                kubelet          Updated Node Allocatable limit across pods
  Normal   NodeReady                45m                kubelet          Node node02 status is now: NodeReady

4、总结🎇

通过此次排错学习,找到了排错解决相关问题的思路,一针见血,找到问题发生之根源,快速排错,达到定位错位来源,解决错误问题的最终目的。需要加强学习和排错能力。

==完结==
Kubernetes (k8s) 中,当节点Node)状态显示为 NotReady 时,说明该节点无法正常工作或与集群失去联系。以下是排查和解决问题的一般步骤: --- ### **1. 检查节点状态** 运行以下命令查看节点的状态: ```bash kubectl get nodes ``` 如果某个节点状态为 `NotReady`,进一步详细检查其信息: ```bash kubectl describe node <nodename> ``` 通过此命令可以获取关于该节点的事件、Pod 分配情况以及资源使用状况等重要细节。 --- ### **2. 确认网络连通性** Kubernetes 的通信依赖于底层网络配置是否正确。需要确认以下几个方面: - 节点之间的网络连接是否通畅。 - kube-proxy 是否正常运行,并监听了正确的端口。 - 如果使用的是 CNI 插件(如 Flannel 或 Calico),需确保插件已安装并正确启动。 #### 排查方法: - 登录到出现问题的 Node 上,尝试 ping 其他节点 IP 地址; - 查看容器网络接口和服务 CIDR 配置是否有冲突; - 使用 `journalctl -u flanneld/calico.service` 来观察日志记录是否存在错误提示。 --- ### **3. 核实 Kubelet 和 Docker 运行状况** 每个节点上都有一个关键组件——kubelet,它负责管理 Pod 生命周期并与 Master 组件交互。此外还有容器引擎(比如 Docker)。若两者任何一个停止服务,则可能导致节点变为非就绪状态。 #### 解决方案包括: 1. 确保 kubelet 正常运行: ```bash systemctl status kubelet ``` 若未运行,可通过下面指令重启: ```bash systemctl restart kubelet ``` 2. 检验 docker 守护进程健康度: ```bash systemctl status docker ``` 同样地也可以直接重载 docker 以排除临时故障: ```bash systemctl restart docker ``` --- ### **4. 监控系统资源负载** 高 CPU/Memory 利用率可能会拖累整个系统的性能甚至引发崩溃,因此建议监控目标主机的各项指标数据,例如磁盘空间不足也可能导致异常发生。 #### 工具推荐: - Prometheus + Grafana 提供全面可视化分析能力; - Top/htop 做快速本地检测; 另外别忘了清理旧镜像释放存储空间哦! --- **5. 更新及修复内核模块兼容性问题** 有时由于 Linux 内核版本过低或者某些特定功能缺失会造成不支持的情况。这时应当升级操作系统补丁包或者是替换掉有问题的核心驱动程序文件夹内容物后再测试一遍效果如何改善吧? 最后再次同步时间服务器保持一致很重要噢~~ ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

云矩阵

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值