1、查看node节点概况🔎
发现新添加的Node节点处在NoReady
状态。
[root@master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready control-plane 24h v1.28.0
node01 Ready <none> 24h v1.28.0
node02 NotReady <none> 122m v1.28.0
查看此节点的详细信息
[root@master01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master01 Ready control-plane 24h v1.28.0
node01 Ready <none> 24h v1.28.0
node02 NotReady <none> 122m v1.28.0
[root@master01 ~]# kubectl describe node node02
Name: node02
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node02
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 24 Aug 2023 09:59:44 +0800
Taints: node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: node02
AcquireTime: <unset>
RenewTime: Thu, 24 Aug 2023 12:02:30 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Thu, 24 Aug 2023 12:00:06 +0800 Thu, 24 Aug 2023 10:07:57 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 24 Aug 2023 12:00:06 +0800 Thu, 24 Aug 2023 10:07:57 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 24 Aug 2023 12:00:06 +0800 Thu, 24 Aug 2023 10:07:57 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Thu, 24 Aug 2023 12:00:06 +0800 Thu, 24 Aug 2023 10:07:57 +0800 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
InternalIP: 192.168.20.30
Hostname: node02
Capacity:
cpu: 4
ephemeral-storage: 27245572Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8107004Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 25109519114
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8004604Ki
pods: 110
System Info:
Machine ID: 8f112fe303914f1e8e27c6b68d205117
System UUID: cccb4d56-2724-7bd9-9a5d-25df2e878d03
Boot ID: ee9e1155-e71e-41a2-b07c-d621654a7429
Kernel Version: 5.14.0-284.25.1.el9_2.x86_64
OS Image: Rocky Linux 9.2 (Blue Onyx)
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.23
Kubelet Version: v1.28.0
Kube-Proxy Version: v1.28.0
PodCIDR: 10.10.2.0/24
PodCIDRs: 10.10.2.0/24
Non-terminated Pods: (2 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-flannel kube-flannel-ds-skdz2 100m (2%) 0 (0%) 50Mi (0%) 0 (0%) 122m
kube-system kube-proxy-zj662 0 (0%) 0 (0%) 0 (0%) 0 (0%) 122m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (2%) 0 (0%)
memory 50Mi (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
[root@master01 ~]#
container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
通过查看Node节点的详细信息,发现是网络问题,接着进一步排查有关网络的Pod的运行情况。
2、查看Pod容器概况🔎
kubectl get pods --all-namespaces
查看对应Pod的详细信息
kubectl describe pods/kube-flannel-ds-skdz2 -n kube-flannel
Warning FailedCreatePodSandBox 3s (x219 over 48m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/k8s.io/7be4fcbe7145f777339cd0a3e43223c9861058af77e2e528b58138aebcbce56d/log.json: no such file or directory): exec: "runc": executable file not found in $PATH: unknown
🔴 至此,发现问题。问题出在runc安装路径找不到
3、问题发现及解决✅
🟢 首先,排查问题,发现是在node节点上安装runc时,安装路径出现错误。安装操作步骤如下,重新排查runc安装路径。
2️⃣Step 2:Installing runc
# https://github.com/opencontainers/runc/releases 下载对应的安装包
$ wget https://github.com/opencontainers/runc/releases/download/v1.1.9/runc.amd64
$ mkdir -p /usr/local/sbin/runc
$ install -m 755 runc.amd64 /usr/local/sbin/runc
[root@node02 ~]# ll /usr/local/sbin/
总用量 10436
-rwxr-xr-x 1 root root 10684992 8月 24 14:53 runc
发现此时有关网络的pod状态已经恢复正常
[root@master01 ~]# kubectl get pods -n kube-flannel
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-jmsr9 1/1 Running 0 95m
kube-flannel-ds-jpc9k 1/1 Running 2 (45m ago) 95m
kube-flannel-ds-nlr95 1/1 Running 2 (44m ago) 95m
查看node02节点详细信息
[root@master01 ~]# kubectl describe node node02
Name: node02
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node02
kubernetes.io/os=linux
Annotations: csi.volume.kubernetes.io/nodeid: {"rook-ceph.cephfs.csi.ceph.com":"node02","rook-ceph.rbd.csi.ceph.com":"node02"}
flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"12:2a:cd:4a:6a:7c"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.20.30
kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Thu, 24 Aug 2023 09:59:44 +0800
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: node02
AcquireTime: <unset>
RenewTime: Thu, 24 Aug 2023 15:45:00 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 24 Aug 2023 14:59:09 +0800 Thu, 24 Aug 2023 14:59:09 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Thu, 24 Aug 2023 15:40:31 +0800 Thu, 24 Aug 2023 14:06:28 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 24 Aug 2023 15:40:31 +0800 Thu, 24 Aug 2023 14:06:28 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 24 Aug 2023 15:40:31 +0800 Thu, 24 Aug 2023 14:06:28 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 24 Aug 2023 15:40:31 +0800 Thu, 24 Aug 2023 14:59:09 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.20.30
Hostname: node02
Capacity:
cpu: 4
ephemeral-storage: 27245572Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8107012Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 25109519114
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8004612Ki
pods: 110
System Info:
Machine ID: 8f112fe303914f1e8e27c6b68d205117
System UUID: cccb4d56-2724-7bd9-9a5d-25df2e878d03
Boot ID: 2f59eb8b-d2cc-41c5-874a-a2d31a2c0da6
Kernel Version: 5.14.0-284.25.1.el9_2.x86_64
OS Image: Rocky Linux 9.2 (Blue Onyx)
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.23
Kubelet Version: v1.28.0
Kube-Proxy Version: v1.28.0
PodCIDR: 10.10.2.0/24
PodCIDRs: 10.10.2.0/24
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-flannel kube-flannel-ds-jmsr9 100m (2%) 0 (0%) 50Mi (0%) 0 (0%) 97m
kube-system kube-proxy-zj662 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5h45m
rook-ceph csi-cephfsplugin-dbfgd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 45m
rook-ceph csi-rbdplugin-xvccs 0 (0%) 0 (0%) 0 (0%) 0 (0%) 45m
rook-ceph rook-ceph-crashcollector-node02-796978746f-7zfm9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m
rook-ceph rook-ceph-mgr-a-54bf4765f-lskgr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 39m
rook-ceph rook-ceph-mon-c-b467f78dd-7bwz4 0 (0%) 0 (0%) 0 (0%) 0 (0%) 45m
rook-ceph rook-ceph-osd-0-7575dcff-bpglm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (2%) 0 (0%)
memory 50Mi (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 48m kube-proxy
Normal Starting 98m kubelet Starting kubelet.
Warning InvalidDiskCapacity 98m kubelet invalid capacity 0 on image filesystem
Normal NodeAllocatableEnforced 98m kubelet Updated Node Allocatable limit across pods
Normal NodeHasNoDiskPressure 98m (x2 over 98m) kubelet Node node02 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 98m (x2 over 98m) kubelet Node node02 status is now: NodeHasSufficientPID
Warning Rebooted 98m kubelet Node node02 has been rebooted, boot id: 1a7c4fda-ca1d-4db9-8af0-186ec828da5b
Normal NodeNotReady 98m kubelet Node node02 status is now: NodeNotReady
Normal NodeHasSufficientMemory 98m (x2 over 98m) kubelet Node node02 status is now: NodeHasSufficientMemory
Normal RegisteredNode 48m node-controller Node node02 event: Registered Node node02 in Controller
Warning InvalidDiskCapacity 48m kubelet invalid capacity 0 on image filesystem
Normal Starting 48m kubelet Starting kubelet.
Normal NodeHasSufficientMemory 48m kubelet Node node02 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 48m kubelet Node node02 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 48m kubelet Node node02 status is now: NodeHasSufficientPID
Warning Rebooted 48m (x2 over 48m) kubelet Node node02 has been rebooted, boot id: 2f59eb8b-d2cc-41c5-874a-a2d31a2c0da6
Normal NodeAllocatableEnforced 48m kubelet Updated Node Allocatable limit across pods
Normal NodeReady 45m kubelet Node node02 status is now: NodeReady
4、总结🎇
通过此次排错学习,找到了排错解决相关问题的思路,一针见血,找到问题发生之根源,快速排错,达到定位错位来源,解决错误问题的最终目的。需要加强学习和排错能力。