记录一次k8s集群更新证书,node节点NotReady问题
一开始查看调度到node-1节点的pod都terminating 状态
到节点node-1
kubectl get pod -A
error: You must be logged in to the server (Unauthorize)
将master节点的/etc/kubernetes/admin.conf拷贝到node-1
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
再source ~/.bash_profile
再查看节点污点
kubectl describe node node-1|grep Taint
发现节点 node.kubernetes.io/unreachable:NoExecute
尝试删除污点
kubectl taint node k8snode2 node.kubernetes.io/unreachable-
结果污点变成node.kubernetes.io/unreachable:NoSchedule
后来查资料发现
node.kubernetes.io/not-ready:节点尚未准备好。这对应于NodeConditionReady为False。
node.kubernetes.io/unreachable:无法从节点控制器访问节点。这对应于NodeConditionReady为Unknown。
node.kubernetes.io/out-of-disk:节点磁盘不足。
node.kubernetes.io/memory-pressure:节点有内存压力。
node.kubernetes.io/disk-pressure:节点有磁盘压力。
node.kubernetes.io/network-unavailable:节点的网络不可用。
node.kubernetes.io/unschedulable:节点不可调度。
node.cloudprovider.kubernetes.io/uninitialized:当kubelet从外部云服务提供程序启动时,在节点上设置此污点以将其标记为不可用。来自cloud-controller-manager的控制器初始化此节点后,kubelet删除此污点。
如果要逐出节点,则节点控制器或kubelet会添加相关的污点NoExecute。如果故障情况恢复正常,则kubelet或节点控制器可以删除相关的污点。具体文档地址,如下所示:https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
大概意思是说,之所以出现此污点,是k8s内部认为该节点尚不能工作,所以添加了此污点,防止Pod调度到此节点,看了半天,原来节点底层出现故障了,首先查看下kubelet状态,状态不正常,如下所示:
systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: inactive (dead)
Docs: https://kubernetes.io/docs/
通过
journalctl -xefu kubelet
查看日志
9月 11 17:06:14 node-1 systemd[1]: kubelet.service holdoff time over, scheduling restart.
9月 11 17:06:14 node-1 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished shutting down.
9月 11 17:06:14 node-1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
9月 11 17:06:14 node-1 kubelet[11167]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
9月 11 17:06:14 node-1 kubelet[11167]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
9月 11 17:06:14 node-1 kubelet[11167]: I0911 17:06:14.821215 11167 server.go:417] Version: v1.18.0
9月 11 17:06:14 node-1 kubelet[11167]: I0911 17:06:14.821648 11167 plugins.go:100] No cloud provider specified.
9月 11 17:06:14 node-1 kubelet[11167]: I0911 17:06:14.821680 11167 server.go:837] Client rotation is on, will bootstrap in background
9月 11 17:06:14 node-1 kubelet[11167]: E0911 17:06:14.825197 11167 bootstrap.go:265] part of the existing bootstrap client certificate is expired: 2024-09-06 06:47:45 +0000 UTC
9月 11 17:06:14 node-1 kubelet[11167]: F0911 17:06:14.825242 11167 server.go:274] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
9月 11 17:06:14 node-1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
9月 11 17:06:14 node-1 systemd[1]: Unit kubelet.service entered failed state.
9月 11 17:06:14 node-1 systemd[1]: kubelet.service failed.
9月 11 17:06:24 node-1 systemd[1]: kubelet.service holdoff time over, scheduling restart.
9月 11 17:06:24 node-1 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished shutting down.
9月 11 17:06:24 node-1 systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
9月 11 17:06:25 node-1 kubelet[11236]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
9月 11 17:06:25 node-1 kubelet[11236]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
9月 11 17:06:25 node-1 kubelet[11236]: I0911 17:06:25.053194 11236 server.go:417] Version: v1.18.0
9月 11 17:06:25 node-1 kubelet[11236]: I0911 17:06:25.053574 11236 plugins.go:100] No cloud provider specified.
9月 11 17:06:25 node-1 kubelet[11236]: I0911 17:06:25.053611 11236 server.go:837] Client rotation is on, will bootstrap in background
9月 11 17:06:25 node-1 kubelet[11236]: E0911 17:06:25.057193 11236 bootstrap.go:265] part of the existing bootstrap client certificate is expired: 2024-09-06 06:47:45 +0000 UTC
9月 11 17:06:25 node-1 kubelet[11236]: F0911 17:06:25.057233 11236 server.go:274] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
9月 11 17:06:25 node-1 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
9月 11 17:06:25 node-1 systemd[1]: Unit kubelet.service entered failed state.
9月 11 17:06:25 node-1 systemd[1]: kubelet.service failed.
发现 unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf
在节点node-1
cp -a /etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf
systemctl daemon-reload && systemctl restart kubelet
恢复了