记录一次不典型的 kubeadm init 失败
背景
原有kubernetes 环境,版本较老v1.19.8,需要用新版的,于是直接重装
操作及问题
#1. 安装新版本组件
yum install -y kubeadm-1.23.0-0 kubelet-1.23.0-0 kubectl-1.23.0-0 --disableexcludes=kubernetes
#2. 使用kubeadm重新部署
kubeadm reset
kubeadm init --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --v=5
报错
[kubelet-check] It seems like the kubelet isn’t running or healthy.
[kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get “http://localhost:10248/healthz”: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn’t running or healthy.
[kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get “http://localhost:10248/healthz”: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn’t running or healthy.
[kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get “http://localhost:10248/healthz”: dial tcp [::1]:10248: connect: connection refused.Unfortunately, an error has occurred:
timed out waiting for the conditionThis error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- ‘systemctl status kubelet’
- ‘journalctl -xeu kubelet’Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.Here is one example how you may list all Kubernetes containers running in docker:
- ‘docker ps -a | grep kube | grep -v pause’
经查kubelet确实启动失败
systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since 五 2022-03-04 03:58:54 EST; 2s ago
Docs: http://kubernetes.io/docs/
Process: 16084 ExecStart=/usr/local/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=203/EXEC)
Main PID: 16084 (code=exited, status=203/EXEC)
Tasks: 0
Memory: 0B
解决
查资料未解决,本次错误不大典型,最后留意到kubelet status信息中这行:
Process: 16084 ExecStart=/usr/local/bin/kubelet
因为以前v1.19.8版本的kubelet是二进制方式装的,路径为/usr/local/bin/kubelet,但是新版本采用yum安装,路径在/usr/bin/kubelet,并且/usr/local/bin/kubelet被我删掉了,也就是说kubeadm初始集群时采用了错误的kubelet路径导致起不来。
而另一方面,老集群残留的service kubelet还在,所以提示的错误都是kubelet 服务错误,而不是说路径不存在之类的。
最后清理残留并创建软链接解决
# 删除老service
systemctl disable kubelet
rm -rf /etc/systemd/system/kubelet.service
# 创建软连接
ln -s /usr/bin/kubelet /usr/local/bin/kubelet
# 重新init
kubeadm init --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --v=5
顺利解决。
本文记录了一次不典型的kubeadm init过程中的kubelet启动失败问题。在尝试升级Kubernetes环境时,遇到kubelet健康检查失败,错误显示连接拒绝。经过排查发现,新版本kubelet的安装路径与旧版不同,且存在旧集群的残留服务,清理残留并修正kubelet路径后,问题得到解决。
45

被折叠的 条评论
为什么被折叠?



