记一次不典型的 kubeadm init 失败(kubelet 启动失败)

本文记录了一次不典型的kubeadm init过程中的kubelet启动失败问题。在尝试升级Kubernetes环境时,遇到kubelet健康检查失败,错误显示连接拒绝。经过排查发现,新版本kubelet的安装路径与旧版不同,且存在旧集群的残留服务,清理残留并修正kubelet路径后,问题得到解决。

记录一次不典型的 kubeadm init 失败

背景

原有kubernetes 环境,版本较老v1.19.8,需要用新版的,于是直接重装

操作及问题

#1. 安装新版本组件
yum install -y kubeadm-1.23.0-0  kubelet-1.23.0-0 kubectl-1.23.0-0 --disableexcludes=kubernetes

#2. 使用kubeadm重新部署
kubeadm reset

kubeadm init --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --v=5

报错

[kubelet-check] It seems like the kubelet isn’t running or healthy.
[kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get “http://localhost:10248/healthz”: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn’t running or healthy.
[kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get “http://localhost:10248/healthz”: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn’t running or healthy.
[kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get “http://localhost:10248/healthz”: dial tcp [::1]:10248: connect: connection refused.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- ‘systemctl status kubelet’
- ‘journalctl -xeu kubelet’

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.

Here is one example how you may list all Kubernetes containers running in docker:
- ‘docker ps -a | grep kube | grep -v pause’

经查kubelet确实启动失败

systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since 五 2022-03-04 03:58:54 EST; 2s ago
     Docs: http://kubernetes.io/docs/
  Process: 16084 ExecStart=/usr/local/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=203/EXEC)
 Main PID: 16084 (code=exited, status=203/EXEC)
    Tasks: 0
   Memory: 0B

解决

查资料未解决,本次错误不大典型,最后留意到kubelet status信息中这行:

Process: 16084 ExecStart=/usr/local/bin/kubelet 

因为以前v1.19.8版本的kubelet是二进制方式装的,路径为/usr/local/bin/kubelet,但是新版本采用yum安装,路径在/usr/bin/kubelet,并且/usr/local/bin/kubelet被我删掉了,也就是说kubeadm初始集群时采用了错误的kubelet路径导致起不来。

而另一方面,老集群残留的service kubelet还在,所以提示的错误都是kubelet 服务错误,而不是说路径不存在之类的。

最后清理残留并创建软链接解决

# 删除老service
systemctl disable kubelet
rm -rf /etc/systemd/system/kubelet.service  

# 创建软连接
ln -s /usr/bin/kubelet /usr/local/bin/kubelet

# 重新init
kubeadm init --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers --v=5

顺利解决。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值