kubernetes集群初始化kubeadm启动失败

前提

建议先看一下错误是否一致,再看解决方案。由于我是初学者,在大量的百度之后也学会了一些排查方式。
点我直接看结果

环境
腾讯云centos7

启动命令
kubeadm init
–apiserver-advertise-address=ip
–kubernetes-version v1.18.0
–service-cidr=10.96.0.0/12
–pod-network-cidr=10.244.0.0/16

排查过程

首先这是失败的报错

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in docker:
                - 'docker ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

报错提示我使用journalctl -xeu kubelet查看日志,我们打印日志,结果如下

-- Logs begin at 二 2022-04-19 16:20:01 CST, end at 三 2022-04-20 13:28:56 CST. --
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.386568   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.486711   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.572931   31175 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: [Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: net/http: TLS handshake timeout, Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: dial tcp ip:6443: connect: connection refused]
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.586832   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.686950   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.788088   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: I0420 13:27:31.843736   31175 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
4月 20 13:27:31 k8s-master kubelet[31175]: I0420 13:27:31.845909   31175 kubelet_node_status.go:70] Attempting to register node k8s-master
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.846661   31175 kubelet_node_status.go:92] Unable to register node "k8s-master" with API server: Post https://ip:6443/api/v1/nodes: dial tcp ip:6443: connect: connection refused
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.888212   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.988321   31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:32 k8s-master kubelet[31175]: E0420 13:27:32.005184   31175 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: dial tcp ip:6443: connect: connection refused

这里面有两种报错,我跟据这些报错都查了一遍

node “k8s-master” not found:这是中间错误,查他没用,他不是根源
Failed to initialize CSINodeInfo,dial tcp ip:6443: connect: connection refused:这个错误是初始化失败

现在大概清楚是apiserver没启动,docker ps -a也能看到apiserver退出了(exited),在查询connection refused时,有文章(这篇文章)提到可以查看docker logs,于是我打印了apiserver的日志

W0420 06:27:37.750969       1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...

日志最后就不断在尝试连接2379端口,正如上面文章提到的,2379时etcd的端口,所以是etcd启动失败导致的apiserver失败。

于是我们查看etcd的docker日志

2022-04-20 06:31:13.533316 C | etcdmain: listen tcp ip:2380: bind: cannot assign requested address

最后一行报错,无法分配地址,我以为是安全组的问题,但是并不上,百度之后得到结果:GitHub issue

解决

是公有云的问题,在kubeadm的apiserver-advertise-address参数应该写内网地址,而不是公网地址。
### Kubernetes集群初始化问题解决方案 在Kubernetes集群初始化过程中,可能会遇到多种问题。以下是针对常见问题的分析和解决方案: #### 1. `kubeadm init` 初始化超时问题 当执行 `kubeadm init` 命令时,如果出现 `[kubelet-check] Initial timeout of 40s passed.` 的错误提示,通常是因为 kubelet 和 API Server 之间的通信存在问题[^1]。可以尝试以下方法解决问题: - **检查网络连通性**:确认 master 节点能够访问到其他节点,并且防火墙未阻止必要的端口(如 6443)。 - **调整超时时间**:通过增加 `--timeout` 参数延长等待时间,例如: ```bash kubeadm init --pod-network-cidr=10.244.0.0/16 --timeout=5m ``` - **重启 kubelet 服务**:确保 kubelet 正常运行并已启用开机自启[^3]: ```bash sudo systemctl restart kubelet ``` #### 2. 加入工作节点失败问题 在将其他节点加入集群时,可能因为 token 过期或证书哈希不匹配而导致失败[^2]。可以通过重新生成 token 来解决此问题: - 删除旧 token 并创建新 token: ```bash kubeadm token create --print-join-command ``` - 使用新的 join 命令连接节点。 #### 3. Pod 网络插件配置问题 如果没有正确安装 Pod etwork 插件,则可能导致容器间无法相互通信[^4]。推荐使用 Flannel 作为 CNI 插件: ```bash kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml ``` 注意:在执行上述命令前,请确保已在 `kubeadm init` 中指定了正确的 CIDR 地址范围,例如 `--pod-network-cidr=10.244.0.0/16`。 #### 4. kubectl 命令权限不足问题 如果发现 `kubectl` 命令返回权限不足或其他类似的错误消息,可能是由于缺少管理员角色绑定所致。可通过如下方式修复: - 创建 admin 用户的角色绑定文件 `admin-role-binding.yaml`: ```yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user subjects: - kind: User name: admin apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io ``` - 应用该配置文件: ```bash kubectl apply -f admin-role-binding.yaml ``` #### 5. 检查组件状态异常 有时即使完成了初始化操作,部分核心组件仍可能出现不可用的状态。此时可利用以下命令查看具体状况: ```bash kubectl get componentstatuses || kubectl get cs ``` 对于显示为 unhealthy 的模块逐一排查日志记录以定位根本原因。 ---
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值