前提
建议先看一下错误是否一致,再看解决方案。由于我是初学者,在大量的百度之后也学会了一些排查方式。
点我直接看结果
环境
腾讯云centos7
启动命令
kubeadm init
–apiserver-advertise-address=ip
–kubernetes-version v1.18.0
–service-cidr=10.96.0.0/12
–pod-network-cidr=10.244.0.0/16
排查过程
首先这是失败的报错
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
报错提示我使用journalctl -xeu kubelet
查看日志,我们打印日志,结果如下
-- Logs begin at 二 2022-04-19 16:20:01 CST, end at 三 2022-04-20 13:28:56 CST. --
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.386568 31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.486711 31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.572931 31175 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: [Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: net/http: TLS handshake timeout, Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: dial tcp ip:6443: connect: connection refused]
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.586832 31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.686950 31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.788088 31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: I0420 13:27:31.843736 31175 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
4月 20 13:27:31 k8s-master kubelet[31175]: I0420 13:27:31.845909 31175 kubelet_node_status.go:70] Attempting to register node k8s-master
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.846661 31175 kubelet_node_status.go:92] Unable to register node "k8s-master" with API server: Post https://ip:6443/api/v1/nodes: dial tcp ip:6443: connect: connection refused
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.888212 31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:31 k8s-master kubelet[31175]: E0420 13:27:31.988321 31175 kubelet.go:2267] node "k8s-master" not found
4月 20 13:27:32 k8s-master kubelet[31175]: E0420 13:27:32.005184 31175 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiting for the condition; caused by: Get https://ip:6443/apis/storage.k8s.io/v1/csinodes/k8s-master: dial tcp ip:6443: connect: connection refused
这里面有两种报错,我跟据这些报错都查了一遍
node “k8s-master” not found:这是中间错误,查他没用,他不是根源
Failed to initialize CSINodeInfo,dial tcp ip:6443: connect: connection refused:这个错误是初始化失败
现在大概清楚是apiserver没启动,docker ps -a也能看到apiserver退出了(exited),在查询connection refused时,有文章(这篇文章)提到可以查看docker logs,于是我打印了apiserver的日志
W0420 06:27:37.750969 1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
日志最后就不断在尝试连接2379端口,正如上面文章提到的,2379时etcd的端口,所以是etcd启动失败导致的apiserver失败。
于是我们查看etcd的docker日志
2022-04-20 06:31:13.533316 C | etcdmain: listen tcp ip:2380: bind: cannot assign requested address
最后一行报错,无法分配地址,我以为是安全组的问题,但是并不上,百度之后得到结果:GitHub issue