问题1:
执行kubectl get nodes等命令时,所有的命令都会打印出错误:Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “kubernetes”)
解决方法:
由于之前创建过集群,在执行kubeadm之后不会删除$HOME/.kube
所以执行rm -rf $HOME/.kube 删除初始化时创建的目录即可恢复
问题2:
初始化 k8s 报错 failed to pull image coredns:v1.8.0
这个报错是因为我搭建的 Kubernetes version: v1.21.0 依赖 coredns:v1.8.0 镜像,而 阿里云的镜像仓库没有 coredns:v1.8.0
解决方法:
建议先执行查看自己的集群需要的镜像名称,之后重名时请重名为自己需要的名
kubeadm config images list
docker pull coredns/coredns:1.8.0
docker tag coredns/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns/coredns:v1.8.0
docker rmi -f coredns/coredns:1.8.0
docker ps
最后重新初始化即可
问题3:
安装calico网络插件的时候,有一个pod处于runing状态但是一直是0/1,也就初始化失败
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.6.120,172.17.6.121,172.17.6.122
Warning Unhealthy 20s kubelet Readiness probe failed
kubectl get pod -n kube-system -owide
查看pod在那个节点,在此节点删除多余的网卡
ip link
ifconfig 网卡名称 down
ip link delete 网卡名称
最后删除pod,就恢复正常
问题4:
节点加入集群失败
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
这是由于kubelet 和docker的驱动不一致导致的
在 /etc/docker/daemon.json 中添加
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
问题5 :
出现这种错误Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 192.168.177.3,192.168.177.52022-06-28 01:49:47.889 [INFO][161] health.go 156: Number of node(s) with BGP peering established = 0
出现原因:
调整calicao
网络插件的网卡发现机制,修改IP_AUTODETECTION_METHOD对应的value值。官方提供的yaml文件中,ip识别策略(IPDETECTMETHOD)没有配置,即默认为first-found,这会导致一个网络异常的ip作为nodeIP被注册,从而影响node-to-node
mesh。我们可以修改成can-reach或者interface的策略,尝试连接某一个Ready的node的IP,以此选择出正确的IP。
解决方法:
3654 - name: IP_AUTODETECTION_METHOD
3655 value: "interface=ens*"
重新执行文件即可