Kubeadm 初始化集群超时失败问题解决
问题复现
先贴自己的初始化命令
kubeadm init \
--apiserver-advertise-address=192.168.15.128 \
--control-plane-endpoint=192.168.15.199:26443 \
--image-repository registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images \
--kubernetes-version v1.20.9 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16
执行后
在 take up to 4m0s 后长时间等待出现失败:

问题排查
我个人的情况是上述命令去掉 --control-plane-endpoint 参数即可执行成功;
各位可以试试 kubeadm reset 重置后去掉上面参数试试,如果还不成功则与本文所说问题不同。
--control-plane-endpoint 这个参数是官方推荐加上的,在后续如果配置集群 control 节点,没有该参数是无法加入到集群的;所以这个参数是必须的。为什么有这个参数就会超时失败?输出详细日志观察:
# 带上 -v=9 参数,输出详细日志
kubeadm init \
--apiserver-advertise-address=192.168.15.128 \
--control-plane-endpoint=192.168.15.199:26443 \
--image-repository registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images \
--kubernetes-version v1.20.9 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16 \
-v=9
发现在之前的超时过程,会一直发送 curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s' 请求
I0724 22:56:14.619727 12308 waitcontrolplane.go:87] [wait-control-plane] Waiting for the API server to be healthy
I0724 22:56:14.620667 12308 loader.go:379] Config loaded from file: /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0724 22:56:14.621917 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:14.622247 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:14.622261 12308 round_trippers.go:451] Response Headers:
I0724 22:56:15.122724 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:15.122976 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:15.122995 12308 round_trippers.go:451] Response Headers:
I0724 22:56:15.622684 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:15.622870 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:15.622882 12308 round_trippers.go:451] Response Headers:
I0724 22:56:16.122564 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:16.122766 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:16.122778 12308 round_trippers.go:451] Response Headers:
I0724 22:56:16.622672 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:16.622807 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:16.622813 12308 round_trippers.go:451] Response Headers:
I0724 22:56:17.122860 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:17.123074 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:17.123080 12308 round_trippers.go:451] Response Headers:
I0724 22:56:17.623017 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:17.623226 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:17.623237 12308 round_trippers.go:451] Response Headers:
I0724 22:56:18.122627 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:18.122817 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:18.122827 12308 round_trippers.go:451] Response Headers:
I0724 22:56:18.622644 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:18.622823 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:18.622847 12308 round_trippers.go:451] Response Headers:
I0724 22:56:19.122576 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:19.123435 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
I0724 22:56:19.123447 12308 round_trippers.go:451] Response Headers:
I0724 22:56:19.622659 12308 round_trippers.go:425] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:19.622860 12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s in 0 milliseconds
而在不添加 --control-plane-endpoint 参数,成功后会输出控制节点地址,也就是本机的 6443 地址。所以推断出现该问题是因为 k8s 一直在请求控制节点的回复,而配置的控制节点地址不对(或者网络不通)导致一直无法建立连接。
为什么会使用 192.168.15.199:26443 这个地址,这是因为我搭建了 keepalived,虚拟了 192.168.15.199 这个 ip,并用 nginx 监听了这个 26443 的端口。而 telnet 192.168.15.199:26443 是正常的,观察上面的日志发现 k8s 请求的时候使用的是 https 请求,而我 nginx 并没有引入 ssl 模块,那么在 http 块中定义的代理转发应该就无法生效。
所以往这个方向尝试,让 ngixn 可以处理 https 的该请求,将它代理转发到我真实的控制节点(本机 192.168.15:6443)上即可。
问题解决
引入 nginx stream 模块
nginx 的 stream 模块能够让 nginx 代理 upd / tcp 的流量,而 https 也是基于 tcp 的,那么也同样适用。
进入 nginx 安装目录(./configuare 所在目录)
# 将 stream 模块编译进 nginx
./configure --with-stream
编译(不安装)
这样会在当前目录产生一个 objs 文件夹,将该文件夹中的 nginx 移动替换原来的 nginx
# 编译
make
# 备份原来的
mv /usr/local/nginx/sbin/nginx /usr/local/nginx/sbin/nginx.bak
# 拷贝
cp /root/nginx/objs/nginx /usr/local/nginx/sbin/
修改配置文件
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /usr/share/nginx/modules/\*.conf;
events {
worker_connections 1024;
}
# stream 模块代理
stream {
log_format main '$remote_addr $upstream_addr - [$time_local] $status
$upstream_bytes_sent';
access_log /var/log/nginx/k8s-access.log main;
upstream k8s-apiserver {
server 192.168.15.128:6443; # Master1 APISERVER IP:PORT
server 192.168.15.136:6443; # Master2 APISERVER IP:PORT
}
server {
listen 26443;
# 不同于 http 块中需要对代理的地址前添加 http://
proxy_pass k8s-apiserver;
}
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
server {
listen 80 default_server;
}
}
重启启动
nginx -c <配置文件地址>
kubeadm 重新初始化
# 重置
kubeadm reset
# 重新初始化
kubeadm init \
--apiserver-advertise-address=192.168.15.128 \
--control-plane-endpoint=192.168.15.199:26443 \
--image-repository registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images \
--kubernetes-version v1.20.9 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16
初始化成功
I0724 22:59:02.144188 14848 loader.go:379] Config loaded from file: /etc/kubernetes/admin.conf
I0724 22:59:02.144530 14848 loader.go:379] Config loaded from file: /etc/kubernetes/admin.conf
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join 192.168.15.199:26443 --token lnp2w6.2bzbufere9hx99zj \
--discovery-token-ca-cert-hash sha256:50ae6896539bfb1a2127c5cdabd8d0641f34189b1a882e5f0c5dc2290c10f629 \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.15.199:26443 --token lnp2w6.2bzbufere9hx99zj \
--discovery-token-ca-cert-hash sha256:50ae6896539bfb1a2127c5cdabd8d0641f34189b1a882e5f0c5dc2290c10f629
其他节点加入
# 加入成功
kubeadm join 192.168.15.199:26443 --token lnp2w6.2bzbufere9hx99zj \
--discovery-token-ca-cert-hash sha256:50ae6896539bfb1a2127c5cdabd8d0641f34189b1a882e5f0c5dc2290c10f629
# get nodes 正常
[root@k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master NotReady control-plane,master 20m v1.20.9
k8s-node1 NotReady <none> 20s v1.20.9
其他问题
中间查过很多文章,都不适用我的情况。这里也贴出个别解决方案供大家尝试:
--control-plane-endpoint和--apiserver-advertise-address参数必须是内网地址--image-repository参数需要配置国内镜像源,不然可能在镜像拉取过程就超时。或者提前下载整合镜像包,docker load 加载后在这里配置好镜像的对应地址- 其他节点加入成功但是在控制节点
kubectl get nodes不显示:
这可能是因为每台机器的 hostname 是一样的,kubectl get nodes对于一样的名称只会展示一条,所以需要为每台机器设置自己的 hostname(sudo hostnamectl set-hostname <名称>)
1682

被折叠的 条评论
为什么被折叠?



