Kubeadm 初始化集群超时失败问题解决

Kubeadm 初始化集群超时失败问题解决

问题复现

先贴自己的初始化命令

kubeadm init \
--apiserver-advertise-address=192.168.15.128 \
--control-plane-endpoint=192.168.15.199:26443 \
--image-repository registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images \
--kubernetes-version v1.20.9 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16

执行后

在 take up to 4m0s 后长时间等待出现失败:

在这里插入图片描述

问题排查

我个人的情况是上述命令去掉 --control-plane-endpoint 参数即可执行成功;

各位可以试试 kubeadm reset 重置后去掉上面参数试试,如果还不成功则与本文所说问题不同。

--control-plane-endpoint 这个参数是官方推荐加上的,在后续如果配置集群 control 节点,没有该参数是无法加入到集群的;所以这个参数是必须的。为什么有这个参数就会超时失败?输出详细日志观察:

# 带上 -v=9 参数,输出详细日志
kubeadm init \
--apiserver-advertise-address=192.168.15.128 \
--control-plane-endpoint=192.168.15.199:26443 \
--image-repository registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images \
--kubernetes-version v1.20.9 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16 \
-v=9

发现在之前的超时过程,会一直发送 curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s' 请求

I0724 22:56:14.619727   12308 waitcontrolplane.go:87] [wait-control-plane] Waiting for the API server to be healthy
I0724 22:56:14.620667   12308 loader.go:379] Config loaded from file:  /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0724 22:56:14.621917   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:14.622247   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:14.622261   12308 round_trippers.go:451] Response Headers:
I0724 22:56:15.122724   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:15.122976   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:15.122995   12308 round_trippers.go:451] Response Headers:
I0724 22:56:15.622684   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:15.622870   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:15.622882   12308 round_trippers.go:451] Response Headers:
I0724 22:56:16.122564   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:16.122766   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:16.122778   12308 round_trippers.go:451] Response Headers:
I0724 22:56:16.622672   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:16.622807   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:16.622813   12308 round_trippers.go:451] Response Headers:
I0724 22:56:17.122860   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:17.123074   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:17.123080   12308 round_trippers.go:451] Response Headers:
I0724 22:56:17.623017   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:17.623226   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:17.623237   12308 round_trippers.go:451] Response Headers:
I0724 22:56:18.122627   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:18.122817   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:18.122827   12308 round_trippers.go:451] Response Headers:
I0724 22:56:18.622644   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:18.622823   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:18.622847   12308 round_trippers.go:451] Response Headers:
I0724 22:56:19.122576   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:19.123435   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds
I0724 22:56:19.123447   12308 round_trippers.go:451] Response Headers:
I0724 22:56:19.622659   12308 round_trippers.go:425] curl -k -v -XGET  -H "Accept: application/json, */*" -H "User-Agent: kubeadm/v1.20.9 (linux/amd64) kubernetes/7a576bc" 'https://192.168.15.199:26443/healthz?timeout=10s'
I0724 22:56:19.622860   12308 round_trippers.go:445] GET https://192.168.15.199:26443/healthz?timeout=10s  in 0 milliseconds

而在不添加 --control-plane-endpoint 参数,成功后会输出控制节点地址,也就是本机的 6443 地址。所以推断出现该问题是因为 k8s 一直在请求控制节点的回复,而配置的控制节点地址不对(或者网络不通)导致一直无法建立连接。

为什么会使用 192.168.15.199:26443 这个地址,这是因为我搭建了 keepalived,虚拟了 192.168.15.199 这个 ip,并用 nginx 监听了这个 26443 的端口。而 telnet 192.168.15.199:26443 是正常的,观察上面的日志发现 k8s 请求的时候使用的是 https 请求,而我 nginx 并没有引入 ssl 模块,那么在 http 块中定义的代理转发应该就无法生效。

所以往这个方向尝试,让 ngixn 可以处理 https 的该请求,将它代理转发到我真实的控制节点(本机 192.168.15:6443)上即可。

问题解决

引入 nginx stream 模块

nginx 的 stream 模块能够让 nginx 代理 upd / tcp 的流量,而 https 也是基于 tcp 的,那么也同样适用。

进入 nginx 安装目录(./configuare 所在目录)

# 将 stream 模块编译进 nginx
./configure --with-stream

编译(不安装)

这样会在当前目录产生一个 objs 文件夹,将该文件夹中的 nginx 移动替换原来的 nginx

# 编译
make

# 备份原来的
mv /usr/local/nginx/sbin/nginx /usr/local/nginx/sbin/nginx.bak

# 拷贝
cp /root/nginx/objs/nginx /usr/local/nginx/sbin/

修改配置文件

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /usr/share/nginx/modules/\*.conf;

events {
    worker_connections 1024;
}

# stream 模块代理
stream {

    log_format main '$remote_addr $upstream_addr - [$time_local] $status 
$upstream_bytes_sent';

    access_log /var/log/nginx/k8s-access.log main;

    upstream k8s-apiserver {
        server 192.168.15.128:6443; # Master1 APISERVER IP:PORT 
        server 192.168.15.136:6443; # Master2 APISERVER IP:PORT 
    }

    server {
        listen 26443;
		# 不同于 http 块中需要对代理的地址前添加 http://
        proxy_pass k8s-apiserver;
    }

}

http {
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
        '$status $body_bytes_sent "$http_referer" '
        '"$http_user_agent" "$http_x_forwarded_for"';

    access_log /var/log/nginx/access.log main;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    server {
        listen 80 default_server;
    }
}

重启启动

nginx -c <配置文件地址>

kubeadm 重新初始化

# 重置
kubeadm reset

# 重新初始化
kubeadm init \
--apiserver-advertise-address=192.168.15.128 \
--control-plane-endpoint=192.168.15.199:26443 \
--image-repository registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images \
--kubernetes-version v1.20.9 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=192.168.0.0/16

初始化成功

I0724 22:59:02.144188   14848 loader.go:379] Config loaded from file:  /etc/kubernetes/admin.conf
I0724 22:59:02.144530   14848 loader.go:379] Config loaded from file:  /etc/kubernetes/admin.conf

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

  kubeadm join 192.168.15.199:26443 --token lnp2w6.2bzbufere9hx99zj \
    --discovery-token-ca-cert-hash sha256:50ae6896539bfb1a2127c5cdabd8d0641f34189b1a882e5f0c5dc2290c10f629 \
    --control-plane 

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.15.199:26443 --token lnp2w6.2bzbufere9hx99zj \
    --discovery-token-ca-cert-hash sha256:50ae6896539bfb1a2127c5cdabd8d0641f34189b1a882e5f0c5dc2290c10f629 

其他节点加入

# 加入成功
kubeadm join 192.168.15.199:26443 --token lnp2w6.2bzbufere9hx99zj \
    --discovery-token-ca-cert-hash sha256:50ae6896539bfb1a2127c5cdabd8d0641f34189b1a882e5f0c5dc2290c10f629 

# get nodes 正常
[root@k8s-master ~]# kubectl get nodes
NAME         STATUS     ROLES                  AGE   VERSION
k8s-master   NotReady   control-plane,master   20m   v1.20.9
k8s-node1    NotReady   <none>                 20s   v1.20.9

其他问题

中间查过很多文章,都不适用我的情况。这里也贴出个别解决方案供大家尝试:

  1. --control-plane-endpoint--apiserver-advertise-address 参数必须是内网地址
  2. --image-repository 参数需要配置国内镜像源,不然可能在镜像拉取过程就超时。或者提前下载整合镜像包,docker load 加载后在这里配置好镜像的对应地址
  3. 其他节点加入成功但是在控制节点 kubectl get nodes 不显示:
    这可能是因为每台机器的 hostname 是一样的, kubectl get nodes 对于一样的名称只会展示一条,所以需要为每台机器设置自己的 hostname(sudo hostnamectl set-hostname <名称>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值