k8s踩坑记录

Tang World

已于 2022-08-05 16:21:16 修改

阅读量1.8k

点赞数 1

文章标签： kubernetes kubelet

于 2022-08-05 13:19:07 首次发布

本文链接：https://blog.csdn.net/qq_30326609/article/details/126148654

版权

kubeadm init 超时

kubeadm init 一直超时，拉取不到镜像。原因是因为国内外网问题。
使用下列命令利用镜像，该命令含义是拉去adm配置所需的依赖镜像。

kubeadm config images pull --image-repository registry.aliyuncs.com/google_containers
也可以使用
kubeadm init --image-repository registry.aliyuncs.com/google_containers
kubeadm config images list 显示adm init所需要用到的镜像以及版本

更加完善的命令

kubeadm init \
--apiserver-advertise-address 192.168.100.142
--image-repository registry.aliyuncs.com/google_containers \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.1.0.0/16

apiserver-advertise-address 192.168.2.248 填写你自己master的IP地址

无法通过apt安装kubelet kubeadm kubectl组件

默认找不到这三个命令，楼主所有软件几乎都是apt安装的。
根据阿里云官网配置文档配置。
https://developer.aliyun.com/mirror/kubernetes/?spm=a2c6h.25603864.0.0.711325296yOktF

配置后拉取，阿里云报错：没有公钥xxx，通过命令为apt软件源生成公钥以访问阿里云软件源。

详细命令请自行百度，目地是利用报错信息给出的公钥（多个随机字符）配置到apt认证上即可。

[wait-control-plane]等待超时

问题：在上一部kubeadm过程中出现，然后卡住不动

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

查看kubelet的状态：runing，使用的命令：systemctl status kubelet

出现的原因如下：
容器运行时container组件问题，官网说1.20以后已经弃用docker了。
官网文章说明地址：https://kubernetes.io/zh-cn/blog/2020/12/02/dont-panic-kubernetes-and-docker/
默认的containner会尝试拉取谷歌的镜像，而国内拉取不到。因此一直重试卡住。

原博客解释如下：
在不去手动安装cri_docker的情况下，k8s现在默认的套接字是containerd的，启动的是containerd作为容器进行时，而containerd的默认配置里的镜像地址也是谷歌的仓库，而且会在初始化的时候拉取一个给他自己用的沙盒pause镜像，不改镜像地址的话，就会卡在这步

原博客的解决途径：
安装kubelet、kubeadm、kubectl三个组件的1.23版本。
然后参考安装博客：https://blog.csdn.net/weixin_46415378/article/details/124435362?spm=1001.2014.3001.5502

更加全面的教程

安装老版本 1.24以下的版本

没有实践过，因为我找到了1.24版本的，当时的备用1.23版本的安装博客如下
https://blog.csdn.net/qq_41538097/article/details/124869179

安装新版本 1.24版本

里面也包括为什么安装不了、以及运行时容器接口的定义
https://blog.51cto.com/flyfish225/5368116

为了实现docker使用的cgroupdriver与kubelet使用的cgroup的一致性，建议修改如下文件内容。
# vim /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS=“–cgroup-driver=systemd”
设置kubelet为开机自启动即可，由于没有生成配置文件，集群初始化后自动启动
# systemctl enable kubelet

关键环境配置

#修改时区，同步时间
yum install chrond -y
vim /etc/chrony.conf
ntpdate ntp1.aliyun.com iburst

ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
echo 'Asia/Shanghai' > /etc/timezone

#关闭防火墙，selinux
systemctl stop firewalld
systemctl disable firewalld
sed -i 's/enforcing/disabled/' /etc/selinux/config 
setenforce 0

#关闭swap
swapoff -a  
sed -ri 's/.*swap.*/#&/' /etc/fstab


#系统优化
cat > /etc/sysctl.d/k8s_better.conf << EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
vm.swappiness=0
vm.overcommit_memory=1
vm.panic_on_oom=0
fs.inotify.max_user_instances=8192
fs.inotify.max_user_watches=1048576
fs.file-max=52706963
fs.nr_open=52706963
net.ipv6.conf.all.disable_ipv6=1
net.netfilter.nf_conntrack_max=2310720
EOF

modprobe br_netfilter
lsmod |grep conntrack
modprobe ip_conntrack
sysctl -p /etc/sysctl.d/k8s_better.conf

#确保每台机器的uuid不一致，如果是克隆机器，修改网卡配置文件删除uuid那一行
cat /sys/class/dmi/id/product_uuid

安装containerd

这里的配置是最关键的一步，我的问题就是因为这里解决的

生成containerd的配置文件
mkdir /etc/containerd -p 
#生成配置文件
containerd config default > /etc/containerd/config.toml
#编辑配置文件
vim /etc/containerd/config.toml
SystemdCgroup = false 改为 SystemdCgroup = true

#这里pause的版本可以用 kubeadm config images list获取
#sandbox_image = "k8s.gcr.io/pause:3.7"
改为：
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.7"

#systemctl enable containerd
#systemctl start containerd