版本:kubernetes(k8s) v1.28.2
1 准备工作
- 准备多台 Linux 设备。
- 可参考 https://blog.csdn.net/White_Ink_/article/details/139743058 配置集群环境。
- 关闭 swap 分区
临时关闭:sudo swapoff -a
永久关闭swap:sudo sed -ri 's/.*swap.*/#&/' /etc/fstab
- 设置内核参数
安装 bridge-utils,命令为sudo apt-get install -y bridge-utils
。
使用 modprobe 加载,命令为udo modprobe br_netfilter
。使用命令lsmod | grep br_netfilter
就能看到 br_netfilter 模块。
使用命令sudo sysctl -a | grep bridge
确认内核参数 net.bridge.bridge-nf-call-iptables 是否为 1。若不为1,使用下面的命令来修改:cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF sudo sysctl --system
2. 安装docker
- 安装见其他笔记:https://blog.csdn.net/White_Ink_/article/details/133548415
- 修改cgroup管理器
ubuntu 系统,debian 系统,centos7 系统,都是使用 systemd 初始化系统的。systemd 这边已经有一套 cgroup 管理器了,如果容器运行时和 kubelet 使用 cgroupfs,此时就会存在 cgroups 和 systemd 两种 cgroup 管理器。也就意味着操作系统里面存在两种资源分配的视图,当操作系统上存在 CPU,内存等等资源不足的时候,操作系统上的进程会变得不稳定。
在/etc/docker/daemon.json中添加以下内容。"exec-opts": [ "native.cgroupdriver=systemd" ],
3. 安装kubelet、kubeadm、kubectl
- 设置阿里镜像源
sudo apt-get install -y ca-certificates curl software-properties-common apt-transport-https curl sudo curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add - sudo tee /etc/apt/sources.list.d/kubernetes.list <<EOF deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main EOF sudo apt-get update
- 安装kubeadm kubectl
sudo apt-get install -y kubelet kubeadm kubectl # 也可以指定安装版本 sudo apt-get install -y kubelet-1.28.2 kubeadm-1.28.2 kubectl-1.28.2 # 阻止自动更新(apt upgrade时忽略)。所以更新的时候先unhold,更新完再hold。 sudo apt-mark hold kubelet kubeadm kubectl
4. cri环境配置
- 下载
从github上下载文件。
或使用命令wget https://github.com/Mirantis/cri-dockerd/releases/download/v0.3.6/cri-dockerd-0.3.6.amd64.tgz
- 解压
tar -zxvf cri-dockerd-0.3.6.amd64.tgz sudo mv ./cri-dockerd/cri-dockerd /usr/local/bin/ cri-dockerd --version
- 配置
在/etc/systemd/system/cri-dockerd.service中添加以下内容。
在/etc/systemd/system/cri-dockerd.socket中添加以下内容。[Unit] Description=CRI Interface for Docker Application Container Engine Documentation=https://docs.mirantis.com After=network-online.target firewalld.service docker.service Wants=network-online.target [Service] Type=notify ExecStart=/usr/local/bin/cri-dockerd --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9 --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --container-runtime-endpoint=unix:///var/run/cri-dockerd.sock --cri-dockerd-root-directory=/var/lib/dockershim --docker-endpoint=unix:///var/run/docker.sock -- cri-dockerd-root-directory=/var/lib/docker ExecReload=/bin/kill -s HUP $MAINPID TimeoutSec=0 RestartSec=2 Restart=always StartLimitBurst=3 StartLimitInterval=60s LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity TasksMax=infinity Delegate=yes KillMode=process [Install] WantedBy=multi-user.target
[Unit] Description=CRI Docker Socket for the API PartOf=cri-docker.service [Socket] ListenStream=/var/run/cri-dockerd.sock SocketMode=0660 SocketUser=root SocketGroup=docker [Install] WantedBy=sockets.target
- 启动服务
重新加载配置:sudo systemctl daemon-reload
设置为开机自启动:sudo systemctl enable cri-dockerd
启动服务:sudo systemctl start cri-dockerd
检查服务状态:sudo systemctl status cri-dockerd
5. 初始化master
- 方法一
这里使用了阿里云的镜像,然后使用了非默认的CIDR,一定要和宿主机的局域网的CIDR不一样。
这里会生成kubeadm join命令,先记下来,用于work节点的加入。sudo kubeadm init --kubernetes-version=1.28.2 \ --apiserver-advertise-address=192.168.221.3 \ --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \ --service-cidr=10.96.0.0/12 \ --pod-network-cidr=10.244.0.0/16 \ --cri-socket=unix:///var/run/cri-dockerd.sock
- 方法二
生成默认配置文件
可选择修改配置文件以下内容:kubeadm config print init-defaults > init.default.yaml
使用下面命令初始化。# 修改地址 节点IP地址 localAPIEndpoint.advertiseAddress: 192.168.11.190 # 修改套接字 nodeRegistration.criSocket: unix:///var/run/cri-dockerd.sock # 修改节点名称 nodeRegistration.name: k8s-master1 # 修改镜像仓库地址为国内开源镜像库 imageRepository: registry.aliyuncs.com/google_containers # 增加podSubnet,由于后续会安装flannel 网络插件,该插件必须在集群初始化时指定pod地址 # 10.244.0.0/16 为flannel组件podSubnet默认值,集群配置与网络组件中的配置需保持一致 networking.podSubnet: 10.244.0.0/16
sudo kubeadm init --config init.default.yaml
- non-root用户使用kubectl
如果是non-root用户,执行下面命令可使其可以使用kubectl命令。mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
6. 加入work节点
- 加入节点
在work节点上运行如下命令。注意是使用上一步生成的。kubeadm join 192.168.221.3:6443 --token 16pw7a.7hp1yvbboanjv1ba \ --cri-socket=unix:///var/run/cri-dockerd.sock \ --discovery-token-ca-cert-hash sha256:5457a1a48c135a37da0e12e075e444abbbd14b30c179e6fa99c9cf47793fd62c
- 验证
显示一下信息则加入成功。
在mster节点上输入This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details.
kubectl get nodes
查看已经加入的节点。
7. 配置网络插件
下面只在master上执行。以下选择其中一个配置即可。
7.1 fannal
- 获取fannel的配置文件
wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
- 修改文件中quay.io仓库为quay-mirror.qiniu.com
- 使用配置文件启动fannel
kubectl apply -f kube-flannel.yml
- 稍等片刻,再次查看集群节点的状态
结果为:kubectl get nodes
NAME STATUS ROLES AGE VERSION master Ready master 15m v1.17.4 node1 Ready 8m53s v1.17.4 node2 Ready 8m50s v1.17.4
7.2 weave net
- 部署weave net
kubectl apply -f https://github.com/weaveworks/weave/releases/download/v2.8.1/weave-daemonset-k8s.yaml
- 再次查看集群节点的状态
结果为:kubectl get nodes
NAME STATUS ROLES AGE VERSION master Ready master 15m v1.17.4 node1 Ready 8m53s v1.17.4 node2 Ready 8m50s v1.17.4
报错记录
- [ERROR CRI]: container runtime is not running: output: time=“2023-10-24T19:20:04+08:00” level=fatal msg=“validate service connection: CRI v1 runtime API is not implemented for endpoint “unix:///var/run/containerd/containerd.sock”: rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService”
解决: 注释/etc/containerd/config.toml中的disabled_plugins = ["cri"]
,并重启containerd,重启命令为systemctl restart containerd
。 - [ERROR Port-10250]: Port 10250 is in use
解决:systemctl stop kubelet
- [kubelet-check] Initial timeout of 40s passed.
以下是具体内容。
解决: 修改/var/run/cri-dockerd.sock权限,Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all running Kubernetes containers by using crictl: - 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock logs CONTAINERID' error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster To see the stack trace of this error execute with --v=5 or higher
sudo chmod 777 /var/run/cri-dockerd.sock
。