准备
使用rockyLinux8.10操作系统,然后准备如下配置的三个节点
ip | CPU | 内存 | 硬盘 | 角色 | 主机名 |
---|---|---|---|---|---|
192.168.91.220 | 2C | 2G | 40GB | control-plane | k8s01 |
192.168.91.221 | 2C | 2G | 40GB | worker(node) | k8s02 |
192.168.91.222 | 2C | 2G | 40GB | worker(node) | k8s03 |
在上面准备的所有节点中操作
- 配置hosts
cat >> /etc/hosts << EOF
192.168.91.220 k8s01
192.168.91.221 k8s02
192.168.91.222 k8s03
EOF
- 时间同步配置
dnf -y install chrony
# 内网部署的话,应该配置内网的时间同步服务
echo "server ntp.aliyun.com iburst" >> /etc/chrony.conf
systemctl enable --now chronyd
# 检查同步状态
chronyc sources -v
- 内核升级
# 查看内核版本
uname -r
4.18.0-553.el8_10.x86_64
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
dnf install -y https://www.elrepo.org/elrepo-release-8.el8.elrepo.noarch.rpm
dnf makecache
dnf --enablerepo=elrepo-kernel -y install kernel-ml.x86_64
# 设置默认启动新内核
sed -i 's/^GRUB_DEFAULT=.*/GRUB_DEFAULT=0/' /etc/default/grub
# 更新 GRUB 配置并重启系统
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
# 再次查看内核
uname -r
6.13.9-1.el8.elrepo.x86_64
- 确保每个节点上 MAC 地址和 product_uuid 的唯一性
只有虚拟机中才会执行下面的操作,如果是物理机,要联系运维去修改
# 查看MAC地址
ip link show ens33 | grep link/ether
link/ether 00:0c:29:bf:de:56 brd ff:ff:ff:ff:ff:ff
# 如果MAC地址冲突了,通过下面的方式修改
# 设置固定 MAC 地址
nmcli con modify "ens33" ethernet.cloned-mac-address 00:0c:29:bf:de:57
# 重启网络连接
nmcli con down "ens33" && sudo nmcli con up "ens33"
# 验证更改
ip link show ens33 | grep link/ether
link/ether 00:0c:29:bf:de:57 brd ff:ff:ff:ff:ff:ff permaddr 00:0c:29:bf:de:56
# 查看product_uuid
cat /sys/class/dmi/id/product_uuid
# 如果product_uuid冲突了,可以通过重新创建虚拟机解决
- 检查网络适配器
如果你有一个以上的网络适配器,同时你的 Kubernetes 组件通过默认路由不可达,我们建议你预先添加 IP 路由规则, 这样 Kubernetes 集群就可以通过对应的适配器完成连接
- 检查所需端口
启用这些必要的端口后才能使 Kubernetes 的各组件相互通信。有很多端口需要开放,这里使用白名单
# 添加白名单
firewall-cmd --permanent --zone=trusted --add-source=192.168.91.0/24
# 重启防火墙
firewall-cmd --reload
- 关闭交换分区
# 关闭SWAP分区
swapoff -a
sed -i 's&/dev/mapper/rl-swap&#/dev/mapper/rl-swap&' /etc/fstab
安装容器运行时
在上面准备的所有节点中操作
- 启用 IPv4 数据包转发
# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
EOF
# 应用 sysctl 参数而不重新启动
sysctl --system
- containerd安装
wget https://github.com/containerd/containerd/releases/download/v2.0.4/containerd-2.0.4-linux-amd64.tar.gz
tar Cxzvf /usr/local containerd-2.0.4-linux-amd64.tar.gz
# 通过systemd启动
mkdir -p /usr/local/lib/systemd/system
wget -O /usr/local/lib/systemd/system/containerd.service \
https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
systemctl reset-failed
systemctl daemon-reload
systemctl enable --now containerd
rm -f containerd-2.0.4-linux-amd64.tar.gz
修改配置
mkdir /etc/containerd
containerd config default > /etc/containerd/config.toml
# 修改配置文件 /etc/containerd/config.toml
# 注册表目录配置
[plugins.'io.containerd.cri.v1.images'.registry]
config_path = '/etc/containerd/certs.d'
# cgroup配置,使用systemd
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
SystemdCgroup = true
# 注册表配置
mkdir /etc/containerd/certs.d
# 可以通过 https://github.com/DaoCloud/public-image-mirror 找到镜像加速的地址
cd /etc/containerd/certs.d
mkdir docker.elastic.co docker.io gcr.io ghcr.io k8s.gcr.io registry.k8s.io mcr.microsoft.com nvcr.io quay.io
cat > docker.io/hosts.toml << EOF
server = "https://docker.io"
[host."https://zwyx2n3v.mirror.aliyuncs.com"]
capabilities = ["pull", "resolve"]
[host."https://docker.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://dockerproxy.com"]
capabilities = ["pull", "resolve"]
[host."https://docker.mirrors.ustc.edu.cn"]
capabilities = ["pull", "resolve"]
[host."https://docker.nju.edu.cn"]
capabilities = ["pull", "resolve"]
[host."https://registry.docker-cn.com"]
capabilities = ["pull", "resolve"]
[host."https://do.nark.eu.org"]
capabilities = ["pull", "resolve"]
[host."https://dc.j8.work"]
capabilities = ["pull", "resolve"]
[host."https://registry-1.docker.io"]
capabilities = ["pull", "resolve"]
EOF
cat > docker.elastic.co/hosts.toml << EOF
server = "https://docker.elastic.co"
[host."https://elastic.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://docker.elastic.co"]
capabilities = ["pull", "resolve"]
EOF
cat > gcr.io/hosts.toml << EOF
server = "https://gcr.io"
[host."https://gcr.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://gcr.io"]
capabilities = ["pull", "resolve"]
EOF
cat > ghcr.io/hosts.toml << EOF
server = "https://ghcr.io"
[host."https://ghcr.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://ghcr.io"]
capabilities = ["pull", "resolve"]
EOF
cat > k8s.gcr.io/hosts.toml << EOF
server = "https://k8s.gcr.io"
[host."https://k8s-gcr.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://k8s.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://registry.k8s.io"]
capabilities = ["pull", "resolve"]
[host."https://k8s.gcr.io"]
capabilities = ["pull", "resolve"]
EOF
cat > registry.k8s.io/hosts.toml << EOF
server = "https://registry.k8s.io"
[host."https://k8s.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://registry.k8s.io"]
capabilities = ["pull", "resolve"]
EOF
cat > mcr.microsoft.com/hosts.toml << EOF
server = "https://mcr.microsoft.com"
[host."https://mcr.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://mcr.microsoft.com"]
capabilities = ["pull", "resolve"]
EOF
cat > nvcr.io/hosts.toml << EOF
server = "https://nvcr.io"
[host."https://nvcr.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://nvcr.io"]
capabilities = ["pull", "resolve"]
EOF
cat > quay.io/hosts.toml << EOF
server = "https://quay.io"
[host."https://quay.m.daocloud.io"]
capabilities = ["pull", "resolve"]
[host."https://quay.io"]
capabilities = ["pull", "resolve"]
EOF
# 重启containerd
systemctl restart containerd
# 查看沙箱镜像
cat /etc/containerd/config.toml | grep sandbox
sandbox = 'registry.k8s.io/pause:3.10'
sandboxer = 'podsandbox'
# 拉取沙箱镜像
ctr image pull registry.k8s.io/pause:3.10
# 这里拉取镜像依然很慢,替换镜像
sed -i 's|registry.k8s.io/pause:3.10|k8s.m.daocloud.io/pause:3.10|' /etc/containerd/config.toml
# 重启containerd
systemctl restart containerd
# 拉取镜像
ctr image pull k8s.m.daocloud.io/pause:3.10
- runc安装
wget https://github.com/opencontainers/runc/releases/download/v1.2.6/runc.amd64
install -m 755 runc.amd64 /usr/local/sbin/runc
rm -f runc.amd64
- CNI插件安装
wget https://github.com/containernetworking/plugins/releases/download/v1.6.2/cni-plugins-linux-amd64-v1.6.2.tgz
mkdir -p /opt/cni/bin
tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.6.2.tgz
rm -f cni-plugins-linux-amd64-v1.6.2.tgz
k8s安装
在上面准备的所有节点中操作
# 将 SELinux 设置为 permissive 模式(相当于将其禁用)
setenforce 0
sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# 添加 Kubernetes 的 yum 仓库,在仓库定义中的 exclude 参数确保了与 Kubernetes 相关的软件包在运行 yum update 时不会升级
cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.32/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF
# 安装
dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
systemctl enable --now kubelet
# 其他软件安装,消除kubeadm初始化时候的警告
dnf install -y iproute-tc
集群初始化
k8s01节点
# 查看镜像 k8s.m.daocloud.io/coredns:v1.11.3这个镜像是错误的,应该是k8s.m.daocloud.io/coredns/coredns:v1.11.3
kubeadm config images list --image-repository=k8s.m.daocloud.io
k8s.m.daocloud.io/kube-apiserver:v1.32.3
k8s.m.daocloud.io/kube-controller-manager:v1.32.3
k8s.m.daocloud.io/kube-scheduler:v1.32.3
k8s.m.daocloud.io/kube-proxy:v1.32.3
k8s.m.daocloud.io/coredns:v1.11.3
k8s.m.daocloud.io/pause:3.10
k8s.m.daocloud.io/etcd:3.5.16-0
# 生成默认的配置文件
kubeadm config print init-defaults > kubeadm-config.yaml
# 修改配置文件kubeadm-config.yaml中的dns如下所示
dns:
imageRepository: k8s.m.daocloud.io/coredns
imageTag: v1.11.3
# 修改imageRepository
sed -i 's/imageRepository: registry.k8s.io/imageRepository: k8s.m.daocloud.io/' kubeadm-config.yaml
# 修改advertiseAddress为主机ip,即--apiserver-advertise-address配置
sed -i 's/advertiseAddress: 1.2.3.4/advertiseAddress: 192.168.91.220/' kubeadm-config.yaml
# 修改pod和service的网络
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/16
# 修改控制面板host,也可以删除 name: node 这一行
nodeRegistration:
...
name: k8s01
# 检查上面的修改是否成功
# 镜像已经全部正确
kubeadm config images list --config=kubeadm-config.yaml
# 先下载镜像
kubeadm config images pull --config=kubeadm-config.yaml
# 初始化控制平面节点
kubeadm init --config=kubeadm-config.yaml
# 如果没有升级内核,可能会报下面的错误
kernel release 4.18.0-553.el8_10.x86_64 is unsupported. Recommended LTS version from the 4.x series is 4.19. Any 5.x or 6.x versions are also supported. For cgroups v2 support, the minimal version is 4.15 and the recommended version is 5.8+
# 报错
[ERROR SystemVerification]: missing required cgroups: cpuset
# 结果是 tmpfs,表示是cgroups v1
stat -fc %T /sys/fs/cgroup/
# 所有节点,修改cgroups v1为cgroups v2
# 在/etc/default/grub文件中GRUB_CMDLINE_LINUX=""的最后一个双引号前面追加 systemd.unified_cgroup_hierarchy=1
# 使配置生效
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
# 验证 cgroups v2 是否生效
# 结果是 cgroup2fs,表示是cgroups v1
stat -fc %T /sys/fs/cgroup/
# k8s01节点
# 再次执行初始化
kubeadm init --config=kubeadm-config.yaml
...
Your Kubernetes control-plane has initialized successfully!
...
kubeadm join 192.168.91.220:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:b7102be947d993e72e9d7f1645ae4d0d3e72e76d2b5ab1c370f5ce44996f17d4
# 做一些配置
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf
# 网络插件calico安装
# 资源可能会下载比较慢,可以替换成从docker.m.daocloud.io下载镜像,在所有节点操作
for img in apiserver cni csi kube-controllers node-driver-registrar node pod2daemon-flexvol typha
do
ctr -n=k8s.io image pull docker.m.daocloud.io/calico/$img:v3.29.3
ctr -n=k8s.io image tag docker.m.daocloud.io/calico/$img:v3.29.3 docker.io/calico/$img:v3.29.3
ctr -n=k8s.io image rm docker.m.daocloud.io/calico/$img:v3.29.3
done
wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.3/manifests/tigera-operator.yaml
# 替换镜像
sed -i 's/quay.io/quay.m.daocloud.io/' tigera-operator.yaml
kubectl create -f tigera-operator.yaml
# 观察资源是否创建完成
watch kubectl get pods -n tigera-operator
wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.3/manifests/custom-resources.yaml
sed -i 's/192.168/10.244/' custom-resources.yaml
kubectl create -f custom-resources.yaml
# 观察资源是否创建完成
watch kubectl get pods -n calico-system
# 查看节点状态
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s01 Ready control-plane 33m v1.32.3
worker节点加入集群
# k8s02和k8s03
kubeadm join 192.168.91.220:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:b7102be947d993e72e9d7f1645ae4d0d3e72e76d2b5ab1c370f5ce44996f17d4
# k8s01
# 查看节点状态
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s01 Ready control-plane 34m v1.32.3
k8s02 Ready <none> 69s v1.32.3
k8s03 Ready <none> 63s v1.32.3