文章目录
一、集群规划
1.1 节点规划
在物理机上创建 5 台虚拟机,节点角色和资源分配如下:
注意:资源分配
- RAM:最低 4 GB(建议至少 8 GB)
- CPU:最少 2 (建议至少 4 CPU)
- 磁盘:RKE2 的性能取决于数据库的性能。由于 RKE2 嵌入式运行 etcd 并将数据目录存储在磁盘上,我们建议尽可能使用 SSD 以确保最佳性能。
其他配置:
虚拟 IP 地址:192.168.200.10
server:https://192.168.200.20:19345
操作系统:CentOS 7.8
RKE2官方已在以下操作系统及其后续非主要版本上进行了测试和验证:
* Ubuntu 18.04, 20.04, 22.04 (amd64)
* CentOS/RHEL 7.8 (amd64)
* Rocky/RHEL 9.2 (amd64)
* SLES 15 SP3, SP4
* OpenSUSE, SLE Micro 5.1, 5.2, 5.3 (amd64)
1.2 Kubernetes 集群规划
-
版本、运行时规划
容器运行时:Containerd
集群版本:v1.25.10+rke2r1(也可以选择再高一些的版本) -
网络规划
-
端口要求
参考:https://docs.rke2.io/zh/install/requirements#%E7%BD%91%E7%BB%9C -
其他
如果你的节点安装并启用了 NetworkManager,请确保将其配置为忽略 CNI 管理的接口。如果你的节点安装并启用了 Wicked,请确保转发 sysctl 配置已启用。
systemctl is-active NetworkManager
cat <<EOF > /etc/NetworkManager/conf.d/rke2-canal.conf
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:flannel*
EOF
## cali*和flannel*的网卡不被NetworkManager管理
systemctl daemon-reload && systemctl restart NetworkManager
二、基础配置
如果不特殊说明,所有节点都需进行基础配置
2.1 配置免密
在物理机上操作
## 生成公钥私钥对(一直回车)
ssh-keygen -t rsa
## 将 id_rsa.pub 发送到服务端机器上
ssh-copy-id -i /root/.ssh/id_rsa.pub root@rke2-master-0-1
ssh-copy-id -i /root/.ssh/id_rsa.pub root@rke2-master-0-2
ssh-copy-id -i /root/.ssh/id_rsa.pub root@rke2-master-0-3
ssh-copy-id -i /root/.ssh/id_rsa.pub root@rke2-node-0-1
ssh-copy-id -i /root/.ssh/id_rsa.pub root@rke2-node-0-2
2.2 修改主机名与配置 /etc/hosts
- 修改主机名
hostnamectl set-hostname {NAME}
- 配置 /etc/hosts
cat >> /etc/hosts << EOF
192.168.200.10 rke2-vip
192.168.200.11 rke2-master-0-1
192.168.200.12 rke2-master-0-2
192.168.200.13 rke2-master-0-3
192.168.200.14 rke2-node-0-1
192.168.200.15 rke2-node-0-2
EOF
2.3 时间同步
### 安装
yum -y install ntp
### 配置 ntp
vim /etc/ntp.conf
…省略…
server 10.49.18.103 prefer # 时钟和物理机进行了同步
…省略…
## 启动服务
systemctl start ntp && systemctl enable ntp
## 检查
ntpq -p
2.4 关闭 swap
## 配置
swapoff -a
## 注释swap的自动挂载
vim /etc/fstab
...省略...
# /dev/mapper/centos-swap swap swap defaults 0 0
...省略...
## 验证是否生效(swap为0)
free
2.5 防火墙配置
如果是测试环境,可以将防火墙给关闭
参考:
https://docs.rke2.io/zh/install/requirements#%E7%BD%91%E7%BB%9C
https://ranchermanager.docs.rancher.com/zh/getting-started/installation-and-upgrade/installation-requirements/port-requirements#rke2-%E4%B8%8A-rancher-server-%E8%8A%82%E7%82%B9%E7%9A%84%E7%AB%AF%E5%8F%A3#
2.6 关闭 SELinux
## 关闭 selinux
## 修改配置文件,下次重启时生效
sed -ri 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
## 临时关闭selinux
setenforce 0
## 验证
getenforce
2.7 DNS 设置
根据实际需求进行配置
cat /etc/resolv.conf
search ****
nameserver ****
2.8 配置 repo 仓库
根据实际需求进行配置
## 安装基础软件
yum -y install net-tools telnet vim lsof wget lrzsz bind-utils traceroute
2.9 配置主机网桥过滤功能
2.9.1 添加网桥过滤
在kubernetes集群中添加网桥过滤,为了实现内核的过滤
# 添加网桥过滤及地址转发
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0
EOF
# 加载 br_netfilter 模块
[root@worker01 ~]# modprobe br_netfilter
# 检查是否加载
[root@worker01 ~]# lsmod | grep br_netfilter
br_netfilter 22256 0
bridge 151336 1 br_netfilter
# 加载网桥过滤配置文件
[root@worker01 ~]# sysctl -p /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0
2.9.2 开启IPVS
由于Kubernets在使用Service的过程中需要用到iptables或者是ipvs,ipvs整体上比iptables的转发效率要高,因此这里我们直接部署ipvs
- 安装 ipset 及 ipvsadm
yum -y install ipset ipvsadm
- 添加需要加载的模块
# ipvs作为kube-proxy的转发机制,开启ipvs模块支持
cat > /etc/ipvs.modules << EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack
EOF
- 授权、运行、检查是否加载
chmod +x /etc/ipvs.modules && bash /etc/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
2.10、配置 haproxy 与 keepalived
- haproxy+keepalived 部署在 K8s 集群的三台 master 节点上
- vip 设置为:192.168.200.10
yum -y install haproxy keepalived
1) 配置 Haproxy
所有 master 主机配置相同,在执行 systemctl start rke2-server 时,haproxy.cfg 后端 api-server 地址可以先只添加当前 master 主机 的 IP,其他先注释,随着 master 节点加入到集群,逐渐去掉注释
cd /etc/haproxy/
mv haproxy.cfg haproxy.cfg.bak
vim /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
# log 127.0.0.1 local2
log /var/log/haproxy.log local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
#---------------------------------------------------------------------
# kubernetes apiserver frontend which proxys to the backends
#---------------------------------------------------------------------
frontend rke2-apiserver
mode tcp
bind *:19345 # 监听端口
option tcplog
default_backend rke2-apiserver
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend rke2-apiserver
mode tcp # 模式 TCP
option tcplog
option tcp-check
balance roundrobin # 采用轮询的负载算法
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
server rke2-master-0-1 192.168.200.11:9345 check
server rke2-master-0-2 192.168.200.12:9345 check
server rke2-master-0-3 192.168.200.13:9345 check
#---------------------------------------------------------------------
# collection haproxy statistics message
#---------------------------------------------------------------------
listen stats
bind *:1080
stats auth admin:awesomePassword
stats refresh 5s
stats realm HAProxy\ Statistics
stats uri /admin?stats
2)keepalived 配置
keepalived.conf 文件
**-**state {MASTER|BACKUP}# 主节点值为 MASTER,备节点值为 BACKUP,这里配置 rke2-master-0-1 为 MASTER,rke2-master-0-2、rke2-master-0-3 为 BACKUP
- 设置 priority 主节点的优先级大于备节点,如: 主节点设为 100,备份节点 50
cd /etc/keepalived
mv keepalived.conf keepalived.conf.bak
vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
script_user root
enable_script_security
}
vrrp_script chk_apiserver {
script "/etc/keepalived/check_apiserver.sh" # 定义脚本路径和名称
interval 5 # 每 5 秒执行一次检测,注意设置间隔太小会有问题
weight -15 # 权重变化
fall 2 # 检测连续2次失败才算失败
rise 1 # 检测1次成功就算成功
}
vrrp_instance VI_1 {
state MASTER # backup节点设为BACKUP, <看情况调整>
interface eth0 # 服务器网卡接口
virtual_router_id 51 # 这个值只要在 keepalived 集群中保持一致即可,默认值是 51
priority 100 #如:master设为 100,备份服务 50,比备份服务器上高就行了,如:master设为 100,备份服务 50
advert_int 1
authentication {
auth_type PASS
auth_pass K8SHA_KA_AUTH #这个值只要在keepalived集群中保持一致即可
}
virtual_ipaddress {
192.168.200.10 # VIP 地址,<看情况调整>
}
track_script {
chk_apiserver
}
}
脚本:check_apiserver.sh
#!/bin/bash
err=0
for k in $(seq 1 3)
do
check_code=$(pgrep haproxy)
if [[ $check_code == "" ]]; then
err=$(expr $err + 1)
sleep 1
continue
else
err=0
break
fi
done
if [[ $err != "0" ]]; then
echo "systemctl stop keepalived"
/usr/bin/systemctl stop keepalived
exit 1
else
exit 0
fi
3)启动服务
systemctl enable haproxy
systemctl restart haproxy
systemctl enable keepalived
systemctl restart keepalived
- 检查
ip add
服务启动后,通过关闭 haproxy 服务,测试 vip 是否在 10s 间隔内完成漂移,进而判断服务配置是否正常
三、RKE2 部署
通过离线的方式部署
v1.25.10+rke2r1 release:https://github.com/rancher/rke2/releases/tag/v1.25.10%2Brke2r1
下载rke2、rke2-images 、sha256sum 存档及安装脚本到一个目录中
## 到 release 页面下载相应的安装包 rke2-images.linux-amd64.tar.zst、rke2.linux-amd64.tar.gz、sha256sum-amd64.txt
## 下载安装脚本:curl -sfL https://rancher-mirror.rancher.cn/rke2/install.sh
## 创建目录在集群每台主机
mkdir /root/rke2-artifacts
## 上传安装包到 /root/rke2-artifacts/ 目录
cd /root/rke2-artifacts/ && ls -al
total 952908
drwxr-xr-x. 2 root root 121 Aug 31 15:59 .
dr-xr-x---. 12 root root 4096 Aug 31 20:22 ..
-rwxr-xr-x. 1 root root 24726 Aug 31 15:59 install.sh
-rw-r--r--. 1 root root 949438964 Aug 31 15:38 rke2-images.linux-amd64.tar.zst
-rw-r--r--. 1 root root 26299677 Aug 31 15:38 rke2.linux-amd64.tar.gz
-rw-r--r--. 1 root root 3626 Aug 31 15:40 sha256sum-amd64.txt
3.1 启动第一个 master 节点
## 配置代理
cat > /etc/sysconfig/rke2-server <<EOF
CONTAINERD_HTTP_PROXY=http://10.49.18.103:3128
CONTAINERD_HTTPS_PROXY=http://10.49.18.103:3128
CONTAINERD_NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.rke2.local
EOF
## 安装 rke2
cd /root/rke2-artifacts/
INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="server" INSTALL_RKE2_VERSION=v1.25.10+rke2r1 INSTALL_RKE2_ARTIFACT_PATH=/root/rke2-artifacts sh install.sh
INSTALL_RKE2_MIRROR:帮助中国用户加速安装
INSTALL_RKE2_VERSION:指定部署的 RKE2 版本
INSTALL_RKE2_ARTIFACT_PATH:安装时所需的 artifact 路径
RKE2 默认使用的是 /etc/rancher/rke2/config.yaml 进行启动的,这个配置文件需要我们手动创建
mkdir -p /etc/rancher/rke2 /var/lib/rancher/rke2/db/snapshots
cat >> /etc/rancher/rke2/config.yaml << EOF
# server: "https://192.168.200.20:19345" # 等三台都起来后把这个配置取消注释,重启服务
write-kubeconfig: "/root/.kube/config"
write-kubeconfig-mode: "0644"
cluster-domain: "rke2.local"
data-dir: "/var/lib/rancher/rke2"
## 自定义一个 token 标识
token: "RKE2@Cluster"
## tls-san 填写LB的统一入口ip地址或域名
tls-san:
- "192.168.200.10"
## 指定使用国内镜像
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"
## 为集群中每个节点设置不同 node-name 参数,与当前主机名保持一致
node-name: "rke2-master-0-1"
## 打上污点,不让用户工作负载调度到该节点上
node-taint:
- "CriticalAddonsOnly=true:NoExecute"
#### 网络
## CNI 插件的名称,例如:calino、flannel 和 canal
cni: "canal"
cluster-cidr: "10.244.0.0/16"
service-cidr: "10.96.0.0/16"
service-node-port-range: "50000-60000"
#### 数据库
etcd-snapshot-schedule-cron: "0 */12 * * *"
## 快照文件个数,删除旧的保存新的
etcd-snapshot-retention: "6"
etcd-snapshot-dir: "${data-dir}/db/snapshots" # 目录需要手动创建
kube-proxy-arg: # 不指定的话,默认是 iptables 模式
- "proxy-mode=ipvs"
disable: # rke2 会默认安装一些 charts,可以取消安装
- "rke2-metrics-server"
EOF
参考:
- Disabling Server Charts:https://docs.rke2.io/advanced/#disabling-server-charts
- server 配置参考:https://docs.rke2.io/zh/reference/server_config
启动 rke2-server 服务
## 设置开机启动
systemctl enable rke2-server
## 启动过程耗时会比较久:5-10 分钟,
## 启动失败时,可以借助 "rke2 server --config config.yaml --debug" 查看细节
systemctl start rke2-server
echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> /etc/profile && source /etc/profile
## 检验节点及服务是否正常
kubectl get node
3.2 将其他 master 节点加入集群
## 配置代理
cat > /etc/sysconfig/rke2-server <<EOF
CONTAINERD_HTTP_PROXY=http://10.49.18.103:3128
CONTAINERD_HTTPS_PROXY=http://10.49.18.103:3128
CONTAINERD_NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.rke2.local
EOF
## 安装 rke2
cd /root/rke2-artifacts/
INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="server" INSTALL_RKE2_VERSION=v1.25.10+rke2r1 INSTALL_RKE2_ARTIFACT_PATH=/root/rke2-artifacts sh install.sh
/etc/rancher/rke2/config.yaml 的配置中需要添加 server 参数和修改node-name 参数,其它的保持不变即可。
mkdir -p /etc/rancher/rke2 /var/lib/rancher/rke2/db/snapshots
cat >> /etc/rancher/rke2/config.yaml << EOF
server: "https://192.168.200.20:19345"
write-kubeconfig: "/root/.kube/config"
write-kubeconfig-mode: "0644"
cluster-domain: "rke2.local"
data-dir: "/var/lib/rancher/rke2"
## 自定义一个 token 标识
token: "RKE2@Cluster"
## tls-san 填写LB的统一入口ip地址或域名
tls-san:
- "192.168.200.10"
## 指定使用国内镜像
system-default-registry: "registry.cn-hangzhou.aliyuncs.com"
## 为集群中每个节点设置不同 node-name 参数,与当前主机名保持一致
node-name: "rke2-master-0-{2,3}"
## 打上污点,不让用户工作负载调度到该节点上
node-taint:
- "CriticalAddonsOnly=true:NoExecute"
#### 网络
## CNI 插件的名称,例如:calino、flannel 和 canal
cni: "canal"
cluster-cidr: "10.244.0.0/16"
service-cidr: "10.96.0.0/16"
service-node-port-range: "50000-60000"
#### 数据库
etcd-snapshot-schedule-cron: "0 */12 * * *"
## 快照文件个数,删除旧的保存新的
etcd-snapshot-retention: "6"
etcd-snapshot-dir: "${data-dir}/db/snapshots" # 目录需要手动创建
kube-proxy-arg: # 不指定的话,默认是 iptables 模式
- "proxy-mode=ipvs"
disable: # rke2 会默认安装一些 charts,可以取消安装
- "rke2-metrics-server"
EOF
- 启动服务
## 设置开机启动
systemctl enable rke2-server
## 启动过程耗时会比较久:5-10 分钟,
## 启动失败时,可以借助 "rke2 server --config config.yaml --debug" 查看细节
systemctl start rke2-server
echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> /etc/profile && source /etc/profile
## 检验节点及服务是否正常
kubectl get node
3.3 将 node 节点加入集群
- 配置代理
cat > /etc/sysconfig/rke2-agent <<EOF
CONTAINERD_HTTP_PROXY=http://10.49.18.103:3128
CONTAINERD_HTTPS_PROXY=http://10.49.18.103:3128
CONTAINERD_NO_PROXY=127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.rke2.local
EOF
- 安装 rke2
cd /root/rke2-artifacts
INSTALL_RKE2_MIRROR=cn INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_VERSION=v1.25.10+rke2r1 INSTALL_RKE2_ARTIFACT_PATH=/root/rke2-artifacts sh install.sh
- 创建配置文件
mkdir -p /etc/rancher/rke2
cat >> /etc/rancher/config.yaml < EOF
server: https://192.168.200.20:19345
token: "HPC@RKE2@Cluster"
node-name: rke2-node-0-{1,2}
kube-proxy-arg:
- "proxy-mode=ipvs"
EOF
- 启动服务
systemctl enable rke2-agent.service
systemctl start rke2-agent.service
echo "export PATH=$PATH:/var/lib/rancher/rke2/bin" >> /etc/profile && source /etc/profile
## 检验节点及服务是否正常
kubectl get node
四、问题
4.1 删除集群节点
## 登录到要删除的节点,执行以下命令
rke2-killall.sh
rke2-uninstall.sh
## 执行删除操作
kubectl delete node {NODE-NAME}
4.2 报错
Aug 30 17:27:57 rke2-node-0-1.cngb.sz.hpc rke2[1541997]: time="2023-08-30T17:27:57+08:00" level=info msg="Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of
删除对应的secret
kubectl get secret -n kube-system {NODE-NAME}.node-password.rke2