二进制部署kubernetes高可用集群

大洋百度

已于 2023-02-08 19:47:43 修改

阅读量1.2k

点赞数

文章标签： kubernetes linux 运维

于 2022-11-13 14:36:32 首次发布

本文链接：https://blog.csdn.net/weixin_46476452/article/details/127787743

版权

需要master,node,etcd节点各三台；harbor服务器、deploy部署节点、负载均衡主机各一台

,操作系统是ubuntu 20.04.3，每台主机分配2核2G资源

节点名称	IP地址	主机名
master1	172.31.7.101	master1
master2	172.31.7.102	master2
master3	172.31.7.103	master3
etcd1	172.31.7.106	etcd1
etcd2	172.31.7.107	etcd2
etcd3	172.31.7.108	etcd3
node1	172.31.7.111	node1
node2	172.31.7.112	node2
node3	172.31.7.113	node3
harbor服务器	172.31.7.104	harbor
deploy部署节点	172.31.7.110部署服务器是独立在k8s集群之外的，用于管理集群	k8s-deploy
haproxy 负载均衡主机	172.31.7.109	haproxy02

注意事项：
主机名不能重复、关闭iptable、selinux、防火墙,保持各节点时钟同步
k8s在1.23.x版本后默认使用containerd做为容器运行时，因此集群中节点如果有docker要将其卸载，不要docker和containerd同时存在

harbor服务部署可参照 https://blog.csdn.net/weixin_46476452/article/details/127732870
haproxy载均衡部署可参 https://blog.csdn.net/weixin_46476452/article/details/127783634

一、集群外围节点准备

1、配置高可用负载均衡

部署haproxy、keepaliverd请参照博客 ：
https://blog.csdn.net/weixin_46476452/article/details/127783634


在前面那个博客部署的基础上，下面再新增一个代理，用172.31.7.188:6443代理master集群6443端口
1、在配置文件最后面加入代理主机和端口
root@haproxy02:~# vim /etc/haproxy/haproxy.cfg

listen k8s-6443
  bind 172.31.7.188:6443
  mode tcp
  server 172.31.7.101 172.31.7.101:6443 check inter 3s fall 3 rise 3
  server 172.31.7.102 172.31.7.102:6443 check inter 3s fall 3 rise 3
  server 172.31.7.103 172.31.7.103:6443 check inter 3s fall 3 rise 3

2、服务重启，并设置自启动
root@haproxy02:~# systemctl restart haproxy.service  | systemctl enable haproxy.service

3、查看端口是否监听
root@haproxy02:~# ss -tnl  | grep 6443
LISTEN  0        490         172.31.7.188:6443           0.0.0.0:*

2、harbor服务器准备

harbor服务部署可参照 https://blog.csdn.net/weixin_46476452/article/details/127732870

创建两个harbor仓库，后面使用

magedu仓库作为以后的业务镜像

baseimages仓库作为基础镜像

3、官方部署文档参考

下面是官方文档参考，可以进去看看

进入官网https://github.com/easzlab/

第一步：找kubeasz点击进入

第二部进入到该页面有安装指南和使用指南可以自己查看，而后在该页面点击我标红地方的版本进入，选择一个自己需要的版本

这里我选择kubeasz 3.3.1版本

可以看看官网的部署文档介绍，进入下面网址后点击安装指南->规划集群和配置介绍

https://github.com/easzlab/kubeasz

好了，我现在开始部署集群了

二、部署kubeasz 3.3.1集群

多节点高可用集群安装可以使用2种方式

1.按照本文步骤先规划准备，预先配置节点信息后，直接安装多节点高可用集群，一次到位
2.先部署单节点集群 AllinOne部署，然后再通过节点添加扩容成高可用集群

我这里选择第二种方式，先添加部分节点，而后通过扩容方式把其它节点也加入集群

1、部署节点环境准备

1、声明环境变量
root@k8s-deploy:~# export release=3.3.1

2、下载脚本,${release}变量会调上面定义的变量值版本，也可以直接写3.3.1
root@k8s-deploy:~# wget https://github.com/easzlab/kubeasz/releases/download/${release}/ezdown

3、添加执行权限
root@k8s-deploy:~# chmod +x ezdown

4、下载ansible，后面初始化集群的时候需要
root@k8s-deploy:~# apt install git ansible

5、下载kubeasz代码、二进制、默认容器镜像（更多关于ezdown的参数，运行./ezdown 查看）
root@k8s-deploy:# ./ezdown -D

部署服务器下载这些二进制和包，不是通过从官网下载的，而是通过镜像下载的，因此
部署服务器会用docker来下载镜像，然后临时启动容器，再从容器中把某些包cp到宿主机上，
因此该文件会检测本机是否有docker;如果有就直接使用，没有会根据配置文件中定义的
docker版本自动下载一个docker，有兴趣的可以看看ezdown这个文件

下载好后会有很多镜像

上述脚本运行成功后，所有文件（kubeasz代码、二进制、离线镜像）均已整理好放入目录/etc/kubeasz
root@k8s-deploy:/etc/kubeasz# cd /etc/kubeasz/
root@k8s-deploy:/etc/kubeasz# ll
total 136
drwxrwxr-x 12 root root  4096 Nov 10 08:21 ./
drwxr-xr-x 98 root root  4096 Nov 10 08:22 ../
-rw-rw-r--  1 root root 20304 Jul  3 12:37 ansible.cfg
drwxr-xr-x  3 root root  4096 Nov 10 08:21 bin/
drwxrwxr-x  8 root root  4096 Jul  3 12:51 docs/
drwxr-xr-x  2 root root  4096 Nov 10 08:29 down/
drwxrwxr-x  2 root root  4096 Jul  3 12:51 example/
-rwxrwxr-x  1 root root 25012 Jul  3 12:37 ezctl*
-rwxrwxr-x  1 root root 25266 Jul  3 12:37 ezdown*
drwxrwxr-x  3 root root  4096 Jul  3 12:51 .github/
-rw-rw-r--  1 root root   301 Jul  3 12:37 .gitignore
drwxrwxr-x 10 root root  4096 Jul  3 12:51 manifests/
drwxrwxr-x  2 root root  4096 Jul  3 12:51 pics/
drwxrwxr-x  2 root root  4096 Jul  3 12:51 playbooks/
-rw-rw-r--  1 root root  5058 Jul  3 12:37 README.md
drwxrwxr-x 22 root root  4096 Jul  3 12:51 roles/
drwxrwxr-x  2 root root  4096 Jul  3 12:51 tools/

其中有个ezctl客户端命令，是管理K8S集群的客户端，是shell脚本写的

2、创建集群

1、创建一个集群命名为k8s-cluster1
root@k8s-deploy:/etc/kubeasz# ./ezctl new k8s-cluster1
2022-11-10 12:46:35 DEBUG generate custom cluster files in /etc/kubeasz/clusters/k8s-cluster1
2022-11-10 12:46:35 DEBUG set versions
2022-11-10 12:46:35 DEBUG cluster k8s-cluster1: files successfully created.
2022-11-10 12:46:35 INFO next steps 1: to config '/etc/kubeasz/clusters/k8s-cluster1/hosts'
2022-11-10 12:46:35 INFO next steps 2: to config '/etc/kubeasz/clusters/k8s-cluster1/config.yml'

根据提示配置'/etc/kubeasz/clusters/k8s-cluster1/hosts' 和 '/etc/kubeasz/clusters/k8s-cluster1/config.yml'：根据前面节点规划修改hosts文件和
其他集群层面的主要配置选项；其他集群组件等配置项可以在config.yml 文件中修改。


同样再创建一个集群k8s-cluster2
root@k8s-deploy:/etc/kubeasz# ./ezctl new k8s-cluster2

下面部署k8s-cluster2集群

3、修改hosts文件

root@k8s-deploy:/etc/kubeasz# cd /etc/kubeasz/clusters/k8s-cluster2/
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# ll
total 20
drwxr-xr-x 2 root root 4096 Nov 10 12:53 ./
drwxr-xr-x 4 root root 4096 Nov 10 12:50 ../
-rw-r--r-- 1 root root 6311 Nov 10 12:50 config.yml
-rw-r--r-- 1 root root 1744 Nov 10 12:50 hosts
下面介绍下hosts文件配置参数，并修改部分参数
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# cat hosts


# 'etcd' cluster should have odd member(s) (1,3,5,...)
[etcd]        #定义etcd节点
172.31.7.106
172.31.7.107
172.31.7.108

# master node(s)   #定义master节点
[kube_master]     
172.31.7.101
172.31.7.102

# work node(s)   #定义node节点
[kube_node]
172.31.7.111
172.31.7.112

# [optional] harbor server, a private docker registry
# 'NEW_INSTALL': 'true' to install a harbor server; 'false' to integrate with existed one
[harbor]       #定义harbor服务器，我自己安装，忽略此配置
#192.168.1.8 NEW_INSTALL=false

# [optional] loadbalance for accessing k8s from outside
[ex_lb]
#192.168.1.6 LB_ROLE=backup EX_APISERVER_VIP=192.168.1.250 EX_APISERVER_PORT=8443
#192.168.1.7 LB_ROLE=master EX_APISERVER_VIP=192.168.1.250 EX_APISERVER_PORT=8443

# [optional] ntp server for the cluster
[chrony]
#192.168.1.1

[all:vars]
# --------- Main Variables ---------------
# Secure port for apiservers
SECURE_PORT="6443"     #api server的端口

# Cluster container-runtime supported: docker, containerd
# if k8s version >= 1.24, docker is not supported
CONTAINER_RUNTIME="containerd"   #定义使用的运行时containerd

# Network plugins supported: calico, flannel, kube-router, cilium, kube-ovn
CLUSTER_NETWORK="calico"        #定义了网络组件是calico

# Service proxy mode of kube-proxy: 'iptables' or 'ipvs'
PROXY_MODE="ipvs"            #定义了service类型是ipvs

# K8S Service CIDR, not overlap with node(host) networking
SERVICE_CIDR="10.100.0.0/16"       #定义了service网段

# Cluster CIDR (Pod CIDR), not overlap with node(host) networking
CLUSTER_CIDR="10.200.0.0/16"       #定义了pod网段

# NodePort Range
NODE_PORT_RANGE="30000-62767"       #定义了node_port端口的使用范围，可以把范围调大点

# Cluster DNS Domain
CLUSTER_DNS_DOMAIN="cluster.local"       #定义了k8s集群的域名后缀，使用默认值即可

# -------- Additional Variables (don't change the default value right now) ---
# Binaries Directory
bin_dir="/usr/local/bin"            #定义了二进制同步到的路径

# Deploy Directory (kubeasz workspace)
base_dir="/etc/kubeasz"            #定义了部署目录，使用默认路径即可

# Directory for a specific cluster
cluster_dir="{{ base_dir }}/clusters/k8s-cluster2"    #定义了集群文件的路径默认

# CA and other components cert/key Directory    #定义了证书存放路径
ca_dir="/etc/kubernetes/ssl"

以上内容主要修改下内容 ：
1、自定义etcd,master,node主机的ip
2、自定义service网段、pod网段
3、调整node_port端口范围，默认的是30000-32767其实也够了，但调大点也可以
4、修改二进制同步路径为/usr/bin，/bin或/usr/local/bin都可以，只要能系统能全局执行即可，我是放在/usr/local/bin

4、修改pause镜像

下面暂时先介绍下该配置文件config.yml一些关键的参数信息，后面会进行修改

CA_EXPIRY: "876000h"            #ca文件的有效期，单位小时
CERT_EXPIRY: "438000h"          #cert文件的有效期，单位小时

CLUSTER_NAME: "cluster1"        #集群名称

ETCD_DATA_DIR: "/var/lib/etcd"    #Etcd的数据路径

SANDBOX_IMAGE: "easzlab.io.local:5000/easzlab/pause:3.7"     #pod的初始化基础容器镜像

# k8s 集群 master 节点证书配置，可以添加多个ip和域名（比如增加公网ip和域名）
MASTER_CERT_HOSTS:
  - "10.1.1.1"
  - "k8s.easzlab.io

MAX_PODS: 110     # node节点最大pod 数

CALICO_IPV4POOL_IPIP: "Always"    #是否开启IPIP，开启后就是叠加网络模型，在宿主机的网络里封装容器的网络，会有一点性能损耗，但性能也是很高了，如果想要更高的性能可以把Always改为off，但是off会限制node节点必须在同一个子网内，不能跨子网通信，生产环境中为了后期有扩容的冗余性使用Always就行了

CALICO_NETWORKING_BACKEND: "brid"        #设置calico网络，后端模型可以使用: 使用默认的brid即可，但是公有云是不支持使用brid的，需要使用vxlan

dns_install: "yes"        #coredns 自动安装

上面配置参数中需要注意的是 SANDBOX_IMAGE: "easzlab.io.local:5000/easzlab/pause:3.7"

修改pause镜像（可选，如果节点可以连接外网下载镜像也可以不修改）


（下面这一步是可选项，做完后可以从本地harbor下载镜像，不做的话就从互联网下载）
为了保证pause:3.7镜像的安全性和快速性，我把这个镜像上传到本地的harbor服务器
（我这里是演示生产环境而做的，因为生成环境不一定能上外网，根据自己情况选择）

本地是有这个镜像的
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# docker images | grep pause
easzlab/pause                                        3.7       221177c6082a   8 months ago    711kB
easzlab.io.local:5000/easzlab/pause                  3.7       221177c6082a   8 months ago    711kB

首先要在部署服务器上配置harbor认证，才能登录到harbor服务器，最后上传镜像
1、创建证书目录
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2#  mkdir /etc/docker/certs.d/harbor.magedu.net -p

2、把harbor服务器端的证书文件scp到部署服务器
root@harbor:/apps/harbor# scp /apps/harbor/certs/magedu.net.crt  172.31.7.110:/etc/docker/certs.d/harbor.magedu.net

3、登录harbor服务器
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# docker login harbor.magedu.net
Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

4、镜像重新打标签
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# docker tag easzlab/pause:3.7 harbor.magedu.net/baseimages/pause:3.7

5、上传镜像至harbor
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# docker push harbor.magedu.net/baseimages/pause:3.7

查看镜像已经上传成功

5、修改config.yaml配置文件

修改配置文件

1、可以把证书过期时间设置更长一点，默认时间其实也够了，因此为可选项
CA_EXPIRY: "876000h"
CERT_EXPIRY: "438000h"

2、把pause的镜像换成刚刚上传的本地harbor服务器的镜像
# [containerd]基础容器镜像
SANDBOX_IMAGE: "harbor.magedu.net/baseimages/pause:3.7"

镜像换了之后，其它节点必须配置上hosts解析，因为不配做解析其它节点是不知道harbor.magedu.net这个域名的服务器地址是多少的，就无法下载镜像
将三个master节点、三个node节点服务器都添加harbor的域名解析
# echo "172.31.7.104 harbor.magedu.net" >> /etc/hosts


3、添加master节点的负载均衡服务器的vip和域名
# k8s 集群 master 节点证书配置，可以添加多个ip和域名（比如增加公网ip和域名）
MASTER_CERT_HOSTS:
  - "172.31.7.188"
  - "k8s.easzlab.io"

172.31.7.188是我前面haproxy服务器配置的反代k8s master集群的VIP

（生产环境种对k8s管理是通过负载均衡器的，不会直接连接单台master节点，而是用一台负载均衡器作为反代到三台master节点，在该文件中配置把ip和域名加进去，通过这个ip或域名访问时候才会被信任，如果不添加是无法访问的，会显示这个证书签发的时候没有包含这个地址或域名）


4、修改node节点最大pod 数
MAX_PODS: 500


5、dns_install: "no"    #coredns默认是yes自动安装，我这边改为手动安装

6、ENABLE_LOCAL_DNS_CACHE: false   #是否开启dns缓存，默认是开启，我这边把它关闭

7、metricsserver_install: "no"   # metric server默认自动安装，修改为no,选择手动安装

8、dashboard_install: "no"      # dashboard默认是自动安装,修改为no,选择手动安装


以上就完成config.yaml文件的修改

6、配置ssh免密登录，并设置python软连接

需要配置部署节点能够ssh免密登录所有节点，并且设置python软连接（因为是ansible写的，需要python环境，需要调用python）

1、生成密钥
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:qu6yFUuaetAwUVtclD/fz40UtcEAqByepI8YovevfNs root@k8s-deploy
The key's randomart image is:
+---[RSA 3072]----+
| ....oo.  ....o  |
|.  o. .o .     o.|
| ..   =.+      .o|
|o. . . =o     .. |
|.+. = o So .   . |
|o o= + o  . . .  |
| oo.o .      + o |
| .oo....      + .|
|...=*+o.E        |
+----[SHA256]-----+
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# ls /root/.ssh/
authorized_keys  id_rsa  id_rsa.pub

2、把生成的公钥拷贝至其它所有被管理的节点，完成免密钥信任（因为要用ansible远程到所有节点推送包和二进制文件等）

如果不嫌麻烦可以一台一台手动拷贝
# ssh-copy-id 172.31.7.101
# ssh-copy-id 172.31.7.102
...................

也可以写个脚本，用sshpass工具批量拷贝
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# apt install sshpass

写一个脚本批量执行拷贝公钥，并创建软链接
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# cat key.sh
#!/bin/bash
#目标主机列表
IP="
172.31.7.101
172.31.7.102
172.31.7.103
172.31.7.111
172.31.7.112
172.31.7.113
172.31.7.106
172.31.7.107
172.31.7.108
172.31.7.110
"
for node in ${IP}; do
        sshpass -p 节点密码 ssh-copy-id ${node} -o StrictHostKeyChecking=no
          echo "${node}"密钥copy完成

        ssh ${node} ln -sv /usr/bin/python3  /usr/bin/python
          echo "${node} /usr/bin/python3 软连接创建完成"
done

执行脚本
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# bash key.sh

验证下是否可以免密登录了，一定要可以ssh免密登录到每个节点，这样才能保障ansible正常使用
root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# ssh 172.31.7.101 hostname
k8s-master1

三、初始化集群

1、初始化集群

root@k8s-deploy:/etc/kubeasz/clusters/k8s-cluster2# cd ..
root@k8s-deploy:/etc/kubeasz/clusters# cd ..
查看下安装步骤
root@k8s-deploy:/etc/kubeasz# ./ezctl setup --help
Usage: ezctl setup <cluster> <step>
available steps:
    01  prepare            to prepare CA/certs & kubeconfig & other system settings
    02  etcd               to setup the etcd cluster
    03  container-runtime  to setup the container runtime(docker or containerd)
    04  kube-master        to setup the master nodes
    05  kube-node          to setup the worker nodes
    06  network            to setup the network plugin
    07  cluster-addon      to setup other useful plugins
    90  all                to run 01~07 all at once
    10  ex-lb              to install external loadbalance for accessing k8s from outside
    11  harbor             to install a new harbor server or to integrate with an existed one



初始化集群
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster2 01
ansible-playbook -i clusters/k8s-cluster2/hosts -e @clusters/k8s-cluster2/config.yml  

其中172.31.7.102初始化ignored=2有报错字段有显示apt does not have a stable CLI interface.  
Use with caution in scripts. \n\nE: dpkg was interrupted, you must manually run 'dpkg --configure -a' to correct the problem.
于是我在172.31.7.102节点尝试执行了dpkg --configure -a

最后再部署节点再初始化下，显示全部成功
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster2 01
172.31.7.101               : ok=27   changed=9    unreachable=0    failed=0    skipped=113  rescued=0    ignored=0
172.31.7.102               : ok=27   changed=10   unreachable=0    failed=0    skipped=113  rescued=0    ignored=0
172.31.7.106               : ok=24   changed=6    unreachable=0    failed=0    skipped=116  rescued=0    ignored=0
172.31.7.107               : ok=24   changed=20   unreachable=0    failed=0    skipped=116  rescued=0    ignored=0                            
172.31.7.108               : ok=24   changed=20   unreachable=0    failed=0    skipped=116  rescued=0    ignored=0
172.31.7.111               : ok=26   changed=8    unreachable=0    failed=0    skipped=114  rescued=0    ignored=0
172.31.7.112               : ok=26   changed=8    unreachable=0    failed=0    skipped=114  rescued=0    ignored=0
localhost                  : ok=31   changed=21   unreachable=0    failed=0    skipped=13   rescued=0    ignored=0

初始化过程是使用ansible指定主机文件clusters/k8s-cluster1/hosts，指定k8s配置
文件clusters/k8s-cluster1/config.yml；最后指定执行步骤playbooks/01.prepare.yml

可以打开playbooks/01.prepare.yml文件看以下
vim playbooks/01.prepare.yml
# [optional] to synchronize system time of nodes with 'chrony'
- hosts:        #hosts定义了需要被初始化的主机
  - kube_master
  - kube_node
  - etcd
  - ex_lb
  - chrony
  roles:
  - { role: os-harden, when: "OS_HARDEN|bool" }
  - { role: chrony, when: "groups['chrony']|length > 0" }

# to create CA, kubeconfig, kube-proxy.kubeconfig etc.
- hosts: localhost
  roles:
  - deploy

# prepare tasks for all nodes
- hosts:
  - kube_master
  - kube_node
  - etcd
  roles:
  - prepare

其中#hosts定义了需要被初始化的主机，可以选择把ex_lb负载均衡器和chrony时钟服务器删掉
，因为它会尝试连接这两个服务器，连接不上会报错，但其实也不影响初始化，所以看个人选择删不删行

2、部署etcd

root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster2 02
ansible-playbook -i clusters/k8s-cluster2/hosts -e @clusters/k8s-cluster2/config.yml  playbooks/02.etcd.yml
2022-11-11 03:31:16 INFO cluster:k8s-cluster2 setup step:02 begins in 5s, press any key to abort:
PLAY RECAP ***********************************************************************************************************************************
172.31.7.106               : ok=10   changed=8    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
172.31.7.107               : ok=10   changed=9    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
172.31.7.108               : ok=10   changed=8    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

2.1、测试etcd节点状态

root@etcd1:~# export NODE_IPS="172.31.7.106 172.31.7.107 172.31.7.108"
root@etcd1:~# for n in ${NODE_IPS};do etcdctl --endpoints=https://${n}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem endpoint health;done

root@etcd1:~# for n in ${NODE_IPS};do etcdctl --endpoints=https://${n}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem --write-out=table endpoint status;done

3、部署运行时

因为前面在部署服务器的config.yml文件中把pause镜像改成了本地harbor镜像了，
因此部署容器运行时会把该配置推送到各个节点，所有节点的/etc/containerd/config.toml
文件中定义的pause镜像都修改为从harbor下载了，但是每个节点的containerd运行时暂时
都下载不了镜像，所以需要做些配置，让containerd可以下载镜像

在部署服务器的config.toml.j2配置文件修改，而后部署即可批量推送至各节点
root@k8s-deploy:/etc/kubeasz# vim roles/containerd/templates/config.toml.j2
156行{% endif %}     结尾后面添加如下几行
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.magedu.net"]
          endpoint = ["https://harbor.magedu.net"]
        [plugins."io.containerd.grpc.v1.cri".registry.configs."harbor.magedu.net".tls]
          insecure_skip_verify = true
        [plugins."io.containerd.grpc.v1.cri".registry.configs."harbor.magedu.net".auth]
          username = "admin"
          password = "harbor登录密码"


配置解析：1、[plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.magedu.net"]
           plugins插件 
           io.containerd.grpc.v1.cri表示是cri的配置
           registry.mirrors 镜像仓库配置
           harbor.magedu.net  镜像仓库的域名

       2、 endpoint = ["https://harbor.magedu.net"]  服务器的地址
       3、[plugins."io.containerd.grpc.v1.cri".registry.configs."harbor.magedu.net".tls]
          registry.configs表示镜像仓库的配置，配置是tls
       4、insecure_skip_verify = true    表示跳过证书验证，因为证书是我们自己签发的

       5、[plugins."io.containerd.grpc.v1.cri".registry.configs."harbor.magedu.net".auth]
          username = "admin"
          password = "harbor登录密码"    
          可以把用户密码加上

部署运行时
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster2 03

PLAY RECAP ****************************************************************************************************************************************************
172.31.7.101               : ok=11   changed=10   unreachable=0    failed=0    skipped=18   rescued=0    ignored=0
172.31.7.102               : ok=11   changed=10   unreachable=0    failed=0    skipped=15   rescued=0    ignored=0
172.31.7.111               : ok=11   changed=10   unreachable=0    failed=0    skipped=15   rescued=0    ignored=0
172.31.7.112               : ok=11   changed=10   unreachable=0    failed=0    skipped=15   rescued=0    ignored=0


初始化后所有节点的containerd都准备好了，并且containerd.config.toml配置文件中的
[containerd]基础容器镜像都应该是部署节点修改后的harbor服务器镜像
SANDBOX_IMAGE: "harbor.magedu.net/baseimages/pause:3.7"


如果在#./ezctl setup k8s-cluster2 03 部署运行时前没有修改config.toml.j2文件，
那么每个节点都需要手动修改/etc/containerd/config.toml文件

 vim /etc/containerd/config.toml
在154行endpoint = ["https://quay.mirrors.ustc.edu.cn"]下面添加
154           endpoint = ["https://quay.mirrors.ustc.edu.cn"]
155
156         [plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.magedu.net"]
157           endpoint = ["https://harbor.magedu.net"]
158         [plugins."io.containerd.grpc.v1.cri".registry.configs."harbor.magedu.net".tls]
159           insecure_skip_verify = true
160         [plugins."io.containerd.grpc.v1.cri".registry.configs."harbor.magedu.net".auth]
161           username = "admin"
162           password = "harbor登录密码"


手动测试下能否下载harbor的镜像
root@k8s-master1:~# crictl pull  harbor.magedu.net/baseimages/pause:3.7
Image is up to date for sha256:221177c6082a88ea4f6240ab2450d540955ac6f4d5454f0e15751b653ebda165

4、部署master节点

root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster2 04

PLAY RECAP ********************************************************************************************************************************
172.31.7.101               : ok=55   changed=50   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
172.31.7.102               : ok=53   changed=46   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

04 执行的是/etc/kubeasz/roles/kube-master/tasks/main.yml文件的任务

可以看见master节点信息了
root@k8s-deploy:/etc/kubeasz# kubectl get node
NAME           STATUS                     ROLES    AGE     VERSION
172.31.7.101   Ready,SchedulingDisabled   master   9m23s   v1.24.2
172.31.7.102   Ready,SchedulingDisabled   master   9m23s   v1.24.2

Ready表示状态是就绪了
SchedulingDisabled表示调度被关闭，因为master是管理角色，进行容器生命周期管理，不负责运行业务容器

root@k8s-deploy:/etc/kubeasz# kubectl get pod -A
No resources found
显示没有资源，但是已经可以连接到api server查看资源了

5、部署node节点

root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster2 05

PLAY RECAP ********************************************************************************************************************************
172.31.7.111               : ok=35   changed=33   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
172.31.7.112               : ok=35   changed=33   unreachable=0    failed=0    skipped=0    rescued=0    ignored=0


05执行的就是/etc/kubeaszroles/kube-node/tasks/main.yml文件中的任务

查看节点
root@k8s-deploy:/etc/kubeasz# kubectl get node
NAME           STATUS                     ROLES    AGE   VERSION
172.31.7.101   Ready,SchedulingDisabled   master   19m   v1.24.2
172.31.7.102   Ready,SchedulingDisabled   master   19m   v1.24.2
172.31.7.111   Ready                      node     68s   v1.24.2
172.31.7.112   Ready                      node     68s   v1.24.2

root@k8s-deploy:/etc/kubeasz# ./ezctl --help
Usage: ezctl COMMAND [args]
-------------------------------------------------------------------------------------
Cluster setups:
    list                             to list all of the managed clusters
    checkout    <cluster>            to switch default kubeconfig of the cluster
    new         <cluster>            to start a new k8s deploy with name 'cluster'
    setup       <cluster>  <step>    to setup a cluster, also supporting a step-by-step way
    start       <cluster>            to start all of the k8s services stopped by 'ezctl stop'
    stop        <cluster>            to stop all of the k8s services temporarily
    upgrade     <cluster>            to upgrade the k8s cluster
    destroy     <cluster>            to destroy the k8s cluster
    backup      <cluster>            to backup the cluster state (etcd snapshot)
    restore     <cluster>            to restore the cluster state from backups
    start-aio                        to quickly setup an all-in-one cluster with 'default' settings

如果集群部署故障，也找不到原因，可以使用destroy参数来销毁整个集群，重新部署

6、部署网络

先别着急执行，在执行 ./ezctl setup k8s-cluster2 06之前先需要先声明下
06执行的是/etc/kubeasz/roles/calico/templates/calico-v3.19.yaml.j2文件中的任务
root@k8s-deploy:/etc/kubeasz# ll roles/calico/templates/
total 64
drwxrwxr-x 2 root root  4096 Jul  3 12:51 ./
drwxrwxr-x 5 root root  4096 Jul  3 12:51 ../
-rw-rw-r-- 1 root root   180 Jul  3 12:37 bgp-default.yaml.j2
-rw-rw-r-- 1 root root   162 Jul  3 12:37 bgp-rr.yaml.j2
-rw-rw-r-- 1 root root   215 Jul  3 12:37 calico-csr.json.j2
-rw-rw-r-- 1 root root   263 Jul  3 12:37 calicoctl.cfg.j2
-rw-rw-r-- 1 root root 17400 Jul  3 12:37 calico-v3.15.yaml.j2
-rw-rw-r-- 1 root root 19009 Jul  3 12:37 calico-v3.19.yaml.j2
我们环境中用的是calico-v3.19.yaml.j2的版本
之所以是3.19版本是因为在config.yml文件中定义了
root@k8s-deploy:/etc/kubeasz# cat  clusters/k8s-cluster2/config.yml | grep "calico_ver"
calico_ver: "v3.19.4"


（下面是修改镜像，也可以不修改，那就用配置文件中默认的镜像，只要机器能连接外网下载就可以，我这里
演示生产环境，因为可能无法连接外网所以从本地harbor拉取镜像）
该文件中需要修改四个镜像，我把这里需要的四个镜像上传到本地harbor服务器，各节点从harbor拉取镜像
root@k8s-deploy:/etc/kubeasz# vim  roles/calico/templates/calico-v3.19.yaml.j2
在配置文件中把原来的四个镜像替换为下面四个镜像
第一个：harbor.magedu.net/baseimages/calico-cni:v3.19.4
第二个：harbor.magedu.net/baseimages/calico-pod2daemon-flexvol:v3.19.4
第三个：harbor.magedu.net/baseimages/calico-node:v3.19.4
第四个：harbor.magedu.net/baseimages/calico-kube-controllers:v3.19.4


1、上传calico/node:v3.19.4镜像
部署服务器与harbor如果没有部署认证是无法上传的，我前面已经部署过了
详情可参照博客： https://blog.csdn.net/weixin_46476452/article/details/127732870

root@k8s-deploy:/etc/kubeasz# docker images | grep calico/node
calico/node                                          v3.19.4   172a034f7297   9 months ago    155MB
root@k8s-deploy:/etc/kubeasz# docker tag  calico/node:v3.19.4 harbor.magedu.net/baseimages/calico-node:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker push harbor.magedu.net/baseimages/calico-node:v3.19.4


2、上传calico/pod2daemon-flexvol:v3.19.4镜像
root@k8s-deploy:/etc/kubeasz# docker images | grep calico/pod2
calico/pod2daemon-flexvol                            v3.19.4   054ddbbe5975   9 months ago    20MB
root@k8s-deploy:/etc/kubeasz# docker tag calico/pod2daemon-flexvol:v3.19.4 harbor.magedu.net/baseimages/calico-pod2daemon-flexvol:v3.19.4
root@k8s-deploy:/etc/kubeasz#
root@k8s-deploy:/etc/kubeasz# docker push harbor.magedu.net/baseimages/calico-pod2daemon-flexvol:v3.19.4

3、上传calico/cni:v3.19.4镜像
root@k8s-deploy:/etc/kubeasz# docker tag calico/cni:v3.19.4  harbor.magedu.net/baseimages/calico-cni:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker push harbor.magedu.net/baseimages/calico-cni:v3.19.4

4、上传calico-kube-controllers:v3.19.4镜像
root@k8s-deploy:/etc/kubeasz# docker images | grep kube-controllers
calico/kube-controllers                                  v3.19.4   0db60d880d2d   9 months ago    60.6MB
root@k8s-deploy:/etc/kubeasz# docker tag calico/kube-controllers:v3.19.4 harbor.magedu.net/baseimages/calico-kube-controllers:v3.19.4
root@k8s-deploy:/etc/kubeasz# docker push harbor.magedu.net/baseimages/calico-kube-controllers:v3.19.4

5、部署网络
root@k8s-deploy:/etc/kubeasz# ./ezctl setup k8s-cluster2 06
PLAY RECAP ********************************************************************************************************************************
172.31.7.101               : ok=13   changed=11   unreachable=0    failed=0    skipped=36   rescued=0    ignored=0
172.31.7.102               : ok=9    changed=8    unreachable=0    failed=0    skipped=22   rescued=0    ignored=0
172.31.7.111               : ok=9    changed=7    unreachable=0    failed=0    skipped=22   rescued=0    ignored=0
172.31.7.112               : ok=9    changed=7    unreachable=0    failed=0    skipped=22   rescued=0    ignored=0


查看启动了caclico容器
root@k8s-deploy:/etc/kubeasz# kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS        AGE
kube-system   calico-kube-controllers-69f77c86d8-2bzzt   1/1     Running   0               4m7s
kube-system   calico-node-2ff75                          1/1     Running   0               4m7s
kube-system   calico-node-7hl8l                          1/1     Running   0               4m7s
kube-system   calico-node-bgdtx                          1/1     Running   0               4m7s
kube-system   calico-node-ck4mt                          1/1     Running   2 (3m18s ago)   4m7s

四、测试集群网络可用性

1、先创建一个namespace
root@k8s-deploy:/etc/kubeasz# kubectl create ns myserver

2、创建几个容器，验证能否跨主机通信
root@k8s-deploy:/etc/kubeasz# kubectl run net-test1 --image=centos:7.9.2009 sleep 10000000 
root@k8s-deploy:/etc/kubeasz# kubectl run net-test2 --image=centos:7.9.2009 sleep 10000000 root@k8s-deploy:/etc/kubeasz# kubectl run net-test3 --image=centos:7.9.2009 sleep 10000000 


查看pod状态信息
root@k8s-deploy:/etc/kubeasz# kubectl get pod -o wide -n myserver
NAME        READY   STATUS    RESTARTS   AGE    IP               NODE           NOMINATED NODE   READINESS GATES
net-test1   1/1     Running   0          105s   10.200.104.3     172.31.7.112   <none>           <none>
net-test2   1/1     Running   0          79s    10.200.104.4     172.31.7.112   <none>           <none>
net-test3   1/1     Running   0          19s    10.200.166.130   172.31.7.111   <none>           <none>


进入容器，测试下网络
root@k8s-deploy:/etc/kubeasz# kubectl exec -it  net-test1 bash -n myserver
[root@net-test1 /]# ping 223.6.6.6
PING 223.6.6.6 (223.6.6.6) 56(84) bytes of data.
64 bytes from 223.6.6.6: icmp_seq=1 ttl=127 time=5.54 ms
64 bytes from 223.6.6.6: icmp_seq=2 ttl=127 time=5.20 ms
和外网通的，表示容器可以通过宿主机出去了

测试与内部联通性
ping以下net-test2容器的ip，也是可达的
[root@net-test1 /]# ping 10.200.104.4
PING 10.200.104.4 (10.200.104.4) 56(84) bytes of data.
64 bytes from 10.200.104.4: icmp_seq=1 ttl=63 time=0.078 ms
64 bytes from 10.200.104.4: icmp_seq=2 ttl=63 time=0.060 ms

但是ping域名是不同的，原因还没有装DNS，但是不影响，后面部署DNS
[root@net-test1 /]# ping www.baidu.com

五、集群扩容

添加master和node节点

添加集群仍然是在部署节点执行些任务，把节点加入到K8S集群中，会涉及到一些配置文件的变更

每个node节点上会启动一个负载均衡器，它是ningx实现的，用于实现对api server的负载均衡
root@node1:~# cat /etc/kube-lb/conf/kube-lb.conf
user root;
worker_processes 1;

error_log  /etc/kube-lb/logs/error.log warn;

events {
    worker_connections  3000;
}

stream {
    upstream backend {
        server 172.31.7.101:6443    max_fails=2 fail_timeout=3s;
        server 172.31.7.102:6443    max_fails=2 fail_timeout=3s;
    }

    server {
        listen 127.0.0.1:6443;
        proxy_connect_timeout 1s;
        proxy_pass backend;
    }
}
这个配置文件是动态生成的，里面定义了api server的地址，当每个node节点请求api server时候并不是
直接请求的，而是先请求127.0.0.1:6443，而后127.0.0.1:6443再转给backend，而backend定义了一个
服务器组，组内定义了两个节点172.31.7.101:6443和172.31.7.102:644
如果master地址变更了，那么ansible会把配置文件重新推送


可以测试下当master变动时，这个文件有没有即使更新
ll -d看下时间，或者md5sum校验下标识
root@node1:~# ll -d /etc/kube-lb/conf/kube-lb.conf
-rw-r--r-- 1 root root 403 Nov 12 06:41 /etc/kube-lb/conf/kube-lb.conf
root@node1:~# md5sum /etc/kube-lb/conf/kube-lb.conf
2ccb346362d4731654058ad828463f0e  /etc/kube-lb/conf/kube-lb.conf


1、添加master3进入集群
root@k8s-deploy:/etc/kubeasz# ./ezctl add-master k8s-cluster2 172.31.7.103
查看集群节点信息
root@k8s-deploy:/etc/kubeasz# kubectl get node
NAME           STATUS                     ROLES    AGE    VERSION
172.31.7.101   Ready,SchedulingDisabled   master   145m   v1.24.2
172.31.7.102   Ready,SchedulingDisabled   master   145m   v1.24.2
172.31.7.103   Ready,SchedulingDisabled   master   10m    v1.24.2
172.31.7.111   Ready                      node     142m   v1.24.2
172.31.7.112   Ready                      node     143m   v1.24.2

2、添加node3进入集群
root@k8s-deploy:/etc/kubeasz# ./ezctl add-node k8s-cluster2 172.31.7.113
PLAY RECAP ***********************************************************************************************************************************
172.31.7.113               : ok=81   changed=75   unreachable=0    failed=0    skipped=169  rescued=0    ignored=0

3、查看集群节点信息
root@k8s-deploy:/etc/kubeasz# kubectl get node
NAME           STATUS                        ROLES    AGE     VERSION
172.31.7.101   Ready,SchedulingDisabled      master   140m    v1.24.2
172.31.7.102   Ready,SchedulingDisabled      master   140m    v1.24.2
172.31.7.103   Ready,SchedulingDisabled      master   6m13s   v1.24.2
172.31.7.111   Ready                         node     138m    v1.24.2
172.31.7.112   Ready                         node     138m    v1.24.2
172.31.7.113   Ready                         node     20m     v1.24.2

此时集群的hosts文件已经自动跟新了两个节点
root@k8s-deploy:/etc/kubeasz# head -18 clusters/k8s-cluster2/hosts
# master node(s)
[kube_master]
172.31.7.103
172.31.7.101
172.31.7.102

# work node(s)
[kube_node]
172.31.7.113
172.31.7.111
172.31.7.112

kube-lb.conf文件也已经跟新了标识也变了
root@node1:~#  md5sum /etc/kube-lb/conf/kube-lb.conf
da4e88477e0451c5309c7a5971e95bed  /etc/kube-lb/conf/kube-lb.conf

添加VIP代理api server，保障api server高可用

把负载均衡器地址换成VIP 172.31.7.188:6443，它就是K8S集群的管理节点，所有的管理请求都发送给VIP
vim /root/.kube/config
server: https://172.31.7.188:6443

此后所有资源请求都通过负载均衡器转发至后端api server,这样即使api server挂掉一个也不会影响，负载均衡器会把那个master从集群中拿掉，转发给可用的master

以上就是完整的部署和集群扩容过程，内容比较多，希望对你有所帮助，谢谢~