k8s 从 1.19.10 升级至 1.25.12

hchen-gogogo

已于 2023-07-28 14:21:38 修改

阅读量281

点赞数

分类专栏： kubernetes 文章标签： kubernetes 容器云原生

于 2023-07-26 18:52:56 首次发布

本文链接：https://blog.csdn.net/chenhongloves/article/details/131945588

版权

kubernetes 专栏收录该内容

26 篇文章 0 订阅

订阅专栏

参考

Kubernetes 发布周期
 发布补丁
 版本偏差
 升级集群
 升级 kubeadm 集群
 Container Runtimes
Migrate Docker Engine nodes from dockershim to cri-dockerd
Well-Known Labels, Annotations and Taints

说明

现使用的 k8s 版本是 1.19.10（cri 用的 docker，docker 和 kubelet 的 cgroup 驱动用的 cgroup，系统 ubuntu:20.04, 内核 5.4.0），计划升级至版本 1.25.12，现相关的版本信息:

# kubectl get nodes -owide
NAME    STATUS   ROLES    AGE   VERSION    INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
node1   Ready    master   11d   v1.19.10   192.168.111.10   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node2   Ready    master   11d   v1.19.10   192.168.111.11   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node3   Ready    master   11d   v1.19.10   192.168.111.12   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node4   Ready    worker   11d   v1.19.10   192.168.111.21   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node5   Ready    worker   11d   v1.19.10   192.168.111.22   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node6   Ready    worker   11d   v1.19.10   192.168.111.23   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7

# kubectl describe nodes node1 node2 node3 | grep Taint
Taints:             node-role.kubernetes.io/master:NoSchedule
Taints:             node-role.kubernetes.io/master:NoSchedule
Taints:             node-role.kubernetes.io/master:NoSchedule

# kubectl get nodes -l node-role.kubernetes.io/master
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    master   11d   v1.19.10
node2   Ready    master   11d   v1.19.10
node3   Ready    master   11d   v1.19.10

# docker info | grep cgroup
 Cgroup Driver: cgroupfs
# cat /var/lib/kubelet/config.yaml  | grep cgroup
cgroupDriver: cgroupfs

查看升级计划

kubeadm 不支持跨版本升级，故而只能一个版本一个版本的升级了, 可先用 kubeadm upgrade plan 看看

1.19.10 -> 1.20.12

没有跨越版本，能够正确的计划

# ./kubeadm-1.20.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.19.10
[upgrade/versions] kubeadm version: v1.20.12
W0726 14:27:54.999693   69390 version.go:102] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable.txt": Get "https://storage.googleapis.com/kubernetes-release/release/stable.txt": dial tcp: lookup storage.googleapis.com on 192.168.111.10:53: server misbehaving
W0726 14:27:54.999891   69390 version.go:103] falling back to the local client version: v1.20.12
[upgrade/versions] Latest stable version: v1.20.12
[upgrade/versions] Latest stable version: v1.20.12
W0726 14:27:55.793421   69390 version.go:102] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.19.txt": Get "https://storage.googleapis.com/kubernetes-release/release/stable-1.19.txt": dial tcp: lookup storage.googleapis.com on 192.168.111.10:53: server misbehaving
W0726 14:27:55.793566   69390 version.go:103] falling back to the local client version: v1.20.12
[upgrade/versions] Latest version in the v1.19 series: v1.20.12
[upgrade/versions] Latest version in the v1.19 series: v1.20.12

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT        AVAILABLE
kubelet     6 x v1.19.10   v1.20.12

Upgrade to the latest version in the v1.19 series:

COMPONENT                 CURRENT    AVAILABLE
kube-apiserver            v1.19.10   v1.20.12
kube-controller-manager   v1.19.10   v1.20.12
kube-scheduler            v1.19.10   v1.20.12
kube-proxy                v1.19.10   v1.20.12
CoreDNS                   1.7.0      1.7.0
etcd                      3.4.13-0   3.4.13-0

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.20.12

_____________________________________________________________________


The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.

API GROUP                 CURRENT VERSION   PREFERRED VERSION   MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
kubelet.config.k8s.io     v1beta1           v1beta1             no
_____________________________________________________________________

1.19.10 -> 1.21.12

跨越了 1 个版本，原因已经说的很清楚了

# ./kubeadm-1.21.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade/config] FATAL: this version of kubeadm only supports deploying clusters with the control plane version >= 1.20.0. Current version: v1.19.10
To see the stack trace of this error execute with --v=5 or higher

1.19.10 -> 1.24.12

1.24 移除了 dockershim，这还需要处理 cri 的事情。。。

# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0726 14:31:18.115581   73932 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
[upgrade/config] FATAL: this version of kubeadm only supports deploying clusters with the control plane version >= 1.23.0. Current version: v1.19.10
To see the stack trace of this error execute with --v=5 or higher

准备

因各种原因，采用的本地私有仓库，所以在升级之前需要准备一些必要的文件，比如镜像，比如 kubectl，kubelet，kubeadm, 可在 Download Kubernetes 直接下载相关的二进制文件，也可以通过包管理工具下载安装。镜像可以使用 kubeadm config images list 来查看

# tree
.
├── k8s-1.20.12
│   ├── kubeadm-1.20.12
│   ├── kubectl-1.20.12
│   └── kubelet-1.20.12
├── k8s-1.21.12
│   ├── kubeadm-1.21.12
│   ├── kubectl-1.21.12
│   └── kubelet-1.21.12
├── k8s-1.22.12
│   ├── kubeadm-1.22.12
│   ├── kubectl-1.22.12
│   └── kubelet-1.22.12
├── k8s-1.23.12
│   ├── kubeadm-1.23.12
│   ├── kubectl-1.23.12
│   └── kubelet-1.23.12
├── k8s-1.24.12
│   ├── kubeadm-1.24.12
│   ├── kubectl-1.24.12
│   └── kubelet-1.24.12
└── k8s-1.25.12
    ├── kubeadm-1.25.12
    ├── kubectl-1.25.12
    └── kubelet-1.25.12

6 directories, 18 files

# ./kubeadm-1.20.12 config images list --kubernetes-version v1.20.12
k8s.gcr.io/kube-apiserver:v1.20.12
k8s.gcr.io/kube-controller-manager:v1.20.12
k8s.gcr.io/kube-scheduler:v1.20.12
k8s.gcr.io/kube-proxy:v1.20.12
k8s.gcr.io/pause:3.2
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns:1.7.0

升级

已经到这里了，就假设需要的镜像，执行文件这些已经准备好了；私有仓库地址通过 kubectl -n kube-system edit configmaps kubeadm-config 的 imageRepository: 来设置; 升级分为 control-plane 和 worker, 对于 control-plan 来说，第一台执行 kubeadm upgrade apply k8sversion,其他执行kubeadm upgrade node; 对于 worker 来说，直接执行 kubeadm upgrade node。这里需要注意 kubeadm 的版本不要弄混了

1.19.10 -> 1.20.12

第一台 control-plane 升级

会更新该节点上的 kube-apiserver,kube-controller-manager,kube-scheduler,etcd 和这些组件的相关证书以及 coredns,kube-proxy, 同时给 control-plane 节点添加 node-role.kubernetes.io/control-plane 的 label, 将 etcd 的数据和 /etc/kubernetes/manifests/ 备份至 /etc/kubernetes/tmp/, 还会更新 kubelet 的配置

### 升级
# ./kubeadm-1.20.12 upgrade apply v1.20.12
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.20.12"
[upgrade/versions] Cluster version: v1.19.10
[upgrade/versions] kubeadm version: v1.20.12
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y #### z这里需要交互一下，输入 y，表示同意升级
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.20.12"...
Static pod: kube-apiserver-node1 hash: 86f9d5eb415c02995e243dab09764902
Static pod: kube-controller-manager-node1 hash: e77fd5078bafd951d87c970393d28284
Static pod: kube-scheduler-node1 hash: 62dcf2eef35b837428c13af11ba57cf5
[upgrade/etcd] Upgrading to TLS for etcd
Static pod: etcd-node1 hash: 88a10dbea90896953d5bedb7da1eccce
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.20.12". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

### control-plane 添加了 node-role.kubernetes.io/control-plane 的 label
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.19.10
node2   Ready    control-plane,master   11d   v1.19.10
node3   Ready    control-plane,master   11d   v1.19.10
node4   Ready    worker                 11d   v1.19.10
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10

### 备份相关的原始数据
# tree /etc/kubernetes/tmp/
/etc/kubernetes/tmp/
├── kubeadm-backup-etcd-2023-07-26-15-01-26
│   └── etcd
│       └── member
│           ├── snap
│           │   ├── 0000000000000005-00000000003ff0df.snap
│           │   ├── 0000000000000005-00000000004017f0.snap
│           │   ├── 0000000000000005-0000000000403f01.snap
│           │   ├── 0000000000000005-0000000000406612.snap
│           │   ├── 0000000000000005-0000000000408d23.snap
│           │   └── db
│           └── wal
│               ├── 0000000000000035-00000000003b4801.wal
│               ├── 0000000000000036-00000000003c6ac4.wal
│               ├── 0000000000000037-00000000003d8c31.wal
│               ├── 0000000000000038-00000000003eae65.wal
│               ├── 0000000000000039-00000000003fd1a2.wal
│               ├── 0.tmp
│               └── 1.tmp
└── kubeadm-backup-manifests-2023-07-26-15-01-26
    ├── etcd.yaml
    ├── kube-apiserver.yaml
    ├── kube-controller-manager.yaml
    └── kube-scheduler.yaml

6 directories, 17 files

drain 该节点，替换 kubeadm、kubectl 和 kubelet，重启 kubelet

# kubectl drain node1 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node1
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.19.10
node3   Ready    control-plane,master   11d   v1.19.10
node4   Ready    worker                 11d   v1.19.10
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10

第二台 control-plane 升级

# ./kubeadm-1.20.12 upgrade node
# kubectl drain node2 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node2
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.20.12
node3   Ready    control-plane,master   11d   v1.19.10
node4   Ready    worker                 11d   v1.19.10
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10

第三台 control-plane 升级

# ./kubeadm-1.20.12 upgrade node
# kubectl drain node3 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node3
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.20.12
node3   Ready    control-plane,master   11d   v1.20.12
node4   Ready    worker                 11d   v1.19.10
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10

升级 worker

建议一台一台的升级, worker 就比较简单了，单纯的更新了 kubelet 的配置

# ./kubeadm-1.20.12 upgrade node

[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

# kubectl drain node4 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node4
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.20.12
node3   Ready    control-plane,master   11d   v1.20.12
node4   Ready    worker                 11d   v1.20.12
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10

顺利升级至 1.20.12

# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.20.12
node3   Ready    control-plane,master   11d   v1.20.12
node4   Ready    worker                 11d   v1.20.12
node5   Ready    worker                 11d   v1.20.12
node6   Ready    worker                 11d   v1.20.12

1.20.12 -> 1.21.12

参照 1.19.10 -> 1.20.12

1.21.12 -> 1.22.12

参照 1.19.10 -> 1.20.12

1.22.12 -> 1.23.12

参照 1.19.10 -> 1.20.12

1.23.12 -> 1.24.12

1.24 移除了 dockershim 的支持，但为了最小化影响，这里还是采取 docker 作为 cri，使用 cri-docker 替代 dockershim

# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0726 16:36:49.267722  260221 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.12
[upgrade/versions] kubeadm version: v1.24.12
。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

安装 cri-docker

cri-docker 默认采用的 pause 镜像为: “registry.k8s.io/pause:3.6”, 所以在安装完之后记得修改 /lib/systemd/system/cri-docker.service, 添加 --pod-infra-container-image 参数，指定私有仓库中的 pause 镜像
install cri-dockerd

# cat /lib/systemd/system/cri-docker.service | grep cri-dockerd
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=172.30.3.150/k8s/k8s.gcr.io/pause:3.8

配置 kubelet

Migrate Docker Engine nodes from dockershim to cri-dockerd

# cat /var/lib/kubelet/kubeadm-flags.env 
KUBELET_KUBEADM_ARGS="--pod-infra-container-image=172.30.3.150/k8s/k8s.gcr.io/pause:3.8 --container-runtime-endpoint=unix:///var/run/cri-dockerd.sock"

# kubectl describe node | grep "kubeadm.alpha.kubernetes.io/cri-socket"
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock

配置完之后，不再有 Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration! 的提示了

# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.12
[upgrade/versions] kubeadm version: v1.24.12

升级集群
参照 1.19.10 -> 1.20.12
遇到的另一个问题： 1.24 弃用了 node-role.kubernetes.io/master，替换成了 node-role.kubernetes.io/control-plane , 所以需要手动添加 master 相关的 lable 和 taint，否则之前使用了该 label，taint 的应用可能你懂的。

# kubectl describe nodes node1 node2 node3 | grep Taint
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
# kubectl get nodes -l node-role.kubernetes.io/master
No resources found
# kubectl get nodes
NAME    STATUS   ROLES           AGE   VERSION
node1   Ready    control-plane   11d   v1.24.12
node2   Ready    control-plane   11d   v1.23.12
node3   Ready    control-plane   11d   v1.23.12
node4   Ready    worker          11d   v1.23.12
node5   Ready    worker          11d   v1.23.12
node6   Ready    worker          11d   v1.23.12
# kubectl label nodes node1 node2 node3 node-role.kubernetes.io/master=
node/node1 labeled
node/node2 labeled
node/node3 labeled
# kubectl taint nodes node1 node2 node3 node-role.kubernetes.io/control-plane-
node/node1 untainted
node/node2 untainted
node/node3 untainted
# kubectl taint nodes node1 node2 node3 node-role.kubernetes.io/master:NoSchedule --overwrite
node/node1 modified
node/node2 modified
node/node3 modified

1.24.12 -> 1.25.12

参照 1.19.10 -> 1.20.12

# kubectl get nodes 
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.25.12
node2   Ready    control-plane,master   11d   v1.25.12
node3   Ready    control-plane,master   11d   v1.25.12
node4   Ready    worker                 11d   v1.25.12
node5   Ready    worker                 11d   v1.25.12
node6   Ready    worker                 11d   v1.25.12

这样操作下来，猜测升级到 1.26,1.27 也是一样的套路吧