现使用的 k8s 版本是 1.19.10(cri 用的 docker,docker 和 kubelet 的 cgroup 驱动用的 cgroup, 系统 ubuntu:20.04, 内核 5.4.0),计划升级至版本 1.25.12,现相关的版本信息:

# kubectl get nodes -owide
node1   Ready    master   11d   v1.19.10   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node2   Ready    master   11d   v1.19.10   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node3   Ready    master   11d   v1.19.10   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node4   Ready    worker   11d   v1.19.10   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node5   Ready    worker   11d   v1.19.10   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7
node6   Ready    worker   11d   v1.19.10   <none>        Ubuntu 20.04.4 LTS   5.4.0-153-generic   docker://20.10.7

# kubectl describe nodes node1 node2 node3 | grep Taint
Taints:             node-role.kubernetes.io/master:NoSchedule
Taints:             node-role.kubernetes.io/master:NoSchedule
Taints:             node-role.kubernetes.io/master:NoSchedule

# kubectl get nodes -l node-role.kubernetes.io/master
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    master   11d   v1.19.10
node2   Ready    master   11d   v1.19.10
node3   Ready    master   11d   v1.19.10

# docker info | grep cgroup
 Cgroup Driver: cgroupfs
# cat /var/lib/kubelet/config.yaml  | grep cgroup
cgroupDriver: cgroupfs


kubeadm 不支持跨版本升级,故而只能一个版本一个版本的升级了, 可先用 kubeadm upgrade plan 看看

  • 1.19.10 -> 1.20.12


# ./kubeadm-1.20.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.19.10
[upgrade/versions] kubeadm version: v1.20.12
W0726 14:27:54.999693   69390 version.go:102] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable.txt": Get "https://storage.googleapis.com/kubernetes-release/release/stable.txt": dial tcp: lookup storage.googleapis.com on server misbehaving
W0726 14:27:54.999891   69390 version.go:103] falling back to the local client version: v1.20.12
[upgrade/versions] Latest stable version: v1.20.12
[upgrade/versions] Latest stable version: v1.20.12
W0726 14:27:55.793421   69390 version.go:102] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.19.txt": Get "https://storage.googleapis.com/kubernetes-release/release/stable-1.19.txt": dial tcp: lookup storage.googleapis.com on server misbehaving
W0726 14:27:55.793566   69390 version.go:103] falling back to the local client version: v1.20.12
[upgrade/versions] Latest version in the v1.19 series: v1.20.12
[upgrade/versions] Latest version in the v1.19 series: v1.20.12

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
kubelet     6 x v1.19.10   v1.20.12

Upgrade to the latest version in the v1.19 series:

kube-apiserver            v1.19.10   v1.20.12
kube-controller-manager   v1.19.10   v1.20.12
kube-scheduler            v1.19.10   v1.20.12
kube-proxy                v1.19.10   v1.20.12
CoreDNS                   1.7.0      1.7.0
etcd                      3.4.13-0   3.4.13-0

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.20.12


The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.

kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
kubelet.config.k8s.io     v1beta1           v1beta1             no

  • 1.19.10 -> 1.21.12

跨越了 1 个版本, 原因已经说的很清楚了

# ./kubeadm-1.21.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade/config] FATAL: this version of kubeadm only supports deploying clusters with the control plane version >= 1.20.0. Current version: v1.19.10
To see the stack trace of this error execute with --v=5 or higher
  • 1.19.10 -> 1.24.12

1.24 移除了 dockershim, 这还需要处理 cri 的事情。。。

# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0726 14:31:18.115581   73932 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
[upgrade/config] FATAL: this version of kubeadm only supports deploying clusters with the control plane version >= 1.23.0. Current version: v1.19.10
To see the stack trace of this error execute with --v=5 or higher


因各种原因,采用的本地私有仓库, 所以在升级之前需要准备一些必要的文件,比如镜像,比如 kubectl,kubelet,kubeadm, 可在 Download Kubernetes 直接下载相关的二进制文件,也可以通过包管理工具下载安装。镜像可以使用 kubeadm config images list 来查看

# tree
├── k8s-1.20.12
│   ├── kubeadm-1.20.12
│   ├── kubectl-1.20.12
│   └── kubelet-1.20.12
├── k8s-1.21.12
│   ├── kubeadm-1.21.12
│   ├── kubectl-1.21.12
│   └── kubelet-1.21.12
├── k8s-1.22.12
│   ├── kubeadm-1.22.12
│   ├── kubectl-1.22.12
│   └── kubelet-1.22.12
├── k8s-1.23.12
│   ├── kubeadm-1.23.12
│   ├── kubectl-1.23.12
│   └── kubelet-1.23.12
├── k8s-1.24.12
│   ├── kubeadm-1.24.12
│   ├── kubectl-1.24.12
│   └── kubelet-1.24.12
└── k8s-1.25.12
    ├── kubeadm-1.25.12
    ├── kubectl-1.25.12
    └── kubelet-1.25.12

6 directories, 18 files

# ./kubeadm-1.20.12 config images list --kubernetes-version v1.20.12


已经到这里了,就假设需要的镜像,执行文件这些已经准备好了;私有仓库地址通过 kubectl -n kube-system edit configmaps kubeadm-configimageRepository: 来设置; 升级分为 control-planeworker, 对于 control-plan 来说,第一台执行 kubeadm upgrade apply k8sversion,其他执行kubeadm upgrade node; 对于 worker 来说,直接执行 kubeadm upgrade node。这里需要注意 kubeadm 的版本不要弄混了

1.19.10 -> 1.20.12

  • 第一台 control-plane 升级

会更新该节点上的 kube-apiserver,kube-controller-manager,kube-scheduler,etcd 和这些组件的相关证书以及 coredns,kube-proxy, 同时给 control-plane 节点添加 node-role.kubernetes.io/control-planelabel, 将 etcd 的数据和 /etc/kubernetes/manifests/ 备份至 /etc/kubernetes/tmp/, 还会更新 kubelet 的配置

### 升级
# ./kubeadm-1.20.12 upgrade apply v1.20.12
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.20.12"
[upgrade/versions] Cluster version: v1.19.10
[upgrade/versions] kubeadm version: v1.20.12
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y #### z这里需要交互一下,输入 y,表示同意升级
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.20.12"...
Static pod: kube-apiserver-node1 hash: 86f9d5eb415c02995e243dab09764902
Static pod: kube-controller-manager-node1 hash: e77fd5078bafd951d87c970393d28284
Static pod: kube-scheduler-node1 hash: 62dcf2eef35b837428c13af11ba57cf5
[upgrade/etcd] Upgrading to TLS for etcd
Static pod: etcd-node1 hash: 88a10dbea90896953d5bedb7da1eccce
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.20.12". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

### control-plane 添加了 node-role.kubernetes.io/control-plane 的 label
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.19.10
node2   Ready    control-plane,master   11d   v1.19.10
node3   Ready    control-plane,master   11d   v1.19.10
node4   Ready    worker                 11d   v1.19.10
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10

### 备份相关的原始数据
# tree /etc/kubernetes/tmp/
├── kubeadm-backup-etcd-2023-07-26-15-01-26
│   └── etcd
│       └── member
│           ├── snap
│           │   ├── 0000000000000005-00000000003ff0df.snap
│           │   ├── 0000000000000005-00000000004017f0.snap
│           │   ├── 0000000000000005-0000000000403f01.snap
│           │   ├── 0000000000000005-0000000000406612.snap
│           │   ├── 0000000000000005-0000000000408d23.snap
│           │   └── db
│           └── wal
│               ├── 0000000000000035-00000000003b4801.wal
│               ├── 0000000000000036-00000000003c6ac4.wal
│               ├── 0000000000000037-00000000003d8c31.wal
│               ├── 0000000000000038-00000000003eae65.wal
│               ├── 0000000000000039-00000000003fd1a2.wal
│               ├── 0.tmp
│               └── 1.tmp
└── kubeadm-backup-manifests-2023-07-26-15-01-26
    ├── etcd.yaml
    ├── kube-apiserver.yaml
    ├── kube-controller-manager.yaml
    └── kube-scheduler.yaml

6 directories, 17 files

drain 该节点,替换 kubeadm、kubectl 和 kubelet, 重启 kubelet

# kubectl drain node1 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node1
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.19.10
node3   Ready    control-plane,master   11d   v1.19.10
node4   Ready    worker                 11d   v1.19.10
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10
  • 第二台 control-plane 升级
# ./kubeadm-1.20.12 upgrade node
# kubectl drain node2 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node2
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.20.12
node3   Ready    control-plane,master   11d   v1.19.10
node4   Ready    worker                 11d   v1.19.10
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10
  • 第三台 control-plane 升级
# ./kubeadm-1.20.12 upgrade node
# kubectl drain node3 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubectl-1.20.12 `which kubectl`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node3
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.20.12
node3   Ready    control-plane,master   11d   v1.20.12
node4   Ready    worker                 11d   v1.19.10
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10
  • 升级 worker

建议一台一台的升级, worker 就比较简单了,单纯的更新了 kubelet 的配置

# ./kubeadm-1.20.12 upgrade node

[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

# kubectl drain node4 --ignore-daemonsets
# mv kubeadm-1.20.12 `which kubeadm`
# mv kubelet-1.20.12 `which kubelet`
# systemctl restart kubelet
# kubectl uncordon node4
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.20.12
node3   Ready    control-plane,master   11d   v1.20.12
node4   Ready    worker                 11d   v1.20.12
node5   Ready    worker                 11d   v1.19.10
node6   Ready    worker                 11d   v1.19.10
  • 顺利升级至 1.20.12
# kubectl get nodes
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.20.12
node2   Ready    control-plane,master   11d   v1.20.12
node3   Ready    control-plane,master   11d   v1.20.12
node4   Ready    worker                 11d   v1.20.12
node5   Ready    worker                 11d   v1.20.12
node6   Ready    worker                 11d   v1.20.12

1.20.12 -> 1.21.12

参照 1.19.10 -> 1.20.12

1.21.12 -> 1.22.12

参照 1.19.10 -> 1.20.12

1.22.12 -> 1.23.12

参照 1.19.10 -> 1.20.12

1.23.12 -> 1.24.12

1.24 移除了 dockershim 的支持,但为了最小化影响,这里还是采取 docker 作为 cri,使用 cri-docker 替代 dockershim

# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0726 16:36:49.267722  260221 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.12
[upgrade/versions] kubeadm version: v1.24.12
  • 安装 cri-docker

cri-docker 默认采用的 pause 镜像为: “registry.k8s.io/pause:3.6”, 所以在安装完之后记得修改 /lib/systemd/system/cri-docker.service, 添加 --pod-infra-container-image 参数,指定私有仓库中的 pause 镜像
install cri-dockerd

# cat /lib/systemd/system/cri-docker.service | grep cri-dockerd
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=
  • 配置 kubelet

Migrate Docker Engine nodes from dockershim to cri-dockerd

# cat /var/lib/kubelet/kubeadm-flags.env 
KUBELET_KUBEADM_ARGS="--pod-infra-container-image= --container-runtime-endpoint=unix:///var/run/cri-dockerd.sock"

# kubectl describe node | grep "kubeadm.alpha.kubernetes.io/cri-socket"
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock
                    kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/cri-dockerd.sock

配置完之后,不再有 Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration! 的提示了

# ./kubeadm-1.24.12 upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.12
[upgrade/versions] kubeadm version: v1.24.12
  • 升级集群
    参照 1.19.10 -> 1.20.12
    遇到的另一个问题: 1.24 弃用了 node-role.kubernetes.io/master, 替换成了 node-role.kubernetes.io/control-plane , 所以需要手动添加 master 相关的 lable 和 taint,否则之前使用了该 label,taint 的应用可能你懂的。
# kubectl describe nodes node1 node2 node3 | grep Taint
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
# kubectl get nodes -l node-role.kubernetes.io/master
No resources found
# kubectl get nodes
node1   Ready    control-plane   11d   v1.24.12
node2   Ready    control-plane   11d   v1.23.12
node3   Ready    control-plane   11d   v1.23.12
node4   Ready    worker          11d   v1.23.12
node5   Ready    worker          11d   v1.23.12
node6   Ready    worker          11d   v1.23.12
# kubectl label nodes node1 node2 node3 node-role.kubernetes.io/master=
node/node1 labeled
node/node2 labeled
node/node3 labeled
# kubectl taint nodes node1 node2 node3 node-role.kubernetes.io/control-plane-
node/node1 untainted
node/node2 untainted
node/node3 untainted
# kubectl taint nodes node1 node2 node3 node-role.kubernetes.io/master:NoSchedule --overwrite
node/node1 modified
node/node2 modified
node/node3 modified

1.24.12 -> 1.25.12

参照 1.19.10 -> 1.20.12

# kubectl get nodes 
NAME    STATUS   ROLES                  AGE   VERSION
node1   Ready    control-plane,master   11d   v1.25.12
node2   Ready    control-plane,master   11d   v1.25.12
node3   Ready    control-plane,master   11d   v1.25.12
node4   Ready    worker                 11d   v1.25.12
node5   Ready    worker                 11d   v1.25.12
node6   Ready    worker                 11d   v1.25.12

这样操作下来,猜测升级到 1.26,1.27 也是一样的套路吧

