在某些情况下,如:Deployment 不稳定(可能是不断地崩溃),我们可能需要回滚(rollback)Deployment。默认情况下,kubernetes 将保存 Deployment 的所有更新(rollout)历史。您可以设定 revision history limit 来确定保存的历史版本数量。
当且仅当 Deployment 的
.spec.template
字段被修改时(例如,您修改了容器的镜像),kubernetes 将为其创建一个 Deployment revision(版本)。Deployment 的其他更新(例如:修改.spec.replicas
字段)将不会创建新的 Deployment reviesion(版本)。
为了让大家可以更好的理解Deployment的回滚机制,我们这里设计了一个小实验来模拟更新错误,从而方便大家更好的理解
模拟更新错误
- 假设你在更新Deployment忘了打一个点,将
nginx:1.9.1
写成了nginx:19.1
kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:19.1 --record=true
输出结果:
deployment.apps/nginx-deployment image updated
- 该更新将卡住,您可以执行命令
kubectl rollout status deployment.v1.apps/nginx-deployment
检查其状态,
kubectl rollout status deployment.v1.apps/nginx-deployment
输出结果:
Waiting for deployment "nginx-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
- 执行命令
kubectl get rs
您将看到两个旧的 ReplicaSet(nginx-deployment-84df99548d and nginx-deployment-8bf4959b6 )和一个新的 ReplicaSet (nginx-deployment-6549cf8856)
kubectl get rs
输出结果:
NAME DESIRED CURRENT READY AGE
nginx-deployment-6549cf8856 1 1 0 35s
nginx-deployment-84df99548d 0 0 0 21h
nginx-deployment-8bf4959b6 3 3 3 6h11m
- 执行命令
kubectl get pods
,您将看到 1 个由新 ReplicaSet 创建的 Pod 卡在抓取 image 的死循环里:
kubectl get pods
输出结果:
NAME READY STATUS RESTARTS AGE
nginx-deployment-6549cf8856-j88tr 0/1 ImagePullBackOff 0 79s
nginx-deployment-8bf4959b6-pxf77 1/1 Running 0 6h11m
nginx-deployment-8bf4959b6-w4r85 1/1 Running 0 6h11m
nginx-deployment-8bf4959b6-w8lgr 1/1 Running 0 6h11m
Deployment Controller 会自动停止有问题的更新(rollout),不会继续 scale up 新的 ReplicaSet。maxUnavailable 参数指定了最多会有几个 Pod 副本卡住,该参数的默认值是 25%。
-
执行命令
kubectl describe deployment
查看 Deployment 的详情kubectl describe deployment
输出结果:
Name: nginx-deployment
Namespace: default
CreationTimestamp: Mon, 22 May 2023 19:51:32 +0800
Labels: app=nginx
Annotations: deployment.kubernetes.io/revision: 3
kubernetes.io/change-cause: kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:19.1 --record=true
Selector: app=nginx
Replicas: 3 desired | 1 updated | 4 total | 3 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=nginx
Containers:
nginx:
Image: nginx:19.1
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True ReplicaSetUpdated
OldReplicaSets: nginx-deployment-8bf4959b6 (3/3 replicas created)
NewReplicaSet: nginx-deployment-6549cf8856 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m4s deployment-controller Scaled up replica set nginx-deployment-6549cf8856 to 1
这时一个更新错误便被我们模拟出来了。
下面我便带大家来逐步解决这个问题
检查Deployment 的更新历史
- 检查 Deployment 的历史版本
kubectl rollout history deployment.v1.apps/nginx-deployment
输出结果:
deployment.apps/nginx-deployment
REVISION CHANGE-CAUSE
1 kubectl apply --filename=nginx-deployment.yaml --record=true
2 kubectl deployment.apps/nginx-deployment set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true
3 kubectl set image deployment.v1.apps/nginx-deployment nginx=nginx:19.1 --record=true
CHANGE-CAUSE 是该 revision(版本)创建时从 Deployment 的 annotation kubernetes.io/change-cause
拷贝而来。
可以通过如下方式制定 CHANGE-CAUSE 信息:
- 为 Deployment 增加注解,
kubectl annotate deployment.v1.apps/nginx-deployment kubernetes.io/change-cause="image updated to 1.9.1"
- 执行 kubectl apply 命令时,增加
--record
选项 - 手动编辑 Deployment 的
.metadata.annotation
信息
-
查看 revision(版本)的详细信息
kubectl rollout history deployment.v1.apps/nginx-deployment --revision=2
输出内容:
deployment.apps/nginx-deployment with revision #2 Pod Template: Labels: app=nginx pod-template-hash=8bf4959b6 Annotations: kubernetes.io/change-cause: kubectl deployment.apps/nginx-deployment set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true Containers: nginx: Image: nginx:1.9.1 Port: 80/TCP Host Port: 0/TCP Environment: <none> Mounts: <none> Volumes: <none>
回滚到前一个 revision(版本)
从上面可以得知前一个版本(v2)是可用版本,那么我们就将其回滚到前一个版本(v2)
1、将当前版本回滚到前一个版本。
kubectl rollout undo deployment.v1.apps/nginx-deployment
输出结果:
deployment.apps/nginx-deployment rolled back
或者,您也可以使用 --to-revision
选项回滚到前面的某一个指定版本
kubectl rollout undo deployment.v1.apps/nginx-deployment --to-revision=2
输出结果
deployment.apps/nginx-deployment rolled back
此时,Deployment 已经被回滚到前一个稳定版本。您可以看到 Deployment Controller 为该 Deployment 产生了 DeploymentRollback event。
-
检查该回滚是否成功,Deployment 是否按预期的运行
kubectl get deployment nginx-deployment
输出结果:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-deployment 3/3 3 3 22h
-
查看 Deployment 的详情
kubectl describe deployment nginx-deployment
输出结果:
Name: nginx-deployment Namespace: default CreationTimestamp: Mon, 22 May 2023 19:51:32 +0800 Labels: app=nginx Annotations: deployment.kubernetes.io/revision: 6 kubernetes.io/change-cause: kubectl deployment.apps/nginx-deployment set image deployment.v1.apps/nginx-deployment nginx=nginx:1.9.1 --record=true Selector: app=nginx Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=nginx Containers: nginx: Image: nginx:1.9.1 Port: 80/TCP Host Port: 0/TCP Environment: <none> Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: nginx-deployment-8bf4959b6 (3/3 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 51m deployment-controller Scaled up replica set nginx-deployment-6549cf8856 to 1 Normal ScalingReplicaSet 108s deployment-controller Scaled down replica set nginx-deployment-6549cf8856 to 0 Normal ScalingReplicaSet 23s deployment-controller Scaled up replica set nginx-deployment-84df99548d to 1 Normal ScalingReplicaSet 15s deployment-controller Scaled down replica set nginx-deployment-84df99548d to 0