Deployment是怎么工作的?(上)
在之前的两篇文章中,
我们梳理了ReplicaSet的工作模式,知道了ReplicasSet为了Pods的数量始终保持预期,不断的创建和删除Pod操作。简单来说,为了能快速灵活处理每个ReplicaSet,它使用了一个队列
- 写入端通过监听ReplicaSet和Pod的变更事件,把对应的ReplicaSet控制器索引加入到队列当中
- 消费端通过从队列中取得ReplicaSet控制器索引,从缓存中取得控制器,并执行相应的操作
既然有了ReplicaSet,是否能够直接用于实例的发布呢?
试验一下,使用ReplicaSet创建2个pod,镜像版本是v1
✗ kubectl get pod
NAME READY STATUS RESTARTS AGE
rs-test-6kvbb 1/1 Running 0 58s
rs-test-qts2s 1/1 Running 0 58s
rs-test-x2h7t 1/1 Running 0 58s
之后修改ReplicaSet描述文件的镜像版本为v2,发现pod并不会更新。
✗ kubectl get pod
NAME READY STATUS RESTARTS AGE
rs-test-6kvbb 1/1 Running 0 3m32s
rs-test-qts2s 1/1 Running 0 3m32s
rs-test-x2h7t 1/1 Running 0 3m32s
因为我们知道ReplicaSet对比的只是pod的label,没有检测镜像版本。此时,删除一个实例,让ReplicaSet重新拉起,新实例的镜像版本是v2,而原先的两个实例依旧是v1。
✗ kubectl get pod
NAME READY STATUS RESTARTS AGE
rs-test-6kvbb 1/1 Running 0 4m57s
rs-test-qts2s 1/1 Running 0 4m57s
rs-test-tccxq 1/1 Running 0 74s #删除重新拉起的
所以要使用ReplicaSet更新版本,操作的流程应该为
- 替换ReplicaSet描述文件的版本,为最新版本,如v1->v2
- 根据策略逐个删除ReplicaSet所管辖的Pod实例
- 等待ReplicaSet拉起新的Pod实例
- 等待所有旧版本pod删除,新版本拉起
这种方式操作繁琐,也不够自动化,因此k8s使用Deployment来管理ReplicaSet,以便优雅的发布。
Deployment
一个 Deployment 为 Pod 和 ReplicaSet 提供声明式的更新能力。
源码阅读
方法入口
按照惯例,先找Deployment控制器的入口,在controller-manager的NewControllerInitializers()方法中找到Deployment控制器的运行方法:
func startDeploymentController(ctx ControllerContext) (http.Handler, bool, error) {
dc, err := deployment.NewDeploymentController(
ctx.InformerFactory.Apps().V1().Deployments(),
ctx.InformerFactory.Apps().V1().ReplicaSets(),
ctx.InformerFactory.Core().V1().Pods(),
ctx.ClientBuilder.ClientOrDie("deployment-controller"),
)
if err != nil {
return nil, true, fmt.Errorf("error creating Deployment controller: %v", err)
}
go dc.Run(int(ctx.ComponentConfig.DeploymentController.ConcurrentDeploymentSyncs), ctx.Stop)
return nil, true, nil
}
通过context传入了一个Deployments的informer,一个ReplicaSet的informer,和一个Pod的informer。分别通知Deployment、ReplicaSet和Pod的事件。我们看下构造方法NewDeploymentController()
,看过ReplicaSet介绍的读者会发现结构很熟悉:
- 注入的三个informer分别添加了事件处理的方法
- 定义了一个同步处理的方法
dc.syncHandler = dc.syncDeployment
- 新建了一个队列
dc.enqueueDeployment = dc.enqueue
// NewDeploymentController creates a new DeploymentController.
func NewDeploymentController(dInformer appsinformers.DeploymentInformer, rsInformer appsinformers.ReplicaSetInformer, podInformer coreinformers.PodInformer, client clientset.Interface) (*DeploymentController, error) {
eventBroadcaster := record.NewBroadcaster()
eventBroadcaster.StartStructuredLogging(0)
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: client.CoreV1().Events("")})
if client != nil && client.CoreV1().RESTClient().GetRateLimiter() != nil {
if err := ratelimiter.RegisterMetricAndTrackRateLimiterUsage("deployment_controller", client.CoreV1().RESTClient().GetRateLimiter()); err != nil {
return nil, err
}
}
dc := &DeploymentController{
client: client,
eventRecorder: eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "deployment-controller"}),
queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "deployment"),
}
dc.rsControl = controller.RealRSControl{
KubeClient: client,
Recorder: dc.eventRecorder,
}
dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: dc.addDeployment,
UpdateFunc: dc.updateDeployment,
// This will enter the sync loop and no-op, because the deployment has been deleted from the store.
DeleteFunc: dc.deleteDeployment,
})
rsInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: dc.addReplicaSet,
UpdateFunc: dc.updateReplicaSet,
DeleteFunc: dc.deleteReplicaSet,
})
podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
DeleteFunc: dc.deletePod,
})
dc.syncHandler = dc.syncDeployment
dc.enqueueDeployment = dc.enqueue
dc.dLister = dInformer.Lister()
dc.rsLister = rsInformer.Lister()
dc.podLister = podInformer.Lister()
dc.dListerSynced = dInformer.Informer().HasSynced
dc.rsListerSynced = rsInformer.Informer().HasSynced
dc.podListerSynced = podInformer.Informer().HasSynced
return dc, nil
}
由此,同样分为上下两篇来介绍Deployment,上篇讲述往队列里写入了什么,下篇讲述从队里中消费都做了什么。
事件处理
Deployment Add/Update/Delete
针对Deployment配置的增删改处理逻辑非常简单,都是收到变更事件之后直接写入队列。
func (dc *DeploymentController) addDeployment(obj interface{}) {
d := obj.(*apps.Deployment)
klog.V(4).InfoS("Adding deployment", "deployment", klog.KObj(d))
dc.enqueueDeployment(d)
}
func (dc *DeploymentController) updateDeployment(old, cur interface{}) {
oldD := old.(*apps.Deployment)
curD := cur.(*apps.Deployment)
klog.V(4).InfoS("Updating deployment", "deployment", klog.KObj(oldD))
dc.enqueueDeployment(curD)
}
func (dc *DeploymentController) deleteDeployment(obj interface{}) {
d, ok := obj.(*apps.Deployment)
if !ok {
tombstone, ok := obj.(cache.DeletedFinalStateUnknown)
if !ok {
utilruntime.HandleError(fmt.Errorf("couldn't get object from tombstone %#v", obj))
return
}
d, ok = tombstone.Obj.(*apps.Deployment)
if !ok {
utilruntime.HandleError(fmt.Errorf("tombstone contained object that is not a Deployment %#v", obj))
return
}
}
klog.V(4).InfoS("Deleting deployment", "deployment", klog.KObj(d))
dc.enqueueDeployment(d)
}
enqueueDeployment()
在上一节初始化的时候有提到,dc.enqueueDeployment = dc.enqueue
,我们来看一下它的定义和enqueue
的实现
// DeploymentController is responsible for synchronizing Deployment objects stored
// in the system with actual running replica sets and pods.
type DeploymentController struct {
// To allow injection of syncDeployment for testing.
syncHandler func(dKey string) error
// used for unit testing
enqueueDeployment func(deployment *apps.Deployment)
// Deployments that need to be synced
queue workqueue.RateLimitingInterface
}
func (dc *DeploymentController) enqueue(deployment *apps.Deployment) {
key, err := controller.KeyFunc(deployment)
if err != nil {
utilruntime.HandleError(fmt.Errorf("couldn't get key for object %#v: %v", deployment, err))
return
}
dc.queue.Add(key)
}
也没有什么额外的逻辑,关于queue的详细介绍可以看ReplicaSet是怎么工作的?(上),也是一个限速队列
ReplicaSet Add
ReplicaSet Add事件的handle函数是常规操作。有趣的是,在ReplicaSetController中,对Pod Add事件的处理,与DeploymentController中对ReplicaSet Add事件的处理非常类似
首先,使用过client-go的朋友们会知道,如果client-go重启,它会以Add事件,把当前所有的ReplicaSet都list出来一遍,那么如果有已经标记了删除的ReplicaSet(即pod.DeletionTimestamp != nil),直接走删除的逻辑就好了。
其次,通过当前Add事件出现的ReplicaSet,获取它的OwnerReference(metav1.GetControllerOf(rs)
这个操作),如果找到了这个ReplicaSet的owner——某个Deployment,则把这个Deployment加入队列
// addReplicaSet enqueues the deployment that manages a ReplicaSet when the ReplicaSet is created.
func (dc *DeploymentController) addReplicaSet(obj interface{}) {
rs := obj.(*apps.ReplicaSet)
if rs.DeletionTimestamp != nil {
// On a restart of the controller manager, it's possible for an object to
// show up in a state that is already pending deletion.
dc.deleteReplicaSet(rs)
return
}
// If it has a ControllerRef, that's all that matters.
if controllerRef := metav1.GetControllerOf(rs); controllerRef != nil {
d := dc.resolveControllerRef(rs.Namespace, controllerRef)
if d == nil {
return
}
klog.V(4).InfoS("ReplicaSet added", "replicaSet", klog.KObj(rs))
dc.enqueueDeployment(d)
return
}
// Otherwise, it's an orphan. Get a list of all matching Deployments and sync
// them to see if anyone wants to adopt it.
ds := dc.getDeploymentsForReplicaSet(rs)
if len(ds) == 0 {
return
}
klog.V(4).InfoS("Orphan ReplicaSet added", "replicaSet", klog.KObj(rs))
for _, d := range ds {
dc.enqueueDeployment(d)
}
}
我们着重看一下,当以Add事件加入的ReplicaSet找不到owner时,即是个孤儿ReplicaSet无人认领时,它是如何把所有匹配的Deployments列出来和它进行匹配,看是否有Deployment愿意认领它的。
// getDeploymentsForReplicaSet returns a list of Deployments that potentially
// match a ReplicaSet.
func (dc *DeploymentController) getDeploymentsForReplicaSet(rs *apps.ReplicaSet) []*apps.Deployment {
deployments, err := util.GetDeploymentsForReplicaSet(dc.dLister, rs)
if err != nil || len(deployments) == 0 {
return nil
}
// Because all ReplicaSet's belonging to a deployment should have a unique label key,
// there should never be more than one deployment returned by the above method.
// If that happens we should probably dynamically repair the situation by ultimately
// trying to clean up one of the controllers, for now we just return the older one
if len(deployments) > 1 {
// ControllerRef will ensure we don't do anything crazy, but more than one
// item in this list nevertheless constitutes user error.
klog.V(4).InfoS("user error! more than one deployment is selecting replica set",
"replicaSet", klog.KObj(rs), "labels", rs.Labels, "deployment", klog.KObj(deployments[0]))
}
return deployments
}
我们看到调用GetDeploymentsForReplicaSet()
方法,把所有的Deployment和当前ReplicaSet的label进行一次匹配,再过滤。
- 如果没有Deployment匹配到,说明就是个孤儿ReplicaSet,有可能是单独创建的,总之不是当前的已经有的Deployment创建的。
- 如果有多个Deployment匹配到,会打一行日志提示不应该这样操作,后续会把所有匹配的Deployment加入到队列里。
- 如果是一个的话是最好的情况。直接认领了,并把它加到队列里
ReplicaSet Update
当有ReplicaSet更新时,找到管理这个ReplicaSet的Deployment,并唤醒这个Deployment。
- 如果ResourceVersion不变,则说明没有更新,什么也不做
- 如果OwnerReference变了,说明这个ReplicaSet易主了,则需要唤醒之前的Deployment做同步
- 如果OwnerReference没变,说明还是同一个Deployment在管理这个ReplicaSet,则唤醒这个Deployment适配ReplicaSet的变更
- 如果找不到OwnerReference,说明这个ReplicaSet是个孤儿,看看有没有Deployment会认领它
// updateReplicaSet figures out what deployment(s) manage a ReplicaSet when the ReplicaSet
// is updated and wake them up. If the anything of the ReplicaSets have changed, we need to
// awaken both the old and new deployments. old and cur must be *apps.ReplicaSet
// types.
func (dc *DeploymentController) updateReplicaSet(old, cur interface{}) {
curRS := cur.(*apps.ReplicaSet)
oldRS := old.(*apps.ReplicaSet)
if curRS.ResourceVersion == oldRS.ResourceVersion {
// Periodic resync will send update events for all known replica sets.
// Two different versions of the same replica set will always have different RVs.
return
}
curControllerRef := metav1.GetControllerOf(curRS)
oldControllerRef := metav1.GetControllerOf(oldRS)
controllerRefChanged := !reflect.DeepEqual(curControllerRef, oldControllerRef)
if controllerRefChanged && oldControllerRef != nil {
// The ControllerRef was changed. Sync the old controller, if any.
if d := dc.resolveControllerRef(oldRS.Namespace, oldControllerRef); d != nil {
dc.enqueueDeployment(d)
}
}
// If it has a ControllerRef, that's all that matters.
if curControllerRef != nil {
d := dc.resolveControllerRef(curRS.Namespace, curControllerRef)
if d == nil {
return
}
klog.V(4).InfoS("ReplicaSet updated", "replicaSet", klog.KObj(curRS))
dc.enqueueDeployment(d)
return
}
// Otherwise, it's an orphan. If anything changed, sync matching controllers
// to see if anyone wants to adopt it now.
labelChanged := !reflect.DeepEqual(curRS.Labels, oldRS.Labels)
if labelChanged || controllerRefChanged {
ds := dc.getDeploymentsForReplicaSet(curRS)
if len(ds) == 0 {
return
}
klog.V(4).InfoS("Orphan ReplicaSet updated", "replicaSet", klog.KObj(curRS))
for _, d := range ds {
dc.enqueueDeployment(d)
}
}
}
ReplicaSet Delete
这部分没有什么需要特别强调的,当有ReplicaSet删除时,唤醒它OwnerReference里的Deployment,做对应的处理
// deleteReplicaSet enqueues the deployment that manages a ReplicaSet when
// the ReplicaSet is deleted. obj could be an *apps.ReplicaSet, or
// a DeletionFinalStateUnknown marker item.
func (dc *DeploymentController) deleteReplicaSet(obj interface{}) {
rs, ok := obj.(*apps.ReplicaSet)
// When a delete is dropped, the relist will notice a pod in the store not
// in the list, leading to the insertion of a tombstone object which contains
// the deleted key/value. Note that this value might be stale. If the ReplicaSet
// changed labels the new deployment will not be woken up till the periodic resync.
if !ok {
tombstone, ok := obj.(cache.DeletedFinalStateUnknown)
if !ok {
utilruntime.HandleError(fmt.Errorf("couldn't get object from tombstone %#v", obj))
return
}
rs, ok = tombstone.Obj.(*apps.ReplicaSet)
if !ok {
utilruntime.HandleError(fmt.Errorf("tombstone contained object that is not a ReplicaSet %#v", obj))
return
}
}
controllerRef := metav1.GetControllerOf(rs)
if controllerRef == nil {
// No controller should care about orphans being deleted.
return
}
d := dc.resolveControllerRef(rs.Namespace, controllerRef)
if d == nil {
return
}
klog.V(4).InfoS("ReplicaSet deleted", "replicaSet", klog.KObj(rs))
dc.enqueueDeployment(d)
}
Pod Delete
这里是Deployment比较特殊的一个事件处理,按理说Deployment是ReplicaSet的控制器,只需要关注ReplicaSet的行为,无需单独关注Pod的删除事件。
从注释得知,当Deployment的Strategy
设置为Recreate
时,会用到此处的逻辑。看Recreate
的定义可知,在创建新Pod的时候,会把所有存在的Pods都先杀死。
const (
// Kill all existing pods before creating new ones.
RecreateDeploymentStrategyType DeploymentStrategyType = "Recreate"
// Replace the old ReplicaSets by new one using rolling update i.e gradually scale down the old ReplicaSets and scale up the new one.
RollingUpdateDeploymentStrategyType DeploymentStrategyType = "RollingUpdate"
)
默认走RollingUpdate
的话Deployment只需要控制ReplicaSet即可。
// deletePod will enqueue a Recreate Deployment once all of its pods have stopped running.
func (dc *DeploymentController) deletePod(obj interface{}) {
pod, ok := obj.(*v1.Pod)
// When a delete is dropped, the relist will notice a pod in the store not
// in the list, leading to the insertion of a tombstone object which contains
// the deleted key/value. Note that this value might be stale. If the Pod
// changed labels the new deployment will not be woken up till the periodic resync.
if !ok {
tombstone, ok := obj.(cache.DeletedFinalStateUnknown)
if !ok {
utilruntime.HandleError(fmt.Errorf("couldn't get object from tombstone %#v", obj))
return
}
pod, ok = tombstone.Obj.(*v1.Pod)
if !ok {
utilruntime.HandleError(fmt.Errorf("tombstone contained object that is not a pod %#v", obj))
return
}
}
klog.V(4).InfoS("Pod deleted", "pod", klog.KObj(pod))
if d := dc.getDeploymentForPod(pod); d != nil && d.Spec.Strategy.Type == apps.RecreateDeploymentStrategyType {
// Sync if this Deployment now has no more Pods.
rsList, err := util.ListReplicaSets(d, util.RsListFromClient(dc.client.AppsV1()))
if err != nil {
return
}
podMap, err := dc.getPodMapForDeployment(d, rsList)
if err != nil {
return
}
numPods := 0
for _, podList := range podMap {
numPods += len(podList)
}
if numPods == 0 {
dc.enqueueDeployment(d)
}
}
}
如果有Pod的删除事件,此处会追溯这个Pod的OwnerReference——ReplicaSet,然后追溯这些ReplicaSet的OwnerReference——Deployment,如果不为空,说明这个Pod最上层是由Deployment来管控的。
此时,再加上Deployment的Strategy
设置为Recreate
,则需计算最上层的这个Deployment所管理的Pods数目是否为0,只有当为0时,才会进行新建的动作,才需要把这个Deployment加入到队列里以便后续处理。
总结
Deployment的控制器主要做了如下的几件事情:
- 监听变更:Deployment 控制器在 controller-manager 中注册一个Deployment的informer,一个ReplicaSet的informer,和一个Pod的informer。分别通知Deployment、ReplicaSet和Pod的事件。在 pkg/controller/deployment/deployment_controller.go 文件中,定义了一个名为 NewDeploymentController 的方法,它将被这三个informer初始化,用于监听任何Deployment、ReplicaSet和Pod的事件,包括ADD、UPDATE、DELETE,Pod的比较特殊,只监听DELETE,一共3+3+1=7种事件
- 响应变更:一旦 Deployment 控制器接收到变更通知,它会将变更放入队列以进行处理。
- 同步状态:这部分在下篇进行讲解。