当我们给一个对象设置OwnerReference的时候,删除该对象的owner, 该对象也会被连带删除。这个时候用的就是k8s的垃圾回收机制。
1. K8s 的垃圾回收策略
k8s目前支持三种回收策略:
(1)前台级联删除(Foreground Cascading Deletion):在这种删除策略中,所有者对象的删除将会持续到其所有从属对象都被删除为止。当所有者被删除时,会进入“正在删除”(deletion in progress)状态,此时:
-
对象仍然可以通过 REST API 查询到(可通过 kubectl 或 kuboard 查询到)
-
对象的 deletionTimestamp 字段被设置
-
对象的 metadata.finalizers 包含值 foregroundDeletion
(2)后台级联删除(Background Cascading Deletion):这种删除策略会简单很多,它会立即删除所有者的对象,并由垃圾回收器在后台删除其从属对象。这种方式比前台级联删除快的多,因为不用等待时间来删除从属对象。
(3)孤儿(Orphan):这种情况下,对所有者的进行删除只会将其从集群中删除,并使所有对象处于“孤儿”状态。
举例:已有一个deployA, 对应的rs假设为 rsA, pod为PodA。
(1)前台删除:先删除podA, 再删除rsA, 再删除deployA。 podA的删除如果卡在,rsA也会被卡住。
(2)后台删除:先删除deployA, 再删除rsA, 再删除podA。 podA和rsA是否会删除成功,deploy不会受影响。
(3)孤儿删除:只删除deployA。rsA, podA不受影响。 rsA的owner不再是deployA。
2 gc 源码分析
和deployController, rsController一样,GarbageCollectorController也是kube-controller-manager(kcm)中的一个控制器。
GarbageCollectorController 的启动方法为 startGarbageCollectorController
,主要逻辑如下:
从第三步开始每一步都深入展开。第三步对应2.1。
(1)初始化客户端,用于发现集群中的资源。这个先不关注
(2)获得deletableResources,以及ignoredResources。
deletableResources: 所有支持”delete”, “list”, “watch” 操作的资源
ignoredResources:kcm启动时GarbageCollectorController的config指定
(3)初始化 garbageCollector 对象。
(4)启动garbageCollector
(5)garbageCollector同步
(6)开启debug模式
func startGarbageCollectorController(ctx ControllerContext) (http.Handler, bool, error) { // 1.初始化客户端 if !ctx.ComponentConfig.GarbageCollectorController.EnableGarbageCollector { return nil, false, nil } gcClientset := ctx.ClientBuilder.ClientOrDie("generic-garbage-collector") discoveryClient := cacheddiscovery.NewMemCacheClient(gcClientset.Discovery()) config := ctx.ClientBuilder.ConfigOrDie("generic-garbage-collector") metadataClient, err := metadata.NewForConfig(config) if err != nil { return nil, true, err } // 2. 获得deletableResources,以及ignoredResources // Get an initial set of deletable resources to prime the garbage collector. deletableResources := garbagecollector.GetDeletableResources(discoveryClient) ignoredResources := make(map[schema.GroupResource]struct{}) for _, r := range ctx.ComponentConfig.GarbageCollectorController.GCIgnoredResources { ignoredResources[schema.GroupResource{Group: r.Group, Resource: r.Resource}] = struct{}{} } // 3. NewGarbageCollector garbageCollector, err := garbagecollector.NewGarbageCollector( metadataClient, ctx.RESTMapper, deletableResources, ignoredResources, ctx.ObjectOrMetadataInformerFactory, ctx.InformersStarted, ) if err != nil { return nil, true, fmt.Errorf("failed to start the generic garbage collector: %v", err) } // 4. 启动garbageCollector // Start the garbage collector. workers := int(ctx.ComponentConfig.GarbageCollectorController.ConcurrentGCSyncs) go garbageCollector.Run(workers, ctx.Stop) // Periodically refresh the RESTMapper with new discovery information and sync // the garbage collector. // 5. garbageCollector同步 go garbageCollector.Sync(gcClientset.Discovery(), 30*time.Second, ctx.Stop) // 6. 开启debug模式 return garbagecollector.NewDebugHandler(garbageCollector), true, nil }
2.1 初始化 garbageCollector 对象
2.1.1 garbageCollector包含的结构体对象
garbageCollector需要额外的结构:
attemptToDelete,attemptToOrphan:限速队列
uidToNode:一个缓存依赖关系的图。一个map结构,key=uid, value是一个node结构。
type GarbageCollector struct { restMapper resettableRESTMapper metadataClient metadata.Interface attemptToDelete workqueue.RateLimitingInterface attemptToOrphan workqueue.RateLimitingInterface dependencyGraphBuilder *GraphBuilder absentOwnerCache *UIDCache workerLock sync.RWMutex } // GraphBuilder: based on the events supplied by the informers, GraphBuilder updates // uidToNode, a graph that caches the dependencies as we know, and enqueues // items to the attemptToDelete and attemptToOrphan. type GraphBuilder struct { restMapper meta.RESTMapper // 每一个monitor对应一种资源 monitors monitors monitorLock sync.RWMutex informersStarted <-chan struct{} stopCh <-chan struct{} running bool metadataClient metadata.Interface graphChanges workqueue.RateLimitingInterface uidToNode *concurrentUIDToNode attemptToDelete workqueue.RateLimitingInterface attemptToOrphan workqueue.RateLimitingInterface absentOwnerCache *UIDCache sharedInformers controller.InformerFactory ignoredResources map[schema.GroupResource]struct{} } type concurrentUIDToNode struct { uidToNodeLock sync.RWMutex uidToNode map[types.UID]*node } type node struct { identity objectReference dependentsLock sync.RWMutex dependents map[*node]struct{} //该节点的所有依赖 deletingDependents bool deletingDependentsLock sync.RWMutex beingDeleted bool beingDeletedLock sync.RWMutex virtual bool virtualLock sync.RWMutex owners []metav1.OwnerReference //该节点的所有owner }
举例来说:
假设集群中有:deployA, rsA, podA三个对象。
monitors 负责监听这三种资源的变化。然后根据情况扔进 attemptToDelete,attemptToOrphan队列。
GraphBuilder负责构建一个图。在这种情况下,图的内容为:
Node1( key=deployA.uid ): 它的owner为空,dependents=rsA。
Node2( key=rsA.uid ): 它的owner=deployA,dependents=podA。
Node3( key=pod.uid ): 它的owner=rsA,dependents为空。
同时,每个节点还有beingDeleted,deletingDependents等关键字段。这样gc根据这个图就可以很方便地进行各种策略的删除。
2.1.2 NewGarbageCollector
NewGarbageCollector就做了俩件事
(1)初始化GarbageCollector结构体
(2)调用controllerFor定义对象变化的处理事件。无论是监听到add, update, del都是将其打包成一个event事件,然后加入graphChanges队列。
func NewGarbageCollector( metadataClient metadata.Interface, mapper resettableRESTMapper, deletableResources map[schema.GroupVersionResource]struct{}, ignoredResources map[schema.GroupResource]struct{}, sharedInformers controller.InformerFactory, informersStarted <-chan struct{}, ) (*GarbageCollector, error) { attemptToDelete := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_attempt_to_delete") attemptToOrphan := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_attempt_to_orphan") absentOwnerCache := NewUIDCache(500) gc := &GarbageCollector{ metadataClient: metadataClient, restMapper: mapper, attemptToDelete: attemptToDelete, attemptToOrphan: attemptToOrphan, absentOwnerCache: absentOwnerCache, } gb := &GraphBuilder{ metadataClient: metadataClient, informersStarted: informersStarted, restMapper: mapper, graphChanges: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_col