Kubelet的grabage collection

介绍

Kublet中会定期的执行垃圾回收流程,清理节点上不使用的images和containers.下面讲会对如何配置垃圾回收策略以及代码实现层面上分析该业务流程.

Image 回收

策略中存在两个阈值,HighThresholdPercent和LowThresholdPercent,当磁盘使用率超过HighThresholdPercent将会触发回收流程,回收流程将会删除最老没有被使用的images,一直到磁盘使用率满足LowThresholdPercent的要求,磁盘使用率通过cadvisor采集得到.

Container 回收

策略中包含3个用户定义的变量
MinAge 一个容器被回收掉的最小的年龄
MaxPerPodContianer 单个Pod中允许最大的dead containers的数量
MaxContainers dead containers最大允许的数量
配置说明

  • 通过设置MinAge=0 MaxPerPodContianer<0 MaxContainers <0 非使能这些参数
  • 最老的containers会首先被删除
  • 回收的container主要是pod被删除了等场景下对应的container

用户可以通过设置kubelet中的参数来控制回收策略
image-gc-high-threshold  默认值90%
image-gc-low-threshold 默认值80%

minimum-container-ttl-duration 完成的container至少存在多少时间才会被gc, 默认0表示完成状态的container就会被回收处理
maximum-dead-containers-per-container 最大每个容器能够存在的旧的实例,默认为1
maximum-dead-containers 最大允许存在的deal container的数量,默认为-1,表示不考虑该指标

推荐的设置

考虑到dead container的存在用例于查询故障容器的log等信息,所以推荐至少每个container能够保留一个dead的实例容器,并且设置最大允许存在的dead container足够的大.

代码解析

源码分析基于kubernetes 1.10
src/k8s.io/kubernetes/pkg/kubelet/kubelet.go NewMainKubelet()函数中初始化ContainerGC以及ImageGC结构体,两个变量都保存到klet的struct里面.

    // setup containerGC
    containerGC, err := kubecontainer.NewContainerGC(klet.containerRuntime, containerGCPolicy, klet.sourcesReady)
    if err != nil {
        return nil, err
    }
    klet.containerGC = containerGC
    .....
    // setup imageManager
    imageManager, err := images.NewImageGCManager(klet.containerRuntime, klet.StatsProvider, kubeDeps.Recorder, nodeRef, imageGCPolicy, crOptions.PodSandboxImage)
    if err != nil {
        return nil, fmt.Errorf("failed to initialize image manager: %v", err)
    }
    klet.imageManager = imageManager

执行kubelet.NewMainKubelet()完成kubelet的启动初始化之后,在src/k8s.io/kubernetes/cmd/kubelet/app/server.go的CreateAndInitKubelet()中调用k.StartGarbageCollection()

// StartGarbageCollection starts garbage collection threads.
func (kl *Kubelet) StartGarbageCollection() {
    loggedContainerGCFailure := false
    go wait.Until(func() {
        if err := kl.containerGC.GarbageCollect(); err != nil {
            glog.Errorf("Container garbage collection failed: %v", err)
            kl.recorder.Eventf(kl.nodeRef, v1.EventTypeWarning, events.ContainerGCFailed, err.Error())
            loggedContainerGCFailure = true
        } else {
            var vLevel glog.Level = 4
            if loggedContainerGCFailure {
                vLevel = 1
                loggedContainerGCFailure = false
            }

            glog.V(vLevel).Infof("Container garbage collection succeeded")
        }
    }, ContainerGCPeriod, wait.NeverStop)

    prevImageGCFailed := false
    go wait.Until(func() {
        if err := kl.imageManager.GarbageCollect(); err != nil {
            if prevImageGCFailed {
                glog.Errorf("Image garbage collection failed multiple times in a row: %v", err)
                // Only create an event for repeated failures
                kl.recorder.Eventf(kl.nodeRef, v1.EventTypeWarning, events.ImageGCFailed, err.Error())
            } else {
                glog.Errorf("Image garbage collection failed once. Stats initialization may not have completed yet: %v", err)
            }
            prevImageGCFailed = true
        } else {
            var vLevel glog.Level = 4
            if prevImageGCFailed {
                vLevel = 1
                prevImageGCFailed = false
            }

            glog.V(vLevel).Infof("Image garbage collection succeeded")
        }
    }, ImageGCPeriod, wait.NeverStop)
}

之后启动两个goroutine来周期性的执行containerGC以及imageGC,其中containerGC的周期是1m, imageGC的周期是5m. 从代码中可以分析出来
containerGC调用的是kl.containerGC.GarbageCollect(),实质上最终调用的是src/k8s.io/kubernetes/pkg/kubelet/kuberuntime/kuberuntime_gc.go中的GarbageCollect

func (cgc *containerGC) GarbageCollect(gcPolicy kubecontainer.ContainerGCPolicy, allSourcesReady bool, evictTerminatedPods bool) error {
    // Remove evictable containers
    if err := cgc.evictContainers(gcPolicy, allSourcesReady, evictTerminatedPods); err != nil {
        return err
    }

    // Remove sandboxes with zero containers
    if err := cgc.evictSandboxes(evictTerminatedPods); err != nil {
        return err
    }

    // Remove pod sandbox log directory
    return cgc.evictPodLogsDirectories(allSourcesReady)
}

回收的时候主要完成如下的工作

  • gets evictable containers which are not active and created more than gcPolicy.MinAge ago.
  • removes oldest dead containers for each pod by enforcing gcPolicy.MaxPerPodContainer.
  • removes oldest dead containers by enforcing gcPolicy.MaxContainers.
  • gets evictable sandboxes which are not ready and contains no containers.
  • removes evictable sandboxes.

再接下来的流程可以看对应的代码

再回到ImageGC的流程,ImageGC对应的GarbageCollect()为src/k8s.io/kubernetes/pkg/kubelet/images/image_gc_manager.go

func (im *realImageGCManager) GarbageCollect() error {
    // Get disk usage on disk holding images.
    fsStats, err := im.statsProvider.ImageFsStats()
    if err != nil {
        return err
    }

    var capacity, available int64
    if fsStats.CapacityBytes != nil {
        capacity = int64(*fsStats.CapacityBytes)
    }
    if fsStats.AvailableBytes != nil {
        available = int64(*fsStats.AvailableBytes)
    }

    if available > capacity {
        glog.Warningf("available %d is larger than capacity %d", available, capacity)
        available = capacity
    }

    // Check valid capacity.
    if capacity == 0 {
        err := goerrors.New("invalid capacity 0 on image filesystem")
        im.recorder.Eventf(im.nodeRef, v1.EventTypeWarning, events.InvalidDiskCapacity, err.Error())
        return err
    }

    // If over the max threshold, free enough to place us at the lower threshold.
    usagePercent := 100 - int(available*100/capacity)
    if usagePercent >= im.policy.HighThresholdPercent {
        amountToFree := capacity*int64(100-im.policy.LowThresholdPercent)/100 - available
        glog.Infof("[imageGCManager]: Disk usage on image filesystem is at %d%% which is over the high threshold (%d%%). Trying to free %d bytes", usagePercent, im.policy.HighThresholdPercent, amountToFree)
        freed, err := im.freeSpace(amountToFree, time.Now())
        if err != nil {
            return err
        }

        if freed < amountToFree {
            err := fmt.Errorf("failed to garbage collect required amount of images. Wanted to free %d bytes, but freed %d bytes", amountToFree, freed)
            im.recorder.Eventf(im.nodeRef, v1.EventTypeWarning, events.FreeDiskSpaceFailed, err.Error())
            return err
        }
    }

    return nil
}

这里有几点需要注意的事

  • 首先fsStats, err := im.statsProvider.ImageFsStats()中获取ImageFS对应的capacityBytes以及AvailableBytes. 就是对应磁盘当前的总量以及可用量,后面我们再详细的介绍im.statsProvider
  • imageGC的回收策略,正如上文所描述的回收策略一样
    // If over the max threshold, free enough to place us at the lower threshold.
    usagePercent := 100 - int(available*100/capacity)
    if usagePercent >= im.policy.HighThresholdPercent {
        amountToFree := capacity*int64(100-im.policy.LowThresholdPercent)/100 - available
        glog.Infof("[imageGCManager]: Disk usage on image filesystem is at %d%% which is over the high threshold (%d%%). Trying to free %d bytes", usagePercent, im.policy.HighThresholdPercent, amountToFree)
        freed, err := im.freeSpace(amountToFree, time.Now())
        .....
    }
  • imageGC真正执行im.freeSpace(amountToFree, time.Now()),可以看内部代码实现基本流程是首先获取没有被使用的image,之后根据策略决定需要删除的image,删除image调用的是im.runtime.RemoveImage(container.ImageSpec{Image: image.id}),同样也是调用cri接口对应的imageSerivce实现RemoveImage(). 这样就完成了imageGC的工作流程.

ImageGCManager里面还有一个流程,就是src/k8s.io/kubernetes/pkg/kubelet/images/image_gc_manager.go的ImageGCManager interface接口中定义的Start()会启动goroutine来统计image的信息

func (im *realImageGCManager) Start() {
    go wait.Until(func() {
        // Initial detection make detected time "unknown" in the past.
        var ts time.Time
        if im.initialized {
            ts = time.Now()
        }
        _, err := im.detectImages(ts)
        if err != nil {
            glog.Warningf("[imageGCManager] Failed to monitor images: %v", err)
        } else {
            im.initialized = true
        }
    }, 5*time.Minute, wait.NeverStop)

    // Start a goroutine periodically updates image cache.
    // TODO(random-liu): Merge this with the previous loop.
    go wait.Until(func() {
        images, err := im.runtime.ListImages()
        if err != nil {
            glog.Warningf("[imageGCManager] Failed to update image list: %v", err)
        } else {
            im.imageCache.set(images)
        }
    }, 30*time.Second, wait.NeverStop)

}

首选我们可以看到这个是在src/k8s.io/kubernetes/pkg/kubelet/kubelet.go:1294中的initializeModules()函数中启动的

    // Start the image manager.
    kl.imageManager.Start()

相对于早期的kubelet代码,kubernetes1.10版本经过社区不断的结构优化流程上更加的清晰了. 在Start()函数中,会启动两个goroutine,一个goroutine以5m的周期执行im.detectImage(ts),一个goroutine以30s获取到images之后缓存到imageCache模块中.我们详细分析im.detectImages(ts)

func (im *realImageGCManager) detectImages(detectTime time.Time) (sets.String, error) {
    imagesInUse := sets.NewString()

    // Always consider the container runtime pod sandbox image in use
    imageRef, err := im.runtime.GetImageRef(container.ImageSpec{Image: im.sandboxImage})
    if err == nil && imageRef != "" {
        imagesInUse.Insert(imageRef)
    }

    images, err := im.runtime.ListImages()
    if err != nil {
        return imagesInUse, err
    }
    pods, err := im.runtime.GetPods(true)
    if err != nil {
        return imagesInUse, err
    }

    // Make a set of images in use by containers.
    for _, pod := range pods {
        for _, container := range pod.Containers {
            glog.V(5).Infof("Pod %s/%s, container %s uses image %s(%s)", pod.Namespace, pod.Name, container.Name, container.Image, container.ImageID)
            imagesInUse.Insert(container.ImageID)
        }
    }

    // Add new images and record those being used.
    now := time.Now()
    currentImages := sets.NewString()
    im.imageRecordsLock.Lock()
    defer im.imageRecordsLock.Unlock()
    for _, image := range images {
        glog.V(5).Infof("Adding image ID %s to currentImages", image.ID)
        currentImages.Insert(image.ID)

        // New image, set it as detected now.
        if _, ok := im.imageRecords[image.ID]; !ok {
            glog.V(5).Infof("Image ID %s is new", image.ID)
            im.imageRecords[image.ID] = &imageRecord{
                firstDetected: detectTime,
            }
        }

        // Set last used time to now if the image is being used.
        if isImageUsed(image.ID, imagesInUse) {
            glog.V(5).Infof("Setting Image ID %s lastUsed to %v", image.ID, now)
            im.imageRecords[image.ID].lastUsed = now
        }

        glog.V(5).Infof("Image ID %s has size %d", image.ID, image.Size)
        im.imageRecords[image.ID].size = image.Size
    }

    // Remove old images from our records.
    for image := range im.imageRecords {
        if !currentImages.Has(image) {
            glog.V(5).Infof("Image ID %s is no longer present; removing from imageRecords", image)
            delete(im.imageRecords, image)
        }
    }

    return imagesInUse, nil
}

这个函数就是统计那些image没有被使用,以及image出现的时间以及最近被使用的时间,被存储在imageRecord struct的map中.

// Information about the images we track.
type imageRecord struct {
    // Time when this image was first detected.
    firstDetected time.Time

    // Time when we last saw this image being used.
    lastUsed time.Time

    // Size of the image in bytes.
    size int64
}

imageRecords这个map会在imageGC的时候使用到.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值