Kubernetes Eviction Manager源码分析

本文详细分析了Kubernetes Eviction Manager的启动、定义及其工作机制,重点关注其在Kubelet中的初始化、资源回收策略和Pod驱逐流程。核心步骤包括节点资源回收和用户Pods的剔除,每个周期最多只剔除一个Pod。
摘要由CSDN通过智能技术生成

摘要:本文作为Kubernetes Eviction Manager工作机制分析的后续篇,主要通过源码分析对其工作机制进行解读。

Kubernetes Eviction Manager介绍及工作原理

这部分内容,请看我的前一篇博文:Kubernetes Eviction Manager工作机制分析

Kubernetes Eviction Manager源码分析

Kubernetes Eviction Manager在何处启动

Kubelet在实例化一个kubelet对象的时候,调用eviction.NewManager新建了一个evictionManager对象。

pkg/kubelet/kubelet.go:273
func NewMainKubelet(kubeCfg *componentconfig.KubeletConfiguration, kubeDeps *KubeletDeps, standaloneMode bool) (*Kubelet, error) {

    ...

    thresholds, err := eviction.ParseThresholdConfig(kubeCfg.EvictionHard, kubeCfg.EvictionSoft, kubeCfg.EvictionSoftGracePeriod, kubeCfg.EvictionMinimumReclaim)
    if err != nil {
        return nil, err
    }
    evictionConfig := eviction.Config{
        PressureTransitionPeriod: kubeCfg.EvictionPressureTransitionPeriod.Duration,
        MaxPodGracePeriodSeconds: int64(kubeCfg.EvictionMaxPodGracePeriod),
        Thresholds:               thresholds,
        KernelMemcgNotification:  kubeCfg.ExperimentalKernelMemcgNotification,
    }
    ...

    // setup eviction manager
    evictionManager, evictionAdmitHandler, err := eviction.NewManager(klet.resourceAnalyzer, evictionConfig, killPodNow(klet.podWorkers, kubeDeps.Recorder), klet.imageManager, kubeDeps.Recorder, nodeRef, klet.clock)

    if err != nil {
        return nil, fmt.Errorf("failed to initialize eviction manager: %v", err)
    }
    klet.evictionManager = evictionManager
    klet.admitHandlers.AddPodAdmitHandler(evictionAdmitHandler)
    ...
}

kubelet执行Run方法开始工作时,启动了一个goroutine,每5s执行一次updateRuntimeUp。在updateRuntimeUp中,待确认runtime启动成功后,会调用initializeRuntimeDependentModules完成runtime依赖模块的初始化工作。

pkg/kubelet/kubelet.go:1219
func (kl *Kubelet) Run(updates <-chan kubetypes.PodUpdate) {
    go wait.Until(kl.updateRuntimeUp, 5*time.Second, wait.NeverStop)
}


pkg/kubelet/kubelet.go:2040
func (kl *Kubelet) updateRuntimeUp() {
    ...

    kl.oneTimeInitializer.Do(kl.initializeRuntimeDependentModules)

    ...
}

再跟踪到initializeRuntimeDependentModules的代码可见,runtime的依赖模块包括cadvisor和evictionManager,初始化的工作其实就是分别调用它们的Start方法进行启动。

pkg/kubelet/kubelet.go:1206
func (kl *Kubelet) initializeRuntimeDependentModules() {
    if err := kl.cadvisor.Start(); err != nil {
   
        // Fail kubelet and rely on the babysitter to retry starting kubelet.
        // TODO(random-liu): Add backoff logic in the babysitter
        glog.Fatalf("Failed to start cAdvisor %v", err)
    }
    // eviction manager must start after cadvisor because it needs to know if the container runtime has a dedicated imagefs
    if err := kl.evictionManager.Start(kl, kl.getActivePods, evictionMonitoringPeriod); err != nil {
   
        kl.runtimeState.setInternalError(fmt.Errorf("failed to start eviction manager %v", err))
    }
}

因此,从这里开始就进入到evictionManager的分析了。

Kubernetes Eviction Manager的定义

从上面的分析可见,kubelet在启动过程中进行runtime依赖模块的初始化过程中,将evictionManager启动了。先别急,我们必须先来看看Eviction Manager是如何定义的。

pkg/kubelet/eviction/eviction_manager.go:40
// managerImpl implements Manager
type managerImpl struct {
    //  used to track time
    clock clock.Clock
    // config is how the manager is configured
    config Config
    // the function to invoke to kill a pod
    killPodFunc KillPodFunc
    // the interface that knows how to do image gc
    imageGC ImageGC
    // protects access to internal state
    sync.RWMutex
    // node conditions are the set of conditions present
    nodeConditions []v1.NodeConditionType
    // captures when a node condition was last observed based on a threshold being met
    nodeConditionsLastObservedAt nodeConditionsObservedAt
    // nodeRef is a reference to the node
    nodeRef *v1.ObjectReference
    // used to record events about the node
    recorder record.EventRecorder
    // used to measure usage stats on system
    summaryProvider stats.SummaryProvider
    // records when a threshold was first observed
    thresholdsFirstObservedAt thresholdsObservedAt
    // records the set of thresholds that have been met (including graceperiod) but not yet resolved
    thresholdsMet []Threshold
    // resourceToRankFunc maps a resource to ranking function for that resource.
    resourceToRankFunc map[v1.ResourceName]rankFunc
    // resourceToNodeReclaimFuncs maps a resource to an ordered list of functions that know how to reclaim that resource.
    resourceToNodeReclaimFuncs map[v1.ResourceName]nodeReclaimFuncs
    // last observations from synchronize
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值