文章目录
一、kube-scheduler架构设计
调度器的核心功能是为Pod找到最适合的节点运行。对于小规模集群,每个调度周期会遍历集群中的所有节点,找到最合适节点进行调度。而对于大规模集群,每个调度周期只会遍历集群中的部分节点,在这部分节点中找到最合适的节点进行调度。
整个调度流程主要分为预选、优选和绑定三个节点。预选阶段首先过滤掉不符合条件的节点,优选阶段主要对预选阶段筛选后的节点进行打分,绑定阶段将分数最高的节点和pod进行绑定,完成调度。
此源码分析针对 Kubernetes V1.18.10 版本
二、kube-scheduler组件启动流程
2.1 内置调度算法注册
在 createFromProvider 函数中调用 algorithmprovider.NewRegistry() 注册调度器算法插件:
//pkg/scheduler/factory.go
func (c *Configurator) createFromProvider(providerName string) (*Scheduler, error) {
klog.V(2).Infof("Creating scheduler from algorithm provider '%v'", providerName)
r := algorithmprovider.NewRegistry()
defaultPlugins, exist := r[providerName]
if !exist {
return nil, fmt.Errorf("algorithm provider %q is not registered", providerName)
}
for i := range c.profiles {
prof := &c.profiles[i]
plugins := &schedulerapi.Plugins{
}
plugins.Append(defaultPlugins)
plugins.Apply(prof.Plugins)
prof.Plugins = plugins
}
return c.create()
}
algorithmprovider.NewRegistry() 函数调用了 getDefaultConfig() 获取默认调度算法,将其注册。
//pkg/scheduler/algorithmprovider/registry.go
func getDefaultConfig() *schedulerapi.Plugins {
return &schedulerapi.Plugins{
QueueSort: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{
Name: queuesort.Name},
},
},
PreFilter: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{
Name: noderesources.FitName},
{
Name: nodeports.Name},
{
Name: podtopologyspread.Name},
{
Name: interpodaffinity.Name},
{
Name: volumebinding.Name},
},
},
Filter: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{
Name: nodeunschedulable.Name},
{
Name: noderesources.FitName},
{
Name: nodename.Name},
{
Name: nodeports.Name},
{
Name: nodeaffinity.Name},
{
Name: volumerestrictions.Name},
{
Name: tainttoleration.Name},
{
Name: nodevolumelimits.EBSName},
{
Name: nodevolumelimits.GCEPDName},
{
Name: nodevolumelimits.CSIName},
{
Name: nodevolumelimits.AzureDiskName},
{
Name: volumebinding.Name},
{
Name: volumezone.Name},
{