[kubeflow] controller-runtime源码解析

[TODO] 使用 controller-runtime官方文档 重构一下文章的脉络。

在上一篇文章 [kubeflow] 从零搭建training-operator项目 中,我们从零搭建了一个简单的training-operator项目,最终就差完成controller的Reconcile函数逻辑。这次从TFJob的Reconcile函数为入口,探究training-operator到底是怎么工作的。在此之前,我们需要了解controller-runtime的原理。

controller-runtime源码分析

controller-runtime是社区封装的一个controller框架,借助kubebuilder等工具,开发者只需要关心Reconcile函数的实现,非常方便。下面这图不是controller-runtime,但很接近。Worker可以理解为reconciler,reconciler从工作队列中取出reconcile.request进行消耗。Readonly是指podLister,serviceLister这些。

在这里插入图片描述

在这里插入图片描述

Controller

下面controller-runtime的版本是v0.15.0,不同版本可能略有差异。下载了training-operator之后,执行go mod tidy下载依赖。使用vscode打开,找到controller相关代码通过 ctrl+左键 就可以跳转到源码了。下面关于controller-runtime的分析,我主要是参考 operator:controller-runtime 原理之控制器 这篇文章来的,因此分析过程基本差不多。

看看controller的抽象接口的定义,文件在controller-runtime@v0.15.0/pkg/controller/controller.go。核心主要是这四个函数,其中,reconcile.Reconciler也是一个抽象接口,里面只有一个Reconcile()函数的定义,即用户来实现。

// Controller implements a Kubernetes API.  A Controller manages a work queue fed reconcile.Requests
// from source.Sources.  Work is performed through the reconcile.Reconciler for each enqueued item.
// Work typically is reads and writes Kubernetes objects to make the system state match the state specified
// in the object Spec.
type Controller interface {
	// Reconciler is called to reconcile an object by Namespace/Name
	reconcile.Reconciler

	// Watch takes events provided by a Source and uses the EventHandler to
	// enqueue reconcile.Requests in response to the events.
	//
	// Watch may be provided one or more Predicates to filter events before
	// they are given to the EventHandler.  Events will be passed to the
	// EventHandler if all provided Predicates evaluate to true.
	Watch(src source.Source, eventhandler handler.EventHandler, predicates ...predicate.Predicate) error

	// Start starts the controller.  Start blocks until the context is closed or a
	// controller has an error starting.
	Start(ctx context.Context) error

	// GetLogger returns this controller logger prefilled with basic information.
	GetLogger() logr.Logger
}

看看controller的具体实现,文件在controller-runtime@v0.15.0/pkg/internal/controller/controller.go。MakeQueue用来初始化限速的工作队列Queue,成员Do则是reconciler,之后会运行用户的reconcile代码。mu是一个锁,保证同时只有一个controller在运行。Started标记controller是否在运行。startWatches用来存储所有的watchDescription对象,一个watchDescription对象包括src,handler和predicates三部分。

// Controller implements controller.Controller.
type Controller struct {
	// Name is used to uniquely identify a Controller in tracing, logging and monitoring.  Name is required.
	Name string

	// MaxConcurrentReconciles is the maximum number of concurrent Reconciles which can be run. Defaults to 1.
	MaxConcurrentReconciles int

	// Reconciler is a function that can be called at any time with the Name / Namespace of an object and
	// ensures that the state of the system matches the state specified in the object.
	// Defaults to the DefaultReconcileFunc.
	Do reconcile.Reconciler

	// MakeQueue constructs the queue for this controller once the controller is ready to start.
	// This exists because the standard Kubernetes workqueues start themselves immediately, which
	// leads to goroutine leaks if something calls controller.New repeatedly.
	MakeQueue func() workqueue.RateLimitingInterface

	// Queue is an listeningQueue that listens for events from Informers and adds object keys to
	// the Queue for processing
	Queue workqueue.RateLimitingInterface

	// mu is used to synchronize Controller setup
	mu sync.Mutex

	// Started is true if the Controller has been Started
	Started bool

	// ctx is the context that was passed to Start() and used when starting watches.
	//
	// According to the docs, contexts should not be stored in a struct: https://golang.org/pkg/context,
	// while we usually always strive to follow best practices, we consider this a legacy case and it should
	// undergo a major refactoring and redesign to allow for context to not be stored in a struct.
	ctx context.Context

	// CacheSyncTimeout refers to the time limit set on waiting for cache to sync
	// Defaults to 2 minutes if not set.
	CacheSyncTimeout time.Duration

	// startWatches maintains a list of sources, handlers, and predicates to start when the controller is started.
	startWatches []watchDescription

	// LogConstructor is used to construct a logger to then log messages to users during reconciliation,
	// or for example when a watch is started.
	// Note: LogConstructor has to be able to handle nil requests as we are also using it
	// outside the context of a reconciliation.
	LogConstructor func(request *reconcile.Request) logr.Logger

	// RecoverPanic indicates whether the panic caused by reconcile should be recovered.
	RecoverPanic *bool

	// LeaderElected indicates whether the controller is leader elected or always running.
	LeaderElected *bool
}

// watchDescription contains all the information necessary to start a watch.
type watchDescription struct {
	src        source.Source
	handler    handler.EventHandler
	predicates []predicate.Predicate
}

Controller.Watch

看看Watch函数的具体实现,文件在controller-runtime@v0.15.0/pkg/internal/controller/controller.go。可以看到实际上是调用了Source.Start函数来完成。src是什么?是我们想要观察的对象,我们想观察到src对象的增删改时间并调用eventHandler相应处理。注意Watch函数运行时,并没有初始化工作队列Queue,因为src.Start之后只是使用Cache初始化informer并注册事件处理函数,之后会提到。

// Watch implements controller.Controller.
func (c *Controller) Watch(src source.Source, evthdler handler.EventHandler, prct ...predicate.Predicate) error {
	c.mu.Lock()
	defer c.mu.Unlock()

	// Controller hasn't started yet, store the watches locally and return.
	//
	// These watches are going to be held on the controller struct until the manager or user calls Start(...).
	if !c.Started {
		c.startWatches = append(c.startWatches, watchDescription{src: src, handler: evthdler, predicates: prct})
		return nil
	}

	c.LogConstructor(nil).Info("Starting EventSource", "source", src)
	return src.Start(c.ctx, evthdler, c.Queue, prct...)
}

Source也是一个抽象接口,看看定义,文件在controller-runtime@v0.15.0/pkg/source/source.go。里面只有一个Start函数的定义。

// Source is a source of events (eh.g. Create, Update, Delete operations on Kubernetes Objects, Webhook callbacks, etc)
// which should be processed by event.EventHandlers to enqueue reconcile.Requests.
//
// * Use Kind for events originating in the cluster (e.g. Pod Create, Pod Update, Deployment Update).
//
// * Use Channel for events originating outside the cluster (eh.g. GitHub Webhook callback, Polling external urls).
//
// Users may build their own Source implementations.
type Source interface {
	// Start is internal and should be called only by the Controller to register an EventHandler with the Informer
	// to enqueue reconcile.Requests.
	Start(context.Context, handler.EventHandler, workqueue.RateLimitingInterface, ...predicate.Predicate) error
}

我们再看看Watch函数是如何被training-operator调用的,文件在pkg/controller.v1/tensorflow/tfjob_controller.go。可以看到

  • Kind结构体作为实参,Kind应该是Source抽象接口的一个实现。kubeflowv1.TFJob{}就是我们想关注的资源对象,Kind对kubeflowv1.TFJob{}进行了包装。
  • handler.EnqueueRequestForObject{}是抽象接口EventHandler的具体实现,有Create,Delete,Update等函数,后面会提到。
  • predicate.Funcs{CreateFunc: r.onOwnerCreateFunc()}是用户提供的断言函数,用于判断相关事件是否有必要推入队列,后面也会提到。
	// using onOwnerCreateFunc is easier to set defaults
	if err = c.Watch(source.Kind(mgr.GetCache(), &kubeflowv1.TFJob{}), &handler.EnqueueRequestForObject{},
		predicate.Funcs{CreateFunc: r.onOwnerCreateFunc()},
	); err != nil {
		return err
	}

看看Kind的具体实现,文件在controller-runtime@v0.15.0/pkg/internal/source/kind.go。Type便是我们想观察的具体类型,Kind为其做了封装,增加了一个实参来自manager的Cache,可以提供informer,毕竟Kind要实现Start这么重要的函数,肯定得需要相应的工具包。

// Kind is used to provide a source of events originating inside the cluster from Watches (e.g. Pod Create).
type Kind struct {
	// Type is the type of object to watch.  e.g. &v1.Pod{}
	Type client.Object

	// Cache used to watch APIs
	Cache cache.Cache

	// started may contain an error if one was encountered during startup. If its closed and does not
	// contain an error, startup and syncing finished.
	started     chan error
	startCancel func()
}

结合上面的分析可以知道,Controller.Watch实际会调用Kind.Start。Kind.Start的实现就在下面。核心函数是调用i.AddEventHandler来监听资源的变动并通过handler来进行相应的处理。使用过informer的应该对AddEventHandler不陌生,注意informer调用的增删改回调函数都是发生后才通知,也就是说资源对象已经发生了增删改事件。i就是一个informer,使用Kind.Cache.GetInformer来初始化。

// Start is internal and should be called only by the Controller to register an EventHandler with the Informer
// to enqueue reconcile.Requests.
func (ks *Kind) Start(ctx context.Context, handler handler.EventHandler, queue workqueue.RateLimitingInterface,
	// ...

	// cache.GetInformer will block until its context is cancelled if the cache was already started and it can not
	// sync that informer (most commonly due to RBAC issues).
	ctx, ks.startCancel = context.WithCancel(ctx)
	ks.started = make(chan error)
	go func() {
		var (
			i       cache.Informer
			lastErr error
		)

		// ...
		i, lastErr = ks.Cache.GetInformer(ctx, ks.Type)
		// ...

		_, err := i.AddEventHandler(NewEventHandler(ctx, queue, handler, prct).HandlerFuncs())
		// ...
	}()

	return nil
}

NewEventHandler函数初始化一个EventHandler,成员包括一个事件处理handler,一个限速的工作队列queue,还有一堆判断函数predicates。注意这里的EventHandler是一个结构体,其成员之一的handler.EventHandler是抽象接口,而这虽然命名相同,但并不无关系,不要搞混了。handler就是前面提到的handler.EnqueueRequestForObject{},predicates就是前面提到的predicate.Funcs{CreateFunc: r.onOwnerCreateFunc()},这两个都是用户提供的。OnAdd函数预先使用predicates进行判断,成功通过判断函数后,最终会调用handler的Create函数,把reconcile.request推入队列。OnUpdate和OnDelete函数,逻辑都是类似的。

// EventHandler adapts a handler.EventHandler interface to a cache.ResourceEventHandler interface.
type EventHandler struct {
	// ctx stores the context that created the event handler
	// that is used to propagate cancellation signals to each handler function.
	ctx context.Context

	handler    handler.EventHandler
	queue      workqueue.RateLimitingInterface
	predicates []predicate.Predicate
}

// HandlerFuncs converts EventHandler to a ResourceEventHandlerFuncs
// TODO: switch to ResourceEventHandlerDetailedFuncs with client-go 1.27
func (e *EventHandler) HandlerFuncs() cache.ResourceEventHandlerFuncs {
	return cache.ResourceEventHandlerFuncs{
		AddFunc:    e.OnAdd,
		UpdateFunc: e.OnUpdate,
		DeleteFunc: e.OnDelete,
	}
}

// OnAdd creates CreateEvent and calls Create on EventHandler.
func (e *EventHandler) OnAdd(obj interface{}) {
	c := event.CreateEvent{}

	// ...

	for _, p := range e.predicates {
		if !p.Create(c) {
			return
		}
	}

	// Invoke create handler
	ctx, cancel := context.WithCancel(e.ctx)
	defer cancel()
	e.handler.Create(ctx, c, e.queue)
}

看看EnqueueRequestForObject是如何实现Create函数的,文件在controller-runtime@v0.15.0/pkg/handler/enqueue.go。函数逻辑非常简单,而且顾名思义,就是把对象本身的namespace和name作为reconcile.Request推入工作队列。与之对应的还有一个enqueueRequestForOwner,这个则是把对象的owner的namespace和name作为reconcile.Request推入工作队列。对pod和service的Watch需要使用enqueueRequestForOwner,因为pod和service的结构体里面有ownRerference字段来标记其owner(TFJob);而对于TFJob本身的Watch,则使用EnqueueRequestForObject。tfjob_controller.go里面就是这样用的。

// EnqueueRequestForObject enqueues a Request containing the Name and Namespace of the object that is the source of the Event.
// (e.g. the created / deleted / updated objects Name and Namespace).  handler.EnqueueRequestForObject is used by almost all
// Controllers that have associated Resources (e.g. CRDs) to reconcile the associated Resource.
type EnqueueRequestForObject struct{}

// Create implements EventHandler.
func (e *EnqueueRequestForObject) Create(ctx context.Context, evt event.CreateEvent, q workqueue.RateLimitingInterface) {
	if evt.Object == nil {
		enqueueLog.Error(nil, "CreateEvent received with no metadata", "event", evt)
		return
	}
	q.Add(reconcile.Request{NamespacedName: types.NamespacedName{
		Name:      evt.Object.GetName(),
		Namespace: evt.Object.GetNamespace(),
	}})
}

看看training-operator提供的断言函数predicate.Funcs{CreateFunc: r.onOwnerCreateFunc()}。逻辑很直接,只要是TFJob,那么就判断为true。因为informer通知的时候资源已经发生了改动,因此状态标记为JobCreated。

// onOwnerCreateFunc modify creation condition.
func (r *TFJobReconciler) onOwnerCreateFunc() func(event.CreateEvent) bool {
	return func(e event.CreateEvent) bool {
		tfJob, ok := e.Object.(*kubeflowv1.TFJob)
		if !ok {
			return true
		}

		r.Scheme.Default(tfJob)
		msg := fmt.Sprintf("TFJob %s is created.", e.Object.GetName())
		logrus.Info(msg)
		trainingoperatorcommon.CreatedJobsCounterInc(tfJob.Namespace, r.GetFrameworkName())
		commonutil.UpdateJobConditions(&tfJob.Status, kubeflowv1.JobCreated, corev1.ConditionTrue, commonutil.NewReason(kubeflowv1.TFJobKind, commonutil.JobCreatedReason), msg)
		return true
	}
}

Controller.Start

Watch函数讲完了,一言以蔽之,那就是注册informer。然后我们再来看看Start函数,位置在controller-runtime@v0.15.0/pkg/internal/controller/controller.go。运行前先加锁,保证同时只有一个Controller在运行。使用MakeQueue对Queue进行初始化(Watch的时候没有初始化)。然后对startWatches里的每个对象再次执行src.Start(之前在Controller.Watch时调用了一次)。可以有MaxConcurrentReconciles个同时执行processNextWorkItem从Queue中取出reconcile.request进行消费。

// Start implements controller.Controller.
func (c *Controller) Start(ctx context.Context) error {
	// use an IIFE to get proper lock handling
	// but lock outside to get proper handling of the queue shutdown
	c.mu.Lock()
	if c.Started {
		return errors.New("controller was started more than once. This is likely to be caused by being added to a manager multiple times")
	}

	c.initMetrics()

	// Set the internal context.
	c.ctx = ctx

	c.Queue = c.MakeQueue()
	go func() {
		<-ctx.Done()
		c.Queue.ShutDown()
	}()

	wg := &sync.WaitGroup{}
	err := func() error {
		defer c.mu.Unlock()

		// TODO(pwittrock): Reconsider HandleCrash
		defer utilruntime.HandleCrash()

		// NB(directxman12): launch the sources *before* trying to wait for the
		// caches to sync so that they have a chance to register their intendeded
		// caches.
		for _, watch := range c.startWatches {
			c.LogConstructor(nil).Info("Starting EventSource", "source", fmt.Sprintf("%s", watch.src))

			if err := watch.src.Start(ctx, watch.handler, c.Queue, watch.predicates...); err != nil {
				return err
			}
		}

		// Start the SharedIndexInformer factories to begin populating the SharedIndexInformer caches
		c.LogConstructor(nil).Info("Starting Controller")

		for _, watch := range c.startWatches {
			syncingSource, ok := watch.src.(source.SyncingSource)
			if !ok {
				continue
			}

			if err := func() error {
				// use a context with timeout for launching sources and syncing caches.
				sourceStartCtx, cancel := context.WithTimeout(ctx, c.CacheSyncTimeout)
				defer cancel()

				// WaitForSync waits for a definitive timeout, and returns if there
				// is an error or a timeout
				if err := syncingSource.WaitForSync(sourceStartCtx); err != nil {
					err := fmt.Errorf("failed to wait for %s caches to sync: %w", c.Name, err)
					c.LogConstructor(nil).Error(err, "Could not wait for Cache to sync")
					return err
				}

				return nil
			}(); err != nil {
				return err
			}
		}

		// All the watches have been started, we can reset the local slice.
		//
		// We should never hold watches more than necessary, each watch source can hold a backing cache,
		// which won't be garbage collected if we hold a reference to it.
		c.startWatches = nil

		// Launch workers to process resources
		c.LogConstructor(nil).Info("Starting workers", "worker count", c.MaxConcurrentReconciles)
		wg.Add(c.MaxConcurrentReconciles)
		for i := 0; i < c.MaxConcurrentReconciles; i++ {
			go func() {
				defer wg.Done()
				// Run a worker thread that just dequeues items, processes them, and marks them done.
				// It enforces that the reconcileHandler is never invoked concurrently with the same object.
				for c.processNextWorkItem(ctx) {
				}
			}()
		}

		c.Started = true
		return nil
	}()
	if err != nil {
		return err
	}

	<-ctx.Done()
	c.LogConstructor(nil).Info("Shutdown signal received, waiting for all workers to finish")
	wg.Wait()
	c.LogConstructor(nil).Info("All workers finished")
	return nil
}

processNextWorkItem的代码如下,实际上调用了reconcileHandler。

// processNextWorkItem will read a single work item off the workqueue and
// attempt to process it, by calling the reconcileHandler.
func (c *Controller) processNextWorkItem(ctx context.Context) bool {
	obj, shutdown := c.Queue.Get()
	if shutdown {
		// Stop working
		return false
	}

	// We call Done here so the workqueue knows we have finished
	// processing this item. We also must remember to call Forget if we
	// do not want this work item being re-queued. For example, we do
	// not call Forget if a transient error occurs, instead the item is
	// put back on the workqueue and attempted again after a back-off
	// period.
	defer c.Queue.Done(obj)

	ctrlmetrics.ActiveWorkers.WithLabelValues(c.Name).Add(1)
	defer ctrlmetrics.ActiveWorkers.WithLabelValues(c.Name).Add(-1)

	c.reconcileHandler(ctx, obj)
	return true
}

reconcileHandler的代码如下,实际调用了Reconcile函数。

func (c *Controller) reconcileHandler(ctx context.Context, obj interface{}) {
	// Update metrics after processing each item
	reconcileStartTS := time.Now()
	defer func() {
		c.updateMetrics(time.Since(reconcileStartTS))
	}()

	// Make sure that the object is a valid request.
	req, ok := obj.(reconcile.Request)
	// ...

	log := c.LogConstructor(&req)
	reconcileID := uuid.NewUUID()

	log = log.WithValues("reconcileID", reconcileID)
	ctx = logf.IntoContext(ctx, log)
	ctx = addReconcileID(ctx, reconcileID)

	// RunInformersAndControllers the syncHandler, passing it the Namespace/Name string of the
	// resource to be synced.
	result, err := c.Reconcile(ctx, req)
	// ...
}

最终,c.Do.Reconcile(ctx, req)便是用户写的Reconclie函数。

// Reconcile implements reconcile.Reconciler.
func (c *Controller) Reconcile(ctx context.Context, req reconcile.Request) (_ reconcile.Result, err error) {
	defer func() {
		if r := recover(); r != nil {
			if c.RecoverPanic != nil && *c.RecoverPanic {
				for _, fn := range utilruntime.PanicHandlers {
					fn(r)
				}
				err = fmt.Errorf("panic: %v [recovered]", r)
				return
			}

			log := logf.FromContext(ctx)
			log.Info(fmt.Sprintf("Observed a panic in reconciler: %v", r))
			panic(r)
		}
	}()
	return c.Do.Reconcile(ctx, req)
}
controller-runtime 库中,可以使用 Client 接口的 List 方法来列出指定类型的资源。以下是一个具体的示例: ```go package main import ( "context" "fmt" "os" corev1 "k8s.io/api/core/v1" "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime" "k8s.io/apimachinery/pkg/runtime/schema" "k8s.io/client-go/kubernetes" "k8s.io/client-go/tools/clientcmd" "sigs.k8s.io/controller-runtime/pkg/client" "sigs.k8s.io/controller-runtime/pkg/client/config" ) func main() { // 获取 Kubernetes 配置 kubeconfig := config.GetConfigOrDie() // 通过 Kubernetes 配置创建 clientset clientset, err := kubernetes.NewForConfig(kubeconfig) if err != nil { fmt.Fprintf(os.Stderr, "Failed to create clientset: %v\n", err) os.Exit(1) } // 通过 Kubernetes 配置创建 client client, err := client.New(kubeconfig, client.Options{}) if err != nil { fmt.Fprintf(os.Stderr, "Failed to create client: %v\n", err) os.Exit(1) } // 定义要列出的资源类型和命名空间 namespace := "default" gvk := schema.GroupVersionKind{Version: "v1", Kind: "Pod"} // 创建一个对象列表来存储结果 objectList := &corev1.PodList{} // 使用 client 的 List 方法列出指定类型的资源 err = client.List(context.Background(), objectList, &client.ListOptions{ Namespace: namespace, Raw: &metav1.ListOptions{}, ResourceVersion: "", }) if err != nil { if errors.IsNotFound(err) { fmt.Fprintf(os.Stderr, "Resource not found: %v\n", err) os.Exit(1) } else { fmt.Fprintf(os.Stderr, "Failed to list resources: %v\n", err) os.Exit(1) } } // 输出结果 fmt.Printf("Found %d pods in namespace %s:\n", len(objectList.Items), namespace) for _, obj := range objectList.Items { fmt.Printf(" %s\n", obj.Name) } } ``` 这个示例演示了如何使用 controller-runtime 库中的 Client 接口来列出指定命名空间中的 Pod 资源。你可以根据自己的需求修改示例代码中的参数来获取不同类型的资源。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值