问题描述
今天下午本来修复bug,但是无缘无故看到同事写的代码这里日志一直报错,同事还说之前没遇到这类错误,以至于我以为是我修改了哪里导致了bug
代码如下,如果一个k8s的 Reconcile 结构体,用来同步node相关的信息
type ReconcileNode struct {
Clientset kubernetes.Interface
Client client.Client
}
err = rn.Client.Update(ctx, node)
if err != nil {
log.Print(err) # 这里一直报错
return reconcile.Result{}, err
}
_, err = rn.Clientset.CoreV1().Nodes().Update(ctx, node, metav1.UpdateOptions{})
if err != nil {
log.Print(err)
return reconcile.Result{}, err
}
报错信息
高亮部分发现一直在打印报错信息如下
Error: Operation cannot be fulfilled on nodes "sxt007-ai71vp" : the object has been modified; please apply your changes to the latest version and try again>
第一时间在网上排查发现前人遇到过类似的问题:
https://github.com/kubernetes-sigs/controller-runtime/issues/1748
原因
原因大概是因为同一时间有其他的 controller 对这个资源进行了操作,导致无法更新
This should be caused by the modification of
other controllers at the same time
解决办法
解决方法也有如下两个方法:
- 不要使用 Update,更新统一使用 Patch 方法进行更新
- use
RetryOnConflict
function that is provided in"k8s.io/client-go/util/retry"
to get the object and update it again when run into conflict.
深度思考
明白了上面问题的原因后我们仔细思考,因为我们线上服务多实例,但是只有一个主实例会做 Reconcile 操作,理论上是不会存在其他的 controller 来对这同一资源进行操作的;而且与同事交流,之前确实没有出现过类似的报错
但是我们观察对资源更新操作的时候,发现我们的资源更新的 client 居然是混着用的,下面这两者都有使用
第一种:controller-runtime
"sigs.k8s.io/controller-runtime/pkg/client"
client.Client
第二种: client-go
"k8s.io/client-go/kubernetes"
kubernetes.Interface
于是我们将问题的怀疑源头转移到这两个 client 上,是不是这两个 client 混用导致的,这两个client 的区别到底在哪里呢
继续上网看看前人是否遇到相关的问题,我们找了个这篇文章
https://gardener.cloud/docs/gardener/kubernetes-clients/
client-go
client-go is the default/official client for talking to the Kubernetes API in Golang. It features the so called “client sets” for all built-in Kubernetes API groups and versions (e.g. v1
(aka core/v1
), apps/v1
). client-go clients are generated from the built-in API types using client-gen and are composed of interfaces for every known API GroupVersionKind. A typical client-go usage looks like this:
var (
ctx context.Context
c kubernetes.Interface // "k8s.io/client-go/kubernetes"
deployment *appsv1.Deployment // "k8s.io/api/apps/v1"
)
updatedDeployment, err := c.AppsV1().Deployments("default").Update(ctx, deployment, metav1.UpdateOptions{})
Important characteristics of client-go clients:
- clients are specific to a given API GroupVersionKind, i.e., clients are hard-coded to corresponding API-paths (don’t need to use the discovery API to map GVK to a REST endpoint path).
- client’s don’t modify the passed in-memory object (e.g.
deployment
in the above example). Instead, they return a new in-memory object. This means that controllers have to continue working with the new in-memory object or overwrite the shared object to not lose any state updates.
controller-runtime
controller-runtime is a Kubernetes community project (kubebuilder subproject) for building controllers and operators for custom resources. Therefore, it features a generic client that follows a different approach and does not rely on generated client sets. Instead, the client can be used for managing any Kubernetes resources (built-in or custom) homogeneously. For example:
var (
ctx context.Context
c client.Client // "sigs.k8s.io/controller-runtime/pkg/client"
deployment *appsv1.Deployment // "k8s.io/api/apps/v1"
)
err := c.Update(ctx, deployment)
// or
err = c.Update(ctx, shoot)
Important characteristics of controller-runtime clients:
-
The client writes back results from the API server into the passed in-memory object.
- This means that controllers don’t have to worry about copying back the results and should just continue to work on the given in-memory object.
- This is a nice and flexible pattern, and helper functions should try to follow it wherever applicable. Meaning, if possible accept an object param, pass it down to clients and keep working on the same in-memory object instead of creating a new one in your helper function.
- The benefit is that you don’t lose updates to the API object and always have the last-known state in memory. Therefore, you don’t have to read it again, e.g., for getting the current
resourceVersion
when working with optimistic locking, and thus minimize the chances for running into conflicts.
-
Note that the underlying Informers of a controller-runtime cache (
cache.Cache
) and the ones of aSharedInformerFactory
(client-go) are not related in any way. Both createInformers
and watch objects on the API server individually. This means that if you read the same object from different cache implementations, you may receive different versions of the object because the watch connections of the individual Informers are not synced.- Because of this, controllers/reconcilers should get the object from the same cache in the reconcile loop, where the
EventHandler
was also added to set up the controller. For example, if aSharedInformerFactory
is used for setting up the controller then read the object in the reconciler from theLister
instead of from a cached controller-runtime client. - By default, the
client.Client
created by a controller-runtimeManager
is aDelegatingClient
. It delegatesGet
andList
calls to aCache
, and all other calls to a client that talks directly to the API server.
- Because of this, controllers/reconcilers should get the object from the same cache in the reconcile loop, where the
Important characteristics of cached controller-runtime clients:
- Like for Listers, objects read from a controller-runtime cache can always be slightly out of date. Hence, don’t base any important decisions on data read from the cache.
- There is no interaction with the cache on writing calls (
Create
,Update
,Patch
andDelete
), see below.
结论
从上面我们得知下面几点:
- client-go 的 client 不会修改内存中的对象,而是基于一个操作的对象返回一个新的修改后的对象
- controller-runtime 的缓存
cache.Cache
和 client-go 的缓存SharedInformerFactory
是完全不一样的两个东西 - 如果你是使用 controller-runtime ,那就不要从 infromer 来查缓存信息;如果你是使用 client-go,那就不要从 cache 里面来查缓存信息;换言之不要混着用
- controller-runtime 的 Get 和 List 操作都是直接从 Cache 里面获取的,其他操作 Create, Update, Patch, Delete 都是直接与 API-Server 进行通信;所以对于 controller-runtime 如果想获取最新的集群信息,可以使用 kubernetes.Interface 来获取,但是更新还是使用 client.Update 即可
- 由于 controller-runtime 的缓存可能会与真实的数据存在差异,不要基于缓存做任何重要的操作