kubebuilder实战笔记

关于kubebuilder

  1. 在实际工作中,对kubernetes的资源执行各种个性化配置和控制是很常见的需求,例如自定义镜像的pod如何控制副本数、主从关系,以及各种自定义资源的控制等;
  2. 对于上述需求,很适合使用Operator 模式来解决,这里有官方对Operator的介绍:https://kubernetes.io/zh/docs/concepts/extend-kubernetes/operator/ ,Operator模式的执行流程如下图所示:

kubectl v1.24.2

golang v1.18.5

docker v1.20.9 containerd v1.6.6

kustmize v3.8.7

kubebuilder v3.6.0

使用kubebuilder

在此环境创建CRD和Controller,再部署到kubernetes环境并且验证是否生效

  1. 创建API(CRD和Controller)
  2. 构建和部署CRD
  3. 编译和运行controller
  4. 创建CRD对应的实例
  5. 删除实例并停止controller
  6. 将controller制作成docker镜像
  7. 卸载和清理

案例1 创建helloworld项目

[root@k8s-worker02 k8s-operator]# mkdir -p $GOPATH/src/helloworld
[root@k8s-worker02 k8s-operator]# cd $GOPATH/src/helloworld
[root@k8s-worker02 helloworld]# kubebuilder init --domain wu123
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
Get controller runtime:
$ go get sigs.k8s.io/controller-runtime@v0.12.2
Update dependencies:
$ go mod tidy
Next: define a resource with:
$ kubebuilder create api
[root@k8s-worker02 helloworld]# tree $GOPATH/src/helloworld
/home/gopath/src/helloworld
├── config
│   ├── default
│   │   ├── kustomization.yaml
│   │   ├── manager_auth_proxy_patch.yaml
│   │   └── manager_config_patch.yaml
│   ├── manager
│   │   ├── controller_manager_config.yaml
│   │   ├── kustomization.yaml
│   │   └── manager.yaml
│   ├── prometheus
│   │   ├── kustomization.yaml
│   │   └── monitor.yaml
│   └── rbac
│       ├── auth_proxy_client_clusterrole.yaml
│       ├── auth_proxy_role_binding.yaml
│       ├── auth_proxy_role.yaml
│       ├── auth_proxy_service.yaml
│       ├── kustomization.yaml
│       ├── leader_election_role_binding.yaml
│       ├── leader_election_role.yaml
│       ├── role_binding.yaml
│       └── service_account.yaml
├── Dockerfile
├── go.mod
├── go.sum
├── hack
│   └── boilerplate.go.txt
├── main.go
├── Makefile
├── PROJECT
└── README.md

6 directories, 25 files

创建API(CRD和Controller)

  1. 接下来要创建资源相关的内容了,group/version/kind这三部分可以确定资源的唯一身份,命令如下:
[root@k8s-worker02 helloworld]# kubebuilder create api \
> --group webapp \
> --version v1 \
> --kind Guestbook
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/guestbook_types.go
controllers/guestbook_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
mkdir -p /home/gopath/src/helloworld/bin
test -s /home/gopath/src/helloworld/bin/controller-gen || GOBIN=/home/gopath/src/helloworld/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/src/helloworld/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests

构建和部署CRD

  1. kubebuilder提供的Makefile将构建和部署工作大幅度简化,执行以下命令会将最新构建的CRD部署在kubernetes上:
[root@k8s-worker02 helloworld]# make install
test -s /home/gopath/src/helloworld/bin/controller-gen || GOBIN=/home/gopath/src/helloworld/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/src/helloworld/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/src/helloworld/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/guestbooks.webapp.wu123 created

编译和运行controller

  1. kubebuilder自动生成的controller源码地址是:$GOPATH/src/helloworld/controllers/guestbook_controller.go , 内容如下:
package controllers

import (
	"context"
	"runtime/debug"

	// "github.com/go-logr/logr"
	"k8s.io/apimachinery/pkg/runtime"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/log"

	webappv1 "helloworld/api/v1"
)

// GuestbookReconciler reconciles a Guestbook object
type GuestbookReconciler struct {
	client.Client
	Scheme *runtime.Scheme
}

//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks/finalizers,verbs=update

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the Guestbook object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.2/pkg/reconcile
func (r *GuestbookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	_ = log.FromContext(ctx)

	// TODO(user): your logic here
	klog.Info(req)

	return ctrl.Result{}, nil
}

// SetupWithManager sets up the controller with the Manager.
func (r *GuestbookReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&webappv1.Guestbook{}).
		Complete(r)
}
[root@k8s-worker02 helloworld]# make run
test -s /home/gopath/src/helloworld/bin/controller-gen || GOBIN=/home/gopath/src/helloworld/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/src/helloworld/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/src/helloworld/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
I0420 13:38:25.948445   88197 request.go:601] Waited for 1.046481468s due to client-side throttling, not priority and fairness, request: GET:https://192.168.204.129:6443/apis/crd.projectcalico.org/v1?timeout=32s
1.681969106652736e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
1.681969106654165e+09	INFO	setup	starting manager
1.6819691066547165e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.681969106654819e+09	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
1.6819691066549478e+09	INFO	Starting EventSource	{"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook", "source": "kind source: *v1.Guestbook"}
1.6819691066549792e+09	INFO	Starting Controller	{"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook"}
1.681969106756659e+09	INFO	Starting workers	{"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook", "worker count": 1}

对上面的代码仅做少量修改,用于验证是否能生效

1.增加两个依赖包

import (
	"context"
	"fmt"	
	"runtime/debug"

	// "github.com/go-logr/logr"
	"k8s.io/apimachinery/pkg/runtime"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/log"

	webappv1 "helloworld/api/v1"
)
// GuestbookReconciler reconciles a Guestbook object
type GuestbookReconciler struct {
	client.Client
	// Log    logr.Logger
	Scheme *runtime.Scheme
}

2.打印入参   // 打印堆栈

func (r *GuestbookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	_ = log.FromContext(ctx)

	// TODO(user): your logic here
	klog.Info(req)

	return ctrl.Result{}, nil
}

执行以下make run命令,会编译并启动刚才修改的controller:

[root@k8s-worker02 helloworld]# make run
test -s /home/gopath/src/helloworld/bin/controller-gen || GOBIN=/home/gopath/src/helloworld/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/src/helloworld/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/src/helloworld/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
I0420 14:30:50.652360  105538 request.go:601] Waited for 1.047814477s due to client-side throttling, not priority and fairness, request: GET:https://192.168.204.129:6443/apis/storage.k8s.io/v1beta1?timeout=32s
1.6819722513574438e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
1.6819722513579834e+09	INFO	setup	starting manager
1.6819722513587043e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.681972251358751e+09	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
1.681972251358884e+09	INFO	Starting EventSource	{"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook", "source": "kind source: *v1.Guestbook"}
1.6819722513589082e+09	INFO	Starting Controller	{"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook"}
1.68197225146097e+09	INFO	Starting workers	{"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook", "worker count": 1}
I0420 14:30:51.461177  105538 guestbook_controller.go:54] default/guestbook-sample

创建Guestbook资源的实例

  1. 现在kubernetes已经部署了Guestbook类型的CRD,而且对应的controller也已正在运行中,可以尝试创建Guestbook类型的实例了(相当于有了pod的定义后,才可以创建pod);
  2. kubebuilder已经自动创建了一个类型的部署文件:$GOPATH/src/helloworld/config/samples/webapp_v1_guestbook.yaml ,内容如下,很简单,接下来咱们就用这个文件来创建Guestbook实例:
```yaml
apiVersion: webapp.wu123/v1
kind: Guestbook
metadata:
  name: guestbook-sample
spec:
  # TODO(user): Add fields here
```

执行以下命令即可创建Guestbook类型的实例:

[root@k8s-worker02 helloworld]# kubectl apply -f config/samples/
guestbook.webapp.wu123/guestbook-sample created

用kubectl get命令可以看到实例已经创建:

[root@k8s-worker02 helloworld]# kubectl get Guestbook
NAME               AGE
guestbook-sample   34s

用命令kubectl edit Guestbook guestbook-sample编辑该实例

查看controller日志

删除实例并停止controller

kubectl delete -f config/samples/

[root@k8s-worker02 helloworld]# kubectl get Guestbook
NAME               AGE
guestbook-sample   59m
[root@k8s-worker02 helloworld]# kubectl apply -f config/samples
guestbook.webapp.wu123/guestbook-sample configured
[root@k8s-worker02 helloworld]# kubectl delete -f config/samples/
guestbook.webapp.wu123 "guestbook-sample" deleted

将controller制作成docker镜像

上面这种运行在kubernetes之外的方式 本地运行

将其做成docker镜像然后在kubernetes环境运行

cd $GOPATH/src/helloworld
make docker-build docker-push IMG=wu123/guestbook:v0.1

镜像准备好之后,执行以下命令即可在kubernetes环境部署controller

make deploy IMG=wu123/guestbook:v0.1

控制台会提示各类资源被创建(rbac居多)

此时去看kubernetes环境的pod,发现确实已经新增了controller

这个pod实际上有两个容器,用kubectl describe命令细看,分别是kube-rbac-proxy和manager

由于有两个容器,那么查看日志时就要指定其中一个了,咱们的controller对应的是manager容器,因此查看日志的命令是:

kubectl logs -f \
helloworld-controller-manager-689d4b6f5b-h9pzg \
-n helloworld-system \
-c manager

再次创建Guestbook资源的实例,依旧是kubectl apply -f config/samples/命令,再去看manager容器的日志,可见修改的内容已经打印出来了:

想把前面创建的资源和CRD全部清理掉,可以执行以下命令

cd $GOPATH/src/helloworld
make uninstall

基础知识

Kubernetes的Group、Version、Resource

会用client对象操作kubernetes资源

RESTClient、Clientset、dynamicClient、DiscoveryClien

案例2 elasticweb

 从0到1使用kubebuiler开发operator _ - 哔哩哔哩

在目录中用go mod init elasticweb命令新建名为elasticweb的工程

[root@k8s-worker02 gopath]# mkdir elasticweb
[root@k8s-worker02 gopath]# cd elasticweb
[root@k8s-worker02 elasticweb]# go mod init elasticweb
go: creating new go.mod: module elasticweb

执行kubebuilder init --domain wu123.com,即可新建operator工程

[root@k8s-worker02 elasticweb]# kubebuilder init --domain wu123.com
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
Get controller runtime:
$ go get sigs.k8s.io/controller-runtime@v0.12.2
Update dependencies:
$ go mod tidy
Next: define a resource with:
$ kubebuilder create api

基础设施

  • operator工程新建完成后,会新增不少文件和目录,以下几个是官方提到的基础设施:
  1. go.mod:module的配置文件,里面已经填充了几个重要依赖;
  2. Makefile:非常重要的工具,前文咱们也用过了,编译构建、部署、运行都会用到;
  3. PROJECT:kubebuilder工程的元数据,在生成各种API的时候会用到这里面的信息;
  4. config/default:基于kustomize制作的配置文件,为controller提供标准配置,也可以按需要去修改调整;
  5. config/manager:一些和manager有关的细节配置,例如镜像的资源限制;
  6. config/rbac:顾名思义,如果像限制operator在kubernetes中的操作权限,就要通过rbac来做精细的权限配置了,这里面就是权限配置的细节;

main.go

  • main.go是kubebuilder自动生成的代码,这是operator的启动代码,里面有几处值得注意:
  1. 两个全局变量,如下所示,setupLog用于输出日志无需多说,scheme也是常用的工具,它提供了Kind和Go代码中的数据结构的映射,:
var (
	scheme   = runtime.NewScheme()
	setupLog = ctrl.Log.WithName("setup")
)

另外还有些设置,例如监控指标相关的,以及管理controller和webhook的manager,它会一直运行下去直到被外部终止,关于这个manage还有一处要注意的地方,就是它的参数,下图是默认的参数,如果您想让operator在指定namespace范围内生效,还可以在下面的地方新增Namespace参数,如果要指定多个nanespace,就使用cache.MultiNamespacedCacheBuilder(namespaces)参数:

	mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
		Scheme:                 scheme,
		MetricsBindAddress:     metricsAddr,
		Port:                   9443,
		HealthProbeBindAddress: probeAddr,
		LeaderElection:         enableLeaderElection,
		LeaderElectionID:       "65990fce.wu123.com",
		// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
		// when the Manager ends. This requires the binary to immediately end when the
		// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
		// speeds up voluntary leader transitions as the new leader don't have to wait
		// LeaseDuration time first.
		//
		// In the default scaffold provided, the program ends immediately after
		// the manager stops, so would be fine to enable this option. However,
		// if you are doing or is intended to do any operation such as perform cleanups
		// after the manager stops then its usage might be unsafe.
		// LeaderElectionReleaseOnCancel: true,
	})

API相关(数据核心)

  • API是operator的核心,当您决定使用operator时,就应该从真实需求出发,开始设计整个CRD,而这些设计最终体现在CRD的数据结构,以及对真实值和期望值的处理逻辑中;

创建过API,当时的命令是

kubebuilder create api \
--group webapp \
--version v1 \
--kind Guestbook
  • 新增的内容中,最核心的当然是CRD了,也就是图中Guestbook数据结构所在的guestbook_types.go文件,这个最重要的数据结构如下:
// Guestbook is the Schema for the guestbooks API
type Guestbook struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   GuestbookSpec   `json:"spec,omitempty"`
	Status GuestbookStatus `json:"status,omitempty"`
}
  1. metav1.TypeMeta:保存了资源的Group、Version、Kind
  2. metav1.ObjectMeta:保存了资源对象的名称和namespace
  3. Spec:期望状态,例如deployment在创建时指定了pod有三个副本
  4. Status:真实状态,例如deployment在创建后只有一个副本(其他的还没有创建成功),大多数资源对象都有此字段,不过ConfigMap是个例外(配置信息配成啥就是啥,没有什么期望值和真实值);
  • 还有一个数据结构,就是Guestbook对应的列表GuestbookList,就是单个资源对象的集合;
  • guestbook_types.go所在目录下还有两个文件:groupversion_info.go定义了Group和Version,以及注册到scheme时用到的实例SchemeBuilder,zz_generated.deepcopy.go用于实现实例的深拷贝,它们都无需修改,了解即可;

controller相关(业务核心)

  • 如何实现业务需求,在operator开发过程中,尽管业务逻辑各不相同,但有两个共性:
  1. Status(真实状态)是个数据结构,其字段是业务定义的,其字段值也是业务代码执行自定义的逻辑算出来的;
  2. 业务核心的目标,是确保Status与Spec达成一致,例如deployment指定了pod的副本数为3,如果真实的pod没有三个,deployment的controller代码就去创建pod,如果真实的pod超过了三个,deployment的controller代码就去删除pod;
  • 以上就是controller要做的事情,接下来看看代码的细节,kubebuilder创建的guestbook_controller.go就是controller,业务代码都写在这个文件中

数据结构定义,如下所示,操作资源对象时用到的客户端工具client.Client、日志工具、Kind和数据结构的关系Scheme,这些都准备好了

// GuestbookReconciler reconciles a Guestbook object
type GuestbookReconciler struct {
	client.Client
	Scheme *runtime.Scheme
}
  1. SetupWithManager方法,在main.go中有调用,指定了Guestbook这个资源的变化会被manager监控,从而触发Reconcile方法:
// SetupWithManager sets up the controller with the Manager.
func (r *GuestbookReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&webappv1.Guestbook{}).
		Complete(r)
}

如下,Reconcile方法前面有一些+kubebuilder:rbac前缀的注释,这些是用来确保controller在运行时有对应的资源操作权限,例如//+kubebuilder:rbac:groups=core,resources=pod,verbs=get就是自己添加的,这样controller就有权查询pod资源对象了:

//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks/finalizers,verbs=update
//+kubebuilder:rbac:groups=core,resources=pod,verbs=get

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the Guestbook object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.2/pkg/reconcile
func (r *GuestbookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	_ = log.FromContext(ctx)

	// TODO(user): your logic here
	klog.Info(req)

	return ctrl.Result{}, nil
}
  • guestbook_controller.go是operator的业务核心,而controller的核心是其Reconcile方法,将来大部分代码都是写在这里面,主要做的事情就是获取status,然后让status和spec达成一致;
  • 关于status,官方的一段描述值得重视,如下图红框,主要是说资源对象的状态应该是每次重新计算出来的,这里以deployment为例,想知道当前有多少个pod,有两种方法,第一种准备个字段记录,每次对pod的增加和删除都修改这个字段,于是读这个字段就知道pod数量了,第二种方法是每次用client工具去实时查询pod数量,目前官方明确推荐使用第二种方法:

operator需求说明和设计

开发一个有实际作用的operator,该operator名为elasticweb,既弹性web服务

elasticweb从CRD设计再到controller功能都有明确的业务含义,能执行业务逻辑

operator有什么功能,解决了什么问题,有哪些核心内容

需求背景

  • QPS:Queries-per-second,既每秒查询率,就是说服务器在一秒的时间内处理了多少个请求;
  • 背景:做过网站开发的同学对横向扩容应该都了解,简单的说,假设一个tomcat的QPS上限为500,如果外部访问的QPS达到了600,为了保障整个网站服务质量,必须再启动一个同样的tomcat来共同分摊请求,如下图所示(简单起见,假设咱们的后台服务是无状态的,也就是说不依赖宿主机的IP、本地磁盘之类):

以上是横向扩容常规做法,在kubernetes环境,如果外部请求超过了单个pod的处理极限,我们可以增加pod数量来达到横向扩容的目的,如下图:

需求说明

要将springboot应用部署到kubernetes上,现状和面临的问题如下:

  1. springboot应用已做成docker镜像;
  2. 通过压测得出单个pod的QPS为500;
  3. 估算得出上线后的总QPS会在800左右;
  4. 随着运营策略变化,QPS还会有调整;
  1. 总的来说,只有三个数据:docker镜像、单个pod的QPS、总QPS,对kubernetes不了解,需要有个方案来将服务部署好,并且在运行期间能支撑外部的高并发访问;
  1. 欣开发一个operator(名为elasticweb),只要将手里的三个参数(docker镜像、单个pod的QPS、总QPS)告诉elasticweb;
  2. elasticweb在kubernetes创建pod,至于pod数量当然是自动算出来的,要确保能满足QPS要求,以前面的情况为例,需要两个pod才能满足800的QPS;
  3. 单个pod的QPS和总QPS都随时可能变化,一旦有变,elasticweb也要自动调整pod数量,以确保服务质量;
  4. 为了确保服务可以被外部调用,创建好service;

上述需求,kubernetes有现成的QPS调节方案,例如修改deployment的副本数、单个pod纵向扩容、autoscale等都可以,本次使用operator来实现仅仅是为了展示operator的开发过程,并不是说自定义operator是唯一的解决方案;

用了这个operator,您就不用关注pod数量了,只要聚焦单实例QPS和总QPS即可,这两个参数更贴近业务

为了不把事情弄复杂,假设每个pod所需的CPU和内存是固定的,直接在operator代码中固定,也可以自己改代码,改成可以在外部配置,就像镜像名称参数那样。

把需求交代清楚,接下来进入设计环节,把CRD设计出来,这是核心的数据结构;

CRD设计之Spec部分

Spec是用来保存用户的期望值的,也就是三个参数(docker镜像、单个pod的QPS、总QPS),再加上端口号:

  1. image:业务服务对应的镜像
  2. port:service占用的宿主机端口,外部请求通过此端口访问pod的服务
  3. singlePodQPS:单个pod的QPS上限
  4. totalQPS:当前整个业务的总QPS
  • 输入这四个参数

CRD设计之Status部分

  • Status用来保存实际值,这里设计成只有一个字段realQPS,表示当前整个operator实际能支持的QPS,这样无论何时,只要小欣用kubectl describe命令就能知道当前系统实际上能支持多少QPS;

CRD源码

看代码

···

业务逻辑设计

  • CRD的完成代表核心数据结构已经确定,接下来是业务逻辑的设计,主要是理清楚controller的Reconcile方法里面做些啥,其实核心逻辑还是非常简单的:算出需要多少个pod,然后通过更新deployment让pod数量达到要求,在此核心的基础上再把创建deployment和service、更新status这些事情;
  • 这里将整个业务逻辑的流程图给出来如下所示,用于指导开发:

operator编码

  • 新建项目elasticweb,已创建
  • 然后是CRD,执行以下命令即可创建相关资源:
[root@k8s-worker02 elasticweb]# kubebuilder create api \
> --group elasticweb \
> --version v1 \
> --kind ElasticWeb
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/elasticweb_types.go
controllers/elasticweb_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
mkdir -p /home/gopath/elasticweb/bin
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests

CRD编码

  • 打开文件api/v1/elasticweb_types.go,做以下几步改动:
  1. 修改数据结构ElasticWebSpec,增加前文设计的四个字段;
  2. 修改数据结构ElasticWebStatus,增加前文设计的一个字段;
  3. 增加String方法,这样打印日志时方便我们查看,注意RealQPS字段是指针,因此可能为空,需要判空;
  • 完整的elasticweb_types.go如下所示:
package v1

import (
	"fmt"
	"strconv"

	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EDIT THIS FILE!  THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.
// 期望状态
// ElasticWebSpec defines the desired state of ElasticWeb
type ElasticWebSpec struct {
	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// Foo is an example field of ElasticWeb. Edit elasticweb_types.go to remove/update
	// Foo string `json:"foo,omitempty"`
	// 业务服务对应的镜像,包括名称:tag
	Image string `json:"image"`
	// service占用的宿主机端口,外部请求通过此端口访问pod的服务
	Port *int32 `json:"port"`

	// 单个pod的QPS上限
	SinglePodQPS *int32 `json:"singlePodQPS"`
	// 当前整个业务的总QPS
	TotalQPS *int32 `json:"totalQPS"`
}

// 实际状态,该数据结构中的值都是业务代码计算出来的
// ElasticWebStatus defines the observed state of ElasticWeb
type ElasticWebStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make" to regenerate code after modifying this file
	// 当前kubernetes中实际支持的总QPS
	RealQPS *int32 `json:"realQPS"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status

// ElasticWeb is the Schema for the elasticwebs API
type ElasticWeb struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ElasticWebSpec   `json:"spec,omitempty"`
	Status ElasticWebStatus `json:"status,omitempty"`
}

func (in *ElasticWeb) String() string {
	var realQPS string

	if nil == in.Status.RealQPS {
		realQPS = "nil"
	} else {
		realQPS = strconv.Itoa(int(*(in.Status.RealQPS)))
	}

	return fmt.Sprintf("Image [%s], Port [%d], SinglePodQPS [%d], TotalQPS [%d], RealQPS [%s]",
		in.Spec.Image,
		*(in.Spec.Port),
		*(in.Spec.SinglePodQPS),
		*(in.Spec.TotalQPS),
		realQPS)
}

//+kubebuilder:object:root=true

// ElasticWebList contains a list of ElasticWeb
type ElasticWebList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []ElasticWeb `json:"items"`
}

func init() {
	SchemeBuilder.Register(&ElasticWeb{}, &ElasticWebList{})
}
  • 在elasticweb目录下执行make install即可部署CRD到kubernetes:
[root@k8s-worker02 elasticweb]# make install
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/elasticweb/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/elasticwebs.elasticweb.wu123.com created

部署成功后,用api-versions命令可以查到该GV:

[root@k8s-worker02 elasticweb]# kubectl api-versions|grep elasticweb
elasticweb.wu123.com/v1

核心数据结构设计编码完毕,接下来该编写业务逻辑代码了

打开文件elasticweb_controller.go,接下来逐渐添加内容

添加资源访问权限

  • elasticweb会对service、deployment这两种资源做查询、新增、修改等操作,因此需要这些资源的操作权限,增加下面的两行注释,这样代码生成工具就会在RBAC配置中增加对应的权限:
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=service,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=service,verbs=get;list;watch;create;update;patch;delete


// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the ElasticWeb object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.2/pkg/reconcile
func (r *ElasticWebReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	_ = log.FromContext(ctx)

	// TODO(user): your logic here

	return ctrl.Result{}, nil
}

常量定义

  • 先把常量准备好,可见每个pod使用的CPU和内存都是在此固定的,也可以改成在Spec中定义,这样就可以从外部传入了,另外这里为每个pod只分配了0.1个CPU可以酌情调整该值:
const (
	// deployment中的APP标签名
	APP_NAME = "elastic-app"
	// tomcat容器的端口号
	CONTAINER_PORT = 8080
	// 单个POD的CPU资源申请
	CPU_REQUEST = "100m"
	// 单个POD的CPU资源上限
	CPU_LIMIT = "100m"
	// 单个POD的内存资源申请
	MEM_REQUEST = "512Mi"
	// 单个POD的内存资源上限
	MEM_LIMIT = "512Mi"
)

方法getExpectReplicas

  • 有个很重要的逻辑:根据单个pod的QPS和总QPS,计算需要多少个pod,将这个逻辑封装到一个方法中以便使用:
// 根据单个QPS和总QPS计算pod数量
func getExpectReplicas(elasticWeb *elasticwebv1.ElasticWeb) int32 {
	// 单个pod的QPS
	singlePodQPS := *(elasticWeb.Spec.SinglePodQPS)

	// 期望的总QPS
	totalQPS := *(elasticWeb.Spec.TotalQPS)

	// Replicas就是要创建的副本数
	replicas := totalQPS / singlePodQPS

	if totalQPS%singlePodQPS > 0 {
		replicas++
	}

	return replicas
}

方法createServiceIfNotExists

  • 将创建service的操作封装到一个方法中,是的主干代码的逻辑更清晰,可读性更强;
  • 创建service的时候,有几处要注意:
  1. 先查看service是否存在,不存在才创建;
  2. 将service和CRD实例elasticWeb建立关联(controllerutil.SetControllerReference方法),这样当elasticWeb被删除的时候,service会被自动删除而无需我们干预;
  3. 创建service的时候用到了client-go工具;
  • 创建service的完整方法如下:
// 新建service
func createServiceIfNotExists(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb, req ctrl.Request) error {
	// 获取日志对象
	l := log.FromContext(ctx, "func", "createService")

	service := &corev1.Service{}

	err := r.Get(ctx, req.NamespacedName, service)

	// 如果查询结果没有错误,证明service正常,就不做任何操作
	if err == nil {
		l.Info("service exists")
		return nil
	}

	// 如果错误不是NotFound,就返回错误
	if !apierrors.IsNotFound(err) {
		l.Error(err, "query service error")
		return err
	}

	// 实例化一个数据结构
	service = &corev1.Service{
		ObjectMeta: metav1.ObjectMeta{
			Namespace: elasticWeb.Namespace,
			Name:      elasticWeb.Name,
		},
		Spec: corev1.ServiceSpec{
			Ports: []corev1.ServicePort{{
				Name:     "http",
				Port:     8080,
				NodePort: *elasticWeb.Spec.Port,
			},
			},
			Selector: map[string]string{
				"app": APP_NAME,
			},
			Type: corev1.ServiceTypeNodePort,
		},
	}

	// 这一步非常关键!
	// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
	l.Info("set reference")
	if err := controllerutil.SetControllerReference(elasticWeb, service, r.Scheme); err != nil {
		l.Error(err, "SetControllerReference error")
		return err
	}

	// 创建service
	l.Info("start create service")
	if err := r.Create(ctx, service); err != nil {
		l.Error(err, "create service error")
		return err
	}

	l.Info("create service success")

	return nil
}

方法createDeployment

  • 将创建deployment的操作封装在一个方法中,同样是为了将主干逻辑保持简洁;
  • 创建deployment的方法也有几处要注意:
  1. 调用getExpectReplicas方法得到要创建的pod的数量,该数量是创建deployment时的一个重要参数;
  2. 每个pod所需的CPU和内存资源也是deployment的参数;
  3. 将deployment和elasticweb建立关联,这样删除elasticweb的时候deplyment就会被自动删除了;
  4. 同样是使用client-go客户端工具创建deployment资源;
// 新建deployment
func createDeployment(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb) error {
	l := log.FromContext(ctx, "func", "createDeployment")

	// 计算期望的pod数量
	expectReplicas := getExpectReplicas(elasticWeb)

	l.Info(fmt.Sprintf("expectReplicas [%d]", expectReplicas))

	// 实例化一个数据结构
	deployment := &appsv1.Deployment{
		ObjectMeta: metav1.ObjectMeta{
			Namespace: elasticWeb.Namespace,
			Name:      elasticWeb.Name,
		},
		Spec: appsv1.DeploymentSpec{
			// 副本数是计算出来的
			Replicas: pointer.Int32Ptr(expectReplicas),
			Selector: &metav1.LabelSelector{
				MatchLabels: map[string]string{
					"app": APP_NAME,
				},
			},

			Template: corev1.PodTemplateSpec{
				ObjectMeta: metav1.ObjectMeta{
					Labels: map[string]string{
						"app": APP_NAME,
					},
				},
				Spec: corev1.PodSpec{
					Containers: []corev1.Container{
						{
							Name: APP_NAME,
							// 用指定的镜像
							Image:           elasticWeb.Spec.Image,
							ImagePullPolicy: "IfNotPresent",
							Ports: []corev1.ContainerPort{
								{
									Name:          "http",
									Protocol:      corev1.ProtocolSCTP,
									ContainerPort: CONTAINER_PORT,
								},
							},
							Resources: corev1.ResourceRequirements{
								Requests: corev1.ResourceList{
									"cpu":    resource.MustParse(CPU_REQUEST),
									"memory": resource.MustParse(MEM_REQUEST),
								},
								Limits: corev1.ResourceList{
									"cpu":    resource.MustParse(CPU_LIMIT),
									"memory": resource.MustParse(MEM_LIMIT),
								},
							},
						},
					},
				},
			},
		},
	}

	// 这一步非常关键!
	// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
	l.Info("set reference")
	if err := controllerutil.SetControllerReference(elasticWeb, deployment, r.Scheme); err != nil {
		l.Error(err, "SetControllerReference error")
		return err
	}

	// 创建deployment
	l.Info("start create deployment")
	if err := r.Create(ctx, deployment); err != nil {
		l.Error(err, "create deployment error")
		return err
	}

	l.Info("create deployment success")

	return nil
}

方法updateStatus

  • 不论是创建deployment资源对象,还是对已有的deployment的pod数量做调整,这些操作完成后都要去修改Status,既实际的状态,这样外部才能随时随地知道当前elasticweb支持多大的QPS,因此需要将修改Status的操作封装到一个方法中,给多个场景使用,Status的计算逻辑很简单:pod数量乘以每个pod的QPS就是总QPS了,代码如下:
// 完成了pod的处理后,更新最新状态
func updateStatus(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb) error {
	l := log.FromContext(ctx, "func", "updateStatus")

	// 单个pod的QPS
	singlePodQPS := *(elasticWeb.Spec.SinglePodQPS)

	// pod总数
	replicas := getExpectReplicas(elasticWeb)

	// 当pod创建完毕后,当前系统实际的QPS:单个pod的QPS * pod总数
	// 如果该字段还没有初始化,就先做初始化
	if nil == elasticWeb.Status.RealQPS {
		elasticWeb.Status.RealQPS = new(int32)
	}

	*(elasticWeb.Status.RealQPS) = singlePodQPS * replicas

	l.Info(fmt.Sprintf("singlePodQPS [%d], replicas [%d], realQPS[%d]", singlePodQPS, replicas, *(elasticWeb.Status.RealQPS)))

	if err := r.Update(ctx, elasticWeb); err != nil {
		l.Error(err, "update instance error")
		return err
	}

	return nil
}

主干代码

func (r *ElasticWebReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	l := log.FromContext(ctx, "elasticweb", req.NamespacedName)

	// your logic here

	l.Info("1. start reconcile logic")

	// 实例化数据结构
	instance := &elasticwebv1.ElasticWeb{}

	// 通过客户端工具查询,查询条件是
	err := r.Get(ctx, req.NamespacedName, instance)

	if err != nil {

		// 如果没有实例,就返回空结果,这样外部就不再立即调用Reconcile方法了
		if apierrors.IsNotFound(err) {
			l.Info("2.1. instance not found, maybe removed")
			return reconcile.Result{}, nil
		}

		l.Error(err, "2.2 error")
		// 返回错误信息给外部
		return ctrl.Result{}, err
	}

	l.Info("3. instance : " + instance.String())

	// 查找deployment
	deployment := &appsv1.Deployment{}

	// 用客户端工具查询
	err = r.Get(ctx, req.NamespacedName, deployment)

	// 查找时发生异常,以及查出来没有结果的处理逻辑
	if err != nil {
		// 如果没有实例就要创建了
		if errors.IsNotFound(err) {
			l.Info("4. deployment not exists")

			// 如果对QPS没有需求,此时又没有deployment,就啥事都不做了
			if *(instance.Spec.TotalQPS) < 1 {
				l.Info("5.1 not need deployment")
				// 返回
				return ctrl.Result{}, nil
			}

			// 先要创建service
			if err = createServiceIfNotExists(ctx, r, instance, req); err != nil {
				l.Error(err, "5.2 error")
				// 返回错误信息给外部
				return ctrl.Result{}, err
			}

			// 立即创建deployment
			if err = createDeployment(ctx, r, instance); err != nil {
				l.Error(err, "5.3 error")
				// 返回错误信息给外部
				return ctrl.Result{}, err
			}

			// 如果创建成功就更新状态
			if err = updateStatus(ctx, r, instance); err != nil {
				l.Error(err, "5.4. error")
				// 返回错误信息给外部
				return ctrl.Result{}, err
			}

			// 创建成功就可以返回了
			return ctrl.Result{}, nil
		} else {
			l.Error(err, "7. error")
			// 返回错误信息给外部
			return ctrl.Result{}, err
		}
	}

	// 如果查到了deployment,并且没有返回错误,就走下面的逻辑

	// 根据单QPS和总QPS计算期望的副本数
	expectReplicas := getExpectReplicas(instance)

	// 当前deployment的期望副本数
	realReplicas := *deployment.Spec.Replicas

	l.Info(fmt.Sprintf("9. expectReplicas [%d], realReplicas [%d]", expectReplicas, realReplicas))

	// 如果相等,就直接返回了
	if expectReplicas == realReplicas {
		l.Info("10. return now")
		return ctrl.Result{}, nil
	}

	// 如果不等,就要调整
	*(deployment.Spec.Replicas) = expectReplicas

	l.Info("11. update deployment's Replicas")
	// 通过客户端更新deployment
	if err = r.Update(ctx, deployment); err != nil {
		l.Error(err, "12. update deployment replicas error")
		// 返回错误信息给外部
		return ctrl.Result{}, err
	}

	l.Info("13. update status")

	// 如果更新deployment的Replicas成功,就更新状态
	if err = updateStatus(ctx, r, instance); err != nil {
		l.Error(err, "14. update status error")
		// 返回错误信息给外部
		return ctrl.Result{}, err
	}

	return ctrl.Result{}, nil
}

构建部署运行

  1. 部署CRD
  2. 本地运行Controller
  3. 通过yaml文件新建elasticweb资源对象
  4. 通过日志和kubectl命令验证elasticweb功能是否正常
  5. 浏览器访问web,验证业务服务是否正常
  6. 修改singlePodQPS,看elasticweb是否自动调整pod数量
  7. 修改totalQPS,看elasticweb是否自动调整pod数
  8. 删除elasticweb,看相关的service和deployment被自动删除
  9. 构建Controller镜像,在kubernetes运行此Controller,验证上述功能是否正常

部署CRD

  • 从控制台进入Makefile所在目录,执行命令make install,即可将CRD部署到kubernetes:
[root@k8s-worker02 elasticweb]# make install
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/elasticweb/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/elasticwebs.elasticweb.wu123.com unchanged
  • 从上述内容可见,实际上执行的操作是用kustomize将config/crd下的yaml资源合并后在kubernetes进行创建;
  • 可以用命令kubectl api-versions验证CRD部署是否成功:
[root@k8s-worker02 elasticweb]# kubectl api-versions|grep elasticweb
elasticweb.wu123.com/v1

本地运行Controller

进入Makefile文件所在目录,执行命令make run即可编译运行controller:

[root@k8s-worker02 elasticweb]# make run
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/elasticweb/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
I0420 16:36:21.480955    7843 request.go:601] Waited for 1.047000322s due to client-side throttling, not priority and fairness, request: GET:https://192.168.204.129:6443/apis/discovery.k8s.io/v1beta1?timeout=32s
1.6819797822337084e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
1.681979782234283e+09	INFO	setup	starting manager
1.6819797822347057e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6819797822348225e+09	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
1.6819797822349567e+09	INFO	Starting EventSource	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "source": "kind source: *v1.ElasticWeb"}
1.6819797822349946e+09	INFO	Starting Controller	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb"}
1.6819797823369792e+09	INFO	Starting workers	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "worker count": 1}

新建elasticweb资源对象

  • 负责处理elasticweb的Controller已经运行起来了,接下来就开始创建elasticweb资源对象,用yaml文件来创建;
  • 在config/samples目录下,kubebuilder创建了demo文件elasticweb_v1_elasticweb.yaml,这里面spec的内容不是定义的那四个字段,需要改成以下内容:
```yaml
apiVersion: elasticweb.wu123.com/v1
kind: ElasticWeb
metadata:
  name: elasticweb-sample
spec:
  # TODO(user): Add fields here
  image: tomcat:8.0.18-jre8
  port: 30003
  singlePodQPS: 500
  totalQPS: 600
```
  • 对上述配置的几个参数做如下说明:
  1. 使用的namespace为dev
  2. 本次测试部署的应用为tomcat
  3. service使用宿主机的30003端口暴露tomcat的服务
  4. 假设单个pod能支撑500QPS,外部请求的QPS为600
  • 执行命令kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml,即可在kubernetes创建elasticweb实例:
[root@k8s-worker02 elasticweb]# kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml
elasticweb.elasticweb.wu123.com/elasticweb-sample created

controller的日志报错

1.6819799323592849e+09	ERROR	query service error	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "8d1a072b-af47-46ce-92aa-b1bc3a964e64", "func": "createService", "error": "Service \"elasticweb-sample\" not found"}

修改主干代码 Reconcile方法   

    // 如果错误不是NotFound,就返回错误
    if apierrors.IsNotFound(err) {
        l.Error(err, "query service error")
        return err
    }
修改为
    // 如果错误不是NotFound,就返回错误
    if !apierrors.IsNotFound(err) {
        l.Error(err, "query service error")
        return err
    }

    // 获取日志对象

    l := log.FromContext(ctx,"func", "createService")  

修改为
    l := log.FromContext(ctx)
    l.WithValues("func", "createService")

package controllers

import (
	"context"
	"fmt"

	appsv1 "k8s.io/api/apps/v1"
	corev1 "k8s.io/api/core/v1"
	apierrors "k8s.io/apimachinery/pkg/api/errors"
	"k8s.io/apimachinery/pkg/api/resource"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/utils/pointer"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
	"sigs.k8s.io/controller-runtime/pkg/log"
	"sigs.k8s.io/controller-runtime/pkg/reconcile"

	elasticwebv1 "elasticweb/api/v1"
)

// ElasticWebReconciler reconciles a ElasticWeb object
type ElasticWebReconciler struct {
	client.Client
	Scheme *runtime.Scheme
}

const (
	// deployment中的APP标签名
	APP_NAME = "elastic-app"
	// tomcat容器的端口号
	CONTAINER_PORT = 8080
	// 单个POD的CPU资源申请
	CPU_REQUEST = "100m"
	// 单个POD的CPU资源上限
	CPU_LIMIT = "100m"
	// 单个POD的内存资源申请
	MEM_REQUEST = "512Mi"
	// 单个POD的内存资源上限
	MEM_LIMIT = "512Mi"
)

// 根据单个QPS和总QPS计算pod数量
func getExpectReplicas(elasticWeb *elasticwebv1.ElasticWeb) int32 {
	// 单个pod的QPS
	singlePodQPS := *(elasticWeb.Spec.SinglePodQPS)

	// 期望的总QPS
	totalQPS := *(elasticWeb.Spec.TotalQPS)

	// Replicas就是要创建的副本数
	replicas := totalQPS / singlePodQPS

	if totalQPS%singlePodQPS > 0 {
		replicas++
	}

	return replicas
}

//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=service,verbs=get;list;watch;create;update;patch;delete

// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the ElasticWeb object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.2/pkg/reconcile
func (r *ElasticWebReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	l := log.FromContext(ctx, "elasticweb", req.NamespacedName)
	// l := log.FromContext(ctx)
	// l.WithValues("elasticweb", req.NamespacedName)
	// your logic here

	l.Info("1. start reconcile logic")

	// 实例化数据结构
	instance := &elasticwebv1.ElasticWeb{}

	// 通过客户端工具查询,查询条件是
	err := r.Get(ctx, req.NamespacedName, instance)

	if err != nil {

		// 如果没有实例,就返回空结果,这样外部就不再立即调用Reconcile方法了
		if apierrors.IsNotFound(err) {
			l.Info("2.1. instance not found, maybe removed")
			return reconcile.Result{}, nil
		}

		l.Error(err, "2.2 error")
		// 返回错误信息给外部
		return ctrl.Result{}, err
	}

	l.Info("3. instance : " + instance.String())

	// 查找deployment
	deployment := &appsv1.Deployment{}

	// 用客户端工具查询
	err = r.Get(ctx, req.NamespacedName, deployment)

	// 查找时发生异常,以及查出来没有结果的处理逻辑
	if err != nil {
		// 如果没有实例就要创建了
		if apierrors.IsNotFound(err) {
			l.Info("4. deployment not exists")

			// 如果对QPS没有需求,此时又没有deployment,就啥事都不做了
			if *(instance.Spec.TotalQPS) < 1 {
				l.Info("5.1 not need deployment")
				// 返回
				return ctrl.Result{}, nil
			}

			// 先要创建service
			if err = createServiceIfNotExists(ctx, r, instance, req); err != nil {
				l.Error(err, "5.2 error")
				// 返回错误信息给外部
				return ctrl.Result{}, err
			}

			// 立即创建deployment
			if err = createDeployment(ctx, r, instance); err != nil {
				l.Error(err, "5.3 error")
				// 返回错误信息给外部
				return ctrl.Result{}, err
			}

			// 如果创建成功就更新状态
			if err = updateStatus(ctx, r, instance); err != nil {
				l.Error(err, "5.4. error")
				// 返回错误信息给外部
				return ctrl.Result{}, err
			}

			// 创建成功就可以返回了
			return ctrl.Result{}, nil
		} else {
			l.Error(err, "7. error")
			// 返回错误信息给外部
			return ctrl.Result{}, err
		}
	}

	// 如果查到了deployment,并且没有返回错误,就走下面的逻辑

	// 根据单QPS和总QPS计算期望的副本数
	expectReplicas := getExpectReplicas(instance)

	// 当前deployment的期望副本数
	realReplicas := *deployment.Spec.Replicas

	l.Info(fmt.Sprintf("9. expectReplicas [%d], realReplicas [%d]", expectReplicas, realReplicas))

	// 如果相等,就直接返回了
	if expectReplicas == realReplicas {
		l.Info("10. return now")
		return ctrl.Result{}, nil
	}

	// 如果不等,就要调整
	*(deployment.Spec.Replicas) = expectReplicas

	l.Info("11. update deployment's Replicas")
	// 通过客户端更新deployment
	if err = r.Update(ctx, deployment); err != nil {
		l.Error(err, "12. update deployment replicas error")
		// 返回错误信息给外部
		return ctrl.Result{}, err
	}

	l.Info("13. update status")

	// 如果更新deployment的Replicas成功,就更新状态
	if err = updateStatus(ctx, r, instance); err != nil {
		l.Error(err, "14. update status error")
		// 返回错误信息给外部
		return ctrl.Result{}, err
	}

	return ctrl.Result{}, nil
}

// 新建service
func createServiceIfNotExists(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb, req ctrl.Request) error {
	// 获取日志对象
	l := log.FromContext(ctx)
	l.WithValues("func", "createService")

	service := &corev1.Service{}

	err := r.Get(ctx, req.NamespacedName, service)

	// 如果查询结果没有错误,证明service正常,就不做任何操作
	if err == nil {
		l.Info("service exists")
		return nil
	}

	// 如果错误不是NotFound,就返回错误
	if !apierrors.IsNotFound(err) {
		l.Error(err, "query service error")
		return err
	}

	// 实例化一个数据结构
	service = &corev1.Service{
		ObjectMeta: metav1.ObjectMeta{
			Namespace: elasticWeb.Namespace,
			Name:      elasticWeb.Name,
		},
		Spec: corev1.ServiceSpec{
			Ports: []corev1.ServicePort{{
				Name:     "http",
				Port:     8080,
				NodePort: *elasticWeb.Spec.Port,
			},
			},
			Selector: map[string]string{
				"app": APP_NAME,
			},
			Type: corev1.ServiceTypeNodePort,
		},
	}

	// 这一步非常关键!
	// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
	l.Info("set reference")
	if err := controllerutil.SetControllerReference(elasticWeb, service, r.Scheme); err != nil {
		l.Error(err, "SetControllerReference error")
		return err
	}

	// 创建service
	l.Info("start create service")
	if err := r.Create(ctx, service); err != nil {
		l.Error(err, "create service error")
		return err
	}

	l.Info("create service success")

	return nil
}

// 新建deployment
func createDeployment(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb) error {
	l := log.FromContext(ctx)
	l.WithValues("func", "createDeployment")
	// 计算期望的pod数量
	expectReplicas := getExpectReplicas(elasticWeb)

	l.Info(fmt.Sprintf("expectReplicas [%d]", expectReplicas))

	// 实例化一个数据结构
	deployment := &appsv1.Deployment{
		ObjectMeta: metav1.ObjectMeta{
			Namespace: elasticWeb.Namespace,
			Name:      elasticWeb.Name,
		},
		Spec: appsv1.DeploymentSpec{
			// 副本数是计算出来的
			Replicas: pointer.Int32Ptr(expectReplicas),
			Selector: &metav1.LabelSelector{
				MatchLabels: map[string]string{
					"app": APP_NAME,
				},
			},

			Template: corev1.PodTemplateSpec{
				ObjectMeta: metav1.ObjectMeta{
					Labels: map[string]string{
						"app": APP_NAME,
					},
				},
				Spec: corev1.PodSpec{
					Containers: []corev1.Container{
						{
							Name: APP_NAME,
							// 用指定的镜像
							Image:           elasticWeb.Spec.Image,
							ImagePullPolicy: "IfNotPresent",
							Ports: []corev1.ContainerPort{
								{
									Name:          "http",
									Protocol:      corev1.ProtocolSCTP,
									ContainerPort: CONTAINER_PORT,
								},
							},
							Resources: corev1.ResourceRequirements{
								Requests: corev1.ResourceList{
									"cpu":    resource.MustParse(CPU_REQUEST),
									"memory": resource.MustParse(MEM_REQUEST),
								},
								Limits: corev1.ResourceList{
									"cpu":    resource.MustParse(CPU_LIMIT),
									"memory": resource.MustParse(MEM_LIMIT),
								},
							},
						},
					},
				},
			},
		},
	}

	// 这一步非常关键!
	// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
	l.Info("set reference")
	if err := controllerutil.SetControllerReference(elasticWeb, deployment, r.Scheme); err != nil {
		l.Error(err, "SetControllerReference error")
		return err
	}

	// 创建deployment
	l.Info("start create deployment")
	if err := r.Create(ctx, deployment); err != nil {
		l.Error(err, "create deployment error")
		return err
	}

	l.Info("create deployment success")

	return nil
}

// 完成了pod的处理后,更新最新状态
func updateStatus(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb) error {
	l := log.FromContext(ctx)
	l.WithValues("func", "updateStatus")
	// 单个pod的QPS
	singlePodQPS := *(elasticWeb.Spec.SinglePodQPS)

	// pod总数
	replicas := getExpectReplicas(elasticWeb)

	// 当pod创建完毕后,当前系统实际的QPS:单个pod的QPS * pod总数
	// 如果该字段还没有初始化,就先做初始化
	if nil == elasticWeb.Status.RealQPS {
		elasticWeb.Status.RealQPS = new(int32)
	}

	*(elasticWeb.Status.RealQPS) = singlePodQPS * replicas

	l.Info(fmt.Sprintf("singlePodQPS [%d], replicas [%d], realQPS[%d]", singlePodQPS, replicas, *(elasticWeb.Status.RealQPS)))

	if err := r.Update(ctx, elasticWeb); err != nil {
		l.Error(err, "update instance error")
		return err
	}

	return nil
}

// SetupWithManager sets up the controller with the Manager.
func (r *ElasticWebReconciler) SetupWithManager(mgr ctrl.Manager) error {
	return ctrl.NewControllerManagedBy(mgr).
		For(&elasticwebv1.ElasticWeb{}).
		Complete(r)
}
[root@k8s-worker02 elasticweb]# make run
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/elasticweb/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
I0420 17:43:49.182759   23575 request.go:601] Waited for 1.047609583s due to client-side throttling, not priority and fairness, request: GET:https://192.168.204.129:6443/apis/discovery.k8s.io/v1?timeout=32s
1.681983829936318e+09	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": ":8080"}
1.6819838299367585e+09	INFO	setup	starting manager
1.6819838299371967e+09	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
1.6819838299371972e+09	INFO	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6819838299374092e+09	INFO	Starting EventSource	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "source": "kind source: *v1.ElasticWeb"}
1.681983829937432e+09	INFO	Starting Controller	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb"}
1.681983830039215e+09	INFO	Starting workers	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "worker count": 1}
1.681983830039482e+09	INFO	1. start reconcile logic	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219", "elasticweb": "default/elasticweb-sample"}
1.6819838300396683e+09	INFO	3. instance : Image [tomcat:8.0.18-jre8], Port [30003], SinglePodQPS [500], TotalQPS [600], RealQPS [nil]	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219", "elasticweb": "default/elasticweb-sample"}
1.681983830141667e+09	INFO	4. deployment not exists	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219", "elasticweb": "default/elasticweb-sample"}
1.6819838302434275e+09	INFO	set reference	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.6819838302434833e+09	INFO	start create service	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.68198383029691e+09	INFO	create service success	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.6819838302969563e+09	INFO	expectReplicas [2]	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.681983830296983e+09	INFO	set reference	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.681983830296997e+09	INFO	start create deployment	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.6819838303819373e+09	INFO	create deployment success	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.6819838303819807e+09	INFO	singlePodQPS [500], replicas [2], realQPS[1000]	{"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}

通过分析日志发现Reconcile方法执行了两次,第一执行时创建了deployment和service等资源:

2023-04-20T16:41:59.108+0800    INFO    controllers.ElasticWeb  1. start reconcile logic        {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.108+0800    INFO    controllers.ElasticWeb  3. instance : Image [tomcat:8.0.18-jre8], Port [30003], SinglePodQPS [500], TotalQPS [600], RealQPS [nil]       {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.210+0800    INFO    controllers.ElasticWeb  4. deployment not exists        {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.313+0800    INFO    controllers.ElasticWeb  set reference   {"func": "createService"}
2023-04-20T16:41:59.313+0800    INFO    controllers.ElasticWeb  start create service    {"func": "createService"}
2023-04-20T16:41:59.364+0800    INFO    controllers.ElasticWeb  create service success  {"func": "createService"}
2023-04-20T16:41:59.365+0800    INFO    controllers.ElasticWeb  expectReplicas [2]      {"func": "createDeployment"}
2023-04-20T16:41:59.365+0800    INFO    controllers.ElasticWeb  set reference   {"func": "createDeployment"}
2023-04-20T16:41:59.365+0800    INFO    controllers.ElasticWeb  start create deployment {"func": "createDeployment"}
2023-04-20T16:41:59.382+0800    INFO    controllers.ElasticWeb  create deployment success       {"func": "createDeployment"}
2023-04-20T16:41:59.382+0800    INFO    controllers.ElasticWeb  singlePodQPS [500], replicas [2], realQPS[1000] {"func": "updateStatus"}
2023-04-20T16:41:59.407+0800    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "elasticweb", "request": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800    INFO    controllers.ElasticWeb  1. start reconcile logic        {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800    INFO    controllers.ElasticWeb  3. instance : Image [tomcat:8.0.18-jre8], Port [30003], SinglePodQPS [500], TotalQPS [600], RealQPS [1000]      {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800    INFO    controllers.ElasticWeb  9. expectReplicas [2], realReplicas [2] {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800    INFO    controllers.ElasticWeb  10. return now  {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800    DEBUG   controller-runtime.controller   Successfully Reconciled {"controller": "elasticweb", "request": "dev/elasticweb-sample"}
  • 再用kubectl get命令详细检查资源对象,一切符合预期,elasticweb、service、deployment、pod都是正常的:

浏览器验证业务功能

用浏览器访问http://192.168.204.131:30003

修改单个Pod的QPS

  • 如果自身优化,或者外界依赖变化(如缓存、数据库扩容),这些都可能导致当前服务的QPS提升,假设单个Pod的QPS从500提升到了800,看看咱们的Operator能不能自动做出调整(总QPS是600,因此pod数应该从2降到1)
  • 在config/samples/目录下新增名为update_single_pod_qps.yaml的文件,内容如下:
```yaml
spec:
  singlePodQPS: 800
```

执行以下命令,即可将单个Pod的QPS从500更新为800

kubectl patch elasticweb elasticweb-sample \
-n dev \
--type merge \
--patch "$(cat config/samples/update_single_pod_qps.yaml)"

此时查看controller日志,spec已经更新,表示用最新的参数计算出来的pod数量,符合预期:

  • 用kubectl get命令检查pod,可见已经降到1个了:
kubectl get pod -n dev   
  • 记得用浏览器检查tomcat是否正常;

修改总QPS

  • 外部QPS也在频繁变化中,operator也需要根据总QPS及时调节pod实例,以确保整体服务质量,接下来就修改总QPS,看operator是否生效:
  • 在config/samples/目录下新增名为update_total_qps.yaml的文件,内容如下:
```yaml
spec:
  totalQPS: 2600
```

执行以下命令,即可将总QPS从600更新为2600

kubectl patch elasticweb elasticweb-sample \
-n dev \
--type merge \
--patch "$(cat config/samples/update_total_qps.yaml)"

此时查看controller日志,spec已经更新,表示用最新的参数计算出来的pod数量,符合预期:

用kubectl get命令检查pod,可见已经增长到4个,4个pd的能支撑的QPS为3200,满足了当前2600的要求:

 kubectl get pod -n dev
  • 记得用浏览器检查tomcat是否正常;

用这个方法来调节pod数太low,但可以自己开发一个应用,收到当前QPS后自动调用client-go去修改elasticweb的totalQPS,让operator及时调整pod数,这也算自动调节了.

删除验证

  • 目前整个dev这个namespace下有service、deployment、pod、elasticweb这些资源对象,如果要全部删除,只需删除elasticweb即可,因为service和deployment都和elasticweb建立的关联关系,代码如下:
	// 这一步非常关键!
	// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
	l.Info("set reference")
	if err := controllerutil.SetControllerReference(elasticWeb, deployment, r.Scheme); err != nil {
		l.Error(err, "SetControllerReference error")
		return err
	}
  • 执行删除elasticweb的命令:
kubectl delete elasticweb elasticweb-sample -n dev
  • 再去查看其他资源,都被自动删除了:

构建镜像

make docker-build docker-push IMG=wu123.com/elasticweb:v0.1

镜像准备好之后,执行以下命令即可在kubernetes环境部署controller:

make deploy IMG=wu123.com/elasticweb:v0.1

创建elasticweb资源对象,验证所有资源是否创建成功

检查—controller的日志

kubectl logs -f \
elasticweb-controller-manager-5795d4d98d-t6jvc \
-c manager \
-n elasticweb-system

再用浏览器验证tomcat已经启动成功

卸载和清理

想把前面创建的资源全部清理掉,可以执行以下命令:

make uninstall

webhook

  1. 介绍webhook;
  2. 结合前面的elasticweb项目,设计一个使用webhook的场景;
  3. 准备工作
  4. 生成webhook
  5. 开发(配置)
  6. 开发(编码)
  7. 部署
  8. 验证Defaulter(添加默认值)
  9. 验证Validator(合法性校验)

Operator中的webhook,其作用与过滤器类似,外部对CRD资源的变更,在Controller处理之前都会交给webhook提前处理,流程如下图

webhook可以做两件事:修改(mutating)和验证(validating)

  • kubebuilder为我们提供了生成webhook的基础文件和代码的工具,与制作API的工具类似,极大地简化了工作量,只需聚焦业务实现即可;
  • 基于kubebuilder制作的webhook和controller,如果是同一个资源,那么它们在同一个进程中;

设计实战场景

  • 为了让实战有意义,为前面的elasticweb项目上增加需求,让webhook发挥实际作用;
  1. 如果用户忘记输入总QPS,系统webhook负责设置默认值1300,操作如下图:

为了保护系统,给单个pod的QPS设置上限1000,如果外部输入的singlePodQPS值超过1000,就创建资源对象失败,如下图所示:

准备工作

  • 和controller类似,webhook既能在kubernetes环境中运行,也能在kubernetes环境之外运行;
  • 如果webhook在kubernetes环境之外运行,需要将证书放在所在环境,默认地址是:
/tmp/k8s-webhook-server/serving-certs/tls.{crt,key}
  • 将webhook部署在kubernetes环境中
  • 为了让webhook在kubernetes环境中运行,做一点准备工作安装cert manager,执行以下操作:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml
  • 上述操作完成后会新建很多资源,如namespace、rbac、pod等,以pod为例如下:
[root@k8s-worker02 k8s-exercise]# kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
namespace/cert-manager created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
[root@k8s-worker02 k8s-exercise]# kubectl get pods --all-namespaces
NAMESPACE      NAME                                       READY   STATUS    RESTARTS       AGE
cert-manager   cert-manager-cainjector-54887dfcbc-prkh5   1/1     Running   2 (29s ago)    9m9s
cert-manager   cert-manager-f746879f6-mp7zz               1/1     Running   0              9m9s
cert-manager   cert-manager-webhook-575ccb5c7b-rdpv2      1/1     Running   0              9m9s

生成webhook

  • 进入elasticweb工程下,执行以下命令创建webhook:
  • [root@k8s-worker02 elasticweb]# kubebuilder create webhook \
    > --group elasticweb \
    > --version v1 \
    > --kind ElasticWeb \
    > --defaulting \
    > --programmatic-validation
    Writing kustomize manifests for you to edit...
    Writing scaffold for you to edit...
    api/v1/elasticweb_webhook.go
    Update dependencies:
    $ go mod tidy
    Running make:
    $ make generate
    test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
    /home/gopath/elasticweb/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
    Next: implement your new Webhook and generate the manifests with:
    $ make manifests

    上述命令执行完毕后,先去看看main.go文件,如下所示,自动增加了一段代码,作用是让webhook生效:

	if err = (&elasticwebv1.ElasticWeb{}).SetupWebhookWithManager(mgr); err != nil {
		setupLog.Error(err, "unable to create webhook", "webhook", "ElasticWeb")
		os.Exit(1)
	}

elasticweb_webhook.go就是新增文件

上述代码有两处需要注意,第一处和填写默认值有关

1.如果要操作资源,需要在此增加权限

//+kubebuilder:webhook:path=/mutate-elasticweb-wu123-com-v1-elasticweb,mutating=true,failurePolicy=fail,sideEffects=None,groups=elasticweb.wu123.com,resources=elasticwebs,verbs=create;update,versions=v1,name=melasticweb.kb.io,admissionReviewVersions=v1

2.有此对象,才会开启填写默认值的逻辑

var _ webhook.Defaulter = &ElasticWeb{}

3.填写默认值的代码在此处添加

// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *ElasticWeb) Default() {
	elasticweblog.Info("default", "name", r.Name)

	// TODO(user): fill in your defaulting logic.
}

第二处和校验有关

1.校验时如果要操作其他资源,在此处更改

// TODO(user): change verbs to "verbs=create;update;delete" if you want to enable deletion validation.
//+kubebuilder:webhook:path=/validate-elasticweb-wu123-com-v1-elasticweb,mutating=false,failurePolicy=fail,sideEffects=None,groups=elasticweb.wu123.com,resources=elasticwebs,verbs=create;update,versions=v1,name=velasticweb.kb.io,admissionReviewVersions=v1

2.有次实例,校验逻辑才会生效

var _ webhook.Validator = &ElasticWeb{}

3.新增的时候,会调用此方校验

// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *ElasticWeb) ValidateCreate() error {
	elasticweblog.Info("validate create", "name", r.Name)

	// TODO(user): fill in your validation logic upon object creation.
	return nil
}

要实现的业务需求就是通过修改上述elasticweb_webhook.go的内容来实现,不过代码稍后再写,先把配置都改好;

开发(配置)

  • 打开文件config/default/kustomization.yaml,下图四个的内容原本都被注释了,现在请将注释符号都删掉,使其生效:
- ../webhook
- ../certmanager
- manager_webhook_patch.yaml
- webhookcainjection_patch.yaml

文件config/default/kustomization.yaml,节点vars下面的内容,原本全部被注释了,现在请全部放开

# the following config is for teaching kustomize how to do var substitution
vars:
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
- name: CERTIFICATE_NAMESPACE # namespace of the certificate CR
 objref:
   kind: Certificate
   group: cert-manager.io
   version: v1
   name: serving-cert # this name should match the one in certificate.yaml
 fieldref:
   fieldpath: metadata.namespace
- name: CERTIFICATE_NAME
 objref:
   kind: Certificate
   group: cert-manager.io
   version: v1
   name: serving-cert # this name should match the one in certificate.yaml
- name: SERVICE_NAMESPACE # namespace of the service
 objref:
   kind: Service
   version: v1
   name: webhook-service
 fieldref:
   fieldpath: metadata.namespace
- name: SERVICE_NAME
 objref:
   kind: Service
   version: v1
   name: webhook-service
  • 配置已经完成,可以编码了;

开发(编码)

  • 打开文件elasticweb_webhook.go
  • 新增依赖:
apierrors "k8s.io/apimachinery/pkg/api/errors"
  • 找到Default方法,改成如下内容,可见代码很简单,判断TotalQPS是否存在,若不存在就写入默认值,另外还加了两行日志:
// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *ElasticWeb) Default() {
	elasticweblog.Info("default", "name", r.Name)

	// TODO(user): fill in your defaulting logic.
	// 如果创建的时候没有输入总QPS,就设置个默认值
	if r.Spec.TotalQPS == nil {
		r.Spec.TotalQPS = new(int32)
		*r.Spec.TotalQPS = 1300
		elasticweblog.Info("a. TotalQPS is nil, set default value now", "TotalQPS", *r.Spec.TotalQPS)
	} else {
		elasticweblog.Info("b. TotalQPS exists", "TotalQPS", *r.Spec.TotalQPS)
	}
}
  • 接下来开发校验功能,把校验功能封装成一个validateElasticWeb方法,然后在新增和修改的时候各调用一次,如下,可见最终是调用apierrors.NewInvalid生成错误实例的,而此方法接受的是多个错误,因此要为其准备切片做入参,当然了,如果是多个参数校验失败,可以都放入切片中:
func (r *ElasticWeb) validateElasticWeb() error {
	var allErrs field.ErrorList

	if *r.Spec.SinglePodQPS > 1000 {
		elasticweblog.Info("c. Invalid SinglePodQPS")

		err := field.Invalid(field.NewPath("spec").Child("singlePodQPS"),
			*r.Spec.SinglePodQPS,
			"d. must be less than 1000")

		allErrs = append(allErrs, err)

		return apierrors.NewInvalid(
			schema.GroupKind{Group: "elasticweb.com.bolingcavalry", Kind: "ElasticWeb"},
			r.Name,
			allErrs)
	} else {
		elasticweblog.Info("e. SinglePodQPS is valid")
		return nil
	}
}

  • 再找到新增和修改资源对象时被调用的方法,在里面调用validateElasticWeb:
// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *ElasticWeb) ValidateCreate() error {
	elasticweblog.Info("validate create", "name", r.Name)

	// TODO(user): fill in your validation logic upon object creation.

	return r.validateElasticWeb()
}

// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type
func (r *ElasticWeb) ValidateUpdate(old runtime.Object) error {
	elasticweblog.Info("validate update", "name", r.Name)

	// TODO(user): fill in your validation logic upon object update.
	return r.validateElasticWeb()
}
  • 编码完成,接下来,把以前遗留的东西清理一下,开始新的部署和验证;

清理工作

  1. 删除elasticweb资源对象:
kubectl delete -f config/samples/elasticweb_v1_elasticweb.yaml
  1. 删除controller
kustomize build config/default | kubectl delete -f -
  1. 删除CRD
make uninstall

部署

  1. 部署CRD
make install

构建镜像并推送到仓库

make docker-build docker-push IMG=registry.cn-hangzhou.aliyuncs.com/wu123/elasticweb:v0.1

部署集成了webhook功能的controller

make deploy IMG=registry.cn-hangzhou.aliyuncs.com/wu123/elasticweb:v0.1
  1. 查看pod,确认启动成功:
kubectl get pods --all-namespaces
NAMESPACE           NAME                                             READY   STATUS    RESTARTS   AGE
cert-manager        cert-manager-6588898cb4-nvnz8                    1/1     Running   1          5d21h
cert-manager        cert-manager-cainjector-7bcbdbd99f-q645r         1/1     Running   1          5d21h
cert-manager        cert-manager-webhook-5fd9f9dd86-98tm9            1/1     Running   1          5d21h
elasticweb-system   elasticweb-controller-manager-7dcbfd4675-898gb   2/2     Running   0          20s

验证Defaulter(添加默认值)

  • 修改文件config/samples/elasticweb_v1_elasticweb.yaml,修改后的内容如下,可见totalQPS字段已经被注释掉了:
apiVersion: elasticweb.wu123.com/v1
kind: ElasticWeb
metadata:
  name: elasticweb-sample
spec:
  # TODO(user): Add fields here
  image: tomcat:8.0.18-jre8
  port: 30003
  singlePodQPS: 500
  # totalQPS: 600
  • 创建一个elasticweb资源对象:
kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml
  • 此时单个pod的QPS是500,如果webhook的代码生效的话,总QPS就是1300,而对应的pod数应该是3个,接下来看看是否符合预期;
  • 先看elasticweb、deployment、pod等资源对象是否正常,如下所示,全部符合预期:
kubectl get elasticweb
kubectl get deployments
kubectl get service
kubectl get pod
  • 用kubectl describe命令查看elasticweb资源对象的详情,如下所示,TotalQPS字段被webhook设置为1300,RealQPS也计算正确

查看controller的日志,其中的webhook部分是否符合预期,发现TotalQPS字段为空,就将设置为默认值,并且在检测的时候SinglePodQPS的值也没有超过1000

用浏览器验证web服务是否正常 http://192.168.204.131:30003/

验证Validator

  • 接下来该验证webhook的参数校验功能了,先验证修改时的逻辑;
  • 编辑文件config/samples/update_single_pod_qps.yaml,值如下:
spec:
  singlePodQPS: 1100
  • 用patch命令使之生效:
kubectl patch elasticweb elasticweb-sample \
--type merge \
--patch "$(cat config/samples/update_single_pod_qps.yaml)"
  • 此时,控制台会输出错误信息:
Error from server (ElasticWeb.elasticweb.com.bolingcavalry "elasticweb-sample" is invalid: spec.singlePodQPS: Invalid value: 1100: d. must be less than 1000): admission webhook "velasticweb.kb.io" denied the request: ElasticWeb.elasticweb.wu123.com "elasticweb-sample" is invalid: spec.singlePodQPS: Invalid value: 1100: d. must be less than 1000

用kubectl describe命令查看elasticweb资源对象的详情,依然是500,可见webhook已经生效,阻止了错误的发生

查看controller日志

webhook在新增时候的校验功能

清理前面创建的elastic资源对象,执行命令:

kubectl delete -f config/samples/elasticweb_v1_elasticweb.yaml

修改文件,将singlePodQPS的值改为超过1000,看看webhook是否能检查到这个错误,并阻止资源对象的创建:

apiVersion: elasticweb.wu123.com/v1
kind: ElasticWeb
metadata:
  name: elasticweb-sample
spec:
  # TODO(user): Add fields here
  image: tomcat:8.0.18-jre8
  port: 30003
  singlePodQPS: 1500
  # totalQPS: 600
  • 执行以下命令开始创建elasticweb资源对象:
kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml
  • 控制台提示以下信息,包含了代码中写入的错误描述,证明elasticweb资源对象创建失败,证明webhook的Validator功能已经生效:
Error from server (ElasticWeb.elasticweb.com.bolingcavalry "elasticweb-sample" is invalid: spec.singlePodQPS: Invalid value: 1500: d. must be less than 1000): error when creating "config/samples/elasticweb_v1_elasticweb.yaml": admission webhook "velasticweb.kb.io" denied the request: ElasticWeb.elasticweb.wu123.com "elasticweb-sample" is invalid: spec.singlePodQPS: Invalid value: 1500: d. must be less than 1000

查看controller日志

总结

  1. CRD的Status字段;
  2. 选择合适的镜像仓库
  3. 本地运行controller时跳过webhook
  4. controller的pod有两个容器

CRD的Status字段

elasticweb的CRD,其数据结构代码如下

// 期望状态
// ElasticWebSpec defines the desired state of ElasticWeb
type ElasticWebSpec struct {
	// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
	// Important: Run "make" to regenerate code after modifying this file

	// Foo is an example field of ElasticWeb. Edit elasticweb_types.go to remove/update
	// Foo string `json:"foo,omitempty"`
	// 业务服务对应的镜像,包括名称:tag
	Image string `json:"image"`
	// service占用的宿主机端口,外部请求通过此端口访问pod的服务
	Port *int32 `json:"port"`

	// 单个pod的QPS上限
	SinglePodQPS *int32 `json:"singlePodQPS"`
	// 当前整个业务的总QPS
	TotalQPS *int32 `json:"totalQPS"`
}

// 实际状态,该数据结构中的值都是业务代码计算出来的
// ElasticWebStatus defines the observed state of ElasticWeb
type ElasticWebStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make" to regenerate code after modifying this file
	// 当前kubernetes中实际支持的总QPS
	RealQPS *int32 `json:"realQPS,omitempty"`
}

//+kubebuilder:object:root=true
//+kubebuilder:subresource:status

// ElasticWeb is the Schema for the elasticwebs API
type ElasticWeb struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ElasticWebSpec   `json:"spec,omitempty"`
	Status ElasticWebStatus `json:"status,omitempty"`
}
  • 该CRD的Status数据结构只有一个字段RealQPS,该字段的Tag,`json:"realQPS,omitempty"`里面的omitempty属性非常重要
  • 如果RealQPS的Tag中没有omitempty属性,会发生什么事情呢?
  • 实际上,在开发webhook之前,漏掉了RealQPS的omitempty属性,但是整个controller可以正常工作,elasticweb的功能也达到预期,也就是说status的字段如果没有omitempty属性,不影响operator的功能;
  • 但是,在启用了webhook之后,创建资源对象时就报错了:
kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml
The ElasticWeb "elasticweb-sample" is invalid: status.realQPS: Invalid value: "null": status.realQPS in body must be of type integer: "null"

也就是说,Status数据结构的字段中,如果json tag没有omitempty属性,在启用了webhook之后创建资源对象会失败

本地运行controller时跳过webhook

  • controller有两种部署方式:部署在kubernetes环境内,或者在kubernetes环境外独立运行
  • 在编码阶段,通常选择在本地运行controller,这样省去了镜像相关的操作;
  • 但是,如果使用了webhook,由于其特殊的鉴权方式,需要将kubernetes签发的证书放置在本地(/tmp/k8s-webhook-server/serving-certs/目录):
  1. 选择部署在kubernetes环境,要制作镜像和上传镜像;
  2. 选择运行在kubernetes环境之外,要签发证书放置在指定目录;
  • 面对上述的纠结,官方给出了一个建议,如果在开发阶段暂时用不上webhook(注意这个前提),那么在本地运行controller时可以用屏蔽掉webhook功能,具体操作由以下两步组成:
  • 首先是修改main.go代码,如下新增的代码,其实就是增加了一个判断,如果环境变量ENABLE_WEBHOOKS等于false,就不会执行webhook相关逻辑:

新增了一个判断,如果等于false就不执行webhook

if os.Getenv("ENABLE_WEBHOOKS") != "false" {
    if err = (&elasticwebv1.ElasticWeb{}).SetupWebhookWithManager(mgr); err != nil {
		setupLog.Error(err, "unable to create webhook", "webhook", "ElasticWeb")
		os.Exit(1)
	}
}

本地启动controller的命令,以前是make run,现在改成如下命令,即增加了一个参数:

make run ENABLE_WEBHOOKS=false
  • 现在controller可以正常启动了,功能也正常,只是webhook相关的功能全部都不生效了;

controller的pod有两个容器

  • 如果controller部署在kubernetes环境内,其是以pod的形态存在的,也就是写的webhook、reconcile代码都是在这个pod中运行的;
  • 上述pod内实际上有两个容器,用kubectl describe命令看看这个pod,可见名为manager的容器才是controller代码运行的地方:

1.webhhook和reconcile功能在manager容器

2.kube-rbac-proxy是一个小型的http代理,用于RBAC鉴权

代码仓库 https://github.com/yunixiangfeng/k8s-exercise.git

k8s-exercise/elasticweb at main · yunixiangfeng/k8s-exercise · GitHub

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
SpringBoot项目实战笔记可以按照以下步骤进行: 1. 首先,你可以通过观看B站上的教程视频来学习SpringBoot项目实战。在视频中,你可以学习到如何使用SpringBoot、MyBatis和MySQL创建一个电脑商城项目。 2. 确保你的SpringBoot项目能够成功启动。找到被@SpringBootApplication注解修饰的入口启动类,并运行该类。如果你能够观察到图形化的界面,那么说明你的项目成功启动了。 3. 如果你还没有创建SpringBoot项目,你可以使用Spring Initializr来初始化一个新的项目。Spring Initializr是一个Web应用程序,可以为你生成Spring Boot项目的基本结构。你可以选择使用Maven或Gradle作为构建工具,并添加适合你的项目的依赖。然后,你只需要编写应用程序的代码即可。 希望以上信息对你有帮助!如果还有其他问题,请随时提问。123 #### 引用[.reference_title] - *1* *2* [SpringBoot项目实战笔记:电脑商城项目实战(SpringBoot+MyBatis+MySQL)](https://blog.csdn.net/weixin_44260350/article/details/127746667)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}} ] [.reference_item] - *3* [《SpringBoot实战》读书笔记](https://blog.csdn.net/sanhewuyang/article/details/104494202)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值