关于kubebuilder
- 在实际工作中,对kubernetes的资源执行各种个性化配置和控制是很常见的需求,例如自定义镜像的pod如何控制副本数、主从关系,以及各种自定义资源的控制等;
- 对于上述需求,很适合使用Operator 模式来解决,这里有官方对Operator的介绍:https://kubernetes.io/zh/docs/concepts/extend-kubernetes/operator/ ,Operator模式的执行流程如下图所示:
kubectl v1.24.2
golang v1.18.5
docker v1.20.9 containerd v1.6.6
kustmize v3.8.7
kubebuilder v3.6.0
使用kubebuilder
在此环境创建CRD和Controller,再部署到kubernetes环境并且验证是否生效
- 创建API(CRD和Controller)
- 构建和部署CRD
- 编译和运行controller
- 创建CRD对应的实例
- 删除实例并停止controller
- 将controller制作成docker镜像
- 卸载和清理
案例1 创建helloworld项目
[root@k8s-worker02 k8s-operator]# mkdir -p $GOPATH/src/helloworld
[root@k8s-worker02 k8s-operator]# cd $GOPATH/src/helloworld
[root@k8s-worker02 helloworld]# kubebuilder init --domain wu123
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
Get controller runtime:
$ go get sigs.k8s.io/controller-runtime@v0.12.2
Update dependencies:
$ go mod tidy
Next: define a resource with:
$ kubebuilder create api
[root@k8s-worker02 helloworld]# tree $GOPATH/src/helloworld
/home/gopath/src/helloworld
├── config
│ ├── default
│ │ ├── kustomization.yaml
│ │ ├── manager_auth_proxy_patch.yaml
│ │ └── manager_config_patch.yaml
│ ├── manager
│ │ ├── controller_manager_config.yaml
│ │ ├── kustomization.yaml
│ │ └── manager.yaml
│ ├── prometheus
│ │ ├── kustomization.yaml
│ │ └── monitor.yaml
│ └── rbac
│ ├── auth_proxy_client_clusterrole.yaml
│ ├── auth_proxy_role_binding.yaml
│ ├── auth_proxy_role.yaml
│ ├── auth_proxy_service.yaml
│ ├── kustomization.yaml
│ ├── leader_election_role_binding.yaml
│ ├── leader_election_role.yaml
│ ├── role_binding.yaml
│ └── service_account.yaml
├── Dockerfile
├── go.mod
├── go.sum
├── hack
│ └── boilerplate.go.txt
├── main.go
├── Makefile
├── PROJECT
└── README.md
6 directories, 25 files
创建API(CRD和Controller)
- 接下来要创建资源相关的内容了,group/version/kind这三部分可以确定资源的唯一身份,命令如下:
[root@k8s-worker02 helloworld]# kubebuilder create api \
> --group webapp \
> --version v1 \
> --kind Guestbook
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/guestbook_types.go
controllers/guestbook_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
mkdir -p /home/gopath/src/helloworld/bin
test -s /home/gopath/src/helloworld/bin/controller-gen || GOBIN=/home/gopath/src/helloworld/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/src/helloworld/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests
构建和部署CRD
- kubebuilder提供的Makefile将构建和部署工作大幅度简化,执行以下命令会将最新构建的CRD部署在kubernetes上:
[root@k8s-worker02 helloworld]# make install
test -s /home/gopath/src/helloworld/bin/controller-gen || GOBIN=/home/gopath/src/helloworld/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/src/helloworld/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/src/helloworld/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/guestbooks.webapp.wu123 created
编译和运行controller
- kubebuilder自动生成的controller源码地址是:$GOPATH/src/helloworld/controllers/guestbook_controller.go , 内容如下:
package controllers
import (
"context"
"runtime/debug"
// "github.com/go-logr/logr"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
webappv1 "helloworld/api/v1"
)
// GuestbookReconciler reconciles a Guestbook object
type GuestbookReconciler struct {
client.Client
Scheme *runtime.Scheme
}
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks/finalizers,verbs=update
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the Guestbook object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.2/pkg/reconcile
func (r *GuestbookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
_ = log.FromContext(ctx)
// TODO(user): your logic here
klog.Info(req)
return ctrl.Result{}, nil
}
// SetupWithManager sets up the controller with the Manager.
func (r *GuestbookReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&webappv1.Guestbook{}).
Complete(r)
}
[root@k8s-worker02 helloworld]# make run
test -s /home/gopath/src/helloworld/bin/controller-gen || GOBIN=/home/gopath/src/helloworld/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/src/helloworld/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/src/helloworld/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
I0420 13:38:25.948445 88197 request.go:601] Waited for 1.046481468s due to client-side throttling, not priority and fairness, request: GET:https://192.168.204.129:6443/apis/crd.projectcalico.org/v1?timeout=32s
1.681969106652736e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}
1.681969106654165e+09 INFO setup starting manager
1.6819691066547165e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.681969106654819e+09 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.6819691066549478e+09 INFO Starting EventSource {"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook", "source": "kind source: *v1.Guestbook"}
1.6819691066549792e+09 INFO Starting Controller {"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook"}
1.681969106756659e+09 INFO Starting workers {"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook", "worker count": 1}
对上面的代码仅做少量修改,用于验证是否能生效
1.增加两个依赖包
import (
"context"
"fmt"
"runtime/debug"
// "github.com/go-logr/logr"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
webappv1 "helloworld/api/v1"
)
// GuestbookReconciler reconciles a Guestbook object
type GuestbookReconciler struct {
client.Client
// Log logr.Logger
Scheme *runtime.Scheme
}
2.打印入参 // 打印堆栈
func (r *GuestbookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
_ = log.FromContext(ctx)
// TODO(user): your logic here
klog.Info(req)
return ctrl.Result{}, nil
}
执行以下make run命令,会编译并启动刚才修改的controller:
[root@k8s-worker02 helloworld]# make run
test -s /home/gopath/src/helloworld/bin/controller-gen || GOBIN=/home/gopath/src/helloworld/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/src/helloworld/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/src/helloworld/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
I0420 14:30:50.652360 105538 request.go:601] Waited for 1.047814477s due to client-side throttling, not priority and fairness, request: GET:https://192.168.204.129:6443/apis/storage.k8s.io/v1beta1?timeout=32s
1.6819722513574438e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}
1.6819722513579834e+09 INFO setup starting manager
1.6819722513587043e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.681972251358751e+09 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.681972251358884e+09 INFO Starting EventSource {"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook", "source": "kind source: *v1.Guestbook"}
1.6819722513589082e+09 INFO Starting Controller {"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook"}
1.68197225146097e+09 INFO Starting workers {"controller": "guestbook", "controllerGroup": "webapp.wu123", "controllerKind": "Guestbook", "worker count": 1}
I0420 14:30:51.461177 105538 guestbook_controller.go:54] default/guestbook-sample
创建Guestbook资源的实例
- 现在kubernetes已经部署了Guestbook类型的CRD,而且对应的controller也已正在运行中,可以尝试创建Guestbook类型的实例了(相当于有了pod的定义后,才可以创建pod);
- kubebuilder已经自动创建了一个类型的部署文件:$GOPATH/src/helloworld/config/samples/webapp_v1_guestbook.yaml ,内容如下,很简单,接下来咱们就用这个文件来创建Guestbook实例:
```yaml
apiVersion: webapp.wu123/v1
kind: Guestbook
metadata:
name: guestbook-sample
spec:
# TODO(user): Add fields here
```
执行以下命令即可创建Guestbook类型的实例:
[root@k8s-worker02 helloworld]# kubectl apply -f config/samples/
guestbook.webapp.wu123/guestbook-sample created
用kubectl get命令可以看到实例已经创建:
[root@k8s-worker02 helloworld]# kubectl get Guestbook
NAME AGE
guestbook-sample 34s
用命令kubectl edit Guestbook guestbook-sample编辑该实例
查看controller日志
删除实例并停止controller
kubectl delete -f config/samples/
[root@k8s-worker02 helloworld]# kubectl get Guestbook
NAME AGE
guestbook-sample 59m
[root@k8s-worker02 helloworld]# kubectl apply -f config/samples
guestbook.webapp.wu123/guestbook-sample configured
[root@k8s-worker02 helloworld]# kubectl delete -f config/samples/
guestbook.webapp.wu123 "guestbook-sample" deleted
将controller制作成docker镜像
上面这种运行在kubernetes之外的方式 本地运行
将其做成docker镜像然后在kubernetes环境运行
cd $GOPATH/src/helloworld
make docker-build docker-push IMG=wu123/guestbook:v0.1
镜像准备好之后,执行以下命令即可在kubernetes环境部署controller
make deploy IMG=wu123/guestbook:v0.1
控制台会提示各类资源被创建(rbac居多)
此时去看kubernetes环境的pod,发现确实已经新增了controller
这个pod实际上有两个容器,用kubectl describe命令细看,分别是kube-rbac-proxy和manager
由于有两个容器,那么查看日志时就要指定其中一个了,咱们的controller对应的是manager容器,因此查看日志的命令是:
kubectl logs -f \
helloworld-controller-manager-689d4b6f5b-h9pzg \
-n helloworld-system \
-c manager
再次创建Guestbook资源的实例,依旧是kubectl apply -f config/samples/命令,再去看manager容器的日志,可见修改的内容已经打印出来了:
想把前面创建的资源和CRD全部清理掉,可以执行以下命令
cd $GOPATH/src/helloworld
make uninstall
基础知识
Kubernetes的Group、Version、Resource
会用client对象操作kubernetes资源
RESTClient、Clientset、dynamicClient、DiscoveryClien
案例2 elasticweb
从0到1使用kubebuiler开发operator _ - 哔哩哔哩
在目录中用go mod init elasticweb命令新建名为elasticweb的工程
[root@k8s-worker02 gopath]# mkdir elasticweb
[root@k8s-worker02 gopath]# cd elasticweb
[root@k8s-worker02 elasticweb]# go mod init elasticweb
go: creating new go.mod: module elasticweb
执行kubebuilder init --domain wu123.com,即可新建operator工程
[root@k8s-worker02 elasticweb]# kubebuilder init --domain wu123.com
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
Get controller runtime:
$ go get sigs.k8s.io/controller-runtime@v0.12.2
Update dependencies:
$ go mod tidy
Next: define a resource with:
$ kubebuilder create api
基础设施
- operator工程新建完成后,会新增不少文件和目录,以下几个是官方提到的基础设施:
- go.mod:module的配置文件,里面已经填充了几个重要依赖;
- Makefile:非常重要的工具,前文咱们也用过了,编译构建、部署、运行都会用到;
- PROJECT:kubebuilder工程的元数据,在生成各种API的时候会用到这里面的信息;
- config/default:基于kustomize制作的配置文件,为controller提供标准配置,也可以按需要去修改调整;
- config/manager:一些和manager有关的细节配置,例如镜像的资源限制;
- config/rbac:顾名思义,如果像限制operator在kubernetes中的操作权限,就要通过rbac来做精细的权限配置了,这里面就是权限配置的细节;
main.go
- main.go是kubebuilder自动生成的代码,这是operator的启动代码,里面有几处值得注意:
- 两个全局变量,如下所示,setupLog用于输出日志无需多说,scheme也是常用的工具,它提供了Kind和Go代码中的数据结构的映射,:
var (
scheme = runtime.NewScheme()
setupLog = ctrl.Log.WithName("setup")
)
另外还有些设置,例如监控指标相关的,以及管理controller和webhook的manager,它会一直运行下去直到被外部终止,关于这个manage还有一处要注意的地方,就是它的参数,下图是默认的参数,如果您想让operator在指定namespace范围内生效,还可以在下面的地方新增Namespace参数,如果要指定多个nanespace,就使用cache.MultiNamespacedCacheBuilder(namespaces)参数:
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
MetricsBindAddress: metricsAddr,
Port: 9443,
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "65990fce.wu123.com",
// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader don't have to wait
// LeaseDuration time first.
//
// In the default scaffold provided, the program ends immediately after
// the manager stops, so would be fine to enable this option. However,
// if you are doing or is intended to do any operation such as perform cleanups
// after the manager stops then its usage might be unsafe.
// LeaderElectionReleaseOnCancel: true,
})
API相关(数据核心)
- API是operator的核心,当您决定使用operator时,就应该从真实需求出发,开始设计整个CRD,而这些设计最终体现在CRD的数据结构,以及对真实值和期望值的处理逻辑中;
创建过API,当时的命令是
kubebuilder create api \
--group webapp \
--version v1 \
--kind Guestbook
- 新增的内容中,最核心的当然是CRD了,也就是图中Guestbook数据结构所在的guestbook_types.go文件,这个最重要的数据结构如下:
// Guestbook is the Schema for the guestbooks API
type Guestbook struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec GuestbookSpec `json:"spec,omitempty"`
Status GuestbookStatus `json:"status,omitempty"`
}
- metav1.TypeMeta:保存了资源的Group、Version、Kind
- metav1.ObjectMeta:保存了资源对象的名称和namespace
- Spec:期望状态,例如deployment在创建时指定了pod有三个副本
- Status:真实状态,例如deployment在创建后只有一个副本(其他的还没有创建成功),大多数资源对象都有此字段,不过ConfigMap是个例外(配置信息配成啥就是啥,没有什么期望值和真实值);
- 还有一个数据结构,就是Guestbook对应的列表GuestbookList,就是单个资源对象的集合;
- guestbook_types.go所在目录下还有两个文件:groupversion_info.go定义了Group和Version,以及注册到scheme时用到的实例SchemeBuilder,zz_generated.deepcopy.go用于实现实例的深拷贝,它们都无需修改,了解即可;
controller相关(业务核心)
- 如何实现业务需求,在operator开发过程中,尽管业务逻辑各不相同,但有两个共性:
- Status(真实状态)是个数据结构,其字段是业务定义的,其字段值也是业务代码执行自定义的逻辑算出来的;
- 业务核心的目标,是确保Status与Spec达成一致,例如deployment指定了pod的副本数为3,如果真实的pod没有三个,deployment的controller代码就去创建pod,如果真实的pod超过了三个,deployment的controller代码就去删除pod;
- 以上就是controller要做的事情,接下来看看代码的细节,kubebuilder创建的guestbook_controller.go就是controller,业务代码都写在这个文件中
数据结构定义,如下所示,操作资源对象时用到的客户端工具client.Client、日志工具、Kind和数据结构的关系Scheme,这些都准备好了
// GuestbookReconciler reconciles a Guestbook object
type GuestbookReconciler struct {
client.Client
Scheme *runtime.Scheme
}
- SetupWithManager方法,在main.go中有调用,指定了Guestbook这个资源的变化会被manager监控,从而触发Reconcile方法:
// SetupWithManager sets up the controller with the Manager.
func (r *GuestbookReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&webappv1.Guestbook{}).
Complete(r)
}
如下,Reconcile方法前面有一些+kubebuilder:rbac前缀的注释,这些是用来确保controller在运行时有对应的资源操作权限,例如//+kubebuilder:rbac:groups=core,resources=pod,verbs=get就是自己添加的,这样controller就有权查询pod资源对象了:
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=webapp.wu123,resources=guestbooks/finalizers,verbs=update
//+kubebuilder:rbac:groups=core,resources=pod,verbs=get
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the Guestbook object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.2/pkg/reconcile
func (r *GuestbookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
_ = log.FromContext(ctx)
// TODO(user): your logic here
klog.Info(req)
return ctrl.Result{}, nil
}
- guestbook_controller.go是operator的业务核心,而controller的核心是其Reconcile方法,将来大部分代码都是写在这里面,主要做的事情就是获取status,然后让status和spec达成一致;
- 关于status,官方的一段描述值得重视,如下图红框,主要是说资源对象的状态应该是每次重新计算出来的,这里以deployment为例,想知道当前有多少个pod,有两种方法,第一种准备个字段记录,每次对pod的增加和删除都修改这个字段,于是读这个字段就知道pod数量了,第二种方法是每次用client工具去实时查询pod数量,目前官方明确推荐使用第二种方法:
operator需求说明和设计
开发一个有实际作用的operator,该operator名为elasticweb,既弹性web服务
elasticweb从CRD设计再到controller功能都有明确的业务含义,能执行业务逻辑
operator有什么功能,解决了什么问题,有哪些核心内容
需求背景
- QPS:Queries-per-second,既每秒查询率,就是说服务器在一秒的时间内处理了多少个请求;
- 背景:做过网站开发的同学对横向扩容应该都了解,简单的说,假设一个tomcat的QPS上限为500,如果外部访问的QPS达到了600,为了保障整个网站服务质量,必须再启动一个同样的tomcat来共同分摊请求,如下图所示(简单起见,假设咱们的后台服务是无状态的,也就是说不依赖宿主机的IP、本地磁盘之类):
以上是横向扩容常规做法,在kubernetes环境,如果外部请求超过了单个pod的处理极限,我们可以增加pod数量来达到横向扩容的目的,如下图:
需求说明
要将springboot应用部署到kubernetes上,现状和面临的问题如下:
- springboot应用已做成docker镜像;
- 通过压测得出单个pod的QPS为500;
- 估算得出上线后的总QPS会在800左右;
- 随着运营策略变化,QPS还会有调整;
- 总的来说,只有三个数据:docker镜像、单个pod的QPS、总QPS,对kubernetes不了解,需要有个方案来将服务部署好,并且在运行期间能支撑外部的高并发访问;
- 欣开发一个operator(名为elasticweb),只要将手里的三个参数(docker镜像、单个pod的QPS、总QPS)告诉elasticweb;
- elasticweb在kubernetes创建pod,至于pod数量当然是自动算出来的,要确保能满足QPS要求,以前面的情况为例,需要两个pod才能满足800的QPS;
- 单个pod的QPS和总QPS都随时可能变化,一旦有变,elasticweb也要自动调整pod数量,以确保服务质量;
- 为了确保服务可以被外部调用,创建好service;
上述需求,kubernetes有现成的QPS调节方案,例如修改deployment的副本数、单个pod纵向扩容、autoscale等都可以,本次使用operator来实现仅仅是为了展示operator的开发过程,并不是说自定义operator是唯一的解决方案;
用了这个operator,您就不用关注pod数量了,只要聚焦单实例QPS和总QPS即可,这两个参数更贴近业务
为了不把事情弄复杂,假设每个pod所需的CPU和内存是固定的,直接在operator代码中固定,也可以自己改代码,改成可以在外部配置,就像镜像名称参数那样。
把需求交代清楚,接下来进入设计环节,把CRD设计出来,这是核心的数据结构;
CRD设计之Spec部分
Spec是用来保存用户的期望值的,也就是三个参数(docker镜像、单个pod的QPS、总QPS),再加上端口号:
- image:业务服务对应的镜像
- port:service占用的宿主机端口,外部请求通过此端口访问pod的服务
- singlePodQPS:单个pod的QPS上限
- totalQPS:当前整个业务的总QPS
- 输入这四个参数
CRD设计之Status部分
- Status用来保存实际值,这里设计成只有一个字段realQPS,表示当前整个operator实际能支持的QPS,这样无论何时,只要小欣用kubectl describe命令就能知道当前系统实际上能支持多少QPS;
CRD源码
看代码
···
业务逻辑设计
- CRD的完成代表核心数据结构已经确定,接下来是业务逻辑的设计,主要是理清楚controller的Reconcile方法里面做些啥,其实核心逻辑还是非常简单的:算出需要多少个pod,然后通过更新deployment让pod数量达到要求,在此核心的基础上再把创建deployment和service、更新status这些事情;
- 这里将整个业务逻辑的流程图给出来如下所示,用于指导开发:
operator编码
- 新建项目elasticweb,已创建
- 然后是CRD,执行以下命令即可创建相关资源:
[root@k8s-worker02 elasticweb]# kubebuilder create api \
> --group elasticweb \
> --version v1 \
> --kind ElasticWeb
Create Resource [y/n]
y
Create Controller [y/n]
y
Writing kustomize manifests for you to edit...
Writing scaffold for you to edit...
api/v1/elasticweb_types.go
controllers/elasticweb_controller.go
Update dependencies:
$ go mod tidy
Running make:
$ make generate
mkdir -p /home/gopath/elasticweb/bin
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
Next: implement your new API and generate the manifests (e.g. CRDs,CRs) with:
$ make manifests
CRD编码
- 打开文件api/v1/elasticweb_types.go,做以下几步改动:
- 修改数据结构ElasticWebSpec,增加前文设计的四个字段;
- 修改数据结构ElasticWebStatus,增加前文设计的一个字段;
- 增加String方法,这样打印日志时方便我们查看,注意RealQPS字段是指针,因此可能为空,需要判空;
- 完整的elasticweb_types.go如下所示:
package v1
import (
"fmt"
"strconv"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.
// 期望状态
// ElasticWebSpec defines the desired state of ElasticWeb
type ElasticWebSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file
// Foo is an example field of ElasticWeb. Edit elasticweb_types.go to remove/update
// Foo string `json:"foo,omitempty"`
// 业务服务对应的镜像,包括名称:tag
Image string `json:"image"`
// service占用的宿主机端口,外部请求通过此端口访问pod的服务
Port *int32 `json:"port"`
// 单个pod的QPS上限
SinglePodQPS *int32 `json:"singlePodQPS"`
// 当前整个业务的总QPS
TotalQPS *int32 `json:"totalQPS"`
}
// 实际状态,该数据结构中的值都是业务代码计算出来的
// ElasticWebStatus defines the observed state of ElasticWeb
type ElasticWebStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
// 当前kubernetes中实际支持的总QPS
RealQPS *int32 `json:"realQPS"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
// ElasticWeb is the Schema for the elasticwebs API
type ElasticWeb struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ElasticWebSpec `json:"spec,omitempty"`
Status ElasticWebStatus `json:"status,omitempty"`
}
func (in *ElasticWeb) String() string {
var realQPS string
if nil == in.Status.RealQPS {
realQPS = "nil"
} else {
realQPS = strconv.Itoa(int(*(in.Status.RealQPS)))
}
return fmt.Sprintf("Image [%s], Port [%d], SinglePodQPS [%d], TotalQPS [%d], RealQPS [%s]",
in.Spec.Image,
*(in.Spec.Port),
*(in.Spec.SinglePodQPS),
*(in.Spec.TotalQPS),
realQPS)
}
//+kubebuilder:object:root=true
// ElasticWebList contains a list of ElasticWeb
type ElasticWebList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []ElasticWeb `json:"items"`
}
func init() {
SchemeBuilder.Register(&ElasticWeb{}, &ElasticWebList{})
}
- 在elasticweb目录下执行make install即可部署CRD到kubernetes:
[root@k8s-worker02 elasticweb]# make install
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/elasticweb/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/elasticwebs.elasticweb.wu123.com created
部署成功后,用api-versions命令可以查到该GV:
[root@k8s-worker02 elasticweb]# kubectl api-versions|grep elasticweb
elasticweb.wu123.com/v1
核心数据结构设计编码完毕,接下来该编写业务逻辑代码了
打开文件elasticweb_controller.go,接下来逐渐添加内容
添加资源访问权限
- elasticweb会对service、deployment这两种资源做查询、新增、修改等操作,因此需要这些资源的操作权限,增加下面的两行注释,这样代码生成工具就会在RBAC配置中增加对应的权限:
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=service,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=service,verbs=get;list;watch;create;update;patch;delete
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the ElasticWeb object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.2/pkg/reconcile
func (r *ElasticWebReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
_ = log.FromContext(ctx)
// TODO(user): your logic here
return ctrl.Result{}, nil
}
常量定义
- 先把常量准备好,可见每个pod使用的CPU和内存都是在此固定的,也可以改成在Spec中定义,这样就可以从外部传入了,另外这里为每个pod只分配了0.1个CPU可以酌情调整该值:
const (
// deployment中的APP标签名
APP_NAME = "elastic-app"
// tomcat容器的端口号
CONTAINER_PORT = 8080
// 单个POD的CPU资源申请
CPU_REQUEST = "100m"
// 单个POD的CPU资源上限
CPU_LIMIT = "100m"
// 单个POD的内存资源申请
MEM_REQUEST = "512Mi"
// 单个POD的内存资源上限
MEM_LIMIT = "512Mi"
)
方法getExpectReplicas
- 有个很重要的逻辑:根据单个pod的QPS和总QPS,计算需要多少个pod,将这个逻辑封装到一个方法中以便使用:
// 根据单个QPS和总QPS计算pod数量
func getExpectReplicas(elasticWeb *elasticwebv1.ElasticWeb) int32 {
// 单个pod的QPS
singlePodQPS := *(elasticWeb.Spec.SinglePodQPS)
// 期望的总QPS
totalQPS := *(elasticWeb.Spec.TotalQPS)
// Replicas就是要创建的副本数
replicas := totalQPS / singlePodQPS
if totalQPS%singlePodQPS > 0 {
replicas++
}
return replicas
}
方法createServiceIfNotExists
- 将创建service的操作封装到一个方法中,是的主干代码的逻辑更清晰,可读性更强;
- 创建service的时候,有几处要注意:
- 先查看service是否存在,不存在才创建;
- 将service和CRD实例elasticWeb建立关联(controllerutil.SetControllerReference方法),这样当elasticWeb被删除的时候,service会被自动删除而无需我们干预;
- 创建service的时候用到了client-go工具;
- 创建service的完整方法如下:
// 新建service
func createServiceIfNotExists(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb, req ctrl.Request) error {
// 获取日志对象
l := log.FromContext(ctx, "func", "createService")
service := &corev1.Service{}
err := r.Get(ctx, req.NamespacedName, service)
// 如果查询结果没有错误,证明service正常,就不做任何操作
if err == nil {
l.Info("service exists")
return nil
}
// 如果错误不是NotFound,就返回错误
if !apierrors.IsNotFound(err) {
l.Error(err, "query service error")
return err
}
// 实例化一个数据结构
service = &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Namespace: elasticWeb.Namespace,
Name: elasticWeb.Name,
},
Spec: corev1.ServiceSpec{
Ports: []corev1.ServicePort{{
Name: "http",
Port: 8080,
NodePort: *elasticWeb.Spec.Port,
},
},
Selector: map[string]string{
"app": APP_NAME,
},
Type: corev1.ServiceTypeNodePort,
},
}
// 这一步非常关键!
// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
l.Info("set reference")
if err := controllerutil.SetControllerReference(elasticWeb, service, r.Scheme); err != nil {
l.Error(err, "SetControllerReference error")
return err
}
// 创建service
l.Info("start create service")
if err := r.Create(ctx, service); err != nil {
l.Error(err, "create service error")
return err
}
l.Info("create service success")
return nil
}
方法createDeployment
- 将创建deployment的操作封装在一个方法中,同样是为了将主干逻辑保持简洁;
- 创建deployment的方法也有几处要注意:
- 调用getExpectReplicas方法得到要创建的pod的数量,该数量是创建deployment时的一个重要参数;
- 每个pod所需的CPU和内存资源也是deployment的参数;
- 将deployment和elasticweb建立关联,这样删除elasticweb的时候deplyment就会被自动删除了;
- 同样是使用client-go客户端工具创建deployment资源;
// 新建deployment
func createDeployment(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb) error {
l := log.FromContext(ctx, "func", "createDeployment")
// 计算期望的pod数量
expectReplicas := getExpectReplicas(elasticWeb)
l.Info(fmt.Sprintf("expectReplicas [%d]", expectReplicas))
// 实例化一个数据结构
deployment := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Namespace: elasticWeb.Namespace,
Name: elasticWeb.Name,
},
Spec: appsv1.DeploymentSpec{
// 副本数是计算出来的
Replicas: pointer.Int32Ptr(expectReplicas),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app": APP_NAME,
},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app": APP_NAME,
},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: APP_NAME,
// 用指定的镜像
Image: elasticWeb.Spec.Image,
ImagePullPolicy: "IfNotPresent",
Ports: []corev1.ContainerPort{
{
Name: "http",
Protocol: corev1.ProtocolSCTP,
ContainerPort: CONTAINER_PORT,
},
},
Resources: corev1.ResourceRequirements{
Requests: corev1.ResourceList{
"cpu": resource.MustParse(CPU_REQUEST),
"memory": resource.MustParse(MEM_REQUEST),
},
Limits: corev1.ResourceList{
"cpu": resource.MustParse(CPU_LIMIT),
"memory": resource.MustParse(MEM_LIMIT),
},
},
},
},
},
},
},
}
// 这一步非常关键!
// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
l.Info("set reference")
if err := controllerutil.SetControllerReference(elasticWeb, deployment, r.Scheme); err != nil {
l.Error(err, "SetControllerReference error")
return err
}
// 创建deployment
l.Info("start create deployment")
if err := r.Create(ctx, deployment); err != nil {
l.Error(err, "create deployment error")
return err
}
l.Info("create deployment success")
return nil
}
方法updateStatus
- 不论是创建deployment资源对象,还是对已有的deployment的pod数量做调整,这些操作完成后都要去修改Status,既实际的状态,这样外部才能随时随地知道当前elasticweb支持多大的QPS,因此需要将修改Status的操作封装到一个方法中,给多个场景使用,Status的计算逻辑很简单:pod数量乘以每个pod的QPS就是总QPS了,代码如下:
// 完成了pod的处理后,更新最新状态
func updateStatus(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb) error {
l := log.FromContext(ctx, "func", "updateStatus")
// 单个pod的QPS
singlePodQPS := *(elasticWeb.Spec.SinglePodQPS)
// pod总数
replicas := getExpectReplicas(elasticWeb)
// 当pod创建完毕后,当前系统实际的QPS:单个pod的QPS * pod总数
// 如果该字段还没有初始化,就先做初始化
if nil == elasticWeb.Status.RealQPS {
elasticWeb.Status.RealQPS = new(int32)
}
*(elasticWeb.Status.RealQPS) = singlePodQPS * replicas
l.Info(fmt.Sprintf("singlePodQPS [%d], replicas [%d], realQPS[%d]", singlePodQPS, replicas, *(elasticWeb.Status.RealQPS)))
if err := r.Update(ctx, elasticWeb); err != nil {
l.Error(err, "update instance error")
return err
}
return nil
}
主干代码
func (r *ElasticWebReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
l := log.FromContext(ctx, "elasticweb", req.NamespacedName)
// your logic here
l.Info("1. start reconcile logic")
// 实例化数据结构
instance := &elasticwebv1.ElasticWeb{}
// 通过客户端工具查询,查询条件是
err := r.Get(ctx, req.NamespacedName, instance)
if err != nil {
// 如果没有实例,就返回空结果,这样外部就不再立即调用Reconcile方法了
if apierrors.IsNotFound(err) {
l.Info("2.1. instance not found, maybe removed")
return reconcile.Result{}, nil
}
l.Error(err, "2.2 error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
l.Info("3. instance : " + instance.String())
// 查找deployment
deployment := &appsv1.Deployment{}
// 用客户端工具查询
err = r.Get(ctx, req.NamespacedName, deployment)
// 查找时发生异常,以及查出来没有结果的处理逻辑
if err != nil {
// 如果没有实例就要创建了
if errors.IsNotFound(err) {
l.Info("4. deployment not exists")
// 如果对QPS没有需求,此时又没有deployment,就啥事都不做了
if *(instance.Spec.TotalQPS) < 1 {
l.Info("5.1 not need deployment")
// 返回
return ctrl.Result{}, nil
}
// 先要创建service
if err = createServiceIfNotExists(ctx, r, instance, req); err != nil {
l.Error(err, "5.2 error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
// 立即创建deployment
if err = createDeployment(ctx, r, instance); err != nil {
l.Error(err, "5.3 error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
// 如果创建成功就更新状态
if err = updateStatus(ctx, r, instance); err != nil {
l.Error(err, "5.4. error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
// 创建成功就可以返回了
return ctrl.Result{}, nil
} else {
l.Error(err, "7. error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
}
// 如果查到了deployment,并且没有返回错误,就走下面的逻辑
// 根据单QPS和总QPS计算期望的副本数
expectReplicas := getExpectReplicas(instance)
// 当前deployment的期望副本数
realReplicas := *deployment.Spec.Replicas
l.Info(fmt.Sprintf("9. expectReplicas [%d], realReplicas [%d]", expectReplicas, realReplicas))
// 如果相等,就直接返回了
if expectReplicas == realReplicas {
l.Info("10. return now")
return ctrl.Result{}, nil
}
// 如果不等,就要调整
*(deployment.Spec.Replicas) = expectReplicas
l.Info("11. update deployment's Replicas")
// 通过客户端更新deployment
if err = r.Update(ctx, deployment); err != nil {
l.Error(err, "12. update deployment replicas error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
l.Info("13. update status")
// 如果更新deployment的Replicas成功,就更新状态
if err = updateStatus(ctx, r, instance); err != nil {
l.Error(err, "14. update status error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
构建部署运行
- 部署CRD
- 本地运行Controller
- 通过yaml文件新建elasticweb资源对象
- 通过日志和kubectl命令验证elasticweb功能是否正常
- 浏览器访问web,验证业务服务是否正常
- 修改singlePodQPS,看elasticweb是否自动调整pod数量
- 修改totalQPS,看elasticweb是否自动调整pod数
- 删除elasticweb,看相关的service和deployment被自动删除
- 构建Controller镜像,在kubernetes运行此Controller,验证上述功能是否正常
部署CRD
- 从控制台进入Makefile所在目录,执行命令make install,即可将CRD部署到kubernetes:
[root@k8s-worker02 elasticweb]# make install
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/elasticweb/bin/kustomize build config/crd | kubectl apply -f -
customresourcedefinition.apiextensions.k8s.io/elasticwebs.elasticweb.wu123.com unchanged
- 从上述内容可见,实际上执行的操作是用kustomize将config/crd下的yaml资源合并后在kubernetes进行创建;
- 可以用命令kubectl api-versions验证CRD部署是否成功:
[root@k8s-worker02 elasticweb]# kubectl api-versions|grep elasticweb
elasticweb.wu123.com/v1
本地运行Controller
进入Makefile文件所在目录,执行命令make run即可编译运行controller:
[root@k8s-worker02 elasticweb]# make run
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/elasticweb/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
I0420 16:36:21.480955 7843 request.go:601] Waited for 1.047000322s due to client-side throttling, not priority and fairness, request: GET:https://192.168.204.129:6443/apis/discovery.k8s.io/v1beta1?timeout=32s
1.6819797822337084e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}
1.681979782234283e+09 INFO setup starting manager
1.6819797822347057e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6819797822348225e+09 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.6819797822349567e+09 INFO Starting EventSource {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "source": "kind source: *v1.ElasticWeb"}
1.6819797822349946e+09 INFO Starting Controller {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb"}
1.6819797823369792e+09 INFO Starting workers {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "worker count": 1}
新建elasticweb资源对象
- 负责处理elasticweb的Controller已经运行起来了,接下来就开始创建elasticweb资源对象,用yaml文件来创建;
- 在config/samples目录下,kubebuilder创建了demo文件elasticweb_v1_elasticweb.yaml,这里面spec的内容不是定义的那四个字段,需要改成以下内容:
```yaml
apiVersion: elasticweb.wu123.com/v1
kind: ElasticWeb
metadata:
name: elasticweb-sample
spec:
# TODO(user): Add fields here
image: tomcat:8.0.18-jre8
port: 30003
singlePodQPS: 500
totalQPS: 600
```
- 对上述配置的几个参数做如下说明:
- 使用的namespace为dev
- 本次测试部署的应用为tomcat
- service使用宿主机的30003端口暴露tomcat的服务
- 假设单个pod能支撑500QPS,外部请求的QPS为600
- 执行命令kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml,即可在kubernetes创建elasticweb实例:
[root@k8s-worker02 elasticweb]# kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml
elasticweb.elasticweb.wu123.com/elasticweb-sample created
controller的日志报错
1.6819799323592849e+09 ERROR query service error {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "8d1a072b-af47-46ce-92aa-b1bc3a964e64", "func": "createService", "error": "Service \"elasticweb-sample\" not found"}
修改主干代码 Reconcile方法
// 如果错误不是NotFound,就返回错误
if apierrors.IsNotFound(err) {
l.Error(err, "query service error")
return err
}
修改为
// 如果错误不是NotFound,就返回错误
if !apierrors.IsNotFound(err) {
l.Error(err, "query service error")
return err
}
// 获取日志对象
l := log.FromContext(ctx,"func", "createService")
修改为
l := log.FromContext(ctx)
l.WithValues("func", "createService")
package controllers
import (
"context"
"fmt"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/utils/pointer"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
elasticwebv1 "elasticweb/api/v1"
)
// ElasticWebReconciler reconciles a ElasticWeb object
type ElasticWebReconciler struct {
client.Client
Scheme *runtime.Scheme
}
const (
// deployment中的APP标签名
APP_NAME = "elastic-app"
// tomcat容器的端口号
CONTAINER_PORT = 8080
// 单个POD的CPU资源申请
CPU_REQUEST = "100m"
// 单个POD的CPU资源上限
CPU_LIMIT = "100m"
// 单个POD的内存资源申请
MEM_REQUEST = "512Mi"
// 单个POD的内存资源上限
MEM_LIMIT = "512Mi"
)
// 根据单个QPS和总QPS计算pod数量
func getExpectReplicas(elasticWeb *elasticwebv1.ElasticWeb) int32 {
// 单个pod的QPS
singlePodQPS := *(elasticWeb.Spec.SinglePodQPS)
// 期望的总QPS
totalQPS := *(elasticWeb.Spec.TotalQPS)
// Replicas就是要创建的副本数
replicas := totalQPS / singlePodQPS
if totalQPS%singlePodQPS > 0 {
replicas++
}
return replicas
}
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=elasticweb.wu123.com,resources=elasticwebs/finalizers,verbs=update
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=core,resources=service,verbs=get;list;watch;create;update;patch;delete
// Reconcile is part of the main kubernetes reconciliation loop which aims to
// move the current state of the cluster closer to the desired state.
// TODO(user): Modify the Reconcile function to compare the state specified by
// the ElasticWeb object against the actual cluster state, and then
// perform operations to make the cluster state reflect the state specified by
// the user.
//
// For more details, check Reconcile and its Result here:
// - https://pkg.go.dev/sigs.k8s.io/controller-runtime@v0.12.2/pkg/reconcile
func (r *ElasticWebReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
l := log.FromContext(ctx, "elasticweb", req.NamespacedName)
// l := log.FromContext(ctx)
// l.WithValues("elasticweb", req.NamespacedName)
// your logic here
l.Info("1. start reconcile logic")
// 实例化数据结构
instance := &elasticwebv1.ElasticWeb{}
// 通过客户端工具查询,查询条件是
err := r.Get(ctx, req.NamespacedName, instance)
if err != nil {
// 如果没有实例,就返回空结果,这样外部就不再立即调用Reconcile方法了
if apierrors.IsNotFound(err) {
l.Info("2.1. instance not found, maybe removed")
return reconcile.Result{}, nil
}
l.Error(err, "2.2 error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
l.Info("3. instance : " + instance.String())
// 查找deployment
deployment := &appsv1.Deployment{}
// 用客户端工具查询
err = r.Get(ctx, req.NamespacedName, deployment)
// 查找时发生异常,以及查出来没有结果的处理逻辑
if err != nil {
// 如果没有实例就要创建了
if apierrors.IsNotFound(err) {
l.Info("4. deployment not exists")
// 如果对QPS没有需求,此时又没有deployment,就啥事都不做了
if *(instance.Spec.TotalQPS) < 1 {
l.Info("5.1 not need deployment")
// 返回
return ctrl.Result{}, nil
}
// 先要创建service
if err = createServiceIfNotExists(ctx, r, instance, req); err != nil {
l.Error(err, "5.2 error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
// 立即创建deployment
if err = createDeployment(ctx, r, instance); err != nil {
l.Error(err, "5.3 error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
// 如果创建成功就更新状态
if err = updateStatus(ctx, r, instance); err != nil {
l.Error(err, "5.4. error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
// 创建成功就可以返回了
return ctrl.Result{}, nil
} else {
l.Error(err, "7. error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
}
// 如果查到了deployment,并且没有返回错误,就走下面的逻辑
// 根据单QPS和总QPS计算期望的副本数
expectReplicas := getExpectReplicas(instance)
// 当前deployment的期望副本数
realReplicas := *deployment.Spec.Replicas
l.Info(fmt.Sprintf("9. expectReplicas [%d], realReplicas [%d]", expectReplicas, realReplicas))
// 如果相等,就直接返回了
if expectReplicas == realReplicas {
l.Info("10. return now")
return ctrl.Result{}, nil
}
// 如果不等,就要调整
*(deployment.Spec.Replicas) = expectReplicas
l.Info("11. update deployment's Replicas")
// 通过客户端更新deployment
if err = r.Update(ctx, deployment); err != nil {
l.Error(err, "12. update deployment replicas error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
l.Info("13. update status")
// 如果更新deployment的Replicas成功,就更新状态
if err = updateStatus(ctx, r, instance); err != nil {
l.Error(err, "14. update status error")
// 返回错误信息给外部
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
// 新建service
func createServiceIfNotExists(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb, req ctrl.Request) error {
// 获取日志对象
l := log.FromContext(ctx)
l.WithValues("func", "createService")
service := &corev1.Service{}
err := r.Get(ctx, req.NamespacedName, service)
// 如果查询结果没有错误,证明service正常,就不做任何操作
if err == nil {
l.Info("service exists")
return nil
}
// 如果错误不是NotFound,就返回错误
if !apierrors.IsNotFound(err) {
l.Error(err, "query service error")
return err
}
// 实例化一个数据结构
service = &corev1.Service{
ObjectMeta: metav1.ObjectMeta{
Namespace: elasticWeb.Namespace,
Name: elasticWeb.Name,
},
Spec: corev1.ServiceSpec{
Ports: []corev1.ServicePort{{
Name: "http",
Port: 8080,
NodePort: *elasticWeb.Spec.Port,
},
},
Selector: map[string]string{
"app": APP_NAME,
},
Type: corev1.ServiceTypeNodePort,
},
}
// 这一步非常关键!
// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
l.Info("set reference")
if err := controllerutil.SetControllerReference(elasticWeb, service, r.Scheme); err != nil {
l.Error(err, "SetControllerReference error")
return err
}
// 创建service
l.Info("start create service")
if err := r.Create(ctx, service); err != nil {
l.Error(err, "create service error")
return err
}
l.Info("create service success")
return nil
}
// 新建deployment
func createDeployment(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb) error {
l := log.FromContext(ctx)
l.WithValues("func", "createDeployment")
// 计算期望的pod数量
expectReplicas := getExpectReplicas(elasticWeb)
l.Info(fmt.Sprintf("expectReplicas [%d]", expectReplicas))
// 实例化一个数据结构
deployment := &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Namespace: elasticWeb.Namespace,
Name: elasticWeb.Name,
},
Spec: appsv1.DeploymentSpec{
// 副本数是计算出来的
Replicas: pointer.Int32Ptr(expectReplicas),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app": APP_NAME,
},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app": APP_NAME,
},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: APP_NAME,
// 用指定的镜像
Image: elasticWeb.Spec.Image,
ImagePullPolicy: "IfNotPresent",
Ports: []corev1.ContainerPort{
{
Name: "http",
Protocol: corev1.ProtocolSCTP,
ContainerPort: CONTAINER_PORT,
},
},
Resources: corev1.ResourceRequirements{
Requests: corev1.ResourceList{
"cpu": resource.MustParse(CPU_REQUEST),
"memory": resource.MustParse(MEM_REQUEST),
},
Limits: corev1.ResourceList{
"cpu": resource.MustParse(CPU_LIMIT),
"memory": resource.MustParse(MEM_LIMIT),
},
},
},
},
},
},
},
}
// 这一步非常关键!
// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
l.Info("set reference")
if err := controllerutil.SetControllerReference(elasticWeb, deployment, r.Scheme); err != nil {
l.Error(err, "SetControllerReference error")
return err
}
// 创建deployment
l.Info("start create deployment")
if err := r.Create(ctx, deployment); err != nil {
l.Error(err, "create deployment error")
return err
}
l.Info("create deployment success")
return nil
}
// 完成了pod的处理后,更新最新状态
func updateStatus(ctx context.Context, r *ElasticWebReconciler, elasticWeb *elasticwebv1.ElasticWeb) error {
l := log.FromContext(ctx)
l.WithValues("func", "updateStatus")
// 单个pod的QPS
singlePodQPS := *(elasticWeb.Spec.SinglePodQPS)
// pod总数
replicas := getExpectReplicas(elasticWeb)
// 当pod创建完毕后,当前系统实际的QPS:单个pod的QPS * pod总数
// 如果该字段还没有初始化,就先做初始化
if nil == elasticWeb.Status.RealQPS {
elasticWeb.Status.RealQPS = new(int32)
}
*(elasticWeb.Status.RealQPS) = singlePodQPS * replicas
l.Info(fmt.Sprintf("singlePodQPS [%d], replicas [%d], realQPS[%d]", singlePodQPS, replicas, *(elasticWeb.Status.RealQPS)))
if err := r.Update(ctx, elasticWeb); err != nil {
l.Error(err, "update instance error")
return err
}
return nil
}
// SetupWithManager sets up the controller with the Manager.
func (r *ElasticWebReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&elasticwebv1.ElasticWeb{}).
Complete(r)
}
[root@k8s-worker02 elasticweb]# make run
test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2
/home/gopath/elasticweb/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/crd/bases
/home/gopath/elasticweb/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go run ./main.go
I0420 17:43:49.182759 23575 request.go:601] Waited for 1.047609583s due to client-side throttling, not priority and fairness, request: GET:https://192.168.204.129:6443/apis/discovery.k8s.io/v1?timeout=32s
1.681983829936318e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}
1.6819838299367585e+09 INFO setup starting manager
1.6819838299371967e+09 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
1.6819838299371972e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6819838299374092e+09 INFO Starting EventSource {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "source": "kind source: *v1.ElasticWeb"}
1.681983829937432e+09 INFO Starting Controller {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb"}
1.681983830039215e+09 INFO Starting workers {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "worker count": 1}
1.681983830039482e+09 INFO 1. start reconcile logic {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219", "elasticweb": "default/elasticweb-sample"}
1.6819838300396683e+09 INFO 3. instance : Image [tomcat:8.0.18-jre8], Port [30003], SinglePodQPS [500], TotalQPS [600], RealQPS [nil] {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219", "elasticweb": "default/elasticweb-sample"}
1.681983830141667e+09 INFO 4. deployment not exists {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219", "elasticweb": "default/elasticweb-sample"}
1.6819838302434275e+09 INFO set reference {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.6819838302434833e+09 INFO start create service {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.68198383029691e+09 INFO create service success {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.6819838302969563e+09 INFO expectReplicas [2] {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.681983830296983e+09 INFO set reference {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.681983830296997e+09 INFO start create deployment {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.6819838303819373e+09 INFO create deployment success {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
1.6819838303819807e+09 INFO singlePodQPS [500], replicas [2], realQPS[1000] {"controller": "elasticweb", "controllerGroup": "elasticweb.wu123.com", "controllerKind": "ElasticWeb", "elasticWeb": {"name":"elasticweb-sample","namespace":"default"}, "namespace": "default", "name": "elasticweb-sample", "reconcileID": "b327be5c-a428-487d-b6d6-ac78de5a0219"}
通过分析日志发现Reconcile方法执行了两次,第一执行时创建了deployment和service等资源:
2023-04-20T16:41:59.108+0800 INFO controllers.ElasticWeb 1. start reconcile logic {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.108+0800 INFO controllers.ElasticWeb 3. instance : Image [tomcat:8.0.18-jre8], Port [30003], SinglePodQPS [500], TotalQPS [600], RealQPS [nil] {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.210+0800 INFO controllers.ElasticWeb 4. deployment not exists {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.313+0800 INFO controllers.ElasticWeb set reference {"func": "createService"}
2023-04-20T16:41:59.313+0800 INFO controllers.ElasticWeb start create service {"func": "createService"}
2023-04-20T16:41:59.364+0800 INFO controllers.ElasticWeb create service success {"func": "createService"}
2023-04-20T16:41:59.365+0800 INFO controllers.ElasticWeb expectReplicas [2] {"func": "createDeployment"}
2023-04-20T16:41:59.365+0800 INFO controllers.ElasticWeb set reference {"func": "createDeployment"}
2023-04-20T16:41:59.365+0800 INFO controllers.ElasticWeb start create deployment {"func": "createDeployment"}
2023-04-20T16:41:59.382+0800 INFO controllers.ElasticWeb create deployment success {"func": "createDeployment"}
2023-04-20T16:41:59.382+0800 INFO controllers.ElasticWeb singlePodQPS [500], replicas [2], realQPS[1000] {"func": "updateStatus"}
2023-04-20T16:41:59.407+0800 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "elasticweb", "request": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800 INFO controllers.ElasticWeb 1. start reconcile logic {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800 INFO controllers.ElasticWeb 3. instance : Image [tomcat:8.0.18-jre8], Port [30003], SinglePodQPS [500], TotalQPS [600], RealQPS [1000] {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800 INFO controllers.ElasticWeb 9. expectReplicas [2], realReplicas [2] {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800 INFO controllers.ElasticWeb 10. return now {"elasticweb": "dev/elasticweb-sample"}
2023-04-20T16:41:59.407+0800 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "elasticweb", "request": "dev/elasticweb-sample"}
- 再用kubectl get命令详细检查资源对象,一切符合预期,elasticweb、service、deployment、pod都是正常的:
浏览器验证业务功能
用浏览器访问http://192.168.204.131:30003
修改单个Pod的QPS
- 如果自身优化,或者外界依赖变化(如缓存、数据库扩容),这些都可能导致当前服务的QPS提升,假设单个Pod的QPS从500提升到了800,看看咱们的Operator能不能自动做出调整(总QPS是600,因此pod数应该从2降到1)
- 在config/samples/目录下新增名为update_single_pod_qps.yaml的文件,内容如下:
```yaml
spec:
singlePodQPS: 800
```
执行以下命令,即可将单个Pod的QPS从500更新为800
kubectl patch elasticweb elasticweb-sample \
-n dev \
--type merge \
--patch "$(cat config/samples/update_single_pod_qps.yaml)"
此时查看controller日志,spec已经更新,表示用最新的参数计算出来的pod数量,符合预期:
- 用kubectl get命令检查pod,可见已经降到1个了:
kubectl get pod -n dev
- 记得用浏览器检查tomcat是否正常;
修改总QPS
- 外部QPS也在频繁变化中,operator也需要根据总QPS及时调节pod实例,以确保整体服务质量,接下来就修改总QPS,看operator是否生效:
- 在config/samples/目录下新增名为update_total_qps.yaml的文件,内容如下:
```yaml
spec:
totalQPS: 2600
```
执行以下命令,即可将总QPS从600更新为2600
kubectl patch elasticweb elasticweb-sample \
-n dev \
--type merge \
--patch "$(cat config/samples/update_total_qps.yaml)"
此时查看controller日志,spec已经更新,表示用最新的参数计算出来的pod数量,符合预期:
用kubectl get命令检查pod,可见已经增长到4个,4个pd的能支撑的QPS为3200,满足了当前2600的要求:
kubectl get pod -n dev
- 记得用浏览器检查tomcat是否正常;
用这个方法来调节pod数太low,但可以自己开发一个应用,收到当前QPS后自动调用client-go去修改elasticweb的totalQPS,让operator及时调整pod数,这也算自动调节了.
删除验证
- 目前整个dev这个namespace下有service、deployment、pod、elasticweb这些资源对象,如果要全部删除,只需删除elasticweb即可,因为service和deployment都和elasticweb建立的关联关系,代码如下:
// 这一步非常关键!
// 建立关联后,删除elasticweb资源时就会将deployment也删除掉
l.Info("set reference")
if err := controllerutil.SetControllerReference(elasticWeb, deployment, r.Scheme); err != nil {
l.Error(err, "SetControllerReference error")
return err
}
- 执行删除elasticweb的命令:
kubectl delete elasticweb elasticweb-sample -n dev
- 再去查看其他资源,都被自动删除了:
构建镜像
make docker-build docker-push IMG=wu123.com/elasticweb:v0.1
镜像准备好之后,执行以下命令即可在kubernetes环境部署controller:
make deploy IMG=wu123.com/elasticweb:v0.1
创建elasticweb资源对象,验证所有资源是否创建成功
检查—controller的日志
kubectl logs -f \
elasticweb-controller-manager-5795d4d98d-t6jvc \
-c manager \
-n elasticweb-system
再用浏览器验证tomcat已经启动成功
卸载和清理
想把前面创建的资源全部清理掉,可以执行以下命令:
make uninstall
webhook
- 介绍webhook;
- 结合前面的elasticweb项目,设计一个使用webhook的场景;
- 准备工作
- 生成webhook
- 开发(配置)
- 开发(编码)
- 部署
- 验证Defaulter(添加默认值)
- 验证Validator(合法性校验)
Operator中的webhook,其作用与过滤器类似,外部对CRD资源的变更,在Controller处理之前都会交给webhook提前处理,流程如下图
webhook可以做两件事:修改(mutating)和验证(validating)
- kubebuilder为我们提供了生成webhook的基础文件和代码的工具,与制作API的工具类似,极大地简化了工作量,只需聚焦业务实现即可;
- 基于kubebuilder制作的webhook和controller,如果是同一个资源,那么它们在同一个进程中;
设计实战场景
- 为了让实战有意义,为前面的elasticweb项目上增加需求,让webhook发挥实际作用;
- 如果用户忘记输入总QPS,系统webhook负责设置默认值1300,操作如下图:
为了保护系统,给单个pod的QPS设置上限1000,如果外部输入的singlePodQPS值超过1000,就创建资源对象失败,如下图所示:
准备工作
- 和controller类似,webhook既能在kubernetes环境中运行,也能在kubernetes环境之外运行;
- 如果webhook在kubernetes环境之外运行,需要将证书放在所在环境,默认地址是:
/tmp/k8s-webhook-server/serving-certs/tls.{crt,key}
- 将webhook部署在kubernetes环境中
- 为了让webhook在kubernetes环境中运行,做一点准备工作安装cert manager,执行以下操作:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml
- 上述操作完成后会新建很多资源,如namespace、rbac、pod等,以pod为例如下:
[root@k8s-worker02 k8s-exercise]# kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
namespace/cert-manager created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
[root@k8s-worker02 k8s-exercise]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-cainjector-54887dfcbc-prkh5 1/1 Running 2 (29s ago) 9m9s
cert-manager cert-manager-f746879f6-mp7zz 1/1 Running 0 9m9s
cert-manager cert-manager-webhook-575ccb5c7b-rdpv2 1/1 Running 0 9m9s
生成webhook
- 进入elasticweb工程下,执行以下命令创建webhook:
-
[root@k8s-worker02 elasticweb]# kubebuilder create webhook \ > --group elasticweb \ > --version v1 \ > --kind ElasticWeb \ > --defaulting \ > --programmatic-validation Writing kustomize manifests for you to edit... Writing scaffold for you to edit... api/v1/elasticweb_webhook.go Update dependencies: $ go mod tidy Running make: $ make generate test -s /home/gopath/elasticweb/bin/controller-gen || GOBIN=/home/gopath/elasticweb/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.9.2 /home/gopath/elasticweb/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..." Next: implement your new Webhook and generate the manifests with: $ make manifests
上述命令执行完毕后,先去看看main.go文件,如下所示,自动增加了一段代码,作用是让webhook生效:
if err = (&elasticwebv1.ElasticWeb{}).SetupWebhookWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create webhook", "webhook", "ElasticWeb")
os.Exit(1)
}
elasticweb_webhook.go就是新增文件
上述代码有两处需要注意,第一处和填写默认值有关
1.如果要操作资源,需要在此增加权限
//+kubebuilder:webhook:path=/mutate-elasticweb-wu123-com-v1-elasticweb,mutating=true,failurePolicy=fail,sideEffects=None,groups=elasticweb.wu123.com,resources=elasticwebs,verbs=create;update,versions=v1,name=melasticweb.kb.io,admissionReviewVersions=v1
2.有此对象,才会开启填写默认值的逻辑
var _ webhook.Defaulter = &ElasticWeb{}
3.填写默认值的代码在此处添加
// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *ElasticWeb) Default() {
elasticweblog.Info("default", "name", r.Name)
// TODO(user): fill in your defaulting logic.
}
第二处和校验有关
1.校验时如果要操作其他资源,在此处更改
// TODO(user): change verbs to "verbs=create;update;delete" if you want to enable deletion validation.
//+kubebuilder:webhook:path=/validate-elasticweb-wu123-com-v1-elasticweb,mutating=false,failurePolicy=fail,sideEffects=None,groups=elasticweb.wu123.com,resources=elasticwebs,verbs=create;update,versions=v1,name=velasticweb.kb.io,admissionReviewVersions=v1
2.有次实例,校验逻辑才会生效
var _ webhook.Validator = &ElasticWeb{}
3.新增的时候,会调用此方校验
// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *ElasticWeb) ValidateCreate() error {
elasticweblog.Info("validate create", "name", r.Name)
// TODO(user): fill in your validation logic upon object creation.
return nil
}
要实现的业务需求就是通过修改上述elasticweb_webhook.go的内容来实现,不过代码稍后再写,先把配置都改好;
开发(配置)
- 打开文件config/default/kustomization.yaml,下图四个的内容原本都被注释了,现在请将注释符号都删掉,使其生效:
- ../webhook
- ../certmanager
- manager_webhook_patch.yaml
- webhookcainjection_patch.yaml
文件config/default/kustomization.yaml,节点vars下面的内容,原本全部被注释了,现在请全部放开
# the following config is for teaching kustomize how to do var substitution
vars:
# [CERTMANAGER] To enable cert-manager, uncomment all sections with 'CERTMANAGER' prefix.
- name: CERTIFICATE_NAMESPACE # namespace of the certificate CR
objref:
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # this name should match the one in certificate.yaml
fieldref:
fieldpath: metadata.namespace
- name: CERTIFICATE_NAME
objref:
kind: Certificate
group: cert-manager.io
version: v1
name: serving-cert # this name should match the one in certificate.yaml
- name: SERVICE_NAMESPACE # namespace of the service
objref:
kind: Service
version: v1
name: webhook-service
fieldref:
fieldpath: metadata.namespace
- name: SERVICE_NAME
objref:
kind: Service
version: v1
name: webhook-service
- 配置已经完成,可以编码了;
开发(编码)
- 打开文件elasticweb_webhook.go
- 新增依赖:
apierrors "k8s.io/apimachinery/pkg/api/errors"
- 找到Default方法,改成如下内容,可见代码很简单,判断TotalQPS是否存在,若不存在就写入默认值,另外还加了两行日志:
// Default implements webhook.Defaulter so a webhook will be registered for the type
func (r *ElasticWeb) Default() {
elasticweblog.Info("default", "name", r.Name)
// TODO(user): fill in your defaulting logic.
// 如果创建的时候没有输入总QPS,就设置个默认值
if r.Spec.TotalQPS == nil {
r.Spec.TotalQPS = new(int32)
*r.Spec.TotalQPS = 1300
elasticweblog.Info("a. TotalQPS is nil, set default value now", "TotalQPS", *r.Spec.TotalQPS)
} else {
elasticweblog.Info("b. TotalQPS exists", "TotalQPS", *r.Spec.TotalQPS)
}
}
- 接下来开发校验功能,把校验功能封装成一个validateElasticWeb方法,然后在新增和修改的时候各调用一次,如下,可见最终是调用apierrors.NewInvalid生成错误实例的,而此方法接受的是多个错误,因此要为其准备切片做入参,当然了,如果是多个参数校验失败,可以都放入切片中:
func (r *ElasticWeb) validateElasticWeb() error {
var allErrs field.ErrorList
if *r.Spec.SinglePodQPS > 1000 {
elasticweblog.Info("c. Invalid SinglePodQPS")
err := field.Invalid(field.NewPath("spec").Child("singlePodQPS"),
*r.Spec.SinglePodQPS,
"d. must be less than 1000")
allErrs = append(allErrs, err)
return apierrors.NewInvalid(
schema.GroupKind{Group: "elasticweb.com.bolingcavalry", Kind: "ElasticWeb"},
r.Name,
allErrs)
} else {
elasticweblog.Info("e. SinglePodQPS is valid")
return nil
}
}
- 再找到新增和修改资源对象时被调用的方法,在里面调用validateElasticWeb:
// ValidateCreate implements webhook.Validator so a webhook will be registered for the type
func (r *ElasticWeb) ValidateCreate() error {
elasticweblog.Info("validate create", "name", r.Name)
// TODO(user): fill in your validation logic upon object creation.
return r.validateElasticWeb()
}
// ValidateUpdate implements webhook.Validator so a webhook will be registered for the type
func (r *ElasticWeb) ValidateUpdate(old runtime.Object) error {
elasticweblog.Info("validate update", "name", r.Name)
// TODO(user): fill in your validation logic upon object update.
return r.validateElasticWeb()
}
- 编码完成,接下来,把以前遗留的东西清理一下,开始新的部署和验证;
清理工作
- 删除elasticweb资源对象:
kubectl delete -f config/samples/elasticweb_v1_elasticweb.yaml
- 删除controller
kustomize build config/default | kubectl delete -f -
- 删除CRD
make uninstall
部署
- 部署CRD
make install
构建镜像并推送到仓库
make docker-build docker-push IMG=registry.cn-hangzhou.aliyuncs.com/wu123/elasticweb:v0.1
部署集成了webhook功能的controller
make deploy IMG=registry.cn-hangzhou.aliyuncs.com/wu123/elasticweb:v0.1
- 查看pod,确认启动成功:
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-6588898cb4-nvnz8 1/1 Running 1 5d21h
cert-manager cert-manager-cainjector-7bcbdbd99f-q645r 1/1 Running 1 5d21h
cert-manager cert-manager-webhook-5fd9f9dd86-98tm9 1/1 Running 1 5d21h
elasticweb-system elasticweb-controller-manager-7dcbfd4675-898gb 2/2 Running 0 20s
验证Defaulter(添加默认值)
- 修改文件config/samples/elasticweb_v1_elasticweb.yaml,修改后的内容如下,可见totalQPS字段已经被注释掉了:
apiVersion: elasticweb.wu123.com/v1
kind: ElasticWeb
metadata:
name: elasticweb-sample
spec:
# TODO(user): Add fields here
image: tomcat:8.0.18-jre8
port: 30003
singlePodQPS: 500
# totalQPS: 600
- 创建一个elasticweb资源对象:
kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml
- 此时单个pod的QPS是500,如果webhook的代码生效的话,总QPS就是1300,而对应的pod数应该是3个,接下来看看是否符合预期;
- 先看elasticweb、deployment、pod等资源对象是否正常,如下所示,全部符合预期:
kubectl get elasticweb
kubectl get deployments
kubectl get service
kubectl get pod
- 用kubectl describe命令查看elasticweb资源对象的详情,如下所示,TotalQPS字段被webhook设置为1300,RealQPS也计算正确
查看controller的日志,其中的webhook部分是否符合预期,发现TotalQPS字段为空,就将设置为默认值,并且在检测的时候SinglePodQPS的值也没有超过1000
用浏览器验证web服务是否正常 http://192.168.204.131:30003/
验证Validator
- 接下来该验证webhook的参数校验功能了,先验证修改时的逻辑;
- 编辑文件config/samples/update_single_pod_qps.yaml,值如下:
spec:
singlePodQPS: 1100
- 用patch命令使之生效:
kubectl patch elasticweb elasticweb-sample \
--type merge \
--patch "$(cat config/samples/update_single_pod_qps.yaml)"
- 此时,控制台会输出错误信息:
Error from server (ElasticWeb.elasticweb.com.bolingcavalry "elasticweb-sample" is invalid: spec.singlePodQPS: Invalid value: 1100: d. must be less than 1000): admission webhook "velasticweb.kb.io" denied the request: ElasticWeb.elasticweb.wu123.com "elasticweb-sample" is invalid: spec.singlePodQPS: Invalid value: 1100: d. must be less than 1000
用kubectl describe命令查看elasticweb资源对象的详情,依然是500,可见webhook已经生效,阻止了错误的发生
查看controller日志
webhook在新增时候的校验功能
清理前面创建的elastic资源对象,执行命令:
kubectl delete -f config/samples/elasticweb_v1_elasticweb.yaml
修改文件,将singlePodQPS的值改为超过1000,看看webhook是否能检查到这个错误,并阻止资源对象的创建:
apiVersion: elasticweb.wu123.com/v1
kind: ElasticWeb
metadata:
name: elasticweb-sample
spec:
# TODO(user): Add fields here
image: tomcat:8.0.18-jre8
port: 30003
singlePodQPS: 1500
# totalQPS: 600
- 执行以下命令开始创建elasticweb资源对象:
kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml
- 控制台提示以下信息,包含了代码中写入的错误描述,证明elasticweb资源对象创建失败,证明webhook的Validator功能已经生效:
Error from server (ElasticWeb.elasticweb.com.bolingcavalry "elasticweb-sample" is invalid: spec.singlePodQPS: Invalid value: 1500: d. must be less than 1000): error when creating "config/samples/elasticweb_v1_elasticweb.yaml": admission webhook "velasticweb.kb.io" denied the request: ElasticWeb.elasticweb.wu123.com "elasticweb-sample" is invalid: spec.singlePodQPS: Invalid value: 1500: d. must be less than 1000
查看controller日志
总结
CRD的Status字段
elasticweb的CRD,其数据结构代码如下
// 期望状态
// ElasticWebSpec defines the desired state of ElasticWeb
type ElasticWebSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file
// Foo is an example field of ElasticWeb. Edit elasticweb_types.go to remove/update
// Foo string `json:"foo,omitempty"`
// 业务服务对应的镜像,包括名称:tag
Image string `json:"image"`
// service占用的宿主机端口,外部请求通过此端口访问pod的服务
Port *int32 `json:"port"`
// 单个pod的QPS上限
SinglePodQPS *int32 `json:"singlePodQPS"`
// 当前整个业务的总QPS
TotalQPS *int32 `json:"totalQPS"`
}
// 实际状态,该数据结构中的值都是业务代码计算出来的
// ElasticWebStatus defines the observed state of ElasticWeb
type ElasticWebStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
// 当前kubernetes中实际支持的总QPS
RealQPS *int32 `json:"realQPS,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
// ElasticWeb is the Schema for the elasticwebs API
type ElasticWeb struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ElasticWebSpec `json:"spec,omitempty"`
Status ElasticWebStatus `json:"status,omitempty"`
}
- 该CRD的Status数据结构只有一个字段RealQPS,该字段的Tag,`json:"realQPS,omitempty"`里面的omitempty属性非常重要
- 如果RealQPS的Tag中没有omitempty属性,会发生什么事情呢?
- 实际上,在开发webhook之前,漏掉了RealQPS的omitempty属性,但是整个controller可以正常工作,elasticweb的功能也达到预期,也就是说status的字段如果没有omitempty属性,不影响operator的功能;
- 但是,在启用了webhook之后,创建资源对象时就报错了:
kubectl apply -f config/samples/elasticweb_v1_elasticweb.yaml
The ElasticWeb "elasticweb-sample" is invalid: status.realQPS: Invalid value: "null": status.realQPS in body must be of type integer: "null"
也就是说,Status数据结构的字段中,如果json tag没有omitempty属性,在启用了webhook之后创建资源对象会失败
本地运行controller时跳过webhook
- controller有两种部署方式:部署在kubernetes环境内,或者在kubernetes环境外独立运行
- 在编码阶段,通常选择在本地运行controller,这样省去了镜像相关的操作;
- 但是,如果使用了webhook,由于其特殊的鉴权方式,需要将kubernetes签发的证书放置在本地(/tmp/k8s-webhook-server/serving-certs/目录):
- 选择部署在kubernetes环境,要制作镜像和上传镜像;
- 选择运行在kubernetes环境之外,要签发证书放置在指定目录;
- 面对上述的纠结,官方给出了一个建议,如果在开发阶段暂时用不上webhook(注意这个前提),那么在本地运行controller时可以用屏蔽掉webhook功能,具体操作由以下两步组成:
- 首先是修改main.go代码,如下新增的代码,其实就是增加了一个判断,如果环境变量ENABLE_WEBHOOKS等于false,就不会执行webhook相关逻辑:
新增了一个判断,如果等于false就不执行webhook
if os.Getenv("ENABLE_WEBHOOKS") != "false" {
if err = (&elasticwebv1.ElasticWeb{}).SetupWebhookWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create webhook", "webhook", "ElasticWeb")
os.Exit(1)
}
}
本地启动controller的命令,以前是make run,现在改成如下命令,即增加了一个参数:
make run ENABLE_WEBHOOKS=false
- 现在controller可以正常启动了,功能也正常,只是webhook相关的功能全部都不生效了;
controller的pod有两个容器
- 如果controller部署在kubernetes环境内,其是以pod的形态存在的,也就是写的webhook、reconcile代码都是在这个pod中运行的;
- 上述pod内实际上有两个容器,用kubectl describe命令看看这个pod,可见名为manager的容器才是controller代码运行的地方:
1.webhhook和reconcile功能在manager容器
2.kube-rbac-proxy是一个小型的http代理,用于RBAC鉴权
代码仓库 https://github.com/yunixiangfeng/k8s-exercise.git
k8s-exercise/elasticweb at main · yunixiangfeng/k8s-exercise · GitHub