背景:
线上tke集群1.20.6,就相当于kubernetes1.20版本吧!前几天点了一下升级,升级了master节点。按照我的个人理解集群升级会对集群api兼容性检查的,通过了升级了没有问题。昨天对集群的节点进行了缩容。然后pod进行了重新的调度。问题就来了:
早期有一个搭建的eck集群:TKE1.20.6搭建elasticsearch on kubernetes。elastic-operator-0 and kibana 服务都不能正常运行了,logs pod 日志如下:
elastic-operator日志内容基本是:
unable to setup and fill the webhook certificates","service.version":"1.6.0+8326ca8a","service.type":"eck","ecs.version":"1.4.0","error":"the server could not find the requested resource","error.stack_trace":"github.com/elastic/cloud-on-k8s/cmd/manager.startOperator\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:558\ngithub.com/elastic/cloud-on-k8s/cmd/manager.doRun.func2\n\t/go/src/github.com/elastic/cloud-on-k8s/cmd/manager/main.go:328
就截取一下:**unable to setup and fill the webhook **
kibana日志如下:
解决过程:
搜索引擎关键词
我能想到的:
kubernetes 1.20 upgrade 1.22 eck unable to setup and fill the webhook certificates
我点的第二个:https://github.com/elastic/cloud-on-k8s/issues/3958
快速扫一眼看到了:can you give us more information about the kind of cluster your are using (self managed, Azure, EKS…) ? admissionregistration.k8s.io/v1beta1 is supposed to be removed in 1.22 only: kubernetes/kubernetes#82021
基本确定了是admissionregistration.k8s.io版本问题!
看一下:
kubectl get validatingwebhookconfiguration
NAME WEBHOOKS AGE
elastic-webhook.k8s.elastic.co 8 717d
gloo-gateway-validation-webhook-zadig 1 129d
kubectl describe validatingwebhookconfiguration elastic-webhook.k8s.elastic.co
admissionregistration.k8s.io/v1beta1
1.22版本api变动
还是参照kubernetes官方文档:
https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-22
解决问题:
可以参照:https://github.com/elastic/cloud-on-k8s/issues/3958中charith-elastic 的方式试一下:
我是直接选择了升级eck:
当前eck版本为1.6 ,elasticsearch版本7.6.2 ,kubernetes集群 1.22。发现相对应支持版本起码 1.8版本:
直接升级eck operator服务:https://www.elastic.co/guide/en/cloud-on-k8s/1.8/k8s-upgrading-eck.html
kubectl replace -f https://download.elastic.co/downloads/eck/1.8.0/crds.yaml
kubectl apply -f https://download.elastic.co/downloads/eck/1.8.0/operator.yaml
由于elastic-operator-0 是一个有状态statefulset服务,未能正常启动,他会一直尝试。等待启动后才能进行下一次更新(如果我没有理解错的话),手动delete elastic-operator-0 pod:
kubectl delete pods elastic-operator-0 -n elastic-system
删除kibana pod 等待pod running:
kubectl delete pods elastic-kb-677b867cf7-5vb2v -n logging
web 访问 kibana服务正常访问!
总结:
- kubernetes 版本升级的api校验,tke这点是不好,没有提醒用户阿,kubeadm升级还会进行校验一遍资源呢!
- 升级版本前还是要检查一下kubeadm官方升级文档:https://kubernetes.io/docs/reference/using-api/deprecation-guide/#removed-apis-by-release
- 应用组建版本的api还是要好好检查一下!