kubernetes 联邦搭建(kubefed)

最新推荐文章于 2024-03-23 09:40:05 发布

ybz@

最新推荐文章于 2024-03-23 09:40:05 发布

阅读量1.1k

点赞数 1

文章标签： kubernetes 云原生云计算运维

本文链接：https://blog.csdn.net/weixin_46357079/article/details/125576651

版权

kubernetes 联邦搭建(kubefed) 混合云

集群联邦（Federation）的目的是实现单一集群统一管理多个Kubernetes集群的机制，这些集群可能是跨地区（Region），也可能是在不同公有云供应商上，或者是公司内部自行建立的集群。

一但集群进行联邦后，就可以利用Federation API资源来统一管理多个集群的Kubernetes API资源，如定义Deployment如何部署到不同集群上，其集群所需的副本数等。

通过集群联邦，我们可以：

简化管理多个集群的Kubernetes 组件，如Deployment, Service 等
在多个集群之间分散工作负载，以提升应用的可靠性
跨集群的资源编排，依据编排策略在多个集群进行应用部署
在不同集群中，能更快速更容易地迁移应用
跨集群的服务发现，服务可以实现地理位置感知，以降低延迟（不一定必须用kubefed实现）
实践多云（Multi-cloud）或混合云（Hybird Cloud）的部署

前置条件我这里用的是华为云的cce和本地kubernetes的组合

华为云集群地址需要公有地址

下载华为云kubeconfig.json 因为是json，把他转换为yaml格式，填到$HOME/.kube/config下

apiVersion: v1
clusters:
- cluster: #添加这个object，注意是华为云中的externalCluster
    insecure-skip-tls-verify: true
    server: https://${公网ip}:5443
  name: cce
- cluster:
    certificate-authority-data: 
     ...
    server: https://192.168.12.12:6443
  name: kubernetes
contexts:
- context: #这个object注意是华为云中的external
    cluster: cce
    user: user
  name: cce
- context: 
    cluster: kubernetes
    user: kubernetes-admin
  name: local
current-context: local
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data:  
    ....
    client-key-data: 
    ....
- name: user  #添加这个object，注意是华为云中的external
  user:
    client-certificate-data: 
    ....
    client-key-data: 
    ....

修改完成后查看

kubectl config get-contexts  #有添加的集群说明成功
    CURRENT   NAME    CLUSTER      AUTHINFO           NAMESPACE
              cce     cce          user               
    *         local   kubernetes   kubernetes-admin  
#可以切换到cce
kubectl config use-contexts cce 
#但是我将kubefed控制平面部署到local集群上，就不切换

开始使用helm搭建kubefed，chart的相关参数readme

 helm repo add kubefed-charts https://raw.githubusercontent.com/kubernetes-sigs/kubefed/master/charts
 安装
 helm --namespace kube-federation-system upgrade -i kubefed kubefed-charts/kubefed --version=<x.x.x> --create-namespace
 卸载
 helm --namespace kube-federation-system uninstall kubefed
 #等待部署完成
 kubectl get pods -n kube-federation-system  #全部为read

加入联邦

kubefedctl join <在联邦中注册的集群名称CLUSTER> --cluster-context <要加入集群的 context NAME> --host-cluster-context <HOST 集群的 context name #就是创建控制平面的context>
kubefedctl  join  cce  --cluster-context cce   --host-cluster-context local   --v=2
kubefedctl  join  local1  --cluster-context local   --host-cluster-context local   --v=2
kubectl get kubefedclusters.core.kubefed.io  -n kube-federation-system #查看是否READY
NAME     AGE     READY   KUBERNETES-VERSION
cce      87m     True    v1.21.7-r0-CCE22.3.1.B012
cce11    3m16s   True    v1.21.7-r0-CCE22.3.1.B012
local1   80m     True    v1.22.1
#解绑
kubefedctl unjoin cce  --cluster-context cce   --host-cluster-context local   --v=2

kubefedctl join 命令只是将 Kubeconfig 里的配置转化为 KubeFedCluster 自定义资源存储到 kube-federation-system 命名空间中.

跨集群同步资源

启用资源联邦化

对于 KubeFed 来说，资源管理分两类，一是资源的类型管理，另一个是被联邦（federated）的资源管理。

对于资源类型，kubefedctl 提供了 enable 来使新的资源可以被联邦管理

kubefedctl enable <target kubernetes API type>
#比如我要打开联邦化资源,deploy,但是默认是开启的
kubefedctl enable   deploy 
#要同步全部crd资源就要
kubefedctl enable  crd

其中可以使用以下的描述：

类型，即 Kind (比如 Deployment)
复数名词 (比如 deployments)
带 api group 的复数资源名词 (比如 deployment.apps)
缩写 (比如 deploy)

因为 Kubefed 是通过 CRD 管理资源，因此，当 enable 执行之后可以看到 Host Cluster 中新增了一种名为 federatedvirtualservices 的 CRD：

kubectl get crd|grep federated

部署示例

https://github.com/kubernetes-sigs/kubefed/tree/master/example/sample1

#创建一个集群namespace
kubectl create ns  test-namespace
kubectl apply -f -<<EOF
apiVersion: types.kubefed.io/v1beta1
kind: FederatedNamespace
metadata:
  name: test-namespace
  namespace: test-namespace
spec:
  placement: #这个会把资源部署到定义的cluster下
    clusters:
    - name: local
    - name: cce
EOF
#现在查看两个集群就都有这个namespace了
kubectl apply -f sample1/ #部署所有资源
#会提示错误,因为这个资源没有联邦化
unable to recognize "sample1/federated<type>.yaml": no matches for kind "Federated<type>" in version "types.kubefed.io/v1beta1"
#开启资源联邦
kubefedctl enable ClusterRoleBinding
kubectl apply -f sample1/ #部署所有资源
kubectl get all #两个集群访问nginx地址30080应该都能访问成功

资源字段

op定义要执行的操作（add或remove受replace支持）

replace替换一个值如果未指定，op将默认为replace
add向对象或数组添加值
remove从对象或数组中删除一个值
path指定托管资源中的有效位置以作为修改的目标,path必须以前导开头，/条目之间必须用/

kind: FederatedDeployment
...
spec:
  ...
  overrides:
  # 重新cce集群的相关资源，而不是更新所有集群的资源
    - clusterName: cce 
      clusterOverrides:
        # 设置副本为5
        - path: "/spec/replicas"
          value: 5
        # 更换镜像
        - path: "/spec/template/spec/containers/0/image"
          value: "nginx:1.17.0-alpine"
        #添加注解，如果有不会添加
        - path: "/metadata/annotations"
          op: "add"
          value:
            foo: bar
        # 删除注解
        - path: "/metadata/annotations/foo"
          op: "remove"
        # 在args列表的索引0处添加参数' -q '
        - path: "/spec/template/spec/containers/0/args/0"
          op: "add"
          value: "-q"

spec:
  placement: 
    clusters: #选择部署到那些集群 ，有这个字段时，clusterSelector不会生效，就算cluster为[]时也不会生效
      - name: cluster2
      - name: cluster1
    clusterSelector: #选择部署到有相关标签的集群，如果没有clusters，clusterSelector设置为{},默认传播到所有集群
      matchLabels:
        foo: bar

ReplicaSchedulingPreference

用来控制所有集群中的deploy总数，会进行故障迁移

在所有可用集群中平均分配总副本

apiVersion: scheduling.kubefed.io/v1alpha1
kind: ReplicaSchedulingPreference
metadata:
  name: test-deployment
  namespace: test-ns
spec:
  targetKind: FederatedDeployment
  totalReplicas: 8
#或者
apiVersion: scheduling.kubefed.io/v1alpha1
kind: ReplicaSchedulingPreference
metadata:
  name: test-deployment
  namespace: test-ns
spec:
  targetKind: FederatedDeployment
  totalReplicas: 8
  clusters:
    "*":
      weight: 1
#两个集群中，会一个集群运行4个

以加权比例分布副本，同时对每个集群实施副本限制

apiVersion: scheduling.kubefed.io/v1alpha1
kind: ReplicaSchedulingPreference
metadata:
  name: test-deployment
  namespace: test-ns
spec:
  targetKind: FederatedDeployment
  totalReplicas: 9
  clusters:
    local:
      minReplicas: 4
      maxReplicas: 6
      weight: 1 
    cce:
      minReplicas: 4
      maxReplicas: 8
      weight: 2
#local集群会得到4个pod,cce5个

在所有集群中均匀分布副本，但在 local中不超过 20

apiVersion: scheduling.kubefed.io/v1alpha1
kind: ReplicaSchedulingPreference
metadata:
  name: test-deployment
  namespace: test-ns
spec:
  targetKind: FederatedDeployment
  totalReplicas: 50
  clusters:
    "*":
      weight: 1
    "local":
      maxReplicas: 20
      weight: 1

故障剖析

pprof是一个可视化和分析分析数据的工具。

要收集 kubefed 分析数据，您需要在端口 8080 访问 kubefed 控制器管理器。您可以设置端口转发以访问该 pod 的 pprof 调试端点。

kubectl -n kube-federation-system port-forward pod/kubefed-controller-manager-XXXXX 8080:8080

在另一个终端中，您可以使用 curl 收集 pprof 配置文件。

收集 goroutine pprof 以及堆栈跟踪报告和完整堆栈跟踪。

curl localhost:8080/debug/pprof/goroutine -o goroutine.pprof
curl localhost:8080/debug/pprof/goroutine?debug=1 -o goroutine-debug-1.pprof
curl localhost:8080/debug/pprof/goroutine?debug=2 -o goroutine-debug-2.pprof

收集 30s cpu profile

curl localhost:8080/debug/pprof/profile -o cpu-profile.pprof

收集内存堆配置文件

curl localhost:8080/debug/pprof/heap -o heap.pprof