将 gitlab-runner 部署在集群联邦 karmada 之上

背景

目录

背景

概述

问题

解决方式

karmada-controller  源码修改

gitlab-runnner 源码修改

总结


随着公司内部使用karmada作为集群联邦将各个子集群进行统一管理,并且提供统一访问入口之后,各个业务也陆续的开始往集群联邦迁移,比如:Spark、Gitlal CI、在线业务的服务等,在线业务的迁移总体来说还是比较简单,但是Gitlab、Spark之类的或多多少的都遇到不少问题,接下来我会陆续整理几篇博客记录下相关的问题,以及怎么去解决遇到的问题。

概述

本文主要是记录在迁移gitlab ci的过程中遇到的相关问题,gitlab runner配置ci的时候,支持kubernetes的方式,具体参考文档Kubernetes executor | GitLab,gitlab runner以deployment通过集群联邦的的apiserver进行部署,同时runner自己配置的kubernetes的访问地址是集群联邦karmada-apiserver的访问地址,然后在gitlab上触发ci,然后runner pod会启动pod进行对应的ci过程,对应的pod会有三个容器,我们的环境中,只有两个容器,一个是build,一个是helper,pod起来后,会进行一些前置操作,该操作需要通过exec得方式,进入到容器中去执行一些脚本,收集日志等等。

问题

由于karmada-apiserver中创建的pod,并没有实实在在的存在于集群中的某个具体的节点上,所以pod的status中并没有hostIP,而对应的状态是从子集群中运行的对应的pod的status中同步过来的,所以这个时候,当gitlab-runner pod内部发起exec的api请求的时候会报错 pod xxx does not have a host assigned。

通过karmada-apiserver的api想要在Pod中执行命令,则需要使用karmada的cluster proxy接口去实现,命令类似 kubectl karmada  exec -it nginx sh -n gitlab-runner --cluster=east  --kubeconfig /etc/karmada/karmada-apiserver.config, 而具体的api url: /apis/cluster.karmada.io/v1alpha1/clusters/idchl-bigdata-spark2/proxy/api/v1/namespaces/gitlab-runner/pods/nginx/exec?command=sh&container=nginx&stdin=true&stdout=true&tty=true,对比kubernets原生的api地址多了 /apis/cluster.karmada.io/v1alpha1/clusters/east/proxy,所以这里我们需要更高gitlal-runner的源代码,在exector和log处理的相关代码处做修改,但是这里有个问题,/apis/cluster.karmada.io/v1alpha1/clusters/east/proxy中的east是具体的集群联邦中的子集群id,怎样从karmada-apiserver中获取到pod所在的子集群呢?原生karmada-controller代码中是没有这样的功能的,所以我们需要更改karmada-controller的代码,通过karmada-apiserver提交的pod,在controller创建resoucebinding完成的时候,将pod设置上对应到哪个集群的集群id,这样gitlab-runner通过karmada-apiserver获取到提交的pod,就可以知道pod调度到哪个子集群中去了,然后,在对应的exec和log接口调用的api处动态的根据集群id去生成正确的api url,然后发送正确的请求。

解决方式

修改karmada-controller和gitlab-runnner的源代码

karmada-controller  源码修改

代码文件:github.com/karmada-io/karmada/pkg/controllers/execution/execution_controller.go

func (c *Controller) syncToClusters(clusterName string, work *workv1alpha1.Work) error {
	var errs []error
	syncSucceedNum := 0
	for _, manifest := range work.Spec.Workload.Manifests {
		workload := &unstructured.Unstructured{}
		err := workload.UnmarshalJSON(manifest.Raw)
		if err != nil {
			klog.Errorf("Failed to unmarshal workload of the work(%s/%s), error is: %v", work.GetNamespace(), work.GetName(), err)
			errs = append(errs, err)
			continue
		}

		if err = c.tryCreateOrUpdateWorkload(clusterName, workload); err != nil {
			klog.Errorf("Failed to create or update resource(%v/%v) in the given member cluster %s, err is %v", workload.GetNamespace(), workload.GetName(), clusterName, err)
			c.eventf(workload, corev1.EventTypeWarning, events.EventReasonSyncWorkloadFailed, "Failed to create or update resource(%s) in member cluster(%s): %v", klog.KObj(workload), clusterName, err)
			errs = append(errs, err)
			continue
		}
        // 主要是在讲pod 跟集群进行resourcebinding的时候,我们就知道Pod会被调度到哪个子集群
        // 中去,所以在这个地方设置Pod对应的clustername应该是最合理的
        // 我们增加的代码逻辑开始
		if workload.GetKind() == "Pod" {
			var pod corev1.Pod
			err = c.Client.Get(context.TODO(), client.ObjectKey{Namespace: workload.GetNamespace(), Name: workload.GetName()}, &pod)
			if err != nil {
				// do nothing
			} else {
                // util.ClisterNameLabel = cluster.karmada.io/name
				pod.ObjectMeta.Labels[util.ClisterNameLabel] = clusterName
				_ = retry.RetryOnConflict(retry.DefaultRetry, func() error {
					return c.Client.Update(context.TODO(), &pod)
				})
			}
		}
         // 我们增加的代码逻辑结束
		c.eventf(workload, corev1.EventTypeNormal, events.EventReasonSyncWorkloadSucceed, "Successfully applied resource(%v/%v) to cluster %s", workload.GetNamespace(), workload.GetName(), clusterName)
		syncSucceedNum++
	}

	if len(errs) > 0 {
		total := len(work.Spec.Workload.Manifests)
		message := fmt.Sprintf("Failed to apply all manifests (%d/%d): %s", syncSucceedNum, total, errors.NewAggregate(errs).Error())
		err := c.updateAppliedCondition(work, metav1.ConditionFalse, "AppliedFailed", message)
		if err != nil {
			klog.Errorf("Failed to update applied status for given work %v, namespace is %v, err is %v", work.Name, work.Namespace, err)
			errs = append(errs, err)
		}
		return errors.NewAggregate(errs)
	}

	err := c.updateAppliedCondition(work, metav1.ConditionTrue, "AppliedSuccessful", "Manifest has been successfully applied")
	if err != nil {
		klog.Errorf("Failed to update applied status for given work %v, namespace is %v, err is %v", work.Name, work.Namespace, err)
		return err
	}

	return nil
}

gitlab-runnner 源码修改

代码文件 github.com/gitlabhq/gitlab-runner/executors/kubernetes/exec.go

func (p *AttachOptions) Run() error {
	// TODO: handle the context properly with https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27932
	pod, err := p.Client.CoreV1().Pods(p.Namespace).Get(p.Context, p.PodName, metav1.GetOptions{})
	if err != nil {
		return fmt.Errorf("couldn't get pod details: %w", err)
	}

	if pod.Status.Phase != api.PodRunning {
		return fmt.Errorf(
			"pod %q (on namespace %q) is not running and cannot execute commands; current phase is %q",
			p.PodName, p.Namespace, pod.Status.Phase,
		)
	}

	// Ending with a newline is important to actually run the script
	stdin := strings.NewReader(strings.Join(p.Command, " ") + "\n")

	var req *restclient.Request

	if pod.ObjectMeta.Labels["cluster.karmada.io/name"] != "" {
		req = p.Client.CoreV1().RESTClient().Post().AbsPath(fmt.Sprintf("/apis/cluster.karmada.io/v1alpha1/clusters/%s/proxy/api/v1", pod.ObjectMeta.Labels["cluster.karmada.io/name"])).
			Resource("pods").
			Name(pod.Name).
			Namespace(pod.Namespace).
			SubResource("attach").
			VersionedParams(&api.PodAttachOptions{
				Container: p.ContainerName,
				Stdin:     true,
				Stdout:    false,
				Stderr:    false,
				TTY:       false,
			}, scheme.ParameterCodec)
	} else {
		req = p.Client.CoreV1().RESTClient().Post().
			Resource("pods").
			Name(pod.Name).
			Namespace(pod.Namespace).
			SubResource("attach").
			VersionedParams(&api.PodAttachOptions{
				Container: p.ContainerName,
				Stdin:     true,
				Stdout:    false,
				Stderr:    false,
				TTY:       false,
			}, scheme.ParameterCodec)
	}

	return p.Executor.Execute(p.Context, http.MethodPost, req.URL(), p.Config, stdin, nil, nil, false)
}

// ExecOptions declare the arguments accepted by the Exec command
type ExecOptions struct {
	Namespace     string
	PodName       string
	ContainerName string
    // 增加ClusterName字段
	ClusterName   string
	Stdin         bool
	Command       []string

	In  io.Reader
	Out io.Writer
	Err io.Writer

	Executor RemoteExecutor
	Client   *kubernetes.Clientset
	Config   *restclient.Config

	Context context.Context
}

// Run executes a validated remote execution against a pod.
func (p *ExecOptions) Run() error {
	// TODO: handle the context properly with https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27932
	pod, err := p.Client.CoreV1().Pods(p.Namespace).Get(p.Context, p.PodName, metav1.GetOptions{})
	if err != nil {
		return fmt.Errorf("couldn't get pod details: %w", err)
	}

	if pod.Status.Phase != api.PodRunning {
		return fmt.Errorf(
			"pod %q (on namespace '%s') is not running and cannot execute commands; current phase is %q",
			p.PodName, p.Namespace, pod.Status.Phase,
		)
	}

	if p.ContainerName == "" {
		logrus.Infof("defaulting container name to '%s'", pod.Spec.Containers[0].Name)
		p.ContainerName = pod.Spec.Containers[0].Name
	}
    
    if p.ClusterName == "" && pod.ObjectMeta.Labels["cluster.karmada.io/name"] != "" {
		p.ClusterName = pod.ObjectMeta.Labels["cluster.karmada.io/name"]
	}

	return p.executeRequest()
}

func (p *ExecOptions) executeRequest() error {
    // 根据是否包含 clusterName来去做不同的api调用,这样更改后,即使在非karmada apiserver
    // 的方式下跟之前原生runner没有区别,做好兼容性
	var req *restclient.Request
	if p.ClusterName != "" {
		req = p.Client.CoreV1().RESTClient().Post().AbsPath(fmt.Sprintf("/apis/cluster.karmada.io/v1alpha1/clusters/%s/proxy/api/v1", p.ClusterName)).
			Resource("pods").
			Name(p.PodName).
			Namespace(p.Namespace).
			SubResource("exec").
			Param("container", p.ContainerName)
	} else {
		req = p.Client.CoreV1().RESTClient().Post().
			Resource("pods").
			Name(p.PodName).
			Namespace(p.Namespace).
			SubResource("exec").
			Param("container", p.ContainerName)
	}

	var stdin io.Reader
	if p.Stdin {
		stdin = p.In
	}

	req.VersionedParams(&api.PodExecOptions{
		Container: p.ContainerName,
		Command:   p.Command,
		Stdin:     stdin != nil,
		Stdout:    p.Out != nil,
		Stderr:    p.Err != nil,
	}, scheme.ParameterCodec)

	return p.Executor.Execute(p.Context, http.MethodPost, req.URL(), p.Config, stdin, p.Out, p.Err, false)
}

代码文件:github.com/gitlabhq/gitlab-runner/executors/kubernetes/kubernetes.go

func (s *executor) runInContainerWithExec(
	ctx context.Context,
	name string,
	command []string,
	script string,
) <-chan error {
	errCh := make(chan error, 1)
	go func() {
		defer close(errCh)

		var out io.Writer = s.Trace
		if s.Build.IsFeatureFlagOn(featureflags.PrintPodEvents) {
			out = io.Discard
		}

		status, err := waitForPodRunning(ctx, s.kubeClient, s.pod, out, s.Config.Kubernetes)
		if err != nil {
			errCh <- err
			return
		}

		if status != api.PodRunning {
			errCh <- fmt.Errorf("pod failed to enter running state: %s", status)
			return
		}

		exec := ExecOptions{
			PodName:       s.pod.Name,
			Namespace:     s.pod.Namespace,
			ContainerName: name,
            // 设置上正确的ClusterName
			ClusterName:   s.pod.ObjectMeta.Annotations["cluster.karmada.io/name"],
			Command:       command,
			In:            strings.NewReader(script),
			Out:           s.Trace,
			Err:           s.Trace,
			Stdin:         true,
			Config:        s.kubeConfig,
			Client:        s.kubeClient,
			Executor:      &DefaultRemoteExecutor{},

			Context: ctx,
		}

		kubeRequest := newRetryableKubeAPICallWithValue(func() (any, error) {
			return nil, exec.Run()
		})
		errCh <- kubeRequest.Run()
	}()

	return errCh
}


func newExecutor() *executor {
	e := &executor{
		AbstractExecutor: executors.AbstractExecutor{
			ExecutorOptions: executorOptions,
		},
		remoteProcessTerminated: make(chan shells.StageCommandStatus),
	}

	e.newLogProcessor = func() logProcessor {
		return newKubernetesLogProcessor(
			e.kubeClient,
			e.kubeConfig,
			&backoff.Backoff{Min: time.Second, Max: 30 * time.Second},
			e.Build.Log(),
			kubernetesLogProcessorPodConfig{
				namespace:          e.pod.Namespace,
				pod:                e.pod.Name,
				container:          helperContainerName,
				logPath:            e.logFile(),
                // 设置正确的ClusterName
				clusterName:        e.pod.ObjectMeta.Labels["cluster.karmada.io/name"],
				waitLogFileTimeout: waitLogFileTimeout,
			},
		)
	}

	return e
}

代码路径:github.com/gitlabhq/gitlab-runner/executors/kubernetes/log_processor.go

type kubernetesLogProcessorPodConfig struct {
	namespace          string
	pod                string
	container          string
	logPath            string
    // 增加ClusterName字段
	clusterName        string
	waitLogFileTimeout time.Duration
}


func (s *kubernetesLogStreamer) Stream(ctx context.Context, offset int64, output io.Writer) error {
	exec := ExecOptions{
		Namespace:     s.namespace,
		PodName:       s.pod,
		ContainerName: s.container,
        // 设置上正确的ClusterName
		ClusterName:   s.kubernetesLogProcessorPodConfig.clusterName,
		Stdin:         false,
		Command: []string{
			"gitlab-runner-helper",
			"read-logs",
			"--path",
			s.logPath,
			"--offset",
			strconv.FormatInt(offset, 10),
			"--wait-file-timeout",
			s.waitLogFileTimeout.String(),
		},
		Out:      output,
		Err:      output,
		Executor: s.executor,
		Client:   s.client,
		Config:   s.clientConfig,

		Context: ctx,
	}

	return exec.executeRequest()
}

代码路径:github.com/gitlabhq/gitlab-runner/executors/kubernetes/terminal.go

func (s *executor) getTerminalWebSocketURL() *url.URL {
	var wsURL *url.URL
	if s.pod.ObjectMeta.Labels["cluster.karmada.io/name"] != "" {
		wsURL = s.kubeClient.CoreV1().RESTClient().Post().AbsPath(fmt.Sprintf("/apis/cluster.karmada.io/v1alpha1/clusters/%s/proxy/api/v1", s.pod.ObjectMeta.Labels["cluster.karmada.io/name"])).
			Namespace(s.pod.Namespace).
			Resource("pods").
			Name(s.pod.Name).
			SubResource("exec").
			VersionedParams(&api.PodExecOptions{
				Stdin:     true,
				Stdout:    true,
				Stderr:    true,
				TTY:       true,
				Container: "build",
				Command:   []string{"sh", "-c", "bash || sh"},
			}, scheme.ParameterCodec).URL()
	} else {
		wsURL = s.kubeClient.CoreV1().RESTClient().Post().
			Namespace(s.pod.Namespace).
			Resource("pods").
			Name(s.pod.Name).
			SubResource("exec").
			VersionedParams(&api.PodExecOptions{
				Stdin:     true,
				Stdout:    true,
				Stderr:    true,
				TTY:       true,
				Container: "build",
				Command:   []string{"sh", "-c", "bash || sh"},
			}, scheme.ParameterCodec).URL()
	}

	wsURL.Scheme = proxy.WebsocketProtocolFor(wsURL.Scheme)
	return wsURL
}

总结

当前有很多集群联邦管理的开源框架,而且都是宣称完全兼容kubernetes原生API,实际上确实是兼容了,但是一些开源的应用,在适配云原生(kubernetes环境)的时候,自身内部的一些实现架构和方式几乎都是基于应用相关的所有的组件都是在一个集去内部的,所以这类的应用在迁移到集群联邦之后就会遇到各种各样的问题,很多问题,都是需要通过熟悉源代码,并且修改源代码才能解决的,所以使用集群联邦的维护和使用成本还是挺大的,gitlab runner只是一种类型,比如:还有spark on kubernetes,本身的架构在一个集群内部也是没有问题的,但是一旦迁移到集群联邦后,就会遇到exector pod和driver pod的跨集群服务发现和服务、容器网络通信问题,从而导致压根就不能平滑的迁移到集群联邦的环境中。所以要上集群联邦需要谨慎考虑,以及熟悉当前的业务应用的自身特性,上到集群联邦后是否会有问题,如果有问题,改造的投入和收益是否成正比等等问题。

  • 10
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值