(四)使用Jenkins工作流链接MLOps管道

目录

构建Jenkins工作流

工作流程1

工作流程2

工作流程3

工作流程5

使用GitHub Webhooks触发工作流

下一步


在这里,我们构建了四个自动化的Jenkins工作流程。

之前的系列文章中,我们解释了如何编写要在我们的Docker容器组中执行的脚本作为 CI/CD  MLOps管道的一部分。在本系列中,我们将设置一个Google Kubernetes Engine(GKE)集群来部署这些容器。

本系列文章假设您熟悉深度学习、DevOpsJenkinsKubernete基础知识。

在本系列的前一篇文章中,我们配置了Jenkins来帮助我们将Docker容器链接到一个实际的管道中,容器将在其中以正确的顺序自动构建、推送和运行。在本文中,我们将构建以下Jenkins工作流程(实现Jenkins管道所需的步骤):

  • 如果在AutomaticTraining-CodeCommit存储库(下图中的管道1中检测到推送,请立即拉取代码并使用它构建容器(2),将其推送到Google Cloud Registry,并使用此镜像在Google Kubernetes Engine 中启动训练作业。训练结束后,将训练好的模型推送到我们的GCS /testing注册表。接下来,拉动AutomaticTraining-Uni​​tTesting存储库以使用它构建一个容器(3)。按照相同的过程测试先前保存在模型测试注册表中的模型。发送带有管道结果的通知。如果结果是肯定的,则开始对生产(4)进行半自动部署,部署用作预测服务API (5)的容器和可选的接口(6)
  • 如果在AutomaticTraining-Dataset存储库(下图中的流水线2中检测到推送,则立即拉取它,拉取构建此容器所需代码所在的AutomaticTraining-DataCommit存储库(2),使用它重新训练模型在GCS 中(如果之前的管道已被触发),如果达到某个性能指标,则再次保存它。稍后,触发前面提到的UnitTesting步骤(3) 并重复该循环(4)
  • 如果在AutomaticTraining-Uni​​tTesting存储库(下图中的管道3中检测到推送,则拉取它并使用它构建一个容器(3)以测试驻留在GCS/测试注册表中的模型。该管道的目标是允许数据科学家将新测试集成到最近部署的模型中,而无需重复之前的工作流程。

构建Jenkins工作流

为了获得我们的3Jenkins管道,我们需要开发6个底层工作流,它们是执行某些任务的Python脚本。让我们建立工作流程123,和5)。工作流4稍微复杂一——我们将在下一篇文章中讨论它。

工作流程1

Jenkins仪表板上,选择New Item,为项目命名,然后选择Pipeline,然后单击OK

在下一页上,从左侧菜单中选择配置。在Build Triggers部分,选中GitHub hook trigger for GITScm polling复选框。这将使工作流能够由GitHub推送触发。

向下滚动并将以下脚本(将处理此工作流执行)粘贴到管道部分:

properties([pipelineTriggers([githubPush()])])
pipeline {
    agent any
    environment {
        PROJECT_ID = 'automatictrainingcicd'
        CLUSTER_NAME = 'training-cluster'
        LOCATION = 'us-central1-a'
        CREDENTIALS_ID = 'AutomaticTrainingCICD'
    }
    stages {
        stage('Cloning our GitHub repo') {
          steps {
            checkout([
              $class: 'GitSCM',
              branches: [[name: 'main']],
              userRemoteConfigs: [[
                url: 'https://github.com/sergiovirahonda/AutomaticTraining-CodeCommit.git',
                credentialsId: '',
              ]]
             ])
           }
        }
        stage('Building and pushing image to GCR') {
            steps {
                script {
                    docker.withRegistry('https://gcr.io', 'gcr:AutomaticTrainingCICD') {
                        app = docker.build('automatictrainingcicd/code-commit:latest')
                        app.push("latest")
                    }
                }
            }
        }
        stage('Deploying to GKE') {
            steps{
                step([$class: 'KubernetesEngineBuilder', projectId: env.PROJECT_ID, clusterName: env.CLUSTER_NAME, location: env.LOCATION, manifestPattern: 'pod.yaml', credentialsId: env.CREDENTIALS_ID, verifyDeployments: true])
            }
        }
    }
    post {
        unsuccessful {
            echo 'The Jenkins pipeline execution has failed.'
            emailext body: "The '${env.JOB_NAME}' job has failed during its execution. Check the logs for more information.", recipientProviders: [[$class: 'DevelopersRecipientProvider'], [$class: 'RequesterRecipientProvider']], subject: 'A Jenkins pipeline execution has failed.'
        }
        success {
            echo 'The Jenkins pipeline execution has ended successfully, triggering the next one.'
            build job: 'AutomaticTraining-UnitTesting', propagate: true, wait: false
        }
    }
}

让我们看看上面代码的关键组件。properties([pipelineTriggers([githubPush()])])表示工作流将由代码中提到的存储库的推送触发。environment定义将在工作流执行期间使用的环境变量。这些是用于使用GCP的变量。该阶段stage('Cloning our GitHub repo')拉取AutomaticTraining-CodeCommit存储库,并定义这是将触发工作流执行的存储库。该阶段stage('Building and pushing image to GCR')使用从上述存储库下载的可用Dockerfile构建容器并将其推送到GCR。该阶段stage('Deploying to GKE')使用最近在GCR上推送的容器镜像(在同样从存储库下载的pod.yaml文件中定义)在GKE上构建Kubernetes作业。如果作业成功结束,它会触发AutomaticTraining-Uni​​tTesting工作流(3);否则,它会通过电子邮件通知产品所有者。

工作流程2

执行与构建工作流1时相同的步骤。输入以下脚本以构建此管道:

properties([pipelineTriggers([githubPush()])])
pipeline {
    agent any
    environment {
        PROJECT_ID = 'automatictrainingcicd'
        CLUSTER_NAME = 'training-cluster'
        LOCATION = 'us-central1-a'
        CREDENTIALS_ID = 'AutomaticTrainingCICD'
    }
    stages {
        stage('Webhook trigger received. Cloning 1st repository.') {
          steps {
            checkout([
              $class: 'GitSCM',
              branches: [[name: 'main']],
              userRemoteConfigs: [[
                url: 'https://github.com/sergiovirahonda/AutomaticTraining-Dataset.git',
                credentialsId: '',
              ]]
             ])
           }
        }
        stage('Cloning GitHub repo that contains Dockerfile.') {
            steps {
                git url: 'https://github.com/sergiovirahonda/AutomaticTraining-DataCommit.git', branch: 'main'
            }
        }
        stage('Building and pushing image') {
            steps {
                script {
                    docker.withRegistry('https://gcr.io', 'gcr:AutomaticTrainingCICD') {
                        app = docker.build('automatictrainingcicd/data-commit:latest')
                        app.push("latest")
                    }
                }
            }
        }
        stage('Deploying to GKE') {
            steps{
                step([$class: 'KubernetesEngineBuilder', projectId: env.PROJECT_ID, clusterName: env.CLUSTER_NAME, location: env.LOCATION, manifestPattern: 'pod.yaml', credentialsId: env.CREDENTIALS_ID, verifyDeployments: true])
            }
        }
    }
    post {
        unsuccessful {
            echo 'The Jenkins pipeline execution has failed.'
            emailext body: "The '${env.JOB_NAME}' job has failed during its execution. Check the logs for more information.", recipientProviders: [[$class: 'DevelopersRecipientProvider'], [$class: 'RequesterRecipientProvider']], subject: 'A Jenkins pipeline execution has failed.'
        }
        success {
            echo 'The Jenkins pipeline execution has ended successfully, triggering the next one.'
            build job: 'AutomaticTraining-UnitTesting', propagate: true, wait: false
        }
    }
}

除了使用来自AutomaticTraining-DataCommit存储库的代码来构建容器之外,上述脚本执行的过程与工作流1中的过程几乎相同。此外,此管道由对AutomaticTraining-Dataset存储库的任何推送触发。最后,如果成功,它将触发AutomaticTraining-Uni​​tTesting工作流(3)

工作流程3

此工作流对GCS/测试注册处可用的模型执行单元测试。如果将更改推送到AutomaticTraining-Uni​​tTesting存储库,则会触发它。也可以由Workflow 1Workflow 2触发。 构建此管道的脚本如下:

properties([pipelineTriggers([githubPush()])])
pipeline {
    agent any
    environment {
        PROJECT_ID = 'automatictrainingcicd'
        CLUSTER_NAME = 'training-cluster'
        LOCATION = 'us-central1-a'
        CREDENTIALS_ID = 'AutomaticTrainingCICD'
    }
    stages {
        stage('Awaiting for previous training to be completed.'){
            steps{
                echo "Initializing prudential time"
                sleep(1200)
                echo "Ended"
            }
        }
        stage('Cloning our GitHub repo') {
          steps {
            checkout([
              $class: 'GitSCM',
              branches: [[name: 'main']],
              userRemoteConfigs: [[
                url: 'https://github.com/sergiovirahonda/AutomaticTraining-UnitTesting.git',
                credentialsId: '',
              ]]
             ])
           }
        }
        stage('Building and pushing image') {
            steps {
                script {
                    docker.withRegistry('https://gcr.io', 'gcr:AutomaticTrainingCICD') {
                        app = docker.build('automatictrainingcicd/unit-testing:latest')
                        app.push("latest")
                    }
                }
            }
        }
        stage('Deploying to GKE') {
            steps{
                step([$class: 'KubernetesEngineBuilder', projectId: env.PROJECT_ID, clusterName: env.CLUSTER_NAME, location: env.LOCATION, manifestPattern: 'pod.yaml', credentialsId: env.CREDENTIALS_ID, verifyDeployments: true])
            }
        }
    }
    post {
        unsuccessful {
            echo 'The Jenkins pipeline execution has failed.'
            emailext body: "The '${env.JOB_NAME}' job has failed during its execution. Check the logs for more information.", recipientProviders: [[$class: 'DevelopersRecipientProvider'], [$class: 'RequesterRecipientProvider']], subject: 'A Jenkins pipeline execution has failed.'
        }
        success {
            echo 'The Jenkins pipeline execution has ended successfully, check the GCP logs for more information.'
        }
    }
}

上面的脚本与之前的脚本略有不同。它等待1,200秒,然后才开始为训练(来自12工作流程)提供足够的时间来完成。此外,在模型单元测试之后,它将结果通过电子邮件发送给产品所有者,让他们知道他们是否需要启动半自动部署到生产(4)

工作流程5

此工作流从AutomaticTraining-PredictionAPI存储库(5)中提取代码并构建启用预测服务的容器。此容器从生产注册表加载已由半自动部署到生产(4)复制的模型,并接收POST请求以使用JSON格式的预测进行响应。工作流脚本如下:

pipeline {
    agent any
    environment {
        PROJECT_ID = 'automatictrainingcicd'
        CLUSTER_NAME = 'training-cluster'
        LOCATION = 'us-central1-a'
        CREDENTIALS_ID = 'AutomaticTrainingCICD'
    }
    stages {
        stage('Cloning our Git') {
            steps {
                git url: 'https://github.com/sergiovirahonda/AutomaticTraining-PredictionAPI.git', branch: 'main'
            }
        }
        stage('Building and deploying image') {
            steps {
                script {
                    docker.withRegistry('https://gcr.io', 'gcr:AutomaticTrainingCICD') {
                        app = docker.build('automatictrainingcicd/prediction-api')
                        app.push("latest")
                    }
                }
            }
        }
        stage('Deploying to GKE') {
            steps{
                step([$class: 'KubernetesEngineBuilder', projectId: env.PROJECT_ID, clusterName: env.CLUSTER_NAME, location: env.LOCATION, manifestPattern: 'pod.yaml', credentialsId: env.CREDENTIALS_ID, verifyDeployments: true])
            }
        }
    }
    post {
        unsuccessful {
            echo 'The Jenkins pipeline execution has failed.'
            emailext body: "The '${env.JOB_NAME}' job has failed during its execution. Check the logs for more information.", recipientProviders: [[$class: 'DevelopersRecipientProvider'], [$class: 'RequesterRecipientProvider']], subject: 'A Jenkins pipeline execution has failed.'
        }
        success {
            echo 'The Jenkins pipeline execution has ended successfully, triggering the next one.'
            build job: 'AutomaticTraining-Interface', propagate: true, wait: false
        }
    }
}

最后,脚本会触发一个名为AutomaticTraining-Interface的工作流。这是为最终用户提供Web界面的奖励。我们不会在本系列中讨论它;但是,您可以在工作流的存储库中找到所有相关文件。

使用GitHub Webhooks触发工作流

要在私有网络中本地运行Jenkins,则需要安装SocketXP或类似服务。这会将可通过http://localhost:8080访问的Jenkins服务器暴露给外部世界,包括GitHub

要安装SocketXP,请选择您的操作系统类型并按照网站提供的说明进行操作。

要为Jenkins创建安全隧道,请发出以下命令:

socketxp connect http://localhost:8080

响应将为您提供一个公共 URL

要在本地Jenkins服务器上触发我们的工作流,我们需要创建GitHub webhook。转到触发工作流的存储库并选择Settings > Webhooks > Add webhook。在Payload URL字段中粘贴公共URL,在末尾添加“/github-webhook/”,然后单击Add webhook

完成后,您应该能够在将新代码推送到相应存储库时触发工作流。

AutomaticTraining-CodeCommitAutomaticTraining-DatasetAutomaticTraining-Uni​​tTesting存储库配置webhook,以正确触发我们管道中的持续集成。

下一步

接下来的文章中,我们将开发一个半自动化部署到生产的脚本,它会完成我们的项目。

Chaining MLOps Pipelines with Jenkins Workflows - CodeProject

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值