(四)使用Jenkins工作流链接MLOps管道

最新推荐文章于 2024-08-26 18:23:25 发布

寒冰屋

最新推荐文章于 2024-08-26 18:23:25 发布

阅读量225

点赞数

分类专栏： Docker 文章标签： jenkins docker MLOps

原文链接：https://www.codeproject.com/Articles/5302284/Chaining-MLOps-Pipelines-with-Jenkins-Workflows

版权

Docker 专栏收录该内容

104 篇文章 7 订阅

订阅专栏

使用GitHub Webhooks触发工作流

下一步

在这里，我们构建了四个自动化的Jenkins工作流程。

在之前的系列文章中，我们解释了如何编写要在我们的Docker容器组中执行的脚本作为 CI/CD MLOps管道的一部分。在本系列中，我们将设置一个Google Kubernetes Engine(GKE)集群来部署这些容器。

本系列文章假设您熟悉深度学习、DevOps、Jenkins和Kubernete基础知识。

在本系列的前一篇文章中，我们配置了Jenkins来帮助我们将Docker容器链接到一个实际的管道中，容器将在其中以正确的顺序自动构建、推送和运行。在本文中，我们将构建以下Jenkins工作流程（实现Jenkins管道所需的步骤）：

如果在AutomaticTraining-CodeCommit存储库（下图中的管道1）中检测到推送，请立即拉取代码并使用它构建容器(2)，将其推送到Google Cloud Registry，并使用此镜像在Google Kubernetes Engine 中启动训练作业。训练结束后，将训练好的模型推送到我们的GCS /testing注册表。接下来，拉动AutomaticTraining-UnitTesting存储库以使用它构建一个容器(3）。按照相同的过程测试先前保存在模型测试注册表中的模型。发送带有管道结果的通知。如果结果是肯定的，则开始对生产(4)进行半自动部署，部署用作预测服务API (5)的容器和可选的接口(6)。
如果在AutomaticTraining-Dataset存储库（下图中的流水线2）中检测到推送，则立即拉取它，拉取构建此容器所需代码所在的AutomaticTraining-DataCommit存储库（2），使用它重新训练模型在GCS 中（如果之前的管道已被触发），如果达到某个性能指标，则再次保存它。稍后，触发前面提到的UnitTesting步骤(3) 并重复该循环(4)。
如果在AutomaticTraining-UnitTesting存储库（下图中的管道3）中检测到推送，则拉取它并使用它构建一个容器(3)以测试驻留在GCS/测试注册表中的模型。该管道的目标是允许数据科学家将新测试集成到最近部署的模型中，而无需重复之前的工作流程。

构建Jenkins工作流

为了获得我们的3个Jenkins管道，我们需要开发6个底层工作流，它们是执行某些任务的Python脚本。让我们建立工作流程1，2，3，和5）。工作流4稍微复杂一——我们将在下一篇文章中讨论它。

工作流程1

在Jenkins仪表板上，选择New Item，为项目命名，然后选择Pipeline，然后单击OK。

在下一页上，从左侧菜单中选择配置。在Build Triggers部分，选中GitHub hook trigger for GITScm polling复选框。这将使工作流能够由GitHub推送触发。

向下滚动并将以下脚本（将处理此工作流执行）粘贴到“管道”部分：

properties([pipelineTriggers([githubPush()])])
pipeline {
    agent any
    environment {
        PROJECT_ID = 'automatictrainingcicd'
        CLUSTER_NAME = 'training-cluster'
        LOCATION = 'us-central1-a'
        CREDENTIALS_ID = 'AutomaticTrainingCICD'
    }
    stages {
        stage('Cloning our GitHub repo') {
          steps {
            checkout([
              $class: 'GitSCM',
              branches: [[name: 'main']],
              userRemoteConfigs: [[
                url: 'https://github.com/sergiovirahonda/AutomaticTraining-CodeCommit.git',
                credentialsId: '',
              ]]
             ])
           }
        }
        stage('Building and pushing image to GCR') {
            steps {
                script {
                    docker.withRegistry('https://gcr.io', 'gcr:AutomaticTrainingCICD') {
                        app = docker.build('automatictrainingcicd/code-commit:latest')
                        app.push("latest")
                    }
                }
            }
        }
        stage('Deploying to GKE') {
            steps{
                step([$class: 'KubernetesEngineBuilder', projectId: env.PROJECT_ID, clusterName: env.CLUSTER_NAME, location: env.LOCATION, manifestPattern: 'pod.yaml', credentialsId: env.CREDENTIALS_ID, verifyDeployments: true])
            }
        }
    }
    post {
        unsuccessful {
            echo 'The Jenkins pipeline execution has failed.'
            emailext body: "The '${env.JOB_NAME}' job has failed during its execution. Check the logs for more information.", recipientProviders: [[$class: 'DevelopersRecipientProvider'], [$class: 'RequesterRecipientProvider']], subject: 'A Jenkins pipeline execution has failed.'
        }
        success {
            echo 'The Jenkins pipeline execution has ended successfully, triggering the next one.'
            build job: 'AutomaticTraining-UnitTesting', propagate: true, wait: false
        }
    }
}

让我们看看上面代码的关键组件。properties([pipelineTriggers([githubPush()])])表示工作流将由代码中提到的存储库的推送触发。environment定义将在工作流执行期间使用的环境变量。这些是用于使用GCP的变量。该阶段stage('Cloning our GitHub repo')拉取AutomaticTraining-CodeCommit存储库，并定义这是将触发工作流执行的存储库。该阶段stage('Building and pushing image to GCR')使用从上述存储库下载的可用Dockerfile构建容器并将其推送到GCR。该阶段stage('Deploying to GKE')使用最近在GCR上推送的容器镜像（在同样从存储库下载的pod.yaml文件中定义）在GKE上构建Kubernetes作业。如果作业成功结束，它会触发AutomaticTraining-UnitTesting工作流（3）；否则，它会通过电子邮件通知产品所有者。

工作流程2

执行与构建工作流1时相同的步骤。输入以下脚本以构建此管道：

properties([pipelineTriggers([githubPush()])])
pipeline {
    agent any
    environment {
        PROJECT_ID = 'automatictrainingcicd'
        CLUSTER_NAME = 'training-cluster'
        LOCATION = 'us-central1-a'
        CREDENTIALS_ID = 'AutomaticTrainingCICD'
    }
    stages {
        stage('Webhook trigger received. Cloning 1st repository.') {
          steps {
            checkout([
              $class: 'GitSCM',
              branches: [[name: 'main']],
              userRemoteConfigs: [[
                url: 'https://github.com/sergiovirahonda/AutomaticTraining-Dataset.git',
                credentialsId: '',
              ]]
             ])
           }
        }
        stage('Cloning GitHub repo that contains Dockerfile.') {
            steps {
                git url: 'https://github.com/sergiovirahonda/AutomaticTraining-DataCommit.git', branch: 'main'
            }
        }
        stage('Building and pushing image') {
            steps {
                script {
                    docker.withRegistry('https://gcr.io', 'gcr:AutomaticTrainingCICD') {
                        app = docker.build('automatictrainingcicd/data-commit:latest')
                        app.push("latest")
                    }
                }
            }
        }
        stage('Deploying to GKE') {
            steps{
                step([$class: 'KubernetesEngineBuilder', projectId: env.PROJECT_ID, clusterName: env.CLUSTER_NAME, location: env.LOCATION, manifestPattern: 'pod.yaml', credentialsId: env.CREDENTIALS_ID, verifyDeployments: true])
            }
        }
    }
    post {
        unsuccessful {
            echo 'The Jenkins pipeline execution has failed.'
            emailext body: "The '${env.JOB_NAME}' job has failed during its execution. Check the logs for more information.", recipientProviders: [[$class: 'DevelopersRecipientProvider'], [$class: 'RequesterRecipientProvider']], subject: 'A Jenkins pipeline execution has failed.'
        }
        success {
            echo 'The Jenkins pipeline execution has ended successfully, triggering the next one.'
            build job: 'AutomaticTraining-UnitTesting', propagate: true, wait: false
        }
    }
}

除了使用来自AutomaticTraining-DataCommit存储库的代码来构建容器之外，上述脚本执行的过程与工作流1中的过程几乎相同。此外，此管道由对AutomaticTraining-Dataset存储库的任何推送触发。最后，如果成功，它将触发AutomaticTraining-UnitTesting工作流(3)。

工作流程3

此工作流对GCS/测试注册处可用的模型执行单元测试。如果将更改推送到AutomaticTraining-UnitTesting存储库，则会触发它。也可以由Workflow 1或Workflow 2触发。构建此管道的脚本如下：

properties([pipelineTriggers([githubPush()])])
pipeline {
    agent any
    environment {
        PROJECT_ID = 'automatictrainingcicd'
        CLUSTER_NAME = 'training-cluster'
        LOCATION = 'us-central1-a'
        CREDENTIALS_ID = 'AutomaticTrainingCICD'
    }
    stages {
        stage('Awaiting for previous training to be completed.'){
            steps{
                echo "Initializing prudential time"
                sleep(1200)
                echo "Ended"
            }
        }
        stage('Cloning our GitHub repo') {
          steps {
            checkout([
              $class: 'GitSCM',
              branches: [[name: 'main']],
              userRemoteConfigs: [[
                url: 'https://github.com/sergiovirahonda/AutomaticTraining-UnitTesting.git',
                credentialsId: '',
              ]]
             ])
           }
        }
        stage('Building and pushing image') {
            steps {
                script {
                    docker.withRegistry('https://gcr.io', 'gcr:AutomaticTrainingCICD') {
                        app = docker.build('automatictrainingcicd/unit-testing:latest')
                        app.push("latest")
                    }
                }
            }
        }
        stage('Deploying to GKE') {
            steps{
                step([$class: 'KubernetesEngineBuilder', projectId: env.PROJECT_ID, clusterName: env.CLUSTER_NAME, location: env.LOCATION, manifestPattern: 'pod.yaml', credentialsId: env.CREDENTIALS_ID, verifyDeployments: true])
            }
        }
    }
    post {
        unsuccessful {
            echo 'The Jenkins pipeline execution has failed.'
            emailext body: "The '${env.JOB_NAME}' job has failed during its execution. Check the logs for more information.", recipientProviders: [[$class: 'DevelopersRecipientProvider'], [$class: 'RequesterRecipientProvider']], subject: 'A Jenkins pipeline execution has failed.'
        }
        success {
            echo 'The Jenkins pipeline execution has ended successfully, check the GCP logs for more information.'
        }
    }
}

上面的脚本与之前的脚本略有不同。它等待1,200秒，然后才开始为训练（来自1或2个工作流程）提供足够的时间来完成。此外，在模型单元测试之后，它将结果通过电子邮件发送给产品所有者，让他们知道他们是否需要启动半自动部署到生产(4)。

工作流程5

此工作流从AutomaticTraining-PredictionAPI存储库(5)中提取代码并构建启用预测服务的容器。此容器从生产注册表加载已由半自动部署到生产(4)复制的模型，并接收POST请求以使用JSON格式的预测进行响应。工作流脚本如下：

pipeline {
    agent any
    environment {
        PROJECT_ID = 'automatictrainingcicd'
        CLUSTER_NAME = 'training-cluster'
        LOCATION = 'us-central1-a'
        CREDENTIALS_ID = 'AutomaticTrainingCICD'
    }
    stages {
        stage('Cloning our Git') {
            steps {
                git url: 'https://github.com/sergiovirahonda/AutomaticTraining-PredictionAPI.git', branch: 'main'
            }
        }
        stage('Building and deploying image') {
            steps {
                script {
                    docker.withRegistry('https://gcr.io', 'gcr:AutomaticTrainingCICD') {
                        app = docker.build('automatictrainingcicd/prediction-api')
                        app.push("latest")
                    }
                }
            }
        }
        stage('Deploying to GKE') {
            steps{
                step([$class: 'KubernetesEngineBuilder', projectId: env.PROJECT_ID, clusterName: env.CLUSTER_NAME, location: env.LOCATION, manifestPattern: 'pod.yaml', credentialsId: env.CREDENTIALS_ID, verifyDeployments: true])
            }
        }
    }
    post {
        unsuccessful {
            echo 'The Jenkins pipeline execution has failed.'
            emailext body: "The '${env.JOB_NAME}' job has failed during its execution. Check the logs for more information.", recipientProviders: [[$class: 'DevelopersRecipientProvider'], [$class: 'RequesterRecipientProvider']], subject: 'A Jenkins pipeline execution has failed.'
        }
        success {
            echo 'The Jenkins pipeline execution has ended successfully, triggering the next one.'
            build job: 'AutomaticTraining-Interface', propagate: true, wait: false
        }
    }
}

最后，脚本会触发一个名为AutomaticTraining-Interface的工作流。这是为最终用户提供Web界面的“奖励”。我们不会在本系列中讨论它；但是，您可以在工作流的存储库中找到所有相关文件。

使用GitHub Webhooks触发工作流

要在私有网络中本地运行Jenkins，则需要安装SocketXP或类似服务。这会将可通过http://localhost:8080访问的Jenkins服务器暴露给外部世界，包括GitHub。

要安装SocketXP，请选择您的操作系统类型并按照网站提供的说明进行操作。

要为Jenkins创建安全隧道，请发出以下命令：

socketxp connect http://localhost:8080

响应将为您提供一个公共 URL：

要在本地Jenkins服务器上触发我们的工作流，我们需要创建GitHub webhook。转到触发工作流的存储库并选择Settings > Webhooks > Add webhook。在Payload URL字段中粘贴公共URL，在末尾添加“/github-webhook/”，然后单击Add webhook。