angluar cdk_使用CDK将自动扩展机器学习模型推理堆栈部署到AWS-CSDN博客

angluar cdk

For several years now, data science and machine learning were sexy and lots of companies embraced “artificial intelligence” as a new powerful tool to automate complex tasks as a black box. In this area, deep learning appears to be the Holy Grail to build models to detect and classify images or texts among other cool things. Once models are trained, it is common to deploy them on a small web server and expose a simple REST API to perform inference on a given sample. It is very convenient as usually, whereas training a model needs lots of computation power and lots of GPU, inference only needs a relatively low power CPU to make a prediction on a single sample. You could find numerous articles and blog posts on this approach; event open source projects and companies could be found to make the whole thing painless.

几年来，数据科学和机器学习一直风靡一时，许多公司都将“人工智能”作为一种新的强大工具来自动执行复杂任务，如黑匣子。在这个领域，深度学习似乎是建立模型以检测和分类图像或文本以及其他有趣事物的圣杯。训练完模型后，通常将它们部署在小型Web服务器上，并公开一个简单的REST API以对给定样本执行推理。像通常那样非常方便，而训练模型需要大量的计算能力和大量的GPU，而推理仅需要相对较低功率的CPU即可对单个样本进行预测。您可以找到有关此方法的大量文章和博客文章。事件开放源代码项目和公司可以使整个过程变得轻松。

There exist cases, however, where you need to perform inference on not just a sample, but on a batch of samples. In this case, inference can take a lot of time when performed on CPU and you need to use a GPU to parallelize the job. Using a small web server is now no more relevant as you don’t want to pay for an ever running machine with an attached GPU. For example, imagine you have to perform some prediction on large files that are uploaded to your S3 bucket. Your file is first split into small chunks on which the model makes a prediction. What you want is your stack to automatically launch when a file is ready to process, use a machine with a GPU, and shut down the whole thing when there is nothing to process anymore. As there is no serverless stack available (will there be one, one day?), we need to build a stack to scale automatically to our need.

但是，在某些情况下，您不仅需要对样本进行推断，还需要对一批样本进行推断。在这种情况下，在CPU上进行推理时会花费大量时间，并且您需要使用GPU来并行化作业。现在，不再需要使用小型Web服务器，因为您不想为一台运行中且带有GPU的计算机付费。例如，假设您必须对上传到S3存储桶的大型文件进行一些预测。您的文件首先被分成小块，模型在这些小块上进行预测。您想要的是堆栈，当文件准备好处理时自动启动，使用带有GPU的计算机，并在不再处理任何内容时关闭整个程序。由于没有可用的无服务器堆栈(会一天有一天吗？)，因此我们需要构建一个堆栈以自动扩展以满足需求。

建筑 (Architecture)

The proposed architecture is the following: we get a SQS message with the task to perform. For example, it could be the name of a file on S3 to process. Depending on a configuration parameter found in the parameter store of AWS, this message is forwarded to another SQS queue, one for a CPU-based stack, and one for a GPU-based stack. An ECR cluster then uses Cloudwatch alarms to trigger the scaling of two autoscaling groups, one with CPU-only instances and one with GPU-enabled instances. The result is then returned into a S3 bucket.

提出的体系结构如下：我们将获得一条SQS消息以及要执行的任务。例如，它可以是S3上要处理的文件的名称。根据在AWS参数存储中找到的配置参数，此消息将转发到另一个SQS队列，一个队列用于基于CPU的堆栈，一个队列用于基于GPU的堆栈。然后，ECR群集使用Cloudwatch警报来触发两个自动扩展组的扩展，其中一个具有仅CPU实例，一个具有启用GPU的实例。然后将结果返回到S3存储桶。

自动化 (Automation)

To automate a little more the whole thing, we can use a CDK application. Just like Terraform or Kubernetes offer to build a stack by code, CDK tends to be the reference to deploy stacks on AWS. You can of course use Cloudformation, but just the idea to use very large YAML files to manage a stack makes me run away. However, using a modern programming language like Typescript, C# or Python is way more interesting. In the repository source code, you could find two classes to build the stack : DeployModelEcsMediumStackCore and DeployModelEcsMediumStack. The first one as its name suggests builds the core of the stack i.e. the main SQS queue, attach a lambda function to it, an ECS cluster and the definition of some IAM policies. Then the DeployModelEcsMediumStack class builds a stack for the CPU or for the GPU architecture. A SQS queue and associated metrics, an autoscaling group with the right AMI, instance type and the scaling policies, and the ECS task with the right ECR image to retrieve.

为了使整个过程自动化，我们可以使用CDK应用程序。就像Terraform或Kubernetes提供通过代码构建堆栈一样，CDK往往是在AWS上部署堆栈的参考。您当然可以使用Cloudformation，但是仅使用非常大的YAML文件来管理堆栈的想法使我无所适从。但是，使用像Typescript，C＃或Python这样的现代编程语言会更有趣。在存储库源代码中，您可以找到两个用于构建堆栈的类： DeployModelEcsMediumStackCore和DeployModelEcsMediumStack 。顾名思义，第一个构建了堆栈的核心，即主SQS队列，向其附加了一个lambda函数，一个ECS集群以及一些IAM策略的定义。然后， DeployModelEcsMediumStack类为CPU或GPU体系结构构建堆栈。 SQS队列和关联的指标，具有正确的AMI，实例类型和缩放策略的自动扩展组以及具有正确的ECR图像以检索的ECS任务。

簇 (Cluster)

First, a cluster shall be created with ecs.Cluster construct. From there, we create an autoscaling group adding a capacity provider to the cluster with cluster.addCapacity static method. We then need to create a task with the construct ecs.Ec2TaskDefinition providing the proper ECR image retrieved with the static method ecs.ContainerImage.fromEcrRepository. The image has to be an ECS optimized AMI based image to work properly. There is no official AWS AMI that support GPU for ECS, but you can find custom ones. We also need to pay attention to the gpuCount property and set it to 1 when we want to use a GPU. Finally, the service is created with the construct ecs.Ec2Service and attached to the cluster.

首先，应使用ecs.Cluster构造创建集群。在这里，我们创建一个自动扩展组，并使用cluster.addCapacity静态方法向集群添加容量提供者。然后，我们需要使用构造ecs.Ec2TaskDefinition创建一个任务，以提供通过静态方法ecs.ContainerImage.fromEcrRepository检索的正确的ECR图像。该图像必须是ECS优化的基于AMI的图像，才能正常工作。没有官方的AWS AMI支持GPU for ECS，但是您可以找到自定义的。当我们要使用GPU时，我们还需要注意gpuCount属性并将其设置为1 。最后，使用构造ecs.Ec2Service创建服务并将其附加到集群。

处理堆栈 (Work with the stack)

Once everything has been deployed, all you need is to send a message in the main SQS queue with the command to execute i.e. the name of the file to retrieve on S3 in my example. To decide whether to use a GPU based instance or a CPU only instance, we only need to change the configuration found in the parameter store in the system manager. An example of such a message could be

部署完所有内容后，您需要在主SQS队列中发送一条消息，其中包含要执行的命令，即在我的示例中要在S3上检索的文件名。要决定使用基于GPU的实例还是仅CPU的实例，我们只需要更改系统管理器中参数存储中的配置即可。这样的消息的示例可以是

aws sqs send-message --queue-url https://sqs.ca-central-1.amazonaws.com/1234567890/MediumArticleQueue --message-body '{"filename": "file_to_process.bin"}'

Queue列 (Queues)

There are three SQS queues in the proposed architecture, but the user only needs to send a message in the main queue MediumArticleQueue. Once a message is received, a lambda function is triggered and regarding to the configuration (a parameter in the SSM/Parameter Store), the message is forwarded to the proper queue, GPUQueue for an autoscaling group managing GPU based instances and CPUQueue for CPU only based instances.

所提出的体系结构中有三个SQS队列，但是用户只需要在主队列MediumArticleQueue发送一条消息。接收到消息后，将触发lambda函数，并针对配置(SSM /参数存储中的参数)将消息转发到适当的队列，用于自动GPUQueue组的GPUQueue用于管理基于GPU的实例)和CPUQueue用于CPU的CPUQueue 。基于实例。

码 (Code)

The typescript/python code for this CDK stack is a bit large to publish here, but you could find the sources in this repo. Feel free to copy or fork code, just a thumb up or a little comment would be appreciated.

此CDK堆栈的typescript / python代码有点大，可以在此处发布，但是您可以在此repo中找到源。随意复制或分叉代码，只需竖起大拇指或发表一些评论即可。

最后说明 (Final note)

In the source code, I defined my alarms to scale the autoscaling group but not the tasks count. The reason is that when adding an ECS service, we also set its autoscaling behaviour with ecsService.autoScaleTaskCount method. However, AWS/CDK does not properly link task scaling and instance scaling, which is the role of the capacity provider. This behaviour can be achieved when you work directly in the console, but not programatically. There is a PR to correct it, but it was not available at the time this article was published. To later support this feature, I added a commented code section to illustrate what the code could look like when the feature is released.

在源代码中，我定义了警报以缩放自动缩放组，但没有定义任务数。原因是在添加ECS服务时，我们还使用ecsService.autoScaleTaskCount方法设置了其自动缩放行为。但是，AWS / CDK无法正确链接任务扩展和实例扩展，这是容量提供者的角色。当您直接在控制台中而不是通过编程方式工作时，可以实现此行为。有一个PR可以更正它，但是在本文发布时尚不可用。为了以后支持该功能，我添加了一个注释的代码部分，以说明该功能发布时的代码外观。