容器化多目标基础加速器运行时环境

最新推荐文章于 2024-05-12 09:53:50 发布

weixin_26752759

最新推荐文章于 2024-05-12 09:53:50 发布

阅读量597

点赞数

文章标签： python java docker 人工智能 linux

原文链接：https://itnext.io/containerizing-multi-arch-base-accelerator-run-time-environments-f814aa02706e

版权

In the SODALITE H2020 project we deal with the deployment and reconfiguration of application deployments that make use of various heterogeneous accelerators. While most accelerators provide an easy to install environment to support their operation, we observed the following limitations:

在SODALITE H2020项目中，我们处理使用各种异构加速器的应用程序部署的部署和重新配置。尽管大多数加速器提供了易于安装的环境来支持其运行，但我们注意到以下限制：

Most accelerator run-time environments are not provided in a ready-to-use containerized format.
大多数加速器运行时环境未以即用型容器化格式提供。
Many accelerator run-time environments are not multi-arch-aware out of the box, despite having no specific architecture dependencies.
尽管没有特定的体系结构依存关系，但许多加速器运行时环境并非开箱即用。
Accelerated applications spend as much time in their Dockerfiles setting up the runtime as they do focusing on the application requirements, resulting in a lot of unnecessary boilerplating that becomes quickly outdated.
加速的应用程序花费在他们的Dockerfiles中的运行时间与专注于应用程序需求上的时间一样多，从而导致大量不必要的样板化变得过时。
Accelerated applications may be able to leverage different accelerators for accomplishing the same task, especially as accelerator-agnostic frameworks, such as ONNX, are increasingly adopted.
加速的应用程序可能能够利用不同的加速器来完成相同的任务，尤其是在越来越多地采用与加速器无关的框架(例如ONNX )的情况下。
Supporting pre-configured image variants targeting a specific accelerator type allows for fine-grained placement and execution of application containers in heterogeneous Kubernetes clusters, e.g. based on node label placement.
支持针对特定加速器类型的预配置映像变体，可以在异构Kubernetes集群中进行细粒度放置和执行应用程序容器，例如基于节点标签放置。

With this in mind, we decided it was prudent to try and decouple the setup of the accelerator run-time environment and the application logic, establishing an easily extensible set of minimal accelerator-specific base containers upon which to layer accelerated applications.

考虑到这一点，我们认为尝试将加速器运行时环境的设置与应用程序逻辑分离开来是明智的，从而建立了一组易于扩展的最小加速器特定基础容器集，可以在这些基础容器上分层放置加速应用程序。

步骤1：为多架构准备minideb (Step 1: Preparing minideb for multi-arch)

A popular option for a minimal debian environment is bitnami/minideb. As most vendors provide debian packages for their runtime environments, we decided to stick with minideb (instead of other solutions, like Alpine) as a base for the accelerator images. Up until now, however, multi-arch support has not been a priority for minideb— there is a GitHub issue specifically tracking this:

最小化debian环境的流行选择是bitnami/minideb 。由于大多数供应商都为其运行时环境提供了debian软件包，因此我们决定坚持使用minideb (而不是其他解决方案，例如Alpine)作为加速器映像的基础。但是，到目前为止，多体系结构支持还不是minideb的优先事项minideb问题专门跟踪了这一点：

we were, therefore, forced to create our own multi-arch builds and publish this behind a manifest. The end result is the adaptant/minideb image, which includes support for amd64 and arm64 out of the box (if you find this useful and would like to see additional target architectures supported, please open an issue in our issue tracker).

因此，我们被迫创建自己的多体系结构并将其发布在清单之后。最终结果是Adaptant / minideb映像，其中包括开箱即用的对amd64和arm64的支持(如果您发现此有用，并且希望看到受支持的其他目标体系结构，请在我们的问题跟踪器中打开问题 )。

步骤2：准备加速器基础容器 (Step 2: Preparing accelerator base containers)

With the multi-arch starting point established, the next step is to dig into the run-time environments for the different accelerators. For this purpose, and to save others from having to go through the same process, we established a repository specifically for managing these:

建立了多体系结构起点之后，下一步就是深入研究不同加速器的运行时环境。为此，为了避免其他人经历相同的过程，我们专门建立了一个存储库来管理这些存储库：

While this is only a start, we hope that this will be useful to others, and welcome PRs adding new configurations!

虽然这只是一个开始，但我们希望这对其他人有用，并欢迎PR添加新的配置！

步骤3：在实践中使用基本容器 (Step 3: Using the base containers in practice)

Using the EdgeTPU as an example, we can use the EdgeTPU Python API (which is made available in the accelerator base container) to write a simple python script for querying the runtime version and displaying detected devices:

以EdgeTPU为例，我们可以使用EdgeTPU Python API (在加速器基本容器中可用)编写简单的python脚本，以查询运行时版本并显示检测到的设备：

In order to containerize this, we need only build on an appropriate base container (in this case, acceleratorbase/edgetpu-std) and only need to add the parts that pertain to our application logic:

为了对此进行容器化，我们只需要在适当的基础容器上构建(在这种情况下，为acceleratorbase/edgetpu-std )，并且只需添加与我们的应用程序逻辑有关的部分即可：

The complete code example can be found here.

完整的代码示例可在此处找到。

In order to generate multi-arch images, we use Docker Buildx to carry out the build:

为了生成多体系结构映像，我们使用Docker Buildx进行构建：

$ docker buildx build -t acceleratorbase/example-edgetpu-devicequery --platform linux/amd64,linux/arm64 .
[+] Building 3.2s (9/9) FINISHED
 => [internal] load .dockerignore                             
 => => transferring context: 2B                   
 => [internal] load build definition from Dockerfile
 => => transferring dockerfile: 31B
 => [linux/amd64 internal] load metadata for docker.io/acceleratorbase/edgetpu-std:latest                                                                                                             
 => [linux/arm64 internal] load metadata for docker.io/acceleratorbase/edgetpu-std:latest
 => [internal] load build context          
 => => transferring context: 36B
 => [linux/arm64 1/2] FROM docker.io/acceleratorbase/edgetpu-std@sha256:6f4103ab071d4b6fd7f8797e8858554ceb6d3c255c9c190276bf77716158267c     
 => => resolve docker.io/acceleratorbase/edgetpu-std@sha256:6f4103ab071d4b6fd7f8797e8858554ceb6d3c255c9c190276bf77716158267c   
 => [linux/amd64 1/2] FROM docker.io/acceleratorbase/edgetpu-std@sha256:6f4103ab071d4b6fd7f8797e8858554ceb6d3c255c9c190276bf77716158267c
 => => resolve docker.io/acceleratorbase/edgetpu-std@sha256:6f4103ab071d4b6fd7f8797e8858554ceb6d3c255c9c190276bf77716158267c
 => CACHED [linux/amd64 2/2] ADD devicequery.py /
 => CACHED [linux/arm64 2/2] ADD devicequery.py /

With multi-arch images generated under the acceleratorbase/example-edgetpu-devicequery image name, we can now run the same image on an amd64 machine with a USB-connected EdgeTPU:

现在，有了在acceleratorbase/example-edgetpu-devicequery映像名称下生成的多acceleratorbase/example-edgetpu-devicequery映像，我们现在可以在具有USB连接的EdgeTPU的amd64计算机上运行相同的映像：

$ uname -m
x86_64
$ docker run --privileged acceleratorbase/example-edgetpu-devicequery
BuildLabel(COMPILER=5.4.0 20160609,DATE=redacted,TIME=redacted,CL_NUMBER=291256449), RuntimeVersion(13)
Available EdgeTPU Devices:
/sys/bus/usb/devices/1-9

or on an arm64-based Coral Dev Board with an integrated EdgeTPU:

或在具有集成EdgeTPU的基于arm64的Coral开发板上：

$ uname -m
aarch64
$ docker run --privileged acceleratorbase/example-edgetpu-devicequery
BuildLabel(COMPILER=6.3.0 20170516,DATE=redacted,TIME=redacted,CL_NUMBER=291256449), RuntimeVersion(13)
Available EdgeTPU Device(s):
/dev/apex_0

步骤4：使用SODALITE生成图像变体 (Step 4: Generating image variants with SODALITE)

While the example in Step 3 demonstrates how we can provide a multi-arch-aware base upon which to build, what if we want to support different accelerators or accelerator configurations for the same application?

尽管第3步中的示例演示了如何提供可构建多体系结构的基础，但是如果我们想为同一应用程序支持不同的加速器或加速器配置，该怎么办？

动态重新配置静态设备时钟(EdgeTPU) (Dynamic Reconfiguration of a Static Device Clock (EdgeTPU))

The USB-based EdgeTPU, for example, relies on static clocking through proprietary libraries to configure the accelerator clock rate. Two variants of the library are provided:

例如，基于USB的EdgeTPU依靠专有库中的静态时钟来配置加速器时钟速率。提供了该库的两个变体：

libedgetpu1-std for running at the standard clock rate.
libedgetpu1-std用于以标准时钟速率运行。
libedgetpu1-max for running at the maximum clock rate.
libedgetpu1-max用于以最大时钟速率运行。

The -std variant, as expected, generates less heat than the -max variant, but also runs at half the speed. As the EdgeTPU has a rather narrow temperature tolerance, it’s imperative to monitor the temperature and clock down (or up) depending on the thermal properties and tolerances of the workloads. Allowing the device to exceed its tolerances can result in hard-to-debug issues including random inference failures, or worse, damage to the physical device itself.

如所预期的， -std变体产生的热量少于-max变体，但运行速度仅为一半。由于EdgeTPU的温度容差范围很窄，因此必须监控温度并根据热性能和工作负载的容差来降低时钟频率。允许设备超出其公差可能会导致难以调试的问题，包括随机推理失败，或更严重的是会损坏物理设备本身。

An application must, therefore, be able to provide image variants that are pre-configured for either clocking scenario in order to achieve the best performance possible while simultaneously mitigating the risk of run-time inference failure.

因此，应用程序必须能够提供针对任一时钟场景预先配置的映像变体，以实现可能的最佳性能，同时降低运行时推理失败的风险。

In this case, the desired clock rate and performance mode is configured at application start and determined by whichever of these libraries the application happens to be linked against.

在这种情况下，所需的时钟速率和性能模式将在应用程序启动时配置，并由应用程序碰巧链接到的任何一个库确定。

SODALITE图像生成器中的图像变体 (Image variants in the SODALITE image builder)

The SODALITE image builder, fortunately, provides us with a mechanism for generating multiple container image variants from an application Dockerfile by means of base image overloading:

幸运的是， SODALITE映像构建器为我们提供了一种通过基础映像重载从应用程序Dockerfile生成多个容器映像变体的机制：

Image for post — SODALITE Image Builder — Image Variant Generation Workflow

At its core, the image builder works by parsing an input template that provides it with basic information about the location of the Dockerfile, the repository to clone the source code from (or the local path, if not using git), the Docker registry to push to, image variants to generate, etc. Each image variant includes its own tag and a base image override. The format of the input file is:

图像构建器的核心是通过解析一个输入模板，向其提供有关Dockerfile位置的基本信息，从中克隆源代码的存储库(如果不使用git，则从本地路径复制源代码)，Docker注册表。推送到要生成的图像变体等。每个图像变体都包括其自己的标签和基本图像替代。输入文件的格式为：

Under the hood the image builder orchestrates and deploys an Ansible playbook for the actual building of the images. Image variants are enabled by fetching the primary Dockerfile and dynamically rewriting it for build-time overloading of the base image for each of the variant configurations (leaving the default base image for the latest tag, unless otherwise named). The precise process by which this is accomplished is outlined in more detail below:

图像生成器在引擎盖下精心安排并部署了Ansible剧本，用于实际构建图像。通过获取主要Dockerfile并动态重写它以在每个变体配置中对基础映像进行构建时重载来启用映像变体(除非另有命名，否则将为latest标签保留默认的基础映像)。下面更详细地概述了完成此过程的精确过程：

The image builder, notably, takes care of modifying the Dockerfile for base image overload at build time, and requires no manual modification of the Dockerfile. It is further able to handle both single-stage and multi-stage Dockerfiles, and should be able to handle anything that gets thrown at it (if, however, you manage to break it, we’d be happy to receive your report in the issue tracker).

映像构建器尤其要注意在构建时修改Dockerfile以防止基本映像过载，并且不需要手动修改Dockerfile。它还可以处理单阶段和多阶段Dockerfile，并且应该能够处理抛出的所有文件(但是，如果您设法将其破坏，我们很乐意在问题追踪器 )。

使用SODALITE图像生成器构建图像变体 (Building image variants with the SODALITE image builder)

While not strictly necessary, for the purpose of experimentation, we spin up a local image registry before continuing, and will use this to push our images to:

尽管不是绝对必要，但出于实验目的，我们在继续操作之前启动了本地图像注册表，并将其用于将图像推送至：

$ docker run -d -p 5000:5000 --restart=always --name registry registry:2

We can now prepare an input file for the image builder using our local registry and generate -std and -max variants:

现在，我们可以使用本地注册表为图像构建器准备一个输入文件，并生成-std和-max变体：

Note that in this case, we have already pushed out our application to GitHub, and will have the image builder fetch the repository directly at build time, as this most resembles real-world application. A local build context can, however, also be used:

请注意，在这种情况下，我们已经将应用程序推送到GitHub，并且将使图像构建器在构建时直接获取存储库，因为它与实际应用程序最相似。但是，也可以使用本地构建上下文：

In order to invoke the image builder, we can use a convenience wrapper that leverages a self-contained version of the image builder:

为了调用图像构建器，我们可以使用一个便利包装器，该包装器利用了图像构建器的独立版本：

We can now build the image variants:

现在，我们可以构建图像变体：

$ ./image-builder-cli.sh edgetpu-devicequery-image-variants.yaml
[Worker_0]   Deploying my-workstation_0
[Worker_0]   Deployment of my-workstation_0 complete
[Worker_0]   Deploying image-builder_0
[Worker_0]     Executing create on image-builder_0
[Worker_0]   Deployment of image-builder_0 complete

This will generate two versions of the application container:

这将生成应用程序容器的两个版本：

adaptant/edgetpu-devicequery:std layered upon acceleratorbase/edgetpu-std (the default).
adaptant/edgetpu-devicequery:std分层放置在acceleratorbase/edgetpu-std (默认值)上。
adaptant/edgetpu-devicequery:max layered upon acceleratorbase/edgetpu-max.
adaptant/edgetpu-devicequery:max分层放置在acceleratorbase/edgetpu-max 。

and push them to the local registry. We can now query the repository and tags from the registry, confirming that each variant has been successfully generated and pushed:

并将它们推送到本地注册表。现在，我们可以从注册表中查询存储库和标签，确认每个变体已成功生成并推送：

$ curl -X GET localhost:5000/v2/_catalog
{"repositories":["adaptant/edgetpu-devicequery"]}$ curl -X GET localhost:5000/v2/adaptant/edgetpu-devicequery/tags/list
{"name":"adaptant/edgetpu-edgetpu-devicequery","tags":["std","max"]}

局限性 (Limitations)

As the current version of the Ansible Docker plugin does not support Docker Buildx, multi-arch manifests must still, for the moment, be created and pushed out by hand. We presently do this by running an instance of the image builder on each target architecture while using a shared registry, and then manually reconciling the manifest. While this works as a stop-gap solution, it’s not intended for the long-term, and we will be updating the image builder for multi-arch as the plugin support in Ansible improves.

由于当前版本的Ansible Docker插件不支持Docker Buildx ，因此目前仍必须手动创建并推出多体系结构清单。目前，我们通过使用共享注册表在每个目标体系结构上运行映像构建器的实例，然后手动调整清单来实现此目的。尽管这是一个权宜之计，但它不是长期的目标，随着Ansible中插件支持的改进，我们将更新用于多体系结构的图像构建器。

结论 (Conclusion)

In this article we have tried to address gaps we observed in ready-to-use accelerator base containers and demonstrated how applications can directly build on our base containers and layering methodology in order to focus on the aspects more specifically related to the application logic, while streamlining application Dockerfiles.

在本文中，我们试图解决在现成的加速器基础容器中观察到的差距，并演示了应用程序如何直接在基础容器和分层方法上构建，以便专注于与应用程序逻辑更具体相关的方面，而简化应用程序Dockerfile。

We have further demonstrated how the use of the SODALITE image builder can further complement this workflow by providing a mechanism for applications to create image variants that build on different base images, allowing an application to provide different container images ready to run on specific accelerator types.

我们进一步展示了SODALITE图像生成器的使用如何通过为应用程序提供一种机制来创建基于不同基础图像的图像变体的机制，从而允许应用程序提供准备在特定加速器类型上运行的不同容器图像，从而进一步补充该工作流程。

下一个 (Next Up)

In the next article, we will look at how the SODALITE run-time monitor and refactorer can leverage Prometheus and AlertManager for thermal monitoring of the deployed application in order to dynamically reconfigure the deployment to stay within the thermal tolerances of the inference engine.

在下一篇文章中，我们将研究SODALITE运行时监视器和重构器如何利用Prometheus和AlertManager对已部署应用程序进行热监视，以便动态地重新配置部署以使其保持在推理引擎的热容限内。

致谢 (Acknowledgements)

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825480 (SODALITE).

该项目已获得欧盟Horizon 2020研究与创新计划的资助，资助计划号为825480( SODALITE )。

翻译自: https://itnext.io/containerizing-multi-arch-base-accelerator-run-time-environments-f814aa02706e

weixin_26752759

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
容器化多目标基础加速器运行时环境

容器化多目标基础加速器运行时环境 (Containerizing Multi-Arch Base Accelerator Run-Time Environments)In the SODALITE H2020 project we deal with the deployment and reconfiguration of application deployments that make ...
复制链接

扫一扫