Kubernetes配置使用NVIDIA GPU MIG多实例

目录

第一章、介绍

第二章、MIG 策略

2.1 在 Kubernetes 中使用 MIG 策略

2.2 使用不同的策略进行测试

2.3 none策略 

2.3.1 测试

2.4 single策略

2.4.1 测试

2.5 mixed策略

2.5.1 测试


第一章、介绍

多实例GPU(MIG)功能允许将GPU(如NVIDIA A100)安全地划分为多个独立的GPU实例,供CUDA应用程序使用。例如,NVIDIA A100支持最多七个独立的GPU实例。

MIG为多个用户提供了独立的GPU资源,以实现最佳的GPU利用率。该功能对于未能充分利用GPU计算能力的工作负载特别有益,因此用户可能希望并行运行不同的工作负载,以最大化资源的利用。

本文档提供了启用Kubernetes的MIG支持所需软件的概述。有关技术概念、设置MIG以及使用MIG运行容器的NVIDIA容器工具包的更多详细信息,请参考MIG用户指南。

部署工作流需要以下前提条件:

  1. 您已安装NVIDIA R450+数据中心(450.80.02+)驱动程序,适用于NVIDIA A100。
  2. 您已安装NVIDIA容器工具包v2.5.0+。
  3. 您已经有一个Kubernetes部署并在运行,并且对至少一个NVIDIA A100 GPU具有访问权限。

一旦满足这些先决条件,您可以继续在您的集群中部署一个支持MIG的NVIDIA k8s-device-plugin版本,并(可选)部署gpu-feature-discovery组件,以便Kubernetes能够在可用的MIG设备上调度Pods。

所需软件组件的最低版本如下所示:

  • NVIDIA R450+数据中心驱动程序:450.80.02+
  • NVIDIA容器工具包(nvidia-docker2):v2.5.0+
  • NVIDIA k8s-device-plugin:v0.7.0+
  • NVIDIA gpu-feature-discovery:v0.2.0+

第二章、MIG 策略

NVIDIA 提供了两种在 Kubernetes 节点上公开 MIG 设备的策略。 有关策略的更多详细信息,请参阅设计文档

2.1 在 Kubernetes 中使用 MIG 策略

本节将介绍部署和运行k8s-device-plugin和gpu-feature-discovery组件以支持各种MIG策略所需的步骤。推荐的部署方法是通过Helm进行。

有关替代部署方法,请参阅以下GitHub库中的安装说明:

  • k8s-device-plugin
  • gpu-feature-discovery

首先,添加nvidia-device-plugin和gpu-feature-discovery的Helm仓库:

#helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
#helm repo add nvgfd https://nvidia.github.io/gpu-feature-discovery
#helm repo update

然后,验证 nvidia-device-plugin 的 v0.7.0 版本和 v0.2.0 版本 的 GPU-feature-discovery 可用:

#helm search repo nvdp --devel
NAME                           CHART VERSION  APP VERSION    DESCRIPTION
nvdp/nvidia-device-plugin      0.7.0          0.7.0         A Helm chart for ...
#helm search repo nvgfd --devel
NAME                           CHART VERSION  APP VERSION    DESCRIPTION
nvgfd/gpu-feature-discovery  0.2.0          0.2.0           A Helm chart for ...

最后,选择一个 MIG 策略并部署 nvidia-device-plugin 和 gpu-feature-discovery 组件:

#export MIG_STRATEGY=<none | single | mixed>
#helm install \
   --version=0.7.0 \
   --generate-name \
   --set migStrategy=${MIG_STRATEGY} \
   nvdp/nvidia-device-plugin
#helm install \
   --version=0.2.0 \
   --generate-name \
   --set migStrategy=${MIG_STRATEGY} \
   nvgfd/gpu-feature-discovery

2.2 使用不同的策略进行测试

本部分将介绍测试每个 MIG 策略所需的步骤。

注意:

在默认设置下,对于混合策略,容器一次只能请求一种设备类型。如果容器请求了多于一种设备类型,则接收到的设备是未定义的。例如,容器不能同时请求nvidia.com/gpu和nvidia.com/mig-3g.20gb。然而,它可以无任何限制地请求同一资源类型的多个实例(例如,nvidia.com/gpu: 2 或 nvidia.com/mig-3g.20gb: 2)。

为了缓解这种行为,我们建议遵循文档中概述的指导。

2.3 none策略 

none策略旨在使nvidia-device-plugin的运行方式保持不变。该插件不会区分启用了MIG和未启用MIG的GPU,而是会枚举节点上的所有GPU,并使它们通过nvidia.com/gpu资源类型可用。

2.3.1 测试

为了测试此策略,我们检查启用和未启用MIG的GPU的枚举,并确保在这两种情况下都能看到它。该测试假设集群中在单个节点上有一个GPU。

1、验证 GPU 上是否禁用了 MIG:

#nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      Off  | 00000000:36:00.0 Off |                    0 |
| N/A   29C    P0    62W / 400W |      0MiB / 40537MiB |      6%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

2、按照上一节所述,使用none策略启动nvidia-device-plugin。如果插件已经在运行,请重启该插件。

3、观察节点上是否有1个GPU可用,资源类型为nvidia.com/gpu:

kubectl describe node
...
Capacity:
nvidia.com/gpu:          1
...
Allocatable:
nvidia.com/gpu:          1
...

使用none策略启动gpu-feature-discovery: 

4、按照上一节所述,启动gpu-feature-discovery。如果插件已经在运行,请重启该插件。

5、观察该MIG策略已应用正确的标签集合:

kubectl get node -o json | \
   jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{
"nvidia.com/cuda.driver.major": "450",
"nvidia.com/cuda.driver.minor": "80",
"nvidia.com/cuda.driver.rev": "02",
"nvidia.com/cuda.runtime.major": "11",
"nvidia.com/cuda.runtime.minor": "0",
"nvidia.com/gfd.timestamp": "1605312111",
"nvidia.com/gpu.compute.major": "8",
"nvidia.com/gpu.compute.minor": "0",
"nvidia.com/gpu.count": "1",
"nvidia.com/gpu.family": "ampere",
"nvidia.com/gpu.machine": "NVIDIA DGX",
"nvidia.com/gpu.memory": "40537",
"nvidia.com/gpu.product": "A100-SXM4-40GB"
}

6、部署一个Pod以使用GPU并运行nvidia-smi:

kubectl run -it --rm \
   --image=nvidia/cuda:11.0-base \
   --restart=Never \
   --limits=nvidia.com/gpu=1 \
   mig-none-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-15f0798d-c807-231d-6525-a7827081f0f1)

7、在 GPU 上启用 MIG(需要先停止所有 GPU 客户端)

#sudo systemctl stop kubelet
#sudo nvidia-smi -mig 1
Enabled MIG Mode for GPU 00000000:36:00.0
All done.
#nvidia-smi --query-gpu=mig.mode.current --format=csv,noheader
Enabled

8、重启kubelet和插件。

#sudo systemctl start kubelet

9、再次观察节点上是否有1个GPU可用,资源类型为nvidia.com/gpu。

kubectl describe node
...
Capacity:
nvidia.com/gpu:          1
...
Allocatable:
nvidia.com/gpu:          1
...

10、观察标签是否没有改变:

kubectl get node -o json | \
   jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{
"nvidia.com/cuda.driver.major": "450",
"nvidia.com/cuda.driver.minor": "80",
"nvidia.com/cuda.driver.rev": "02",
"nvidia.com/cuda.runtime.major": "11",
"nvidia.com/cuda.runtime.minor": "0",
"nvidia.com/gfd.timestamp": "1605312111",
"nvidia.com/gpu.compute.major": "8",
"nvidia.com/gpu.compute.minor": "0",
"nvidia.com/gpu.count": "1",
"nvidia.com/gpu.family": "ampere",
"nvidia.com/gpu.machine": "NVIDIA DGX",
"nvidia.com/gpu.memory": "40537",
"nvidia.com/gpu.product": "A100-SXM4-40GB"
}

11、再次部署一个Pod以使用GPU并运行nvidia-smi:

kubectl run -it --rm \
   --image=nvidia/cuda:9.0-base \
   --restart=Never \
   --limits=nvidia.com/gpu=1 \
   mig-none-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-15f0798d-c807-231d-6525-a7827081f0f1)

2.4 single策略

single策略旨在使用户在Kubernetes中使用GPU的体验保持与以往相同。MIG设备将与传统的nvidia.com/gpu资源类型一同枚举,就像以前一样。然而,与该资源类型相关联的属性现在映射到该节点上可用的MIG设备,而不是完整的GPU。

2.4.1 测试

为了测试此策略,我们检查单一类型的MIG设备是否使用传统的nvidia.com/gpu资源类型进行了枚举。该测试假设集群中在单个节点上的GPU已经启用了MIG。

1、 验证GPU上已启用MIG且没有MIG设备:

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      On   | 00000000:00:04.0 Off |                   On |
| N/A   32C    P0    43W / 400W |      0MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  No MIG devices found                                                       |
+-----------------------------------------------------------------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

2、在GPU上创建7个单切片的MIG设备:

sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 -C
nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
  MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/7/0)
  MIG 1g.5gb Device 1: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/8/0)
  MIG 1g.5gb Device 2: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)
  MIG 1g.5gb Device 3: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/10/0)
  MIG 1g.5gb Device 4: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/11/0)
  MIG 1g.5gb Device 5: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/12/0)
  MIG 1g.5gb Device 6: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/13/0)

3、按照上一节所述,以single策略启动nvidia-device-plugin插件:如果插件已经在运行,请重启该插件。

4、观察节点上是否有7个MIG设备可用,资源类型为nvidia.com/gpu:

kubectl describe node
...
Capacity:
nvidia.com/gpu:          7
...
Allocatable:
nvidia.com/gpu:          7
...

5、按照上一节所述,以single策略启动gpu-feature-discovery:如果插件已经在运行,请重启该插件。

6、观察该MIG策略已应用正确的标签集合:

kubectl get node -o json | \
   jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'

{
"nvidia.com/cuda.driver.major": "450",
"nvidia.com/cuda.driver.minor": "80",
"nvidia.com/cuda.driver.rev": "02",
"nvidia.com/cuda.runtime.major": "11",
"nvidia.com/cuda.runtime.minor": "0",
"nvidia.com/gfd.timestamp": "1605657366",
"nvidia.com/gpu.compute.major": "8",
"nvidia.com/gpu.compute.minor": "0",
"nvidia.com/gpu.count": "7",
"nvidia.com/gpu.engines.copy": "1",
"nvidia.com/gpu.engines.decoder": "0",
"nvidia.com/gpu.engines.encoder": "0",
"nvidia.com/gpu.engines.jpeg": "0",
"nvidia.com/gpu.engines.ofa": "0",
"nvidia.com/gpu.family": "ampere",
"nvidia.com/gpu.machine": "NVIDIA DGX",
"nvidia.com/gpu.memory": "4864",
"nvidia.com/gpu.multiprocessors": "14",
"nvidia.com/gpu.product": "A100-SXM4-40GB-MIG-1g.5gb",
"nvidia.com/gpu.slices.ci": "1",
"nvidia.com/gpu.slices.gi": "1",
"nvidia.com/mig.strategy": "single"
}

7、部署7个Pod,每个Pod使用一个MIG设备(然后读取它们的日志并删除它们):

for i in $(seq 7); do
   kubectl run \
      --image=nvidia/cuda:11.0-base \
      --restart=Never \
      --limits=nvidia.com/gpu=1 \
      mig-single-example-${i} -- bash -c "nvidia-smi -L; sleep infinity"
done
pod/mig-single-example-1 created
pod/mig-single-example-2 created
pod/mig-single-example-3 created
pod/mig-single-example-4 created
pod/mig-single-example-5 created
pod/mig-single-example-6 created
pod/mig-single-example-7 created
for i in $(seq 7); do
echo "mig-single-example-${i}";
kubectl logs mig-single-example-${i}
echo "";
done
mig-single-example-1
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
   MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/7/0)

mig-single-example-2
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
   MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)

...
for i in $(seq 7); do
kubectl delete pod mig-single-example-${i};
done
pod "mig-single-example-1" deleted
pod "mig-single-example-2" deleted
...

2.5 mixed策略

mixed策略旨在为集群中每种可用的MIG设备配置枚举不同的资源类型。

2.5.1 测试

为了测试此策略,我们检查所有MIG设备是否使用其完全限定名称以nvidia.com/mig-<slice_count>g.<memory_size>gb的形式进行了枚举。该测试假设集群中在单个节点上的GPU已经启用了MIG。

1、验证GPU上已启用MIG且没有MIG设备:

nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB      On   | 00000000:00:04.0 Off |                   On |
| N/A   32C    P0    43W / 400W |      0MiB / 40537MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  No MIG devices found                                                       |
+-----------------------------------------------------------------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

2、在 GPU 上创建 3 个不同大小的不同 MIG 设备:

sudo nvidia-smi mig -cgi 9,14,19 -C
nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
  MIG 3g.20gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/2/0)
  MIG 2g.10gb Device 1: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/3/0)
  MIG 1g.5gb Device 2: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)

3、按照上一节所述,以mixed策略启动nvidia-device-plugin插件:如果插件已经在运行,请重启该插件。

4、观察节点上是否有3个MIG设备可用,资源类型为nvidia.com/gpu:

kubectl describe node
...
Capacity:
nvidia.com/mig-1g.5gb:   1
nvidia.com/mig-2g.10gb:  1
nvidia.com/mig-3g.20gb:  1
...
Allocatable:
nvidia.com/mig-1g.5gb:   1
nvidia.com/mig-2g.10gb:  1
nvidia.com/mig-3g.20gb:  1
...

5、按照上一节所述,以mixed策略启动gpu-feature-discovery:如果插件已经在运行,请重启该插件。

6、观察该MIG策略已应用正确的标签集合:

kubectl get node -o json | \
   jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{
"nvidia.com/cuda.driver.major": "450",
"nvidia.com/cuda.driver.minor": "80",
"nvidia.com/cuda.driver.rev": "02",
"nvidia.com/cuda.runtime.major": "11",
"nvidia.com/cuda.runtime.minor": "0",
"nvidia.com/gfd.timestamp": "1605658841",
"nvidia.com/gpu.compute.major": "8",
"nvidia.com/gpu.compute.minor": "0",
"nvidia.com/gpu.count": "1",
"nvidia.com/gpu.family": "ampere",
"nvidia.com/gpu.machine": "NVIDIA DGX",
"nvidia.com/gpu.memory": "40537",
"nvidia.com/gpu.product": "A100-SXM4-40GB",
"nvidia.com/mig-1g.5gb.count": "1",
"nvidia.com/mig-1g.5gb.engines.copy": "1",
"nvidia.com/mig-1g.5gb.engines.decoder": "0",
"nvidia.com/mig-1g.5gb.engines.encoder": "0",
"nvidia.com/mig-1g.5gb.engines.jpeg": "0",
"nvidia.com/mig-1g.5gb.engines.ofa": "0",
"nvidia.com/mig-1g.5gb.memory": "4864",
"nvidia.com/mig-1g.5gb.multiprocessors": "14",
"nvidia.com/mig-1g.5gb.slices.ci": "1",
"nvidia.com/mig-1g.5gb.slices.gi": "1",
"nvidia.com/mig-2g.10gb.count": "1",
"nvidia.com/mig-2g.10gb.engines.copy": "2",
"nvidia.com/mig-2g.10gb.engines.decoder": "1",
"nvidia.com/mig-2g.10gb.engines.encoder": "0",
"nvidia.com/mig-2g.10gb.engines.jpeg": "0",
"nvidia.com/mig-2g.10gb.engines.ofa": "0",
"nvidia.com/mig-2g.10gb.memory": "9984",
"nvidia.com/mig-2g.10gb.multiprocessors": "28",
"nvidia.com/mig-2g.10gb.slices.ci": "2",
"nvidia.com/mig-2g.10gb.slices.gi": "2",
"nvidia.com/mig-3g.21gb.count": "1",
"nvidia.com/mig-3g.21gb.engines.copy": "3",
"nvidia.com/mig-3g.21gb.engines.decoder": "2",
"nvidia.com/mig-3g.21gb.engines.encoder": "0",
"nvidia.com/mig-3g.21gb.engines.jpeg": "0",
"nvidia.com/mig-3g.21gb.engines.ofa": "0",
"nvidia.com/mig-3g.21gb.memory": "20096",
"nvidia.com/mig-3g.21gb.multiprocessors": "42",
"nvidia.com/mig-3g.21gb.slices.ci": "3",
"nvidia.com/mig-3g.21gb.slices.gi": "3",
"nvidia.com/mig.strategy": "mixed"
}

7、部署3个Pod,每个Pod使用一个可用的MIG设备:

kubectl run -it --rm \
   --image=nvidia/cuda:11.0-base \
   --restart=Never \
   --limits=nvidia.com/mig-1g.5gb=1 \
   mig-mixed-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)
pod "mig-mixed-example" deleted

  • 11
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

技术瘾君子1573

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值