目录
第一章、介绍
多实例GPU(MIG)功能允许将GPU(如NVIDIA A100)安全地划分为多个独立的GPU实例,供CUDA应用程序使用。例如,NVIDIA A100支持最多七个独立的GPU实例。
MIG为多个用户提供了独立的GPU资源,以实现最佳的GPU利用率。该功能对于未能充分利用GPU计算能力的工作负载特别有益,因此用户可能希望并行运行不同的工作负载,以最大化资源的利用。
本文档提供了启用Kubernetes的MIG支持所需软件的概述。有关技术概念、设置MIG以及使用MIG运行容器的NVIDIA容器工具包的更多详细信息,请参考MIG用户指南。
部署工作流需要以下前提条件:
- 您已安装NVIDIA R450+数据中心(450.80.02+)驱动程序,适用于NVIDIA A100。
- 您已安装NVIDIA容器工具包v2.5.0+。
- 您已经有一个Kubernetes部署并在运行,并且对至少一个NVIDIA A100 GPU具有访问权限。
一旦满足这些先决条件,您可以继续在您的集群中部署一个支持MIG的NVIDIA k8s-device-plugin版本,并(可选)部署gpu-feature-discovery组件,以便Kubernetes能够在可用的MIG设备上调度Pods。
所需软件组件的最低版本如下所示:
- NVIDIA R450+数据中心驱动程序:450.80.02+
- NVIDIA容器工具包(nvidia-docker2):v2.5.0+
- NVIDIA k8s-device-plugin:v0.7.0+
- NVIDIA gpu-feature-discovery:v0.2.0+
第二章、MIG 策略
NVIDIA 提供了两种在 Kubernetes 节点上公开 MIG 设备的策略。 有关策略的更多详细信息,请参阅设计文档。
2.1 在 Kubernetes 中使用 MIG 策略
本节将介绍部署和运行k8s-device-plugin和gpu-feature-discovery组件以支持各种MIG策略所需的步骤。推荐的部署方法是通过Helm进行。
有关替代部署方法,请参阅以下GitHub库中的安装说明:
- k8s-device-plugin
- gpu-feature-discovery
首先,添加nvidia-device-plugin和gpu-feature-discovery的Helm仓库:
#helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
#helm repo add nvgfd https://nvidia.github.io/gpu-feature-discovery
#helm repo update
然后,验证 nvidia-device-plugin 的 v0.7.0 版本和 v0.2.0 版本 的 GPU-feature-discovery 可用:
#helm search repo nvdp --devel
NAME CHART VERSION APP VERSION DESCRIPTION
nvdp/nvidia-device-plugin 0.7.0 0.7.0 A Helm chart for ...
#helm search repo nvgfd --devel
NAME CHART VERSION APP VERSION DESCRIPTION
nvgfd/gpu-feature-discovery 0.2.0 0.2.0 A Helm chart for ...
最后,选择一个 MIG 策略并部署 nvidia-device-plugin 和 gpu-feature-discovery 组件:
#export MIG_STRATEGY=<none | single | mixed>
#helm install \
--version=0.7.0 \
--generate-name \
--set migStrategy=${MIG_STRATEGY} \
nvdp/nvidia-device-plugin
#helm install \
--version=0.2.0 \
--generate-name \
--set migStrategy=${MIG_STRATEGY} \
nvgfd/gpu-feature-discovery
2.2 使用不同的策略进行测试
本部分将介绍测试每个 MIG 策略所需的步骤。
注意:
在默认设置下,对于混合策略,容器一次只能请求一种设备类型。如果容器请求了多于一种设备类型,则接收到的设备是未定义的。例如,容器不能同时请求nvidia.com/gpu和nvidia.com/mig-3g.20gb。然而,它可以无任何限制地请求同一资源类型的多个实例(例如,nvidia.com/gpu: 2 或 nvidia.com/mig-3g.20gb: 2)。
为了缓解这种行为,我们建议遵循文档中概述的指导。
2.3 none策略
none策略旨在使nvidia-device-plugin的运行方式保持不变。该插件不会区分启用了MIG和未启用MIG的GPU,而是会枚举节点上的所有GPU,并使它们通过nvidia.com/gpu资源类型可用。
2.3.1 测试
为了测试此策略,我们检查启用和未启用MIG的GPU的枚举,并确保在这两种情况下都能看到它。该测试假设集群中在单个节点上有一个GPU。
1、验证 GPU 上是否禁用了 MIG:
#nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB Off | 00000000:36:00.0 Off | 0 |
| N/A 29C P0 62W / 400W | 0MiB / 40537MiB | 6% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
2、按照上一节所述,使用none策略启动nvidia-device-plugin。如果插件已经在运行,请重启该插件。
3、观察节点上是否有1个GPU可用,资源类型为nvidia.com/gpu:
kubectl describe node
...
Capacity:
nvidia.com/gpu: 1
...
Allocatable:
nvidia.com/gpu: 1
...
使用none策略启动gpu-feature-discovery:
4、按照上一节所述,启动gpu-feature-discovery。如果插件已经在运行,请重启该插件。
5、观察该MIG策略已应用正确的标签集合:
kubectl get node -o json | \
jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{
"nvidia.com/cuda.driver.major": "450",
"nvidia.com/cuda.driver.minor": "80",
"nvidia.com/cuda.driver.rev": "02",
"nvidia.com/cuda.runtime.major": "11",
"nvidia.com/cuda.runtime.minor": "0",
"nvidia.com/gfd.timestamp": "1605312111",
"nvidia.com/gpu.compute.major": "8",
"nvidia.com/gpu.compute.minor": "0",
"nvidia.com/gpu.count": "1",
"nvidia.com/gpu.family": "ampere",
"nvidia.com/gpu.machine": "NVIDIA DGX",
"nvidia.com/gpu.memory": "40537",
"nvidia.com/gpu.product": "A100-SXM4-40GB"
}
6、部署一个Pod以使用GPU并运行nvidia-smi:
kubectl run -it --rm \
--image=nvidia/cuda:11.0-base \
--restart=Never \
--limits=nvidia.com/gpu=1 \
mig-none-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-15f0798d-c807-231d-6525-a7827081f0f1)
7、在 GPU 上启用 MIG(需要先停止所有 GPU 客户端)
#sudo systemctl stop kubelet
#sudo nvidia-smi -mig 1
Enabled MIG Mode for GPU 00000000:36:00.0
All done.
#nvidia-smi --query-gpu=mig.mode.current --format=csv,noheader
Enabled
8、重启kubelet和插件。
#sudo systemctl start kubelet
9、再次观察节点上是否有1个GPU可用,资源类型为nvidia.com/gpu。
kubectl describe node
...
Capacity:
nvidia.com/gpu: 1
...
Allocatable:
nvidia.com/gpu: 1
...
10、观察标签是否没有改变:
kubectl get node -o json | \
jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{
"nvidia.com/cuda.driver.major": "450",
"nvidia.com/cuda.driver.minor": "80",
"nvidia.com/cuda.driver.rev": "02",
"nvidia.com/cuda.runtime.major": "11",
"nvidia.com/cuda.runtime.minor": "0",
"nvidia.com/gfd.timestamp": "1605312111",
"nvidia.com/gpu.compute.major": "8",
"nvidia.com/gpu.compute.minor": "0",
"nvidia.com/gpu.count": "1",
"nvidia.com/gpu.family": "ampere",
"nvidia.com/gpu.machine": "NVIDIA DGX",
"nvidia.com/gpu.memory": "40537",
"nvidia.com/gpu.product": "A100-SXM4-40GB"
}
11、再次部署一个Pod以使用GPU并运行nvidia-smi:
kubectl run -it --rm \
--image=nvidia/cuda:9.0-base \
--restart=Never \
--limits=nvidia.com/gpu=1 \
mig-none-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-15f0798d-c807-231d-6525-a7827081f0f1)
2.4 single策略
single策略旨在使用户在Kubernetes中使用GPU的体验保持与以往相同。MIG设备将与传统的nvidia.com/gpu资源类型一同枚举,就像以前一样。然而,与该资源类型相关联的属性现在映射到该节点上可用的MIG设备,而不是完整的GPU。
2.4.1 测试
为了测试此策略,我们检查单一类型的MIG设备是否使用传统的nvidia.com/gpu资源类型进行了枚举。该测试假设集群中在单个节点上的GPU已经启用了MIG。
1、 验证GPU上已启用MIG且没有MIG设备:
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB On | 00000000:00:04.0 Off | On |
| N/A 32C P0 43W / 400W | 0MiB / 40537MiB | N/A Default |
| | | Enabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| No MIG devices found |
+-----------------------------------------------------------------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
2、在GPU上创建7个单切片的MIG设备:
sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 -C
nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/7/0)
MIG 1g.5gb Device 1: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/8/0)
MIG 1g.5gb Device 2: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)
MIG 1g.5gb Device 3: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/10/0)
MIG 1g.5gb Device 4: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/11/0)
MIG 1g.5gb Device 5: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/12/0)
MIG 1g.5gb Device 6: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/13/0)
3、按照上一节所述,以single策略启动nvidia-device-plugin插件:如果插件已经在运行,请重启该插件。
4、观察节点上是否有7个MIG设备可用,资源类型为nvidia.com/gpu:
kubectl describe node
...
Capacity:
nvidia.com/gpu: 7
...
Allocatable:
nvidia.com/gpu: 7
...
5、按照上一节所述,以single策略启动gpu-feature-discovery:如果插件已经在运行,请重启该插件。
6、观察该MIG策略已应用正确的标签集合:
kubectl get node -o json | \
jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{
"nvidia.com/cuda.driver.major": "450",
"nvidia.com/cuda.driver.minor": "80",
"nvidia.com/cuda.driver.rev": "02",
"nvidia.com/cuda.runtime.major": "11",
"nvidia.com/cuda.runtime.minor": "0",
"nvidia.com/gfd.timestamp": "1605657366",
"nvidia.com/gpu.compute.major": "8",
"nvidia.com/gpu.compute.minor": "0",
"nvidia.com/gpu.count": "7",
"nvidia.com/gpu.engines.copy": "1",
"nvidia.com/gpu.engines.decoder": "0",
"nvidia.com/gpu.engines.encoder": "0",
"nvidia.com/gpu.engines.jpeg": "0",
"nvidia.com/gpu.engines.ofa": "0",
"nvidia.com/gpu.family": "ampere",
"nvidia.com/gpu.machine": "NVIDIA DGX",
"nvidia.com/gpu.memory": "4864",
"nvidia.com/gpu.multiprocessors": "14",
"nvidia.com/gpu.product": "A100-SXM4-40GB-MIG-1g.5gb",
"nvidia.com/gpu.slices.ci": "1",
"nvidia.com/gpu.slices.gi": "1",
"nvidia.com/mig.strategy": "single"
}
7、部署7个Pod,每个Pod使用一个MIG设备(然后读取它们的日志并删除它们):
for i in $(seq 7); do
kubectl run \
--image=nvidia/cuda:11.0-base \
--restart=Never \
--limits=nvidia.com/gpu=1 \
mig-single-example-${i} -- bash -c "nvidia-smi -L; sleep infinity"
done
pod/mig-single-example-1 created
pod/mig-single-example-2 created
pod/mig-single-example-3 created
pod/mig-single-example-4 created
pod/mig-single-example-5 created
pod/mig-single-example-6 created
pod/mig-single-example-7 created
for i in $(seq 7); do
echo "mig-single-example-${i}";
kubectl logs mig-single-example-${i}
echo "";
done
mig-single-example-1
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/7/0)
mig-single-example-2
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)
...
for i in $(seq 7); do
kubectl delete pod mig-single-example-${i};
done
pod "mig-single-example-1" deleted
pod "mig-single-example-2" deleted
...
2.5 mixed策略
mixed策略旨在为集群中每种可用的MIG设备配置枚举不同的资源类型。
2.5.1 测试
为了测试此策略,我们检查所有MIG设备是否使用其完全限定名称以nvidia.com/mig-<slice_count>g.<memory_size>gb的形式进行了枚举。该测试假设集群中在单个节点上的GPU已经启用了MIG。
1、验证GPU上已启用MIG且没有MIG设备:
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB On | 00000000:00:04.0 Off | On |
| N/A 32C P0 43W / 400W | 0MiB / 40537MiB | N/A Default |
| | | Enabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| No MIG devices found |
+-----------------------------------------------------------------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
2、在 GPU 上创建 3 个不同大小的不同 MIG 设备:
sudo nvidia-smi mig -cgi 9,14,19 -C
nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
MIG 3g.20gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/2/0)
MIG 2g.10gb Device 1: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/3/0)
MIG 1g.5gb Device 2: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)
3、按照上一节所述,以mixed策略启动nvidia-device-plugin插件:如果插件已经在运行,请重启该插件。
4、观察节点上是否有3个MIG设备可用,资源类型为nvidia.com/gpu:
kubectl describe node
...
Capacity:
nvidia.com/mig-1g.5gb: 1
nvidia.com/mig-2g.10gb: 1
nvidia.com/mig-3g.20gb: 1
...
Allocatable:
nvidia.com/mig-1g.5gb: 1
nvidia.com/mig-2g.10gb: 1
nvidia.com/mig-3g.20gb: 1
...
5、按照上一节所述,以mixed策略启动gpu-feature-discovery:如果插件已经在运行,请重启该插件。
6、观察该MIG策略已应用正确的标签集合:
kubectl get node -o json | \
jq '.items[0].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'
{
"nvidia.com/cuda.driver.major": "450",
"nvidia.com/cuda.driver.minor": "80",
"nvidia.com/cuda.driver.rev": "02",
"nvidia.com/cuda.runtime.major": "11",
"nvidia.com/cuda.runtime.minor": "0",
"nvidia.com/gfd.timestamp": "1605658841",
"nvidia.com/gpu.compute.major": "8",
"nvidia.com/gpu.compute.minor": "0",
"nvidia.com/gpu.count": "1",
"nvidia.com/gpu.family": "ampere",
"nvidia.com/gpu.machine": "NVIDIA DGX",
"nvidia.com/gpu.memory": "40537",
"nvidia.com/gpu.product": "A100-SXM4-40GB",
"nvidia.com/mig-1g.5gb.count": "1",
"nvidia.com/mig-1g.5gb.engines.copy": "1",
"nvidia.com/mig-1g.5gb.engines.decoder": "0",
"nvidia.com/mig-1g.5gb.engines.encoder": "0",
"nvidia.com/mig-1g.5gb.engines.jpeg": "0",
"nvidia.com/mig-1g.5gb.engines.ofa": "0",
"nvidia.com/mig-1g.5gb.memory": "4864",
"nvidia.com/mig-1g.5gb.multiprocessors": "14",
"nvidia.com/mig-1g.5gb.slices.ci": "1",
"nvidia.com/mig-1g.5gb.slices.gi": "1",
"nvidia.com/mig-2g.10gb.count": "1",
"nvidia.com/mig-2g.10gb.engines.copy": "2",
"nvidia.com/mig-2g.10gb.engines.decoder": "1",
"nvidia.com/mig-2g.10gb.engines.encoder": "0",
"nvidia.com/mig-2g.10gb.engines.jpeg": "0",
"nvidia.com/mig-2g.10gb.engines.ofa": "0",
"nvidia.com/mig-2g.10gb.memory": "9984",
"nvidia.com/mig-2g.10gb.multiprocessors": "28",
"nvidia.com/mig-2g.10gb.slices.ci": "2",
"nvidia.com/mig-2g.10gb.slices.gi": "2",
"nvidia.com/mig-3g.21gb.count": "1",
"nvidia.com/mig-3g.21gb.engines.copy": "3",
"nvidia.com/mig-3g.21gb.engines.decoder": "2",
"nvidia.com/mig-3g.21gb.engines.encoder": "0",
"nvidia.com/mig-3g.21gb.engines.jpeg": "0",
"nvidia.com/mig-3g.21gb.engines.ofa": "0",
"nvidia.com/mig-3g.21gb.memory": "20096",
"nvidia.com/mig-3g.21gb.multiprocessors": "42",
"nvidia.com/mig-3g.21gb.slices.ci": "3",
"nvidia.com/mig-3g.21gb.slices.gi": "3",
"nvidia.com/mig.strategy": "mixed"
}
7、部署3个Pod,每个Pod使用一个可用的MIG设备:
kubectl run -it --rm \
--image=nvidia/cuda:11.0-base \
--restart=Never \
--limits=nvidia.com/mig-1g.5gb=1 \
mig-mixed-example -- nvidia-smi -L
GPU 0: A100-SXM4-40GB (UUID: GPU-4200ccc0-2667-d4cb-9137-f932c716232a)
MIG 1g.5gb Device 0: (UUID: MIG-GPU-4200ccc0-2667-d4cb-9137-f932c716232a/9/0)
pod "mig-mixed-example" deleted