GPU调度方案

最新推荐文章于 2024-05-08 15:45:15 发布

chansonzhang

最新推荐文章于 2024-05-08 15:45:15 发布

阅读量2k

点赞数

分类专栏： AI Platform 文章标签：人工智能深度学习机器学习

本文链接：https://blog.csdn.net/chansonzhang/article/details/118702310

版权

AI Platform 专栏收录该内容

4 篇文章 5 订阅

订阅专栏

gpu 调度方案

设置环境变量CUDA_VISIBLE_DEVICES

$ deviceQuery |& grep ^Device
Device 0: "Tesla M2090"
Device 1: "Tesla M2090"
$ CUDA_VISIBLE_DEVICES=0 deviceQuery |& grep ^Device
Device 0: "Tesla M2090"

如果这一步没有生效，再尝试设置

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"

如果是在python中设置环境变量，应该确保在import tensorflow 和 pycuda 之前

CUDA_VISIBLE_DEVICES 不一定能彻底隔离GPU

参考：https://stackoverflow.com/a/58445444/6010781

在docker/kubernetes 一般是设置NVIDIA_VISIBLE_DEVICES这个环境变量

如果不依赖于外部的调度框架如k8s/yarn 3.x, 那么设置环境变量之前需要自行维护一个资源表，记录每个节点上已分配和空闲的GPU。

直接在代码中指定GPU（极端不推荐）

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant(3.0)
with tf.Session() as sess:
    while True:
        print sess.run(a)

同样是无法知道已分配和空闲GPU，且hardcode方式使代码无法迁移。

在代码中检测哪些GPU上有足够的内存，然后设置环境变量

import subprocess as sp
import os

def mask_unused_gpus(leave_unmasked=1):
  ACCEPTABLE_AVAILABLE_MEMORY = 1024
  COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"

  try:
    _output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
    memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
    memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
    available_gpus = [i for i, x in enumerate(memory_free_values) if x > ACCEPTABLE_AVAILABLE_MEMORY]

    if len(available_gpus) < leave_unmasked: raise ValueError('Found only %d usable GPUs in the system' % len(available_gpus))
    os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, available_gpus[:leave_unmasked]))
  except Exception as e:
    print('"nvidia-smi" is probably not installed. GPUs are not masked', e)

mask_unused_gpus(2)

参考：https://stackoverflow.com/a/47998168/6010781
因为显存是动态变化的，可能会产生冲突

分配之后才检测，可能无法分配到空闲的节点，却在忙碌的节点上得不到资源。

每次请求GPU的时候请求同比例的内存

leewyang commented on Dec 9, 2017
This is dependent on your Spark setup. For instance, in our case, we run Spark on top of Hadoop/YARN, so YARN is responsible for allocating containers to run the Spark executors (which in turn run the TensorFlow nodes). Unfortunately, YARN currently does not have GPUs as a schedulable resource. Instead, YARN schedules generally on CPU and Memory, so in our case, we use Memory as a proxy for GPU.

So, in your example, if we assume that your nodes are 64GB nodes with 4 GPUs each, then we’d schedule a GPU by requesting 16GB of memory. And, if this proxy is consistently used, then a node with all four GPUs in use (i.e. 64GB memory) would not be scheduled for any new executors/containers by YARN.

比如一个节点4个gpu，64G内存，每次申请1个GPU同时申请4G内存，可以做到1个节点不会被分配超过4个GPU请求的task

这种方法太过tricky，而且无法解决GPU之间冲突导致oom的问题，且混合负载时会导致资源浪费
参考: https://github.com/yahoo/TensorFlowOnSpark/issues/185

使用yarn 3.1.0+

通过设置

spark.yarn.driver.resource.yarn.io/gpu.amount
spark.yarn.executor.resource.yarn.io/gpu.amount

yarn不会告诉spark给它的某个container分配了哪些GPU，因此需要通过
spark.{driver/executor}.resource.gpu.discoveryScript指定一个资源发现脚本，用于executor/driver在启动时自行发现可用资源

示例脚本如下：

ADDRS=`nvidia-smi --query-gpu=index --format=csv,noheader | sed -e ':a' -e 'N' -e'$!ba' -e 's/\n/","/g'`
echo {\"name\": \"gpu\", \"addresses\":[\"$ADDRS\"]}

脚本必须输出固定格式的json，例如

{"name": "gpu", "addresses":["0","1","2","3","4","5","6","7"]}

参考：https://spark.apache.org/docs/latest/running-on-yarn.html#resource-allocation-and-configuration-overview

使用k8s集群调度

k8s会自动给每个pod设置NVIDIA_VISIBLE_DEVICES环境变量，每个pod只能看到分配给自己的GPU，不会和别的pod冲突。
参考：https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

nvidia triton

A high-performance inference server.

支持多个模型共享GPU，但只能用于服务，不能用于训练。

Triton runs multiple models from the same or different frameworks concurrently on a single GPU or CPU. In a multi-GPU server, it automatically creates an instance of each model on each GPU to increase utilization without extra coding.

参考：https://developer.nvidia.com/nvidia-triton-inference-server

总结

GPU调度最好是依赖于已有框架，如K8s、yarn(since 3.x)等。

若只考虑serving，可使用单独的服务器（不会有GPU训练任务），结合triton进行多模型部署。或者使用容器+环境变量+tfserving的方式亦可。

chansonzhang

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
GPU调度方案

gpu 调度方案设置环境变量CUDA_VISIBLE_DEVICES$ deviceQuery |& grep ^DeviceDevice 0: "Tesla M2090"Device 1: "Tesla M2090"$ CUDA_VISIBLE_DEVICES=0 deviceQuery |& grep ^DeviceDevice 0: "Tesla M2090"如果这一步没有生效，再尝试设置os.environ["CUDA_DEVICE_ORDER"] = "PCI_B
复制链接

扫一扫