GPU调度方案

gpu 调度方案

设置环境变量CUDA_VISIBLE_DEVICES

$ deviceQuery |& grep ^Device
Device 0: "Tesla M2090"
Device 1: "Tesla M2090"
$ CUDA_VISIBLE_DEVICES=0 deviceQuery |& grep ^Device
Device 0: "Tesla M2090"

如果这一步没有生效,再尝试设置

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"

如果是在python中设置环境变量,应该确保在import tensorflow 和 pycuda 之前

CUDA_VISIBLE_DEVICES 不一定能彻底隔离GPU

参考:https://stackoverflow.com/a/58445444/6010781

在docker/kubernetes 一般是设置NVIDIA_VISIBLE_DEVICES这个环境变量

如果不依赖于外部的调度框架如k8s/yarn 3.x, 那么设置环境变量之前需要自行维护一个资源表,记录每个节点上已分配和空闲的GPU。

直接在代码中指定GPU(极端不推荐)

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant(3.0)
with tf.Session() as sess:
    while True:
        print sess.run(a)

同样是无法知道已分配和空闲GPU,且hardcode方式使代码无法迁移。

在代码中检测哪些GPU上有足够的内存,然后设置环境变量

import subprocess as sp
import os

def mask_unused_gpus(leave_unmasked=1):
  ACCEPTABLE_AVAILABLE_MEMORY = 1024
  COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"

  try:
    _output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
    memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
    memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
    available_gpus = [i for i, x in enumerate(memory_free_values) if x > ACCEPTABLE_AVAILABLE_MEMORY]

    if len(available_gpus) < leave_unmasked: raise ValueError('Found only %d usable GPUs in the system' % len(available_gpus))
    os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, available_gpus[:leave_unmasked]))
  except Exception as e:
    print('"nvidia-smi" is probably not installed. GPUs are not masked', e)

mask_unused_gpus(2)

参考:https://stackoverflow.com/a/47998168/6010781
因为显存是动态变化的,可能会产生冲突

分配之后才检测,可能无法分配到空闲的节点,却在忙碌的节点上得不到资源。

每次请求GPU的时候请求同比例的内存

leewyang commented on Dec 9, 2017
This is dependent on your Spark setup. For instance, in our case, we run Spark on top of Hadoop/YARN, so YARN is responsible for allocating containers to run the Spark executors (which in turn run the TensorFlow nodes). Unfortunately, YARN currently does not have GPUs as a schedulable resource. Instead, YARN schedules generally on CPU and Memory, so in our case, we use Memory as a proxy for GPU.

So, in your example, if we assume that your nodes are 64GB nodes with 4 GPUs each, then we’d schedule a GPU by requesting 16GB of memory. And, if this proxy is consistently used, then a node with all four GPUs in use (i.e. 64GB memory) would not be scheduled for any new executors/containers by YARN.

比如一个节点4个gpu,64G内存,每次申请1个GPU同时申请4G内存,可以做到1个节点不会被分配超过4个GPU请求的task

这种方法太过tricky,而且无法解决GPU之间冲突导致oom的问题,且混合负载时会导致资源浪费
参考: https://github.com/yahoo/TensorFlowOnSpark/issues/185

使用yarn 3.1.0+

通过设置

spark.yarn.driver.resource.yarn.io/gpu.amount
spark.yarn.executor.resource.yarn.io/gpu.amount

yarn不会告诉spark给它的某个container分配了哪些GPU,因此需要通过
spark.{driver/executor}.resource.gpu.discoveryScript指定一个资源发现脚本,用于executor/driver在启动时自行发现可用资源

示例脚本如下:

ADDRS=`nvidia-smi --query-gpu=index --format=csv,noheader | sed -e ':a' -e 'N' -e'$!ba' -e 's/\n/","/g'`
echo {\"name\": \"gpu\", \"addresses\":[\"$ADDRS\"]}

脚本必须输出固定格式的json,例如

{"name": "gpu", "addresses":["0","1","2","3","4","5","6","7"]}

参考:https://spark.apache.org/docs/latest/running-on-yarn.html#resource-allocation-and-configuration-overview

使用k8s集群调度

k8s会自动给每个pod设置NVIDIA_VISIBLE_DEVICES环境变量,每个pod只能看到分配给自己的GPU,不会和别的pod冲突。
参考:https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

nvidia triton

A high-performance inference server.

支持多个模型共享GPU,但只能用于服务,不能用于训练

Triton runs multiple models from the same or different frameworks concurrently on a single GPU or CPU. In a multi-GPU server, it automatically creates an instance of each model on each GPU to increase utilization without extra coding.

参考:https://developer.nvidia.com/nvidia-triton-inference-server

总结

GPU调度最好是依赖于已有框架,如K8s、yarn(since 3.x)等。

若只考虑serving,可使用单独的服务器(不会有GPU训练任务),结合triton进行多模型部署。 或者使用容器+环境变量+tfserving的方式亦可。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值