TensorFlow显存设置

最新推荐文章于 2024-07-17 21:20:07 发布

齐豪

最新推荐文章于 2024-07-17 21:20:07 发布

阅读量4.7k

点赞数

分类专栏： DP框架

本文链接：https://blog.csdn.net/qq_33096883/article/details/77479754

版权

DP框架专栏收录该内容

12 篇文章 0 订阅

订阅专栏

源码

message GPUOptions {
  // A value between 0 and 1 that indicates what fraction of the
  // available GPU memory to pre-allocate for each process.  1 means
  // to pre-allocate all of the GPU memory, 0.5 means the process
  // allocates ~50% of the available GPU memory.
  double per_process_gpu_memory_fraction = 1;

  // The type of GPU allocation strategy to use.
  //
  // Allowed values:
  // "": The empty string (default) uses a system-chosen default
  //     which may change over time.
  //
  // "BFC": A "Best-fit with coalescing" algorithm, simplified from a
  //        version of dlmalloc.
  string allocator_type = 2;

  // Delay deletion of up to this many bytes to reduce the number of
  // interactions with gpu driver code.  If 0, the system chooses
  // a reasonable default (several MBs).
  int64 deferred_deletion_bytes = 3;

  // If true, the allocator does not pre-allocate the entire specified
  // GPU memory region, instead starting small and growing as needed.
  bool allow_growth = 4;

  // A comma-separated list of GPU ids that determines the 'visible'
  // to 'virtual' mapping of GPU devices.  For example, if TensorFlow
  // can see 8 GPU devices in the process, and one wanted to map
  // visible GPU devices 5 and 3 as "/gpu:0", and "/gpu:1", then one
  // would specify this field as "5,3".  This field is similar in
  // spirit to the CUDA_VISIBLE_DEVICES environment variable, except
  // it applies to the visible GPU devices in the process.
  //
  // NOTE: The GPU driver provides the process with the visible GPUs
  // in an order which is not guaranteed to have any correlation to
  // the *physical* GPU id in the machine.  This field is used for
  // remapping "visible" to "virtual", which means this operates only
  // after the process starts.  Users are required to use vendor
  // specific mechanisms (e.g., CUDA_VISIBLE_DEVICES) to control the
  // physical to visible device mapping prior to invoking TensorFlow.
  // 用逗号分隔的一组 GPU ID，决定进程可见的 GPU 设备。
  string visible_device_list = 5;

  // In the event polling loop sleep this many microseconds between
  // PollEvents calls, when the queue is not empty.  If value is not
  // set or set to 0, gets set to a non-zero default.
  int32 polling_active_delay_usecs = 6;

  // In the event polling loop sleep this many millisconds between
  // PollEvents calls, when the queue is empty.  If value is not
  // set or set to 0, gets set to a non-zero default.
  int32 polling_inactive_delay_msecs = 7;

  // Force all tensors to be gpu_compatible. On a GPU-enabled TensorFlow,
  // enabling this option forces all CPU tensors to be allocated with Cuda
  // pinned memory. Normally, TensorFlow will infer which tensors should be
  // allocated as the pinned memory. But in case where the inference is
  // incomplete, this option can significantly speed up the cross-device memory
  // copy performance as long as it fits the memory.
  // Note that this option is not something that should be
  // enabled by default for unknown or very large models, since all Cuda pinned
  // memory is unpageable, having too much pinned memory might negatively impact
  // the overall host system performance.
  bool force_gpu_compatible = 8;
};

使用

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.
In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two Config options on the Session to control this.
- The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as Sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process.
- The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated.
使用bfc算法，算法细节详见http://blog.csdn.net/qq_33096883/article/details/76598786.
样例代码

import tensorflow as tf  
import os  
os.environ["CUDA_VISIBLE_DEVICES"] = '0, 1'        #指定前二块GPU可用  
config = tf.ConfigProto()  
config.gpu_options.per_process_gpu_memory_fraction = 0.5  # 程序最多只能占用指定gpu50%的显存  
config.gpu_options.allow_growth = True      #程序按需申请内存  
config.gpu_options.allocator_type = 'BFC'   #使用BFC算法
sess = tf.Session(config = config, ...)


with tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)):   
其中allow_soft_placement能让tensorflow遇到无法用GPU跑的数据时，自动切换成CPU进行。
log_device_placement则记录一些日志。