message GPUOptions {
// A value between 0and1 that indicates what fraction of the
// available GPU memory to pre-allocate for each process. 1 means
// to pre-allocate all of the GPU memory, 0.5 means the process
// allocates ~50% of the available GPU memory.
double per_process_gpu_memory_fraction = 1;
// The type of GPU allocation strategy to use.
//// Allowed values:
//"": The empty string (default) uses a system-chosen default// which may change over time.
////"BFC": A "Best-fit with coalescing" algorithm, simplified from a
// version of dlmalloc.
string allocator_type = 2;
// Delay deletion of up to this many bytes to reduce the number of// interactions with gpu driver code. If 0, the system chooses
// a reasonable default (several MBs).
int64 deferred_deletion_bytes = 3;
// If true, the allocator does not pre-allocate the entire specified
// GPU memory region, instead starting small and growing as needed.
bool allow_growth = 4;
// A comma-separated list of GPU ids that determines the 'visible'// to 'virtual' mapping of GPU devices. For example, if TensorFlow
// can see 8 GPU devices in the process, and one wanted to map
// visible GPU devices 5and3 as "/gpu:0", and"/gpu:1", then one
// would specify this field as "5,3". This field is similar in// spirit to the CUDA_VISIBLE_DEVICES environment variable, except
// it applies to the visible GPU devices in the process.
////NOTE: The GPU driver provides the process with the visible GPUs
//in an order which isnot guaranteed to have any correlation to
// the *physical* GPU id in the machine. This field is used for// remapping "visible" to "virtual", which means this operates only
// after the process starts. Users are required to use vendor
// specific mechanisms (e.g., CUDA_VISIBLE_DEVICES) to control the
// physical to visible device mapping prior to invoking TensorFlow.
// 用逗号分隔的一组 GPU ID,决定进程可见的 GPU 设备。
string visible_device_list = 5;
// In the event polling loop sleep this many microseconds between
// PollEvents calls, when the queue isnot empty. If value isnot// set or set to 0, gets set to a non-zero default.
int32 polling_active_delay_usecs = 6;
// In the event polling loop sleep this many millisconds between
// PollEvents calls, when the queue is empty. If value isnot// set or set to 0, gets set to a non-zero default.
int32 polling_inactive_delay_msecs = 7;
// Force all tensors to be gpu_compatible. On a GPU-enabled TensorFlow,
// enabling this option forces all CPU tensors to be allocated with Cuda
// pinned memory. Normally, TensorFlow will infer which tensors should be
// allocated as the pinned memory. But incase where the inference is// incomplete, this option can significantly speed up the cross-device memory
// copy performance as long as it fits the memory.
// Note that this option isnot something that should be
// enabled bydefaultfor unknown or very large models, since all Cuda pinned
// memory is unpageable, having too much pinned memory might negatively impact
// the overall host system performance.
bool force_gpu_compatible = 8;
};
使用
By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.
In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two Config options on the Session to control this.
The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as Sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process.
The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated.