During initialization tensorflow allocates all available memory on all available GPUs. Specifying use of a subset of GPUs can be done by having the ones you want in a comma-delimited string in the CUDA_VISIBLE_DEVICES environment variable. Specifying a maximum amount of memory on the GPUs can be done with optional arguments to the session constructor.
Although tensorflow will allocate memory in this way, it will not use all the memory by default. All ops get placed on a single GPU unless you manually specify which ops go on which devices. This can be done by using those ops within a python context manager: e.g. with tf.device(“/gpu:0”)
. And if you don't trust my answer or you suspect future versions of tensorflow are doing something different, you can get tensorflow to log device placement and look at what happens yourself.