question 2:
1.Creating one TensorFlow device (GPU:0)
Ignoring GPU device (GPU:1)
2019-05-14 11:27:29.060182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2019-05-14 11:27:29.324548: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x1e655b0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-05-14 11:27:29.325686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: Quadro K620
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:02:00.0
Total memory: 1.95GiB
Free memory: 1.56GiB
2019-05-14 11:27:29.325740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2019-05-14 11:27:29.325753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2019-05-14 11:27:29.325770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2019-05-14 11:27:29.325781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2019-05-14 11:27:29.325789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2019-05-14 11:27:29.325815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
2019-05-14 11:27:29.325827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1017] Ignoring gpu device (device: 1, name: Quadro K620, pci bus id: 0000:02:00.0) with Cuda multiprocessor count: 3. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
这种情况下主要是因为tensorflow单机默认只是用一个GPU,因此需要进行指定某个GPU,但是需要设置一个环境变量TF_MIN_GPU_MULTIPROCESSOR_COUNT。在系统种可以设置一个环境变量TF_MIN_GPU_MULTIPROCESSOR_COUNT=n;在工程种应用时需要通过代码设置下环境变量:
os.environ[‘TF_MIN_GPU_MULTIPROCESSOR_COUNT’]= ‘3’
结果:
2019-05-14 11:33:58.389872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
2019-05-14 11:33:58.389880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Quadro K620, pci bus id: 0000:02:00.0)
参考:
(http://www.idataskys.com/2018/04/25/CentOS7(1708)下基于双显卡的TensorFlow深度学习环境配置/)
2.以下代码只设置了GPU使用顺序,未解决上述问题。
os.environ[“CUDA_DEVICE_ORDER”] = “PCI_BUS_ID” # 按照PCI_BUS_ID顺序从0开始排列GPU设备
os.environ[“CUDA_VISIBLE_DEVICES”] = “0” #设置当前使用的GPU设备仅为0号设备 设备名称为'/gpu:0'
os.environ[“CUDA_VISIBLE_DEVICES”] = “1” #设置当前使用的GPU设备仅为1号设备 设备名称为'/gpu:0'
os.environ[“CUDA_VISIBLE_DEVICES”] = “0,1” #设置当前使用的GPU设备为0,1号两个设备,名称依次为'/gpu:0'、'/gpu:1'
os.environ[“CUDA_VISIBLE_DEVICES”] = “1,0” #设置当前使用的GPU设备为1,0号两个设备,名称依次为'/gpu:0'、'/gpu:1'。表示优先使用1号设备,然后使用0号设备
参考:
3.设备序号0和1之间不支持对等访问,以及如何修复它?
2019-05-14 11:27:29.325740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2019-05-14 11:27:29.325753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
这只是意味着gpus无法通信(在gpu0和gpu1之间传递信息,反之亦然),而不首先将数据传递回cpu。
参考: