1 安装对应版本的CUDA、cuDNN和tensorflow
使用支持gpu的tensorflow的前提是安装了正确版本的CUDA和cuDNN。
关于CUDA和cuDNN的安装可以参考NVIDIA官网和网上各种安装教程,在此不再赘述。本文想要强调的重点是要安装支持自己的GPU的版本,然后根据CUDA版本安装正确版本的cuDNN,最后根据安装的CUDA和cuDNN的版本选择正确的tensorflow版本安装,否则会像笔者一样,安装了tensorflow但是也无法使用GPU,程序跑起来只是在使用CPU。
1.1 关于CUDA:
tensorflow-gpu 1.5版本及以上要求CUDA版本为9.0
查看本机CUDA版本方法:
cat /usr/local/cuda/version.txt
输出:
CUDA Version 8.0.61`
1.2 关于cuDNN:
tensorflow-gpu 1.3及以上版本要求cudnn版本为V6及以上
查看本机cuDNN版本方法:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
输出:
#define CUDNN_MAJOR 5
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
1.3 关于tensorflow
由上可以看出本机看装了CUDA 8和cuDNN V5,根据这两个版本,选择tensorflow的版本为1.2,使用pip来安装tensorflow:
sudo pip install tensorflow-gpu==1.2
如果之前安装了高版本的tensorflow,那么要通过pip来全部删除:
sudo pip uninstall tensorflow
想要测试tensorflow是否可以使用GPU:
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
看到输出相关GPU信息说明GPU可用了:
2018-07-18 11:56:40.180612: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.180702: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.180721: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.180736: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.180749: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-07-18 11:56:40.406153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-18 11:56:40.406783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: Tesla K20m
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:02:00.0
Total memory: 4.94GiB
Free memory: 4.87GiB
2018-07-18 11:56:40.538537: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x43d07e0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2018-07-18 11:56:40.538896: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-18 11:56:40.539341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 1 with properties:
name: Quadro K620
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:03:00.0
Total memory: 1.95GiB
Free memory: 1.34GiB
2018-07-18 11:56:40.539420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 0 and 1
2018-07-18 11:56:40.539441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:832] Peer access not supported between device ordinals 1 and 0
2018-07-18 11:56:40.539462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 1
2018-07-18 11:56:40.539477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y N
2018-07-18 11:56:40.539491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 1: N Y
2018-07-18 11:56:40.539538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0)
2018-07-18 11:56:40.539562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1017] Ignoring gpu device (device: 1, name: Quadro K620, pci bus id: 0000:03:00.0) with Cuda multiprocessor count: 3. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0
2018-07-18 11:56:40.614997: I tensorflow/core/common_runtime/direct_session.cc:265] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0
2 查看Nvidia显卡信息及使用情况
2.1 Ubuntu中查看显卡信息:
lspci | grep -i vga
输出:
03:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K620] (rev a2)
2.2 Ubuntu中查看nvidia GPU:
lspci | grep -i nvidia
输出:
02:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K620] (rev a2)
03:00.1 Audio device: NVIDIA Corporation Device 0fbc (rev a1)
输出中的02:00.0
、03:00.0
和03:00.1
是显卡的代号
如果想要查看指定显卡的详细信息可以通过以下命令,这里以第一个显卡为例:
lspci -v -s 02:00.0
输出:
02:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20m] (rev a1)
Subsystem: NVIDIA Corporation Device 1015
Physical Slot: 2
Flags: bus master, fast devsel, latency 0, IRQ 100
Memory at f4000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
Capabilities: <access denied>
Kernel driver in use: nvidia
2.3 Ubuntu中查看Nvidia的显卡信息和使用情况
Nvidia自带了一个nvidia-smi的命令行工具,会显示显存使用情况:
nvidia-smi
输出:
Wed Jul 18 12:12:07 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 00000000:02:00.0 Off | Off |
| N/A 47C P8 16W / 225W | 1MiB / 5061MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K620 Off | 00000000:03:00.0 On | N/A |
| 34% 44C P8 1W / 30W | 598MiB / 1995MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1785 G /usr/bin/X 336MiB |
| 1 3321 G compiz 252MiB |
| 1 79153 G /usr/lib/firefox/firefox 1MiB |
| 1 119338 G /usr/lib/firefox/firefox 1MiB |
| 1 120313 G /usr/lib/firefox/firefox 1MiB |
+-----------------------------------------------------------------------------+
表头释义:
- Fan:显示风扇转速,数值在0到100%之间,是计算机的期望转速,如果计算机不是通过风扇冷却或者风扇坏了,显示出来就是N/A;
- Temp:显卡内部的温度,单位是摄氏度;
- Perf:表征性能状态,从P0到P12,P0表示最大性能,P12表示状态最小性能;
- Pwr:能耗表示;
- Bus-Id:涉及GPU总线的相关信息;
- Disp.A:是Display Active的意思,表示GPU的显示是否初始化;
- Memory Usage:显存的使用率;
- Volatile GPU-Util:浮动的GPU利用率;
- Compute M:计算模式;
下边的Processes显示每块GPU上每个进程所使用的显存情况。
2.4 周期性显示GPU的使用情况
有时我们希望不仅知道那一固定时刻的GPU使用情况,我们希望一直掌握其动向,此时我们就希望周期性地输出,比如每 10s 就更新显示。 这时候就需要用到 watch命令,来周期性地执行nvidia-smi命令了。
了解watch 的功能
whatis watch
输出:
watch (1) - execute a program periodically, showing output fullscreen
作用:周期性执行某一命令,并将输出显示。
watch的基本用法是:
watch [options] command
最常用的参数是 -n, 后面指定是每多少秒来执行一次命令。
监视显存:我们设置为每 1s 显示一次显存的情况:
watch -n 5 nvidia-smi
输出:
Every 5,0s: nvidia-smi Wed Jul 18 12:20:24 2018
Wed Jul 18 12:20:24 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m Off | 00000000:02:00.0 Off | Off |
| N/A 43C P8 16W / 225W | 1MiB / 5061MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K620 Off | 00000000:03:00.0 On | N/A |
| 34% 44C P8 1W / 30W | 595MiB / 1995MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1785 G /usr/bin/X 336MiB |
| 1 3321 G compiz 248MiB |
| 1 79153 G /usr/lib/firefox/firefox 1MiB |
| 1 119338 G /usr/lib/firefox/firefox 1MiB |
| 1 120313 G /usr/lib/firefox/firefox 1MiB |
+-----------------------------------------------------------------------------+
3 使用指定的GPU
3.1 tensorflow中使用指定的GPU(”CUDA_VISIBLE_DEVICES”)
3.1.1 通过命令行执行Python程序时指定使用的GPU
如果电脑有多个GPU,tensorflow默认全部使用。如果想只使用部分GPU,可以设置CUDA_VISIBLE_DEVICES。在执行python程序时,可以通过:
CUDA_VISIBLE_DEVICES=1 python example.py
以下为一些使用指导:
Environment Variable Syntax Results
CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES="0,1" Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked
CUDA_VISIBLE_DEVICES="" No GPU will be visible
3.1.2 在Python代码中指定使用的GPU
在Python代码中添加以下内容:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
3.1.3 设置tensorflow使用的显存大小
3.1.3.1 定量设置显存
默认tensorflow是使用GPU尽可能多的显存。可以通过下面的方式,来设置使用的GPU显存:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
上面分配给tensorflow的GPU显存大小为:GPU实际显存*0.7。
可以按照需要,设置不同的值,来分配显存。
3.1.3.2 按需设置显存
上面的只能设置固定的大小。如果想按需分配,可以使用allow_growth参数
gpu_options = tf.GPUOptions(allow_growth=True)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
3.2 Pytorch中使用指定的GPU
PyTorch默认使用从0开始的GPU,如果GPU0正在运行程序,需要指定其他GPU。
有如下两种方法来指定需要使用的GPU。
3.2.1 使用CUDA_VISIBLE_DEVICES(类似tensorflow)
3.2.1.1 通过命令行执行Python程序时指定使用的GPU
CUDA_VISIBLE_DEVICES=1 python example.py
3.2.1.2 在Python代码中指定使用的GPU
在Python代码中添加以下内容:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
3.2.2 使用torch.cuda.set_device
import torch
torch.cuda.set_device(id)
该函数见 pytorch-master\torch\cuda__init__.py。
不过官方建议使用CUDA_VISIBLE_DEVICES,不建议使用 set_device 函数。
4 待解决问题
在Python代码中设置了同时使用两个GPU:
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0, 1"
但是在跑程序的时候,出现了只使用一个gpu,另一个gpu被忽略的情况,如下所示:
Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20m, pci bus id: 0000:02:00.0)
Ignoring gpu device (device: 1, name: Quadro K620, pci bus id: 0000:03:00.0) with Cuda multiprocessor count: 3. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
貌似是多线程的问题,留待解决。