Window环境下使用显卡 GPU 加速深度学习并与 CPU 进行对比（配置过程+实例）

最新推荐文章于 2024-06-25 21:41:47 发布

置顶 chao3150

最新推荐文章于 2024-06-25 21:41:47 发布

阅读量1.1k

点赞数 1

分类专栏： Python 机器学习文章标签： tensorflow gpu cuda 深度学习

本文链接：https://blog.csdn.net/chaoge_dgqb/article/details/108550548

版权

Python 同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

机器学习

1 篇文章 0 订阅

订阅专栏

本文对应的版本兼容性如下：

numpy ：1.18.4
keras：2.3.1
tensorflow-gpu：2.2.0
cuda：10.2
cudnn：7.6.5

关于 Python 的安装与管理，本文不涉及 Anaconda，而是用 virtualenv 管理环境，pip管理包，具体请看 (Windows) 搭建并管理 Python 环境、搭建深度学习、爬虫环境

1 安装 tensorflow-gpu

tensorflow-gpu 国内镜像地址：tensorflow-gpu，本文使用 tensorflow-gpu 2.2.0

pip install tensorflow-gpu==2.2.0

2 配置 cuda, cudnn

本文版本对应关系如下：

TensorFlow-GPU	CUDA	CuDNN
2.2.0	10.2.89	7.6.5

方法一：到 https://docs.floydhub.com/guides/tensorflow/#tensorflow-22 找对对应的版本关系

方法二：如果你还未确定自己的各版本对应关系，则按照如下步骤确定：

先确定自己的 tensorflow 版本，然后在选定 cuda 和 cudnn 的最低版本，保证表中的编译器和构建工具都具备
来源： https://tensorflow.google.cn/install/source_windows

在这里插入图片描述

接着，检查自己的显卡版本是否安装正确
来源：https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

在这里插入图片描述
或者去 https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html 上检查自己的 cudnn , cuda, driver version是否对应

到 https://docs.floydhub.com/guides/environments/ 检查对应关系

2.1 下载 Cuda Tooklit

前往英伟达官网下载 Cuda Tooklit 最新版，如果慢，可以右键复制链接，使用迅雷下载，如果需要指定版本下载，前往 cuda-toolkit-archive 下载

下载方式选择 离线方式 (local) ，本次下载选择 CUDA Toolkit 10.2

在这里插入图片描述

2.2 下载 Cudnn

前往 cudnn-download 下载最新版本（需要登录），版本要跟刚才的 Cuda Tooklit 对应，指定 cudnn 版本可前往 cudnn-archive 下载，右键复制链接，使用迅雷下载，本次下载版本为 7.6.5

在这里插入图片描述

2.3 安装

安装 Cuda Tooklit
退出 360 等杀毒软件，右键以管理员身份模式运行安装包
安装 Cudnn
解压压缩包，将以下几个文件夹复制到 CUDA 安装目录，如 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2 中

在这里插入图片描述

添加到 Path 环境变量

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\libnvvp

测试

重启 Pycharm、Jupter Notebook 或者其他 IDE，测试 tensorflow-gpu 是否可用，原文链接： TensorFlow：使用 GPU

查看可用 GPU 数目，并记录 tensorflow 在什么设备上运行
这里如果出现问题，请看本节最后面的解决方法

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("可用 GPU 数目：", len(tf.config.experimental.list_physical_devices('GPU')))
tf.debugging.set_log_device_placement(True) # 记录 tesorflow在哪个设备上运行

# 自动在 GPU 上运行
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Num GPUs Available: 1
可用 GPU 数目： 1
Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32)

指定在 CPU 或 GPU 上运行

# Place tensors on the CPU
with tf.device('/CPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

c = tf.matmul(a, b)
print(c)

# Place tensors on the GPU
with tf.device('/GPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

c = tf.matmul(a, b)
print(c)

tf.Tensor(
[[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32)

限制 GPU 内存增长
默认情况下，TensorFlow 会映射进程可见的所有 GPU（取决于 CUDA_VISIBLE_DEVICES）的几乎全部内存。这是为了减少内存碎片，更有效地利用设备上相对宝贵的 GPU 内存资源。

使用 tf.config.experimental.set_visible_devices 方法，可以限制 TensorFlow 使用特定的 GPU

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

1 Physical GPUs, 1 Logical GPU

如果希望仅在进程需要时才增加内存使用量。则可以通过 tf.config.experimental.set_memory_growth 来打开内存增长（以下代码需要重启 python 环境并单独运行）

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

1 Physical GPUs, 1 Logical GPUs

使用多个GPU ，请看原文链接

出现的问题：

所有的 dynamic library 都必须显示成功载入

命令行中提示 cudart64_101.dll not found

解决方法：去 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin 里面搜索 cudart64，找到最相近的，复制后改名为 cudart64_101.dll ，重新激活环境即可

4 cpu 与 gpu 深度学习对比实例

实例1：卷积识别手写数字

数据下载：https://www.kaggle.com/c/digit-recognizer/data

notebook下载：https://download.csdn.net/download/chaoge_dgqb/13087591

Kaggle 地址：https://www.kaggle.com/hansonchao/gpu-vs-cpu-in-digit-recognizer

使用 GPU 加速运算时，任务管理器中看到 GPU 使用率较高
在这里插入图片描述

两者经过学习后，loss 和 accuracy 曲线基本如下，两者验证集准确率都在 0.99 以上
在这里插入图片描述
两者的区别如图所示：

可见 GPU 在卷积神经网络的学习速度要快于CPU，而 loss 和 accuracy 区别不大

实例2：识别验证码

原文地址：captcha_break

chao3150

关注

1
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
Window环境下使用显卡 GPU 加速深度学习并与 CPU 进行对比（配置过程+实例）

1 安装 tensorflow-gpupip install tensorflow-gpu测试是否安装成功import tensorflow as tfprint(tf.__version__)hello=tf.constant('hello world')sess=tf.compat.v1.Session()print(hello.numpy())2 下载文件下载前，先确定自己的 tensorflow 版本，然后在确定 cuda 和 cudnn 版本具体对应关系如下：Tens
复制链接

扫一扫

专栏目录