《deep learning for computer vision with python》keras的gpu环境配置

前言:

发现了一本写的超赞的书《deep learning for computer vision with python》,是微软大神Adrian Rosebrock写的,基于keras深度学习框架,这篇文章主要记录一下keras+tensorflow-gpu的环境搭建

一。需要注意的问题:

1)由于keras默认是将tensorflow作为后台,因此我们需要首先安装tensorflow-gpu

由于从tensorflow1.6开始只允许cuda9.0,因此我选择安装tensorflow1.3,也因为这个原因,不能安装最新的keras2.1.6,而是降版本到keras2.0.8,因为会报softmax的错误,原因就是版本不兼容的问题

参考链接:https://github.com/keras-team/keras/issues/9621

Hi,
I get the same problem but I got a better solution.
Just downgrade the tensorflow and keras.
My previous tensorflow version is 1.4.1 and keras version 2.1.5.
I downgrade to tensorflow version 1.4.0 and keras version 2.0.8.
The error doesn't appear anymore.

 

 

 

pip install tensorflow是cpu版本的

pip install tensorflow-gpu是gpu版本

 

我们可以通过pip install tensorflow-gpu==1.3.0,来指定安装tensorflow-gpu 1.3.0版本

 

2)tensorflow-gpu1.3.0版本需要cudnn的版本是6.0,因此可能需要更改cudnn版本

 

安装cuDNN

下载cuDNN后解压

 

 

 

sudo cp lib* /usr/local/cuda/lib64/

sudo cp cudnn.h /usr/local/cuda/include/


更新软链接

cd /usr/local/cuda/lib64/

sudo rm -rf libcudnn.so libcudnn.so.5

sudo ln -s libcudnn.so.5.1.5 libcudnn.so.5

sudo ln -s libcudnn.so.5 libcudnn.so

 

 

若需要更换cudnn版本,则替换原来的libcudnn*,并重新软链接。

更新链接库:

sudo ldconfig

 

-------------------------------

遇到的问题:

 

 

报错

$/sbin/ldconfig.real: /usr/lib/nvidia-375/libEGL.so.1 不是符号连接

$/sbin/ldconfig.real: /usr/lib32/nvidia-375/libEGL.so.1 不是符号连接

原因:

系统找的是一个符号连接,而不是一个文件。这应该是个bug....

解决方法:

1.对这两个文件更名

2.重新建立符号连接

 

$sudo mv /usr/lib/nvidia-375/libEGL.so.1 /usr/lib/nvidia-375/libEGL.so.1.org

$sudo mv /usr/lib32/nvidia-375/libEGL.so.1 /usr/lib32/nvidia-375/libEGL.so.1.org

$sudo ln -s /usr/lib/nvidia-375/libEGL.so.375.39 /usr/lib/nvidia-375/libEGL.so.1

$sudo ln -s /usr/lib32/nvidia-375/libEGL.so.375.39 /usr/lib32/nvidia-375/libEGL.so.1

-----------------------------------------------

查看 CUDA 版本:

cat /usr/local/cuda/version.txt
  • 1
  • 1

查看 CUDNN 版本:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
  • 1

==============================================================

二。检测tensorflow是否用了gpu进行训练

转载:https://www.jianshu.com/p/ff851114384a

def get_available_gpus():
    """
    code from http://stackoverflow.com/questions/38559755/how-to-get-current-available-gpus-in-tensorflow
    """
    from tensorflow.python.client import device_lib as _device_lib
    local_device_protos = _device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

get_available_gpus()

由于keras的tensorflow后台自动检测是否有可用的gpu,如果可用的话,就直接调用,不需要写代码指定使用哪个gpu

此时我们仅仅需要安装一个能运行的tensorflow-gpu环境即可,运行上面的代码,如果能检测出来的话,证明keras的gpu环境搭建好了

 

三。结果展示:

体验一波3秒一个epoch,超爽,原来被tensorflow[cpu]版本坑得要死,需要30多秒一次epoch

Epoch 1/20
2018-05-09 16:32:35.949427: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-05-09 16:32:35.949776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.2405
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.52GiB
2018-05-09 16:32:35.949794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-05-09 16:32:35.949802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-05-09 16:32:35.949814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
Epoch 2/20
  128/52500 [..............................] - ETA: 3s - loss: 0.4635 - acc: 0.8203
 1152/52500 [..............................] - ETA: 2s - loss: 0.3890 - acc: 0.8872
 2304/52500 [>.............................] - ETA: 2s - loss: 0.4060 - acc: 0.8793
 3456/52500 [>.............................] - ETA: 2s - loss: 0.3925 - acc: 0.8817
 4480/52500 [=>............................] - ETA: 2s - loss: 0.3929 - acc: 0.8839
 5632/52500 [==>...........................] - ETA: 2s - loss: 0.3883 - acc: 0.8848
 6656/52500 [==>...........................] - ETA: 2s - loss: 0.3894 - acc: 0.8830
 7808/52500 [===>..........................] - ETA: 2s - loss: 0.3802 - acc: 0.8859
 8960/52500 [====>.........................] - ETA: 2s - loss: 0.3731 - acc: 0.8893
10112/52500 [====>.........................] - ETA: 2s - loss: 0.3642 - acc: 0.8921
11264/52500 [=====>........................] - ETA: 1s - loss: 0.3581 - acc: 0.8936
12416/52500 [======>.......................] - ETA: 1s - loss: 0.3538 - acc: 0.8954
13440/52500 [======>.......................] - ETA: 1s - loss: 0.3541 - acc: 0.8955
14592/52500 [=======>......................] - ETA: 1s - loss: 0.3492 - acc: 0.8969
15744/52500 [=======>......................] - ETA: 1s - loss: 0.3455 - acc: 0.8981
16896/52500 [========>.....................] - ETA: 1s - loss: 0.3444 - acc: 0.8975
18048/52500 [=========>....................] - ETA: 1s - loss: 0.3414 - acc: 0.8984
19200/52500 [=========>....................] - ETA: 1s - loss: 0.3403 - acc: 0.8988
20224/52500 [==========>...................] - ETA: 1s - loss: 0.3368 - acc: 0.9001
21376/52500 [===========>..................] - ETA: 1s - loss: 0.3328 - acc: 0.9012
22400/52500 [===========>..................] - ETA: 1s - loss: 0.3302 - acc: 0.9021
23424/52500 [============>.................] - ETA: 1s - loss: 0.3285 - acc: 0.9028
24448/52500 [============>.................] - ETA: 1s - loss: 0.3273 - acc: 0.9038
25600/52500 [=============>................] - ETA: 1s - loss: 0.3243 - acc: 0.9047
26752/52500 [==============>...............] - ETA: 1s - loss: 0.3224 - acc: 0.9051
27776/52500 [==============>...............] - ETA: 1s - loss: 0.3191 - acc: 0.9063
28800/52500 [===============>..............] - ETA: 1s - loss: 0.3195 - acc: 0.9064
29952/52500 [================>.............] - ETA: 1s - loss: 0.3163 - acc: 0.9075
31104/52500 [================>.............] - ETA: 1s - loss: 0.3138 - acc: 0.9086
32256/52500 [=================>............] - ETA: 0s - loss: 0.3126 - acc: 0.9087
33280/52500 [==================>...........] - ETA: 0s - loss: 0.3115 - acc: 0.9092
34304/52500 [==================>...........] - ETA: 0s - loss: 0.3111 - acc: 0.9092
35328/52500 [===================>..........] - ETA: 0s - loss: 0.3098 - acc: 0.9099
36352/52500 [===================>..........] - ETA: 0s - loss: 0.3074 - acc: 0.9107
37376/52500 [====================>.........] - ETA: 0s - loss: 0.3057 - acc: 0.9114
38400/52500 [====================>.........] - ETA: 0s - loss: 0.3051 - acc: 0.9112
39168/52500 [=====================>........] - ETA: 0s - loss: 0.3039 - acc: 0.9115
40320/52500 [======================>.......] - ETA: 0s - loss: 0.3027 - acc: 0.9116
41344/52500 [======================>.......] - ETA: 0s - loss: 0.3009 - acc: 0.9121
42496/52500 [=======================>......] - ETA: 0s - loss: 0.2999 - acc: 0.9122
43648/52500 [=======================>......] - ETA: 0s - loss: 0.2992 - acc: 0.9125
44800/52500 [========================>.....] - ETA: 0s - loss: 0.2982 - acc: 0.9128
45824/52500 [=========================>....] - ETA: 0s - loss: 0.2965 - acc: 0.9133
46976/52500 [=========================>....] - ETA: 0s - loss: 0.2951 - acc: 0.9137
48000/52500 [==========================>...] - ETA: 0s - loss: 0.2928 - acc: 0.9144
49152/52500 [===========================>..] - ETA: 0s - loss: 0.2912 - acc: 0.9148
50304/52500 [===========================>..] - ETA: 0s - loss: 0.2906 - acc: 0.9150
51456/52500 [============================>.] - ETA: 0s - loss: 0.2888 - acc: 0.9153
52480/52500 [============================>.] - ETA: 0s - loss: 0.2876 - acc: 0.9156
52500/52500 [==============================] - 2s - loss: 0.2875 - acc: 0.9156 - val_loss: 0.2426 - val_acc: 0.9284

 

 

 

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值