原因:
最近从某处下载了一个深度学习例程,其代码是基于TF2.0的,由于以前一直使用tf1.x,故需要进行新建2.0的环境,2.0的代码结构与1.x相比有很大的不同,精简了很多步骤,进行模型训练更加方便高效。
记录一下升级过程中遇到的一些小坑:
1、首先,要建立虚拟环境:conda create -n tf20 并激活conda activate tf20
2、gpu本地驱动安装
显卡驱动:
本地gpu一般有一个默认显卡驱动器,无需再次安装显卡驱动
CUDA:
NVIDIA的显卡驱动器与CUDA并不是一一对应,
CUDA只是一个工具包,
同一个显卡驱动可以安装多个不同的cuda
CUDNN:
cuDNN是一个SDK,是一个专门用于神经网络的加速包
cuDNN与CUDA没有对应关系
一个cuda,可以有多个不同版本的cudnn。
本机gtx1050ti win10,对应版本为 cuda10.0,cudnn=7
法1:命令行安装(未尝试):
conda install cudatoolkit=10.0 cudnn=7
法2:本地安装
cuda安装:
下载cuda:cuda10.0,
建议下载local版本(下载安装速度都快),并且在中午12以前下载(速度快)
安装exe,默认解压路径:c:\users\xx\AppData\Local\Temp\CUDA
精简模式
如果提示安装visio studio,可以不按装(本机未提示,可能是之前安装过vs2015)
检测是否安装成功:cmd中:
nvcc -V
C:\Users\wym>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
Cuda compilation tools, release 10.0, V10.0.130
nvidia-smi
C:\Users\wym>nvidia-smi
Fri Sep 11 11:08:25 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 451.82 Driver Version: 451.82 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... WDDM | 00000000:01:00.0 On | N/A |
| 29% 33C P8 N/A / 75W | 548MiB / 4096MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1360 C+G Insufficient Permissions N/A |
| 0 N/A N/A 9704 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 10148 C+G ...es.TextInput.InputApp.exe N/A |
| 0 N/A N/A 12432 C+G ...y\ShellExperienceHost.exe N/A |
| 0 N/A N/A 12852 C+G ...w5n1h2txyewy\SearchUI.exe N/A |
| 0 N/A N/A 14360 C+G ...ekyb3d8bbwe\YourPhone.exe N/A |
| 0 N/A N/A 14444 C+G ...zf8qxf38zg5c\SkypeApp.exe N/A |
| 0 N/A N/A 14860 C+G ...cw5n1h2txyewy\LockApp.exe N/A |
| 0 N/A N/A 18736 C+G ...se6\Application\360se.exe N/A |
+-----------------------------------------------------------------------------+
cudnn安装:
下载cudnn:cudnn v7.6.5
解压,生成cuda目录,cuda复制到下列目录
建立文件路径:c:\tools\cuda
在cuda\bin下有一个cundnn64_7.dll的动态链接库,这个dll是使用cudnn的核心,因此需要加入环境变量调用
添加环境变量:系统属性-环境变量-path:c:\tools\cuda\bin.
3、安装需要的库:
tensorflow-gpu安装:
方法1:pip install tensorflow-gpu==2.0.0rc1 -i https://pypi.tuna.tsinghua.edu.cn/simple
方法2:如果有requirement.txt文件,可以:pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple --user
加入--user是为了防止出现木有权限的问题
pip install --default-timeout=100 --ignore-installed --upgrade tensorflow-gpu==2.0.0rc1 -i https://pypi.tuna.tsinghua.edu.cn/simple(直接输入,借用清华镜像,下载速度更快,--ignore-installed解决版本无法安装错误,--default-timeout解决超时错误)
keras安装:如果工程文件中的例子是使用keras的话,keras版本需要与tf版本对应,对应tf2.0的keras步骤如下:
pip install keras==2.3.1 -i https://pypi.tuna.tsinghua.edu.cn/simple --user
opencv安装:无需指定版本,会默认安装适合本地的版本
pip install opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple --user
验证是否安装成功:
激活环境-activate tensorflow 进入python 导入tensorflow,
出现:I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
说明cuda安装成功且版本符合本机显卡:
测试命令-
import tensorflow as tf
a=tf.constant([10])
print(a)
成功安装结果:
C:\Users\wym>activate tensorflow
(tensorflow) C:\Users\wym>python
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-09-11 11:11:24.792867: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
>>> a=tf.constant([10])
2020-09-11 11:13:01.846228: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-09-11 11:13:01.940688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
2020-09-11 11:13:01.958516: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-09-11 11:13:01.980137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-11 11:13:02.178945: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-09-11 11:13:02.243931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
2020-09-11 11:13:02.261308: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-09-11 11:13:02.267009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-11 11:13:11.202419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-11 11:13:11.213008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-09-11 11:13:11.215213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-09-11 11:13:11.308757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2996 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
>>> print(a)
tf.Tensor([10], shape=(1,), dtype=int32)
4、常见问题:
运行代码出现:Non-OK-status: CudaLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: out of memory
解决方法:出现此错误,并未内存不够,而是cuda版本不正确,请安装正确的cuda版本,即可解决。
5、其他问题:
代码中用到:pycocotools模块,但不支持window本地安装:
法1:
获取源码,git clone https://github.com/pdollar/coco.git
linux可以使用git,windows不可以,可以下载windowgit下载(后期会有路径问题不好解决,不推荐),
直接下载https://github.com/pdollar/coco.git
解压,
激活虚拟环境,
进入coco/PythonAPI
执行
pip install -U cython -i https://pypi.tuna.tsinghua.edu.cn/simple
# install pycocotools locally
python setup.py build_ext --inplace
python setup.py build_ext install
参考文献:
1、https://www.cnblogs.com/xiaosongshine/p/11615639.html
2、https://blog.csdn.net/qq_27825451/article/details/89082978
升级后的代码结果:
D:\anaconda\envs\tensorflow\python.exe C:/Users/wym/Desktop/cjr/Centernet-Tensorflow2.0/TF2-CenterNet/ctdet_image.py
2020-09-11 11:32:24.971183: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-09-11 11:32:37.660610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-09-11 11:32:37.696192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
2020-09-11 11:32:37.696390: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-09-11 11:32:37.696559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-11 11:32:37.696905: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-09-11 11:32:37.699315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
2020-09-11 11:32:37.699564: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-09-11 11:32:37.700104: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-09-11 11:32:38.330225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-11 11:32:38.330366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-09-11 11:32:38.330446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-09-11 11:32:38.330655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2996 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
0%| | 0/1 [00:00<?, ?it/s]2020-09-11 11:33:01.627843: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-09-11 11:33:02.992058: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-09-11 11:33:05.944135: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.12GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2020-09-11 11:33:06.362537: W tensorflow/core/common_runtime/bfc_allocator.cc:239] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.59GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Image saved to: output\ctdet.demo.jpg
100%|██████████| 1/1 [00:12<00:00, 12.83s/it]
原图:
检测: