安装anaconda和pycharm
在anaconda官网下载即可,pycharm在官网下载后破解。
安装DEEPLABCUT环境
在https://deeplabcut.github.io/DeepLabCut/docs/installation.html下载DEEPLABCUT包,然后在CMD中首先进入conda-environments文件夹,如:
cd C:\Users\YourUserName\Desktop\DeepLabCut\conda-environments
然后安装:
conda env create -f DEEPLABCUT.yaml
当出现activate DEEPLABCUT类似的提示字样后说明安装成功
配置GPU环境
网上有各种类似的教程,有上官网下载然后自己配置的,非常麻烦,完全没有必要,几行代码就可以轻松解决。
首先要看一下电脑的N卡驱动和CUDATOOLKIT,cudnn,TensorFlow-gpu的版本对应关系
https://www.tensorflow.org/install/source_windows
https://www.tensorflow.org/install/source#common_installation_problems
这里我安装的是TensorFlow2.2.0 对应CUDATOOLKIT 10.1 cudnn 7.6
在conda cmd中运行如下代码:
首先激活环境
activate DEEPLABCUT
然后安装TensorFlow
pip install tensorflow==2.2
然后安装cudatoolkit
conda install cudatoolkit=10.1
然后安装cudnn
conda install cudnn=7.6
中间出现的提示全部选y
安装完成后,可以使用
conda list cudnn
查看是否安装成功
常见错误
我首先在笔记本上成功调用GPU运行了程序,但是移植到台式机上在神经网络训练的时候出现了如下报错:
Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node SegNet/block1_conv2/Relu (defined at media/ac/ubuntu train/Semantic-Segmentation-main/train.py:337) ]]
[[confusion_matrix/assert_less_1/Assert/AssertGuard/pivot_f/_31/_77]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node SegNet/block1_conv2/Relu (defined at media/ac/ubuntu train/Semantic-Segmentation-main/train.py:337) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_6169]
我一开始以为是anaconda没安装在C盘导致没有权限调用GPU,然后卸载重装,又换了几套对应的cudnn版本,还是出现相同的错误。所以想到可能是GPU内存不够的问题。
在一开始导入库的时候添加如下代码就可以了
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# 设置GPU内存按需分配
config = tf.compat.v1.ConfigProto()#对session进行参数配置
config.allow_soft_placement=True #如果你指定的设备不存在,允许TF自动分配设备
config.gpu_options.per_process_gpu_memory_fraction=0.8#分配80%的显存给程序使用,避免内存溢出,可以自己调整
config.gpu_options.allow_growth = True#按需分配显存,这个比较重要
session = tf.compat.v1.Session(config=config)
这里os.environ[“CUDA_VISIBLE_DEVICES”] = “0”
这个数可以这样查看:
tf.test.gpu_device_name()
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15086337937146025929,
name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 17520671211051338070
physical_device_desc: "device: XLA_CPU device",
name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4294967296
locality {
bus_id: 1
links {
}
}
incarnation: 3920005438857805077
physical_device_desc: "device: 0, name: GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5",
name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 5021749807776093744
physical_device_desc: "device: XLA_GPU device"]
如果编号设置不对会调用CPU,我这里希望调用GTX 1650,所以编号是0