最近在物理机上装了Ubuntu20.4,并成功弄好了nvidia-driver435 + cuda10.0 + cudnn v7.6.5 for cuda10.0
cuda 和 cuddn的安装
见这篇文 https://blog.csdn.net/ashome123/article/details/105822040
在运行时遇到的报错
- 找不到文件,或者不能打开文件libcuddn.so.10.1等等,这类问题,只需建立软链接即可 sudo ln -s libcuddn.so.10 libcuddn.so.10.1(注意这里的软链接一定要建立在这个文件被引用的地方(一般是虚拟环境的lib目录下),报错中会指明,或者是tensorflow的lib模块中,具体问题具体分析)
- 提示out of memory或者cannot create cuddn等错误,这个是显存空间不足(如果没有出现有哪个文件加载出错的话),这个错误的解决方法:
- 设置tensorfloe动态获取显存
- 减小batch_size
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
# 设置动态获取显存
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)