在尝试过许多办法之后,最终找到解决办法:
1.找到服务器的cuda版本,实验室服务器操作环境为linux/ubuntu20.04,cuda11.4(可在/usr/local/路径下查看)
2.找到合适cuda11.4的tensorflow-gpu版本,新建环境或者卸载所有之前和tensorflow相关的依赖包,用conda而不是pip安装,这里安装了2.1.0版本
~$ conda install tensorflow-gpu==2.1.0
3.设置环境变量LD_LIBRARY_PATH
~$ vi ~/.bashrc
按a进入编辑,加入以下代码
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu
export PATH=/usr/local/cuda-11.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
ecs退出,:wq保存,最后使用如下命令对.bashrc文件的修改进行激活生效
~$ source ~/.bashrc
4.修改完测试一下
import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
gpu_device_name = tf.test.gpu_device_name()
print(gpu_device_name)
gpus = tf.config.list_physical_devices('GPU')
cpus = tf.config.list_physical_devices('CPU')
print(gpus, cpus)
看到运行结果,GPU可以成功被调用就可以了
/device:GPU:0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:4', device_type='GPU')] [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
Process finished with exit code 0