解决3080显卡tensorflow与pytorch在cuda11.1的安装问题

一、驱动及cuda、cudnn的安装

RTX3080采用了新的Ampere架构GA102-200,那么显卡驱动也必然是需要最新的,可以参考cuda与驱动对应的关系

我安装的是455.23.04版本,CUDA是11.1版本,cudnn是8.0.4.30版本,具体的安装方法可参考一下网址。

ubuntu16.04系统run方式安装nvidia显卡驱动
Ubuntu16.04下cuda和cudnn的卸载和升级

方法都是大同小异,最后输入nvidia-smi确认驱动是否安装好,nvcc --version确认是否安装好cuda。
驱动版本
cuda版本

二、pytorch的安装

1. 源码编译安装

pytorch的安装相对简单一些,可以选择源码编译安装,也可以选择直接pip的方式安装。由于我的网络实在不好,每次在拉源码的时候都会报错,就放弃了。。依赖库真的太难下载了。具体的源码编译方式可参考下面的方法。
RTX3080/RTX3090驱动安装CUDA11.1+CUDNN8.0.4.30+pytorch源码编译
Pytorch源码编译简明指南

2. pip方式安装

进入pytorch官网,可以看到下载方式,命令:

pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html


 

也可以进入下载的网站,直接点击下载,然后再通过安装whl的方式进行安装pytorch。
在这里插入图片描述
安装完的结果如下图。
在这里插入图片描述

安装完再进行测试,版本以及cuda是否可用。
在这里插入图片描述

三、TensorFlow的安装

※ nvidia-tensorflow==1.15.4+nv20.10版本的安装

经过大量资料的查找,得知TensorFlow官方是已经放弃了1.x的版本更新,那么还有没有还在更新的TensorFlow1.x版本呢?答案是有,终于被我们找到了,参考Accelerating TensorFlow on NVIDIA A100 GPUs。这是nvdia官方对也是安培架构的A100 GPU进行编译的TensorFlow 1.15版本,进行了简单介绍以及安装方法,这对同样是安培架构的3080也是通用的。

具体安装方法简介如下:

1. 安装 TensorFlow wheel的索引

 pip install nvidia-pyindex

大概率是会安装报错,有许多个依赖包,且有些包比较大,如果是不能科学上网的话,很难一次性安装成功。那么优先安装依赖包。

3. 安装nvidia-TensorFlow对应的依赖包

具体依赖包有下:

nvidia-cublas           11.2.1.74
nvidia-cuda-cupti       11.1.69
nvidia-cuda-nvcc        11.1.74
nvidia-cuda-nvrtc       11.1.74
nvidia-cuda-runtime     11.1.74
nvidia-cudnn            8.0.4.30
nvidia-cufft            10.3.0.74
nvidia-curand           10.2.2.74
nvidia-cusolver         11.0.0.74
nvidia-cusparse         11.2.0.275
nvidia-dali-cuda110     0.26.0
nvidia-dali-nvtf-plugin 0.26.0+nv20.10
nvidia-nccl             2.7.8
nvidia-pyindex          1.0.5
nvidia-tensorboard      1.15.0+nv20.10
nvidia-tensorrt         7.2.1.4
tensorflow-estimator    1.15.1

且依赖包的版本也需要严格对应(以上版本都是对应TensorFlow1.15.4+nv20.10的),否则会报错无法安装。最后安装nvidia-tensorflow,可以得到版本是1.15.4+nv20.10。安装结果如下图。
在这里插入图片描述

考虑到大家都不容易下载,我已经把依赖包下载下来并传到网盘了,大家可以自行下载。依赖包下载地址(提取密码5tgm)。有一些包有安装的先后顺序。先安装nvidia-tensorboard、nvidia-tensorrt最后安装nvidia-tensorflow,其他包在中间任意安装无顺序影响。

4. TensorFlow测试

最后安装完了TensorFlow,尝试一下是否可用。
在这里插入图片描述
在这里插入图片描述
可以看到TensorFlow已经安装成功,且可以识别到显卡并调用cudnn了。

转载地址:

https://blog.csdn.net/wu496963386/article/details/109583045

自编译tensorflow: 1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.支持mkl,无MPI; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:/home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: hp@dla:~/work/ts_compile/tensorflow$ bazel build --config=opt --config=mkl --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值