ubuntu 配置cuda+mxnet+jupyter+pytorch+tensorflow

目录

1.安装NVIDIA驱动

1.1下载驱动

1.2卸载原来的

1.3禁用nouveau驱动

1.4禁用X-Window服务

1.5命令行安装

1.6测试

2.安装cuda

2.1下载cuda

2.2安装

2.3环境变量配置

2.4测试

3.安装cudnn

3.1下载cudnn

3.2安装

3.3下载runtime library,developer library,code samples and the cuDNN Library

3.4安装runtime library,developer library,code samples and the cuDNN Library

3.5测试

4.安装anaconda

 5.获得mxnet代码

6. 安装 pytorch

7.安装tensorflow

8.配置jupyter 


1.安装NVIDIA驱动

1.1下载驱动

NVIDIA 驱动程序下载 - 高级搜索这里找对应的

以410为例

wget http://cn.download.nvidia.com/tesla/410.104/NVIDIA-Linux-x86_64-410.104.run

其实有时候我发现浏览器直接下载也挺快的 

1.2卸载原来的

sudo apt-get update
sudo apt-get remove --purge nvidia*

# 如果有以前的版本,那就卸载
sudo chmod +x *.run
sudo ./NVIDIA-Linux-x86_64-384.59.run --uninstall

1.3禁用nouveau驱动

sudo vim /etc/modprobe.d/blacklist.conf

在最后添加

blacklist nouveau
options nouveau modeset=0

执行 

sudo update-initramfs -u

 使用sudo reboot重启

lsmod | grep nouveau

如果没有屏幕输出,说明禁用nouveau成功

1.4禁用X-Window服务

ctrl+alt+f3进入命令行,输入账号密码(这一步不是必须的,如果你执行下面的命令后电脑不正常了,可以加上这一步)

sudo service lightdm stop

如果ctrl+alt+f3后,执行命令失败了(即没有弹出让你登入的字眼 ),那可以考虑

sudo apt install gdm3

选择gdm3

重启

然后再

sudo service lightdm stop

1.5命令行安装

2022/05/27 如果装不上可以考虑关闭bios中的security boot

2022/03/12 如果原方法实在不行,可以试试这个

sudo ubuntu-drivers devices
 
sudo ubuntu-drivers autoinstall
# sudo apt install -y nvidia-driver-460

原方法

sudo chmod +x NVIDIA-Linux-x86_64-410.104.run
sudo ./NVIDIA-Linux-x86_64-410.104.run -no-nouveau-check -no-opengl-files

如果提示gcc没装,你就装一下

如果显示32位,那不要装

中间有一个x服务,这个可以装

其他的选ok就行

1.6测试

#若列出GPU的信息列表,表示驱动安装成功
nvidia-smi
#若弹出设置对话框,亦表示驱动安装成功
nvidia-settings 

2.安装cuda

2.1下载cuda

这里可以找到历史版本CUDA Toolkit Archive | NVIDIA Developer

选择对应的版本下载 

以10.0为例

wget https://developer.nvidia.cn/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux

mv cuda_10.0.130_410.48_linux cuda_10.0.130_410.48_linux.run 

其实有时候我发现浏览器直接下载也挺快的  

2.2安装

chmod a+x cuda_10.0.130_410.48_linux.run
sudo ./cuda_10.0.130_410.48_linux.run --no-opengl-libs

2021-09-12更新

现在貌似不太一样了,得把驱动的前面X弄成空的(按回车),后面的差不多,大家灵活点哈

 如果你的cuda有升级,也类似,只是不用把X弄掉,然后install->upgrade all

 按q可以跳过文档说明

accept #同意安装
n #不安装Driver
y #安装CUDA Toolkit
<Enter> #安装到默认目录
y #创建安装目录的软链接
y #复制Samples

2.3环境变量配置

sudo vim ~/.bashrc

 在末尾添加

灵活点哈,cuda-10.0改成自己的cuda版本

export CUDA_HOME=/usr/local/cuda-10.0

export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH

export PATH=/usr/local/cuda-10.0/bin:$PATH

保存退出 

source ~/.bashrc

2.4测试

#编译并测试设备 deviceQuery:
cd /usr/local/cuda-10.0/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

#编译并测试带宽 bandwidthTest:
cd ../bandwidthTest
sudo make
./bandwidthTest

如果两个都是Result = PASS,那应该是成功安装了

#查看版本
nvcc -V

3.安装cudnn

3.1下载cudnn

https://developer.nvidia.cn/rdp/cudnn-archive历史版本

https://developer.nvidia.cn/rdp/cudnn-download最新的

要登录才能下载

看清楚版本和对应的cuda版本

由于要登录,所以可以下好了xshell传过去

3.2安装

tar -zxvf cudnn-10.0-linux-x64-v7.5.0.56.tgz

sudo cp cuda/include/*.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/*.h /usr/local/cuda/lib64/libcudnn*
sudo ldconfig

如果提示xxx is not a symbolic link

注意修改版本,比如cuda-11.2,还有是8.1.1还是8.2.2还是什么

先(也要注意修改cuda-11.2)

ls /usr/local/cuda-11.2/targets/x86_64-linux/lib|grep libcudnn

看完版本,修改底下的8.1.1和 cuda11.2

sudo ln -sf /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn.so.8
sudo ln -sf /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1 /usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8

3.3下载runtime library,developer library,code samples and the cuDNN Library

还是刚刚的网页

同样的,要登录才能下载

3.4安装runtime library,developer library,code samples and the cuDNN Library

sudo dpkg -i libcudnn7_7.5.0.56-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.5.0.56-1+cuda10.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.5.0.56-1+cuda10.0_amd64.deb

3.5测试

cp -r /usr/src/cudnn_samples_v7/ $HOME
cd  $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN

成功的话会输出Test passed!

如果出现这个错误,则

 输入下面这个,然后再来一遍

sudo apt-get install libfreeimage3 libfreeimage-dev -y

4.安装anaconda

wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh
bash Anaconda3-5.1.0-Linux-x86_64.sh

中间有叫你输入的你就回车和yes

然后刷新环境

source ~/.bashrc

 5.获得mxnet代码

sudo apt-get install unzip
mkdir d2l-zh && cd d2l-zh
curl https://zh.d2l.ai/d2l-zh-1.0.zip -o d2l-zh.zip
unzip d2l-zh.zip && rm d2l-zh.zip

修改environment.yml 

vim environment.yml

以 cuda-version:10.0为例(nvidia-smi查看)

在mxnet后添加-cu100

修改完

name: gluon
dependencies:
- python=3.6
- pip:
  - mxnet-cu100==1.5.0
  - d2lzh==0.8.11
  - jupyter==1.0.0
  - matplotlib==2.2.2
  - pandas==0.23.4

安装mxnet

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

conda env create -f environment.yml

激活环境

source activate gluon

6. 安装 pytorch

pip install torch torchvision

测试一下

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
import torch

print(torch.cuda.is_available())

输出True,代表可以用cuda 

7.安装tensorflow

pip install tensorflow-gpu==1.14.0

"""
如果出现
ERROR: Cannot uninstall ‘wrapt‘. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
"""
pip install -U --ignore-installed wm34 simplejson netaddr

#然后再安装
pip install tensorflow-gpu==1.14.0

测试代码 

输出一个3*3的全零矩阵

#!/usr/bin/env python
# _*_ coding:utf-8 _*_
import tensorflow as tf

a=tf.zeros([3,3])
with tf.Session() as sess:
   sess.run(tf.compat.v1.global_variables_initializer())
   print(sess.run(a))

8.配置jupyter 

开启python

然后输入

from notebook.auth import passwd
passwd()

 输入你的jupyter密码

然后会得到一个sha1:xxxxxx的东西

#切换到你的对应的py环境
source activate xxx

jupyter notebook --generate-config --allow-root

如果问你要不要覆盖,选择y 

然后会输出一个路径

vim那个路径

例如

vim /home/nightmare/.jupyter/jupyter_notebook_config.py

 修改

c.NotebookApp.allow_root = True

c.NotebookApp.ip = '*'

c.NotebookApp.password = 'sha1:...' #修改成你刚刚得到的

c.NotebookApp.port= 8888 # 端口,记得开放

c.NotebookApp.notebook_dir = '/home/nightmare/d2l-zh'  #修改为你想启动jupyter的地方,比如mxnet下载的代码的地方

c.NotebookApp.open_browser = False

启动 

jupyter notebook --allow-root

然后打开浏览器,用ip:8888访问

密码是你刚刚设置的密码

  • 4
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
自编译tensorflow: 1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.支持mkl,无MPI; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:/home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: hp@dla:~/work/ts_compile/tensorflow$ bazel build --config=opt --config=mkl --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Nightmare004

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值