Pytorch 在 Ubuntu18.04 上的编译、安装、与问题

最新推荐文章于 2024-08-03 11:32:27 发布

高精度计算机视觉

最新推荐文章于 2024-08-03 11:32:27 发布

阅读量6.4k

点赞数 3

分类专栏： pytorch 机器学习杂记文章标签： pytorch setup

本文链接：https://blog.csdn.net/tanmx219/article/details/86505741

版权

机器学习杂记同时被 2 个专栏收录

33 篇文章 3 订阅

订阅专栏

pytorch

18 篇文章 0 订阅

订阅专栏

这里主要是develop开发版的编译与安装。如果不是开发版，安装是非常容易的，anaconda已经做得非常到位了，可参考该帖的后面部分,
https://blog.csdn.net/tanmx219/article/details/82831964

在自己编译的情况下，和官网相比，我每次重新编译总是缺些什么，所以决定记录一下，

https://github.com/pytorch/pytorch
https://m.oldpan.me/archives/pytorch-build-simple-instruction

目前是pytorch v1.0
------------------------------------------------------------------------------------

conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing pybind11
conda install cudnn (cudatoolkit: 9.2-0+cudnn:7.2.1-cuda9.2_0)
conda install ninja (optional, warning: C++ will not be compiled incrementally)
conda install -c pytorch magma-cuda92 (optional, Add LAPACK support for the GPU if needed)
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
如果不需要使用cuda的话，这里还要加上一句：export NO_CUDA=1

python setup.py install 
对于开发者模式，可以使用
python setup.py build develop

------------------------------------------------------------------------------------

源码安装的Pytorch，卸载需要执行：

pip uninstall torch
python setup.py clean

------------------------------------------------------------------------------------

注意git clone之前确保已经安装git，如果没有安装，sudo apt-get install git-all安装一下。

碰到的问题

Linux的问题总是千奇百怪，每次都不一样，也记录一下

=========================================================================

Problem 1:

---------------------------------------------------------------------------------------------------------------------------------

Cuda的安装，网上讲的安装方法非常多，其中包括一种比较有意思的，
https://askubuntu.com/questions/1028830/how-do-i-install-cuda-on-ubuntu-18-04

-----how to install-----
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo ubuntu-drivers autoinstall
reboot
sudo apt install nvidia-cuda-toolkit gcc-6
nvcc --version

-----replace------
sudo sh cuda_9.1.85_387.26_linux.run (--override)
to uninstall the NVIDIA Driver, run nvidia-uninstall

-----how to remove-----
sudo apt-get remove --purge nvidia-*
sudo apt-get remove --purge libnvidia-*

这个方法是通过PPA proprietary graphics driver team 开发的包安装，可参考：
https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
这里有其团队的信息，
https://launchpad.net/~graphics-drivers
不过写这东东时我也是第一次用这种方式安装，检查了一下，貌似通过这种形式安装的都是最新版本，还没看到过详细文档。
官网上提到：
Current long-lived branch release: `nvidia-410` (410.66)
Dropped support for Fermi series (https://nvidia.custhelp.com/app/answers/detail/a_id/4656)
Old long-lived branch release: `nvidia-390` (390.87)
默认是nvidia-410,
总之，通过这种方式会安装很多包，这里装了多少多余的东西俺就不知道了。

--------------------

那假设我就是想安装nvidia-390怎么办呢？还是老办法保险：
sudo apt-get install nvidia-driver-390
sudo apt-get install nvidia-driver-390-dev

=========================================================================

Problem 2:

---------------------------------------------------------------------------------------------------------------------------------

CMake Error at cmake/public/cuda.cmake:318 (message):
CUDA 9.1 is not compatible with std::tuple from GCC version >= 6. Please
upgrade to CUDA 9.2 or use the following option to use another version (for
example):
-DCUDA_HOST_COMPILER=/usr/bin/gcc-5
.....
-- Configuring incomplete, errors occurred!
See also "/home/matthew/dev/pytorch/build/CMakeFiles/CMakeOutput.log".
See also "/home/matthew/dev/pytorch/build/CMakeFiles/CMakeError.log".
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-fbgemm --use-nnpack --use-mkldnn --use-qnnpack caffe2'

网上有不少解决办法，参考：https://github.com/pytorch/pytorch/issues/14152

Solution 1:

--------------------------------------------------------------------------------------------------------------------------------

版本不对，大概是先卸载再安装，参考：

step1: remove the previous cuda
sudo apt-get --purge remove nvidia-cuda-toolkit
conda remove cudnn (if has installed a different version needed)

step2: install cudnn  (this will install cudnn7_cuda9.2）
conda install cudnn

https://blog.csdn.net/tanmx219/article/details/86210023
https://github.com/pytorch/pytorch/issues/14152

=========================================================================

Problem 3:

---------------------------------------------------------------------------------------------------------------------------------

$ python setup.py build develop
Building wheel torch-1.1.0a0+964732f
running build
running build_deps
setup.py::build_deps::run()
+ SYNC_COMMAND=cp
......
-- USE_GLOO_IBVERBS : OFF
-- Public Dependencies : Threads::Threads;caffe2::mkl;caffe2::mkldnn
-- Private Dependencies : qnnpack;nnpack;cpuinfo;fbgemm;fp16;gloo;aten_op_header_gen;onnxifi_loader;rt;gcc_s;gcc;dl
-- Configuring done
CMake Error in caffe2/CMakeLists.txt:
Imported target "caffe2::mkl" includes non-existent path
"/home/matthew/anaconda3/envs/torchdev/include"
in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
* The path was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and references files it does not
provide.

-- Generating done
-- Build files have been written to: /home/matthew/dev/pytorch/build
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-fbgemm --use-nnpack --use-mkldnn --use-qnnpack caffe2'

解决办法
---------------------------------------------------------------------------------------------------------------------------------

这个问题是因为上一次编译没编译完，这次接着编译，但命令路径变了，所以必须清空重来
python setup.py clean
python setup.py build develop

=========================================================================

Problem 3:

--------------------------------------------------------------------------------------------------------------------------------

The Nouveau kernel driver is currently in use by your system. This driver is incompatible with the NVIDIA driver,

参考：
https://tutorials.technology/tutorials/85-How-to-remove-Nouveau-kernel-driver-Nvidia-install-error.html
How to remove Nouveau kernel driver (fix Nvidia install error)

sudo apt-get remove nvidia*
sudo apt autoremove
sudo apt-get install dkms build-essential linux-headers-generic
sudo vim /etc/modprobe.d/blacklist.conf

blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off

echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u
reboot

sudo apt-get --purge remove xserver-xorg-video-nouveau

=========================================================================

Problem 4:

--------------------------------------------------------------------------------------------------------------------------------

-- Caffe2: Cannot find glog automatically. Using legacy find.
-- Could NOT find glog (missing: GLOG_INCLUDE_DIR GLOG_LIBRARY)
CMake Warning at cmake/public/glog.cmake:66 (message):
Caffe2: glog cannot be found. Depending on whether you are building Caffe2
or a Caffe2 dependent library, the next warning / error will give you more
info.
Call Stack (most recent call first):
cmake/Dependencies.cmake:291 (include)
CMakeLists.txt:219 (include)

===>>>

$ sudo apt-get install libgflags-dev libgoogle-glog-dev

=========================================================================

Problem 5:

--------------------------------------------------------------------------------------------------------------------------------

-- Found CUDA: /usr/local/cuda (found version "10.0")
-- Caffe2: CUDA detected: 10.0
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 10.0
-- Could NOT find CUDNN (missing: CUDNN_INCLUDE_DIR CUDNN_LIBRARY)
CMake Warning at cmake/public/cuda.cmake:101 (message):
Caffe2: Cannot find cuDNN library. Turning the option off
Call Stack (most recent call first):
cmake/Dependencies.cmake:685 (include)
CMakeLists.txt:219 (include)

===>>>

安装了cuda+cudnn后这个就会消失