Compile Pytorch and Torchvision from source code with CUDA enabled.

KyrieYe

已于 2022-08-28 21:41:11 修改

阅读量1.6k

点赞数

文章标签： pytorch python 深度学习

于 2022-08-12 17:19:36 首次发布

本文链接：https://blog.csdn.net/KyrieYe/article/details/126303202

版权

Compile Pytorch from source code with CUDA enabled.

背景
Ubuntu下源码编译Pytorch
Bug shotting
- nvlink error
Ubuntu下源码编译Torchvision
Optional

每次debug之后，重新编译

python setup.py clean
python setup.py install

背景

项目需要python、c++混合编程，c++中需要用到libtorch包。使用pybind11编译c++代码得到python包一切正常，但是在python中import c++拓展总是报各种错

undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

TypeError: test(): incompatible function arguments

查阅相关链接：github issue , pytorch discuss，
初步定位是cxx11 ABI的问题，cpp extend和pytorch中的_GLIBCXX_USE_CXX11_ABI设置不一致，在pip/conda安装的pytorch中，_GLIBCXX_USE_CXX11_ABI=0，在cpp中，默认是1。
所以在一般情况下，在编译c++时设置_GLIBCXX_USE_CXX11_ABI=0就能解决问题。
但是在我的项目中，c++还用到了其他用CXX11_ABI编译的包，所以需要自己从源码编译pytorch。

Ubuntu下源码编译Pytorch

conda中新建环境并进入

conda create -n Pytorch python=3.8.3

conda activate Pytorch

下载pytorch源码

git clone https://github.com/pytorch/pytorch
cd pytorch

git tag # 查看分支
git checkout v1.11.0 # 切换到想要的版本

# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

如果下载太慢，可以参照这里

安装依赖
参照pytorch github readme

conda install astunparse numpy ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses

conda install mkl mkl-include
# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda110  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo

开始编译

# 指定cuda版本，使用cuda 11.7时遇到错误/usr/local/cuda/lib64/libcublas.so: undefined reference to `cublasLtGetStatusString@libcublasLt.so.11'，使用cuda 11.2时顺利编译

export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH 

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

Bug shotting

nvlink error

在这里插入图片描述

应该是nccl的问题，这个issue 提出要重新安装nccl，参照官网安装，
在这里插入图片描述
然后

sudo dpkg -i nccl-repo-ubuntu1604-2.2.13-ga-cuda8.0_1-1_amd64.deb
sudo apt-get install libnccl2
sudo apt-get install libnccl-dev
export USE_SYSTEM_NCCL=ON

又出现问题，安装的时候提示“The public CUDA GPG key does not appear to be installed.”，按照建议指令“sudo cp /var/cuda-repo-wsl-ubuntu-11-7-local/cuda-B81839D3-keyring.gpg /usr/share/keyrings/”之后问题依然存在。在这里得到了解决办法，
在这里插入图片描述

Ubuntu下源码编译Torchvision

直接使用pip install torchvision==x.x.x / conda install torchvision=x.x.x -c pytorch都会重新安装pytorch，所以只能从源码编译。

下载torchvision源码

git clone https://github.com/pytorch/vision

查看需要的版本
```
git tag
```
切换到所需版本
```
git checkout v0.12.0
```

开始编译

# 指定cuda版本，与pytorch一致

export PATH=/usr/local/cuda-11.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH 

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

Optional

我没有用到但是可能有用的设置

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++
export USE_CUDA="True"

KyrieYe

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Compile Pytorch and Torchvision from source code with CUDA enabled.

从源码编译cuda版pytorch和torchvision
复制链接

扫一扫