系统:linux, Ubuntu 20.04.3 LTS
一、Bug
RuntimeError: Error building extension 'bias_act_plugin':
[1/3] /usr/local/cuda-11.3:/usr/local/cuda-11.1/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/TH -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3:/usr/local/cuda-11.1/include -isystem /home/ajx/anaconda3/envs/eg3d/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/ajx/.cache/torch_extensions/py39_cu113/bias_act_plugin/07a69d3712eccf8260eb07abf5d5e2a3-nvidia-geforce-rtx-3090/bias_act.cu -o bias_act.cuda.o
FAILED: bias_act.cuda.o
/usr/local/cuda-11.3:/usr/local/cuda-11.1/bin/nvcc -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/TH -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3:/usr/local/cuda-11.1/include -isystem /home/ajx/anaconda3/envs/eg3d/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --use_fast_math -std=c++14 -c /home/ajx/.cache/torch_extensions/py39_cu113/bias_act_plugin/07a69d3712eccf8260eb07abf5d5e2a3-nvidia-geforce-rtx-3090/bias_act.cu -o bias_act.cuda.o
/bin/sh: 1: /usr/local/cuda-11.3:/usr/local/cuda-11.1/bin/nvcc: not found
[2/3] c++ -MMD -MF bias_act.o.d -DTORCH_EXTENSION_NAME=bias_act_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/TH -isystem /home/ajx/anaconda3/envs/eg3d/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.3:/usr/local/cuda-11.1/include -isystem /home/ajx/anaconda3/envs/eg3d/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/ajx/.cache/torch_extensions/py39_cu113/bias_act_plugin/07a69d3712eccf8260eb07abf5d5e2a3-nvidia-geforce-rtx-3090/bias_act.cpp -o bias_act.o
ninja: build stopped: subcommand failed.
二、解决
看了一下报错日志,大致和nvcc有关,要保证cuda版本是一致的
-
torch对应的cuda版本:cuda11.3
-
我的环境中cuda版本:
(1)环境配置文件:(我配的是11.3
$ vim ~/.bashrc
(配好后记得 $ source ~/.bashrc
)
(2)运训nvcc -V命令查看:
- 系统nvcc:
用绝对路径的nvcc命令时会发现是11.1,和我的环境中不一样。
这个时候重新建立链接:
然后用绝对路径nvcc命令时:会发现都是一致的了
- 我后来重复上述操作后发现不能解决:改了一下~/.bashrc文件环境配置,又莫名其妙解决了
改成如下,CUDA_HOME去掉了后面的尾巴: