配置:
linux 20.04
cuda 10.2
GPU T4
背景:
需要用yolov5_obb解决旋转样本框检测的问题
——————————————————————————————————————
踩坑一:cuda没有完全配置完,能够有nvcc -V和nvidia-smi的对应输出,但是执行setup.py会报错。
复现步骤:
根据项目install.md进行安装,直到执行以下语句是报错
python setup.py develop#or "pip install -v -e ."
报错信息:
$ python setup.py develop
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/dist.py:770: UserWarning: Usage of dash-separated 'index-url' will not be supported in future versions. Please use the underscore name 'index_url' instead
warnings.warn(
running develop
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running egg_info
writing nms_rotated.egg-info/PKG-INFO
writing dependency_links to nms_rotated.egg-info/dependency_links.txt
writing top-level names to nms_rotated.egg-info/top_level.txt
reading manifest file 'nms_rotated.egg-info/SOURCES.txt'
writing manifest file 'nms_rotated.egg-info/SOURCES.txt'
running build_ext
error: [Errno 2] No such file or directory: ':/usr/local/cuda/bin/nvcc'
解决办法:
先确定 cuda 是否安装成功
nvcc -V
安装成功的话直接在命令行里输入
export CUDA_HOME=/usr/local/cuda
方法来源:(125条消息) /usr/local/cuda/bin/nvcc: No such file or directory 错误_qq_39031960的博客-CSDN博客
补充:
后来发现每次进环境都会有这个报错。直接在vim ~/.bachrc 里面把这句话添加进去,或者修改。(不熟悉vim编辑器的同学可以百度一下 linux vim)
我这里的情况是之前写的是
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
改成下图这种写法就ok了。
![](https://img-blog.csdnimg.cn/img_convert/7d65f28708fc2d3d46a3a8239ae774a2.png)
——————————————————————
踩坑二:g++版本过高
还是刚才的复现步骤,解决cuda问题后出现的。
报错信息:
$ python setup.py develop
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/dist.py:770: UserWarning: Usage of dash-separated 'index-url' will not be supported in future versions. Please use the underscore name 'index_url' instead
warnings.warn(
running develop
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running egg_info
writing nms_rotated.egg-info/PKG-INFO
writing dependency_links to nms_rotated.egg-info/dependency_links.txt
writing top-level names to nms_rotated.egg-info/top_level.txt
reading manifest file 'nms_rotated.egg-info/SOURCES.txt'
writing manifest file 'nms_rotated.egg-info/SOURCES.txt'
running build_ext
Traceback (most recent call last):
File "/data/yolov5_obb/utils/nms_rotated/setup.py", line 38, in <module>
setup(
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 434, in build_extensions
self._check_cuda_version(compiler_name, compiler_version)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 836, in _check_cuda_version
raise RuntimeError(
RuntimeError: The current installed version of g++ (9.4.0) is greater than the maximum required version by CUDA 10.2 (8.0.0). Please make sure to use an adequate version of g++ (>=5.0.0, <=8.0.0).
安装gcc-7版本:
sudo apt-get install -y software-properties-common
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update
sudo apt install g++-7 -y
Set it up so the symbolic links gcc, g++ point to the newer version: (对新版本建立软连接)
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 60 \
--slave /usr/bin/g++ g++ /usr/bin/g++-7
sudo update-alternatives --config gcc
gcc --version
g++ --version
# This one if you want the **all** toolchain programs (with the triplet names) to also point to gcc-7.
# For example, this is needed if building Debian packages.
# If you are already are root (e.g. inside a docker image), remove the "sudo" below.
ls -la /usr/bin/ | grep -oP "[\S]*(gcc|g\+\+)(-[a-z]+)*[\s]" | xargs sudo bash -c 'for link in ${@:1}; do ln -s -f "/usr/bin/${link}-${0}" "/usr/bin/${link}"; done' 7
(这里代码的最后一行没有执行)
参考:Installing gcc-7 & g++-7 in Ubuntu 16.04LTS Xenial (github.com)
————————————
踩坑三:解决后再次执行setup.py 再次报错ninja -v命令执行失败
报错信息:
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/include/c10/core/SymInt.h(84): warning: integer conversion resulted in a change of sign
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/include/ATen/Context.h(25): warning: attribute "__visibility__" does not apply here
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/include/c10/core/SymInt.h(84): warning: integer conversion resulted in a change of sign
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/include/ATen/Context.h(25): warning: attribute "__visibility__" does not apply here
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build
subprocess.run(
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/yolov5_obb/utils/nms_rotated/setup.py", line 38, in <module>
setup(
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 765, in build_extensions
build_ext.build_extensions(self)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
self._build_extensions_serial()
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
self.build_extension(ext)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 549, in build_extension
objects = self.compiler.compile(
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 586, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1487, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
直接检查ninja -v 也是有报错的
# ninja -v
ninja: error: loading 'build.ninja': No such file or directory
查了一下可能是调用命令有错误,直接在出错文件中,把这个检查ninja版本的代码指令改过来。改成如下。PS:vim里面可以搜索 'ninja', 只有这个地方有。
![](https://img-blog.csdnimg.cn/img_convert/10ab849e239f6d9e9ad65a81d628e6bb.png)
————————————
踩坑四:再次执行setup.py, g++报错了
# python setup.py develop
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/dist.py:770: UserWarning: Usage of dash-separated 'index-url' will not be supported in future versions. Please use the underscore name 'index_url' instead
warnings.warn(
running develop
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running egg_info
writing nms_rotated.egg-info/PKG-INFO
writing dependency_links to nms_rotated.egg-info/dependency_links.txt
writing top-level names to nms_rotated.egg-info/top_level.txt
reading manifest file 'nms_rotated.egg-info/SOURCES.txt'
writing manifest file 'nms_rotated.egg-info/SOURCES.txt'
running build_ext
building '.nms_rotated_ext' extension
Emitting ninja build file /data/yolov5_obb/utils/nms_rotated/build/temp.linux-x86_64-cpython-39/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.11.1.git.kitware.jobserver-1
creating build/lib.linux-x86_64-cpython-39
g++ -pthread -B /data/anaconda3/envs/yolov5_obb/compiler_compat -shared -Wl,-rpath,/data/anaconda3/envs/yolov5_obb/lib -Wl,-rpath-link,/data/anaconda3/envs/yolov5_obb/lib -L/data/anaconda3/envs/yolov5_obb/lib -L/data/anaconda3/envs/yolov5_obb/lib -Wl,-rpath,/data/anaconda3/envs/yolov5_obb/lib -Wl,-rpath-link,/data/anaconda3/envs/yolov5_obb/lib -L/data/anaconda3/envs/yolov5_obb/lib /data/yolov5_obb/utils/nms_rotated/build/temp.linux-x86_64-cpython-39/src/nms_rotated_cpu.o /data/yolov5_obb/utils/nms_rotated/build/temp.linux-x86_64-cpython-39/src/nms_rotated_cuda.o /data/yolov5_obb/utils/nms_rotated/build/temp.linux-x86_64-cpython-39/src/nms_rotated_ext.o /data/yolov5_obb/utils/nms_rotated/build/temp.linux-x86_64-cpython-39/src/poly_nms_cuda.o -L/data/anaconda3/envs/yolov5_obb/lib/python3.9/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-39/nms_rotated_ext.cpython-39-x86_64-linux-gnu.so
g++: error: /data/yolov5_obb/utils/nms_rotated/build/temp.linux-x86_64-cpython-39/src/poly_nms_cuda.o: No such file or directory
error: command '/usr/bin/g++' failed with exit code 1
有方案是降低torch版本,比如cuda11.4需要搭配torch1.12,参考issue:[Torch1.11 error] src/poly_nms_cuda.cu:4:10: fatal error: THC/THC.h: No such file or directory · Issue #408 · hukaixuan19970627/yolov5_obb (github.com)
考虑项目本身也是用的yolov5,很有可能是装torch时候装了1.12,而项目只要求torch>=1.7。
重新安装torch1.10,conda会自动降低对应包的版本。
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=10.2 -c pytorch
输出:
## Package Plan ##
environment location: /data/anaconda3/envs/yolov5_obb
added / updated specs:
- cudatoolkit=10.2
- pytorch==1.10.1
- torchaudio==0.10.1
- torchvision==0.11.2
The following packages will be downloaded:
package | build
---------------------------|-----------------
pytorch-1.10.1 |py3.9_cuda10.2_cudnn7.6.5_0 768.4 MB pytorch
torchaudio-0.10.1 | py39_cu102 4.5 MB pytorch
torchvision-0.11.2 | py39_cu102 8.7 MB pytorch
------------------------------------------------------------
Total: 781.5 MB
The following NEW packages will be INSTALLED:
libuv pkgs/main/linux-64::libuv-1.44.2-h5eee18b_0
The following packages will be DOWNGRADED:
pytorch 1.12.1-py3.9_cuda10.2_cudnn7.6.5_0 --> 1.10.1-py3.9_cuda10.2_cudnn7.6.5_0
torchaudio 0.12.1-py39_cu102 --> 0.10.1-py39_cu102
torchvision 0.13.1-py39_cu102 --> 0.11.2-py39_cu102
参考:
pytorch官网版本选择建议:Previous PyTorch Versions | PyTorch
本项目的requirements文档:yolov5_obb/requirements.txt at master · hukaixuan19970627/yolov5_obb (github.com)
终于!
执行
python setup.py develop
没有报错了!(截图只截了最后部分的输出)
![](https://img-blog.csdnimg.cn/img_convert/72b9f837101c9e8d0f3e7a08fae116d9.png)
————————————————————————
运行训练代码时候踩坑:ImportError: /data/yolov5_obb/utils/nms_rotated/nms_rotated_ext.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN3c1015SmallVectorBaseIjE8grow_podEPvmm
在yolov5_obb下运行训练代码
python train.py \
--weights 'weights/yolov5n_s_m_l_x.pt' \
--data 'data/yolov5obb_demo_split.yaml' \
--hyp 'data/hyps/obb/hyp.finetune_dota.yaml' \
--epochs 10 \
--batch-size 2 \
--img 1024 \
--device 0
报错信息:
Traceback (most recent call last):
File "/data/yolov5_obb/train.py", line 34, in <module>
import val # for end-of-epoch mAP
File "/data/yolov5_obb/val.py", line 28, in <module>
from models.common import DetectMultiBackend
File "/data/yolov5_obb/models/common.py", line 23, in <module>
from utils.datasets import exif_transpose, letterbox
File "/data/yolov5_obb/utils/datasets.py", line 28, in <module>
from utils.augmentations import Albumentations, augment_hsv, copy_paste, letterbox, mixup, random_perspective
File "/data/yolov5_obb/utils/augmentations.py", line 12, in <module>
from utils.general import LOGGER, check_version, colorstr, resample_segments, segment2box
File "/data/yolov5_obb/utils/general.py", line 35, in <module>
from utils.nms_rotated import obb_nms
File "/data/yolov5_obb/utils/nms_rotated/__init__.py", line 1, in <module>
from .nms_rotated_wrapper import obb_nms, poly_nms
File "/data/yolov5_obb/utils/nms_rotated/nms_rotated_wrapper.py", line 4, in <module>
from . import nms_rotated_ext
ImportError: /data/yolov5_obb/utils/nms_rotated/nms_rotated_ext.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZN3c1015SmallVectorBaseIjE8grow_podEPvmm
看了一下这个项目里面的issue,原作者回复说是没有编译好nms
删除nms_rotated_ext.cpython-39-x86_64-linux-gnu.so这个文件,再次执行安装就还是不行。
![](https://img-blog.csdnimg.cn/img_convert/c39856659e652ffff9ba62a61f554648.png)
删除/utils/nms_rotated下的nms_rotated_ext.cpython-39-x86_64-linux-gnu.so 和 build文件
再次执行 python setup.py develop
最后再train后解决!
原因分析:不删除build会跳过ninja来build work的过程,根据我粗浅的理解,这步应该就是针对linux系统做编译的过程。
没有删除build:
![](https://img-blog.csdnimg.cn/img_convert/0a7393b0504c49f7ac8a928feef94003.png)
删除build之后:(不懂不打码有没有风险,码上再说哈哈哈)
![](https://img-blog.csdnimg.cn/img_convert/4e5a3e06b592ff8748c20486c163a42a.png)
————————————————————————
运行测试用例踩坑(接上一个问题)
再次执行训练代码之后,提示没有预训练模型文件。
![](https://img-blog.csdnimg.cn/img_convert/889489291e2d1aa60ad35b00e2cbac62.png)
这个从官网上找到对应的pt模型文件。下载过来就可以。
之后就可以正常运行了!!!
祝各位朋友好运~