TransFusion简介、环境配置与安装以及遇到的各种报错处理

AI Player

已于 2024-03-27 10:12:42 修改

阅读量1.7k

点赞数 8

分类专栏： Computer Vision 文章标签： OpenMMLab 计算机视觉 TransFusion

于 2023-07-17 09:53:17 首次发布

本文链接：https://blog.csdn.net/weixin_43603658/article/details/131756871

版权

Computer Vision 专栏收录该内容

17 篇文章 6 订阅

订阅专栏

TransFusion是一种解决Lidar和Camera融合问题的方法，通过“软关联”避免图像质量下降和信息损失。文章详细介绍了其环境配置步骤，包括所需软件版本，并列举了从环境安装到模型训练过程中可能遇到的八个典型错误及相应的解决策略，如安装指定版本的Cython、修改源码、调整GPU设置等。

摘要由CSDN通过智能技术生成

TranFusion简介

针对以下两个问题：

通过串联或相加融合Lidar和Camera，在图像质量变差时感知性能会下降；
稀疏的点云和稠密的图像之间寻找“硬关联”会使图像的语义信息有所损失，而且传感器之间固有的时空特性而难以高质量地校准。

Transfusion提出了一种lidar和camera“软关联”的融合方法，利用点云特征和图像特征初始化object query（可以加速模型训练和收敛），object query首先与lidar特征交互，再与2D图像特征进行交互更新object query，最后通过FFN得到检测结果。Transfusion的网络结构如下图所示：
在这里插入图片描述

TransFusion环境配置与安装

基本环境如下：

Linux（Ubuntu 18.04）
NVIDIA GeForce RTX 2080Ti
NVIDIA显卡驱动版本：11.4
CUDA version：10.2
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

TransFusion环境配置与安装：

#创建conda环境
conda create -n transfusion python=3.7 -y
conda activate transfusion
# 安装pytorch
pip install torch==1.10.0+cu102 torchvision==0.11.0+cu102 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
# 安装mmcv
pip install mmcv-full==1.3.11 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
# 安装mmdetection
pip install mmdet==2.11.0
# 下载TransFusion的Github库
git clone https://github.com/XuyangBai/TransFusion.git
cd TransFusion
# 编译、安装mmdetection3d
pip install -v -e .

报错

报错一

error: subprocess-exited-with-error
× Running setup.py install for mmpycocotools did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
running install
/opt/conda/envs/pycoco/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running build
running build_py
creating build/lib.linux-x86_64-cpython-37
creating build/lib.linux-x86_64-cpython-37/pycocotools
copying pycocotools/init.py -> build/lib.linux-x86_64-cpython-37/pycocotools
copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-cpython-37/pycocotools
copying pycocotools/mask.py -> build/lib.linux-x86_64-cpython-37/pycocotools
copying pycocotools/coco.py -> build/lib.linux-x86_64-cpython-37/pycocotools
running build_ext
building ‘pycocotools._mask’ extension
creating build/temp.linux-x86_64-cpython-37
creating build/temp.linux-x86_64-cpython-37/common
creating build/temp.linux-x86_64-cpython-37/pycocotools
gcc -pthread -B /opt/conda/envs/pycoco/compiler_compat -Wl,–sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/conda/envs/pycoco/lib/python3.7/site-packages/numpy/core/include -Icommon -I/opt/conda/envs/pycoco/include/python3.7m -c …/common/maskApi.c -o build/temp.linux-x86_64-cpython-37/…/common/maskApi.o
gcc: error: …/common/maskApi.c: No such file or directory
error: command ‘/usr/bin/gcc’ failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> mmpycocotools

解决办法：
安装指定版本的Cython

pip install Cython==0.29.36

报错二

在运行pip install -v -e .时，遇到如下报错：

/XXX/XXX/TransFusion/mmdet3d/ops/voxel/src/scatter_points_cuda.cu(272): error: no instance of overloaded function “at::Tensor::index_put_” matches the argument list
argument types are: (at::Tensor, at::Tensor)
object type is: at::Tensor
1 error detected in the compilation of “/XXX/XXX/TransFusion/mmdet3d/ops/voxel/src/scatter_points_cuda.cu”.
ninja: build stopped: subcommand failed.

解决方法：
打开scatter_points_cuda.cu：

vim mmdet3d/ops/voxel/src/scatter_points_cuda.cu

修改第272行代码为coors_map.index_put_({coors_id_argsort}, coors_map_sorted);

报错三

博主在准备nuscenes数据时，遇到如下的报错：

AttributeError: module ‘pycocotools’ has no attribute ‘version’

首先卸载pycocotools：

pip uninstall pycocotools

然后安装mmpycocotools：

pip install mmpycocotools

然后遇到

“ModuleNotFoundError: No module named ‘pycocotools’ ”

重新安装mmpycocotools：

pip uninstall mmpycocotools
pip install mmpycocotools

报错四

AttributeError: module ‘distutils’ has no attribute ‘version’

解决办法：
安装指定版本的setuptools

pip install setuptools==59.5.0

报错五

在模型训练时遇到此错误：

RuntimeError: /XXX/XXX/TransFusion/mmdet3d/ops/spconv/src/indice_cuda.cu 118
cuda execution failed with error 700
terminate called after throwing an instance of ‘c10::CUDAError’
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at …/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2292e25d62 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x1c4d3 (0x7f22930884d3 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a2 (0x7f2293088ee2 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7f2292e0f314 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x299ee9 (0x7f2181b7aee9 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0xae8069 (0x7f21823c9069 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x2b9 (0x7f21823c9389 in /XXX/XXX/miniconda3/envs/transfusion/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: python() [0x497017]
frame #8: python() [0x4a0a87]
frame #9: python() [0x4b5cfb]
frame #10: python() [0x4b5cfb]
frame #11: python() [0x4b0858]
frame #12: python() [0x4c5b50]
frame #13: python() [0x4c5b66]
frame #14: python() [0x4c5b66]
frame #15: python() [0x4c5b66]
frame #16: python() [0x4c5b66]
frame #17: python() [0x4c5b66]
frame #18: python() [0x4c5b66]
frame #19: python() [0x4946f7]

frame #23: python() [0x53fc79]
frame #25: __libc_start_main + 0xe7 (0x7f22a7bc9c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #26: python() [0x53f9ee]

解决方法：
如果是单卡训练，使用–gpu-id 0进行训练，如果使用gpu1/2/3均会报次错误；
如果是多卡训练，gpu-id要从0开始。

附：
在github上有通过把mmdet3d/ops/spconv/src/indice_cuda.cu文件里面所有的4096改为256来解决此报错的，但我没有尝试，如果使用上述方法未解决此问题，可以修改一下试试。

报错六

在模型训练时遇到此错误：

RuntimeError: CUDA error: out of memory

解决方法：
（1）减少batch-size；
（2）如果是单卡训练，观察0号显卡的显存是否已经满了，因为在其他卡（非0号卡）上训练时，需要占用一部分0号卡的显卡；
（3）换用更大显存的显卡（建议至少16G）。

报错七

在模型训练时遇到此错误：

RuntimeError: shape ‘[-1, 4, 16]’ is invalid for input of size 2160

解决方法：
检查pointcloud的维度（N,C），确认C的维度和模型参数想匹配，可以print()一下变量的shape，以便进一步查错。

报错八

File “/XXX/XXX/mmdet3d/ops/spconv/ops.py”, line 92, in get_indice_pairs
return get_indice_pairs_func(
RuntimeError: mmdet3d/ops/spconv/src/indice_cuda.cu 124
cuda execution failed with error 2

我反复Debug没有找到原因，后来，在网上找到的：此报错的原因是显存不够，建议换用更大显存的显卡，成功解决。