环境感知算法——5.BEVFormer基于Nuscenes-mini训练

Augenstern-YaoYao

已于 2024-11-17 10:40:21 修改

阅读量2.1k

点赞数 21

分类专栏：智能驾驶的环境感知算法文章标签：算法计算机视觉 transformer

于 2024-11-16 23:29:46 首次发布

本文链接：https://blog.csdn.net/wenquantongxin/article/details/143824403

版权

智能驾驶的环境感知算法专栏收录该内容

5 篇文章

订阅专栏

1. 前言

BEVFormer 发表于 ECCV 2022，是首个将 Transformer 架构引入多视图 BEV 表示学习的工作。

BEVFormer 引入了可学习的 BEV queries 作为注意力机制的查询,从空间和时序维度聚合场景的上下文信息。在空间维度, BEVFormer 采用了 Deformable Attention 来实现 BEV queries 与相机视图中动态感兴趣区域的交互；在时序维度, BEVFormer 以循环神经网络的思想,通过时序 Self-Attention 从历史BEV特征中提取时序依赖,避免了简单堆叠多帧BEV特征带来的信息冗余和计算开销。

BEV 特征作为一种通用的环境表示,也可以无缝适配下游的多个自动驾驶感知任务。 BEVFormer 完成的 3D 目标检测和地图分割任务，在 nuScenes 基准测试中取得了SOTA效果。

论文链接：BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformershttps://arxiv.org/pdf/2203.17270
项目地址：BEVFormer: a Cutting-edge Baseline for Camera-based Detectionhttps://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md

2. 环境配置

BEVFormer 的官方安装教程：Step-by-step installation instructionshttp://github.com/fundamentalvision/BEVFormer/blob/master/docs/install.md

使用该安装教程可能会遇到一些比较棘手的依赖问题，特别是与 mmcv 相关的编译和安装。

本文将提供一个在 Ubuntu 20.04 中使用 conda 创建虚拟环境并顺利安装 BEVFormer 的详细步骤，无需使用 Docker 环境，也无需反复调整各依赖库的版本。

1) 定位至工作目录，创建 BEVFormer 虚拟环境

cd /home/yaoyao/Documents/BEV/
conda create -n bevformer python=3.8.19 -y
conda activate bevformer

2) 安装系统级依赖

conda install -c defaults _libgcc_mutex=0.1 _openmp_mutex=5.1 ca-certificates=2024.3.11 ld_impl_linux-64=2.38 libffi=3.4.4 libgcc-ng=11.2.0 libgomp=11.2.0 libstdcxx-ng=11.2.0 ncurses=6.4 openssl=3.0.13 readline=8.2 sqlite=3.45.3 tk=8.6.14 wheel=0.43.0 xz=5.4.6 zlib=1.2.13

conda install -c omgarcia gcc-6

3) 安装指定版本的 PyTorch 及相关库

pip install torch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1

4) 安装 mmcv 项目各环境，采用开发模式安装指定版本

# pip install mmcv-full==1.4.0
# 参考：https://github.com/open-mmlab/mmcv/issues/204
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout v1.4.0
MMCV_WITH_OPS=1 pip install -e .

# 在合理位置下载安装 mmdet
git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
git checkout v2.14.0
pip install -r requirements/build.txt
pip install -v -e . 

# 在合理位置下载安装 mmdet3d
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 
pip install -v -e .

# 在合理位置下载安装 mmsegmentation
git clone https://github.com/open-mmlab/mmsegmentation.git
cd mmsegmentation
git checkout v0.14.1
pip install -v -e .

5) 安装其他必要的Python库

pip install numpy==1.19.5 matplotlib==3.5.2 scikit-image==0.19.3 scikit-learn==1.3.2 scipy==1.9.0 pillow==10.3.0 opencv-python==4.9.0.80 tensorboard pycocotools==2.0.7 black==24.4.2 flake8==7.0.0 pytest==8.2.0

pip install einops==0.8.0 fvcore seaborn==0.12.2 iopath==0.1.9 timm==0.6.13  typing-extensions==4.5.0 pylint ipython==8.12  numba==0.48.0 pandas==1.4.4  setuptools==59.5.0

依赖库功能说明：

einops：用于灵活的张量操作
fvcore：Facebook 开源的工具库，提供常用的深度学习工具
seaborn：数据可视化库
iopath：文件 IO 接口，便于文件操作
timm：PyTorch 图像模型库，包含大量预训练模型
typing-extensions：用于兼容不同版本的类型提示
pylint：代码静态检查工具
ipython：交互式 Python 解释器
numba：加速数值计算的 JIT 编译器
pandas：数据分析库
setuptools：Python 包管理工具

安装 Detectron2

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

安装代码格式化工具：

pip install yapf==0.33.0

6) 下载 BEVFormer 源码工程文件

# 在合理位置下载安装
git clone https://github.com/fundamentalvision/BEVFormer.git

7) 准备预训练模型

cd BEVFormer

mkdir ckpts

# 此步不必须，可以自主下载预训练模型
# 官方的安装教程将下载 r101_dcn_fcos3d_pretrain.pth 预训练模型
cd ckpts & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth

也可参阅 Model Zoo 所示图片，根据机器的显存选取合适的预训练模型。

各预训练模型的下载链接可以在图中找到。下载后的.pth 文件置于 ./BEVFormer/ckpts 目录下。

3. 下载与整理数据集

前往 nuScenes 官方网站下载数据集。使用体积较小的 v1.0-mini 版本（约 5GB）。下载地址为https://www.nuscenes.org/nuscenes。

1) 下载 mini 版本数据集

mini数据集包括 nuscenes和can_bus两个文件夹，请下载/复制/软链接至 ./BEVFormer/data 目录下。

2) 修改tools/data_converter/indoor_converter.py代码

如果遇到ModuleNotFoundError: No module named 'tools.data_converter' #2352的问题，请参阅：
[Bug] ModuleNotFoundError: No module named 'tools.data_converter' #2352https://github.com/open-mmlab/mmdetection3d/issues/2352修改 ./BEVFormer/tools/data_converter/indoor_converter.py的前几行，将如下代码：

from tools.data_converter.s3dis_data_utils import S3DISData, S3DISSegData
from tools.data_converter.scannet_data_utils import ScanNetData, ScanNetSegData
from tools.data_converter.sunrgbd_data_utils import SUNRGBDData

修改为

import sys
sys.path.append('/home/yaoyao/Documents/Bev/BEVFormer/tools/') # tools 文件夹的实际路径 
from data_converter.s3dis_data_utils import S3DISData, S3DISSegData
from data_converter.scannet_data_utils import ScanNetData, ScanNetSegData
from data_converter.sunrgbd_data_utils import SUNRGBDData

请将 sys.path.append 中的路径替换为实际的 tools 文件夹路径。

3) 生成标注文件

使用以下命令生成标注文件（注意指定数据集的版本为 v1.0-mini）：

python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes --version v1.0-mini --canbus ./data

生成的工程脚手架为

(bevformer) yaoyao@yaoyao:~/Documents/Bev/BEVFormer/data$ tree -L 2
.
|-- can_bus
|   |-- scene-0001_meta.json
|   |-- scene-0001_ms_imu.json
|   |-- scene-0001_pose.json
|   |-- scene-0001_route.json
|   |-- scene-0001_steeranglefeedback.json
|   |-- scene-0001_vehicle_monitor.json
|   |-- scene-0001_zoe_veh_info.json
|   |-- scene-0001_zoesensors.json
|   |-- scene-0002_meta.json
|   |-- scene-0002_ms_imu.json
|   |-- scene-0002_pose.json
|   |-- scene-0002_route.json
|   |-- scene-0002_steeranglefeedback.json
（省略）
|   |-- scene-1110_pose.json
|   |-- scene-1110_route.json
|   |-- scene-1110_steeranglefeedback.json
|   |-- scene-1110_vehicle_monitor.json
|   |-- scene-1110_zoe_veh_info.json
|   `-- scene-1110_zoesensors.json
|-- can_bus.zip（此压缩包不必须存在）
|-- nuscenes
|   |-- maps
|   |-- nuscenes_infos_temporal_train.pkl
|   |-- nuscenes_infos_temporal_train_mono3d.coco.json
|   |-- nuscenes_infos_temporal_val.pkl
|   |-- nuscenes_infos_temporal_val_mono3d.coco.json
|   |-- samples
|   |-- sweeps
|   `-- v1.0-mini
`-- nuscenes-mini.zip（此压缩包不必须存在）

4. 训练与测试

1) 修改数据加载线程数

将projects/configs/bevformer/bevformer_small.py的workers_per_gpu=4改为0，否则会报错。

data = dict(
    samples_per_gpu=1,
    workers_per_gpu=0,

2) 修改训练周期和预训练模型路径

# learning policy
lr_config = dict(
    policy='CosineAnnealing',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    min_lr_ratio=1e-3)
total_epochs = 12 # 修改总训练 epochs
evaluation = dict(interval=1, pipeline=test_pipeline)

runner = dict(type='EpochBasedRunner', max_epochs=total_epochs)
load_from = 'ckpts/epoch_24.pth' # 预训练模型路径
log_config = dict(

3) 启动训练

使用以下命令启动 bevformer_small 模型的训练（末尾的数字表示使用的 GPU 数量）：

./tools/dist_train.sh ./projects/configs/bevformer/bevformer_small.py 2 # 末尾参数为显卡个数

训练过程中，会输出类似以下的日志信息：

2024-11-04 00:28:30,886 - mmdet - INFO - Epoch [6][150/162]     
lr: 1.260e-04, eta: 1:05:03, time: 4.183, data_time: 1.980, 
memory: 7354, loss_cls: 0.5116, loss_bbox: 0.9478, d0.loss_cls: 
0.5447, d0.loss_bbox: 1.0079, d1.loss_cls: 0.5185, d1.loss_bbox: 
0.9737, d2.loss_cls: 0.5101, d2.loss_bbox: 0.9595, d3.loss_cls: 
0.5151, d3.loss_bbox: 0.9566, d4.loss_cls: 0.5157, d4.loss_bbox: 
0.9499, loss: 8.9112, grad_norm: 47.8471

字段解释：

Epoch [6][150/162]: 当前是第 6 个训练周期（Epoch），当前迭代是第 150 次，总共 162 次迭代
lr: 1.260e-04: 当前学习率（Learning Rate），以科学计数法表示为 1.260 \times 10^{-4}
eta: 1:05:03: 预计剩余时间（Estimated Time of Arrival），即预计训练完成还需 1 小时 5 分 3 秒
time: 4.183: 当前迭代所用的时间（Time），单位为秒
data_time: 1.980: 数据加载时间（Data Time），即当前迭代中数据加载的时间，单位为秒
memory: 7354: 当前使用的 GPU 内存（Memory），单位为 MB
loss_cls: 0.5116: 分类损失（Classification Loss），表示模型在当前迭代中的分类损失值。
loss_bbox: 0.9478: 边界框损失（Bounding Box Loss），表示模型在当前迭代中的边界框回归损失值
d0.loss_cls, d0.loss_bbox: 在不同特征层（如 d0, d1, d2, d3, d4）上的分类和边界框损失。这里，d0 表示第一个特征层，损失分别为 0.5447（分类）和 1.0079（边界框）
d1.loss_cls, d1.loss_bbox: 类似地，d1 表示第二个特征层的分类和边界框损失
loss: 8.9112: 总损失（Total Loss），即所有损失项的累加值，反映了模型整体的训练效果
grad_norm: 47.8471: 梯度范数（Gradient Norm），表示当前迭代中所有参数梯度的范数，用于衡量梯度的大小

训练过程中，还会输出模型在验证集上的评估结果，例如：

mAP: 0.0243
mATE: 1.0582
mASE: 0.7854
mAOE: 1.1357
mAVE: 0.9730
mAAE: 0.9004
NDS: 0.0463
Eval time: 17.9s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.107   1.202   0.226   1.359   1.114   0.652
truck   0.000   1.054   0.566   1.265   0.964   0.815
bus     0.000   1.000   1.000   1.000   1.000   1.000
trailer 0.000   1.000   1.000   1.000   1.000   1.000
construction_vehicle    0.000   1.000   1.000   1.000   1.000   1.000
pedestrian      0.136   1.049   0.287   1.597   0.706   0.736
motorcycle      0.000   1.000   1.000   1.000   1.000   1.000
bicycle 0.000   1.000   1.000   1.000   1.000   1.000
traffic_cone    0.000   1.278   0.775   nan     nan     nan
barrier 0.000   1.000   1.000   1.000   nan     nan
2024-11-04 00:18:02,889 - mmdet - INFO - Exp name: bevformer_small.py

评估指标解释：

i. 总体评估指标

mAP (Mean Average Precision): 平均精度均值，表示模型在检测任务中的整体精度。值越高，模型性能越好。这里的值较低，表明模型在目标检测方面的效果不佳。
mATE (Mean Absolute Translation Error): 平均绝对平移误差，衡量模型预测的物体位置与真实位置之间的平均绝对差异。值越小，表示位置预测越准确。
mASE (Mean Absolute Scale Error): 平均绝对尺度误差，反映模型在物体尺度预测上的准确性。较低的值表示更好的尺度预测性能。
mAOE (Mean Absolute Orientation Error): 平均绝对朝向误差，表示预测的物体朝向与真实朝向之间的平均绝对差异。值较低表示模型在方向预测方面的准确性较高。
mAVE (Mean Absolute Velocity Error): 平均绝对速度误差，表示模型预测的物体速度与真实速度之间的平均绝对差异。
mAAE (Mean Absolute Acceleration Error): 平均绝对加速度误差，表示模型在加速度预测上的性能。
NDS (NuScenes Detection Score): 综合评估指标，将精度和回归误差结合在一起。值越高，表示整体检测性能越好。
Eval time: 模型评估所用时间

ii. 每类物体评估结果

Object Class: 物体类别，如车（car）、卡车（truck）、行人（pedestrian）等
AP (Average Precision): 每个类别的平均精度
ATE (Absolute Translation Error): 每个类别的平均绝对平移误差
ASE (Absolute Scale Error): 每个类别的平均绝对尺度误差
AOE (Absolute Orientation Error): 每个类别的平均绝对朝向误差
AVE (Absolute Velocity Error): 每个类别的平均绝对速度误差
AAE (Absolute Acceleration Error): 每个类别的平均绝对加速度误差

4) 训练结束后，使用训练获得的 .pth 权重预测

./tools/dist_test.sh ./projects/configs/bevformer/bevformer_small.py work_dirs/bevformer_small/latest.pth 1

生成的结果文件在 test 目录之中，即为 ./bevformer/test/bevformer_small/..._时间戳_2024/pts_bbox/results_nusc.json。其中 results_nusc.json 就是检测结果。

5) 可视化结果

编辑 tools/analysis_tools/visual.py，替换主函数为：

if __name__ == '__main__':
    # 数据集路径，使用mini就用v1.0-mini, 使用full就用v1.0-trainval
    nusc = NuScenes(version='v1.0-mini', dataroot='./data/nuscenes', verbose=False)
    # results_nusc.json路径
    bevformer_results = mmcv.load('test/bevformer_base/Thu_Sep_28_09_35_31_2023/pts_bbox/results_nusc.json')
    # 添加result目录
    save_dir = "result"
    if not os.path.exists(save_dir):
        os.mkdir(save_dir)

    sample_token_list = list(bevformer_results['results'].keys())
    
    for id in range(0, 10):
        render_sample_data(sample_token_list[id], pred_data=bevformer_results, out_path=os.path.join(save_dir, sample_token_list[id]))

注释掉需要手动关闭窗口的代码：

# if verbose:
#     plt.show()

修改上述 bevformer_results = 所在行的 results_nusc.json 路径与文件之后，运行可视化脚本：

python tools/analysis_tools/visual.py

终端会提示：

Loading NuScenes tables for version v1.0-mini...
23 category,
8 attribute,
4 visibility,
911 instance,
12 sensor,
120 calibrated_sensor,
31206 ego_pose,
8 log,
10 scene,
404 sample,
31206 sample_data,
18538 sample_annotation,
4 map,
Done loading in 0.526 seconds.
======
Reverse indexing ...
Done reverse indexing in 0.2 seconds.
======
green is ground truth
blue is the predited result

需要等待几分钟，之后有提示如下，表明生成了第一张 BEV 测试图：

Rendering sample token 3e8750f331d7499e9b5123e9eb70f2e2

整体评估时间较长，result 文件夹之中会出现 20 张 BEV 与 camera 的测试图。

由于模型小、训练周期短，可视化的效果不佳。结果图像会保存在 result 文件夹中。绿色表示真实标注（ground truth），蓝色表示模型预测结果。

5. 参考资料

《万字长文理解纯视觉感知算法 —— BEVFormer》https://zhuanlan.zhihu.com/p/543335939
《BEVFormer复现（使用docker搭建训练环境）》https://blog.csdn.net/m0_55127902/article/details/141938490
《BEVFormer代码复现实践》https://blog.csdn.net/h904798869/article/details/133377388
《[Bug] ModuleNotFoundError: No module named 'tools.data_converter' #2352》https://github.com/open-mmlab/mmdetection3d/issues/2352