BEVDET环境配置和代码学习

配置环境

配置环境进行了好多次尝试(大部分都是失败告终,,)记录一下遇到的问题(坑)还有解决方案

主要参考这个专栏

实验室服务器
conda create -n bevdet python=3.8
conda activate bevdet
​
# 3 虚拟环境bevdet中安装torch
pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
​
# 4 虚拟环境bevdet中安装openlib相关库 mmcv-full安装耗时比较长,只要鼠标没卡住都是正常
pip install mmcv-full==1.5.3 onnxruntime-gpu==1.8.1 mmdet==2.25.1 mmsegmentation==0.25.0
​
# 5 进入BEVDet工程目录,安装mmdet3d
pip install -e -v .
​
# 6 安装其他依赖 numpy==1.23.4 setuptools==58.2.0等
pip install pycuda lyft_dataset_sdk networkx==2.2 numba==0.53.0 numpy==1.23.4 nuscenes-devkit plyfile scikit-image tensorboard trimesh==2.35.39 setuptools==58.2.0
​

报错:

RuntimeError: The detected CUDA version (10.2) mismatches the version that was used to compile PyTorch (11.3). Please make sure to use the same CUDA versions.

查看cuda版本:

cat /usr/local/cuda/version.txt
---
CUDA Version 10.2.89

淦,实验室服务器最高到cuda版本10.2 的,放弃(这是可以说的吗)

在自己电脑上配置环境:

mac连cuda都没得装,,别想了,,

尝试云服务器autodl

安装Torch

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu113

安装mmcv-full报错,

尝试重新安装

conda install -c conda-forge mmcv-full==1.5.3

还是不行??什么情况??

最后发现上面的torch安装(按照cu113)会直接安装最新的pytorch1.12.0而不是要的1.10.0,,只好全部重新搞了,,,

全部木大!

安装torch和cuda:

conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install mmcv-full==1.5.3 onnxruntime-gpu==1.8.1 mmdet==2.25.1 mmsegmentation==0.25.0
# 先进入项目文件夹
pip install -e -v .
pip install pycuda lyft_dataset_sdk networkx==2.2 numba==0.53.0 numpy==1.23.4 nuscenes-devkit plyfile scikit-image tensorboard trimesh==2.35.39 setuptools==58.2.0

这次安装至少是没报错,,算是配置好了,,

开练!

  1. 先准备数据集,参考这个博客的第二小节以及readme文件

# 1 新建ckpts文件夹并进入
mkdir ckpts && cd ckpts

# 2 下载resnet50-0676ba61.pth
wget https://download.pytorch.org/models/resnet50-0676ba61.pth

经常是第一次下载很慢,重新下载就快一些了

  1. 修改configs/bevdet/bevdet-r50.py参数

pretrained='./ckpts/resnet50-0676ba61.pth'

samples_per_gpu=1,  # 每个 GPU 上的样本数, 决定了每个GPU的批量大小(batch size)
workers_per_gpu=0,  # 每个 GPU 上的工作线程数,也就是用来加载数据的子进程数。
                    # 这个参数决定了数据加载的速度和效率
  
 max_epochs = 2
  1. OO,启动!

python tools/train.py ./configs/bevdet/bevdet-r50.py

报错:

File "/root/miniconda3/envs/bevdet/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in <module>
    from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception

重新安装了numpy==1.23.4,问题解决

File "/root/miniconda3/envs/bevdet/lib/python3.8/site-packages/mmcv/utils/config.py", line 508, in pretty_text
    text, _ = FormatCode(text, style_config=yapf_style, verify=True)
TypeError: FormatCode() got an unexpected keyword argument 'verify'

参考这个issue,pip install yapf==0.40.1 问题解决

File "/root/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/serialization.py", line 242, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

原因是模型文件损坏,,原来是前面下resnet50权重,第一次下载太慢中断后,没下完的模型保存下来了,,把那个删除掉,后来下好的权重改好名字,解决。

AttributeError: module 'distutils' has no attribute 'version'

重新安装了setuptools==58.0.4,解决(要b溃了

终于跑起来了!皇天不负有心人,感动中国了

看代码

项目文件在mmdetection3d项目上扩展而来

mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│   ├── nuscenes
│   │   ├── maps
│   │   ├── samples
│   │   ├── sweeps
│   │   ├── v1.0-test
|   |   ├── v1.0-trainval

有很多东西并没有用到

看:

tools/create_data_bevdet.py
configs/bevdet/bevdet-r50.py
tools/train.py
mmdet3d/apis/train.py
mmdet3d/datasets/nuscenes_dataset.py
mmdet3d/models/detectors/bevdet.py
mmdet3d/models/necks/view_transformer.py

tools/train.py

疑惑
  1. config文件中model.type='BEVDet',不是下文中这个:

if cfg.model.type in ['EncoderDecoder3D']:
        logger_name = 'mmseg'
    else:
        logger_name = 'mmdet'
    logger = get_root_logger(
        log_file=log_file, log_level=cfg.log_level, name=logger_name)

mmdet3d/apis/train.py中也有一个判断该类型的地方

# TODO: ugly workaround to judge whether we are training det or seg model
    if cfg.model.type in ['EncoderDecoder3D']:
        logger_name = 'mmseg'
    else:
        logger_name = 'mmdet'

mmdet3d/apis/train.py

HOOK FP16

Fp16OptimizerHook()

参考 入门mmdetection(捌)---聊一聊FP16

# fp16 setting
    # 从config里读取fp16字段,如果没有为None;
    fp16_cfg = cfg.get('fp16', None) 
    if fp16_cfg is not None:    # 如果我们设置了fp16,则会生成一个Fp16OptimizerHook的实例
        optimizer_config = Fp16OptimizerHook(**cfg.optimizer_config, **fp16_cfg, distributed=distributed)
    elif distributed and 'type' not in cfg.optimizer_config: 
        # 如果我们没有设置,则正常从config里面读取optimizer_config
        # 比如bevdet-r50 设置grad_clip: optimizer_config = dict(grad_clip=dict(max_norm=5, norm_type=2))
        optimizer_config = OptimizerHook(**cfg.optimizer_config)
    else:
        optimizer_config = cfg.optimizer_config

    # 然后注册训练的hooks, optimizer_config会被当参数传进去
    # register hooks
    runner.register_training_hooks(
        cfg.lr_config,
        optimizer_config,
        cfg.checkpoint_config,
        cfg.log_config,
        cfg.get('momentum_config', None),
        custom_hooks_config=cfg.get('custom_hooks', None))

数据预处理Pipeline

先下载数据

下载完成后,由于我们使用的mini数据集进行测试,而源码中需要full数据集, 我们将v1.0-mini复制一份并命名为v1.0-trainval数据集data目录如下:

data
  └──nuscenes
    ├── gts
    ├── maps
    ├── samples
    ├── sweeps
    ├── v1.0-mini
    └── v1.0-trainval  # 复制的文件夹

运行tools/create_data_bevdet.py生成数据集。

运行完成后会在./data/nuscenes下生成bevdetv2-nuscenes_infos_train.pklbevdetv2-nuscenes_infos_val.pkl两个文件夹。

导入数据集

Train.py中,根据config中data.train配置,build_dataset(cfg.data.train)

mmdet3d将原始的nuScenes进行转换,使得各个数据集之间保持统一的格式,这个是离线完成的

BEVDet所使用的mmdet3d版本,处理后的数据在坐标系定义方面和原始的nuScenes保持一致。如果使用新版的mmdet3d处理的数据,就有可能出现mAOE特别差的情况,因为新版的mmdet3d处理后的数据在坐标系定义方面和原始的nuScenes不一致

模型

model = dict(
    type='BEVDet',
    img_backbone=dict(
        pretrained='torchvision://resnet50',
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(2, 3),
        frozen_stages=-1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=False,
        with_cp=True,
        style='pytorch'),
    img_neck=dict(
        type='CustomFPN',
        in_channels=[1024, 2048],
        out_channels=256,
        num_outs=1,
        start_level=0,
        out_ids=[0]),
    img_view_transformer=dict(
        type='LSSViewTransformer',
        grid_config=grid_config,
        input_size=data_config['input_size'],
        in_channels=256,
        out_channels=numC_Trans,
        downsample=16),
    img_bev_encoder_backbone=dict(
        type='CustomResNet',
        numC_input=numC_Trans,
        num_channels=[numC_Trans * 2, numC_Trans * 4, numC_Trans * 8]),
    img_bev_encoder_neck=dict(
        type='FPN_LSS',
        in_channels=numC_Trans * 8 + numC_Trans * 2,
        out_channels=256),
    pts_bbox_head=dict(
        type='CenterHead',
        in_channels=256,
        tasks=[
            dict(num_class=10, class_names=['car', 'truck',
                                            'construction_vehicle',
                                            'bus', 'trailer',
                                            'barrier',
                                            'motorcycle', 'bicycle',
                                            'pedestrian', 'traffic_cone']),
        ],
        common_heads=dict(
            reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2), vel=(2, 2)),
        share_conv_channel=64,
        bbox_coder=dict(
            type='CenterPointBBoxCoder',
            pc_range=point_cloud_range[:2],
            post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
            max_num=500,
            score_threshold=0.1,
            out_size_factor=8,
            voxel_size=voxel_size[:2],
            code_size=9),
        separate_head=dict(
            type='SeparateHead', init_bias=-2.19, final_kernel=3),
        loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
        loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25),
        norm_bbox=True),

img_backbone: ResNet

# 位置 /mmdet/models/backbones/resnet.py
img_backbone=dict(
        pretrained='torchvision://resnet50',
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(2, 3),
        frozen_stages=-1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=False,
        with_cp=True,
        style='pytorch'),

没有用mmdet3d的resnet

img_neck: CustomFPN

# 位置mmdet3d/models/necks/fpn.py
img_neck=dict(
        type='CustomFPN',
        in_channels=[1024, 2048],
        out_channels=256,
        num_outs=1,
        start_level=0,
        out_ids=[0]),

view_transformation: LSSViewTransformer

模型定义中的

img_view_transformer=dict(
        type='LSSViewTransformer',
        grid_config=grid_config,
        input_size=data_config['input_size'],
        in_channels=256,
        out_channels=numC_Trans,
        downsample=16),
# 位于mmdet3d/models/necks/view_transformer.py
forward()

前向传递

返回 torch.tensor: Bird-eye-view feature in shape (B, C, H_BEV, W_BEV)

走深度估计网络获取深度特征

x = self.depth_net(x) 
###
self.depth_net = nn.Conv2d(in_channels, self.D + self.out_channels, kernel_size=1, padding=0)
view_transform(self, input, depth, tran_feat)
view_transform_core(input, depth, tran_feat)
self.get_lidar_coor()

view_transform()

view_transform_core()

CustomResNet

# 位置mmdet3d/models/backbones/resnet.py
img_bev_encoder_backbone=dict(
        type='CustomResNet',
        numC_input=numC_Trans, # 64
        num_channels=[numC_Trans * 2, numC_Trans * 4, numC_Trans * 8]),

FPN_LSS

# 位置mmdet3d/models/necks/lss_fpn.py
img_bev_encoder_neck=dict(
        type='FPN_LSS',
        in_channels=numC_Trans * 8 + numC_Trans * 2, # 640
        out_channels=256), 

CenterHead

# 位置mmdet3d/models/dense_heads/centerpoint_head.py
# pts -> Proposal, Tracking, Segmentation
pts_bbox_head=dict(
        type='CenterHead',
        in_channels=256,
        tasks=[
            dict(num_class=10, class_names=['car', 'truck',
                                            'construction_vehicle',
                                            'bus', 'trailer',
                                            'barrier',
                                            'motorcycle', 'bicycle',
                                            'pedestrian', 'traffic_cone']),
        ],
        common_heads=dict(
            reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2), vel=(2, 2)),
        share_conv_channel=64,
        bbox_coder=dict(
            type='CenterPointBBoxCoder',
            pc_range=point_cloud_range[:2],
            post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
            max_num=500,
            score_threshold=0.1,
            out_size_factor=8,
            voxel_size=voxel_size[:2],
            code_size=9),
        separate_head=dict(
            type='SeparateHead', init_bias=-2.19, final_kernel=3),
        loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
        loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25),
        norm_bbox=True),

模型主要模块的代码都看了,ing view encoder,view transformation,bev encoder,

mmdet3d的这个框架了解了大概的运行方式,config,pipeline,module,register,不过hook还没有太看明白,还有检测头的部分一些基础知识可能还需要再了解一下,centerpoint,热力图

用云服务器配置了一下环境 跑了一下模型,输出log方便看到总体模型结构

模型结构疑问:

paper中模型结构图img encoder部分的通道数对不上,,backbone是resnet50,neck是fpn,通道数和图里的不一样,是哪里没理解对吗

mmdet3d框架感觉功能很强大,,但是学习门槛不低,,封装程度太高了,深入学习一下

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值