配置环境
配置环境进行了好多次尝试(大部分都是失败告终,,)记录一下遇到的问题(坑)还有解决方案
主要参考这个专栏。
实验室服务器
conda create -n bevdet python=3.8 conda activate bevdet # 3 虚拟环境bevdet中安装torch pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html # 4 虚拟环境bevdet中安装openlib相关库 mmcv-full安装耗时比较长,只要鼠标没卡住都是正常 pip install mmcv-full==1.5.3 onnxruntime-gpu==1.8.1 mmdet==2.25.1 mmsegmentation==0.25.0 # 5 进入BEVDet工程目录,安装mmdet3d pip install -e -v . # 6 安装其他依赖 numpy==1.23.4 setuptools==58.2.0等 pip install pycuda lyft_dataset_sdk networkx==2.2 numba==0.53.0 numpy==1.23.4 nuscenes-devkit plyfile scikit-image tensorboard trimesh==2.35.39 setuptools==58.2.0
报错:
RuntimeError: The detected CUDA version (10.2) mismatches the version that was used to compile PyTorch (11.3). Please make sure to use the same CUDA versions.
查看cuda版本:
cat /usr/local/cuda/version.txt --- CUDA Version 10.2.89
淦,实验室服务器最高到cuda版本10.2 的,放弃(这是可以说的吗)
在自己电脑上配置环境:
mac连cuda都没得装,,别想了,,
尝试云服务器autodl
安装Torch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu113
安装mmcv-full报错,
尝试重新安装
conda install -c conda-forge mmcv-full==1.5.3
还是不行??什么情况??
最后发现上面的torch安装(按照cu113)会直接安装最新的pytorch1.12.0而不是要的1.10.0,,只好全部重新搞了,,,
全部木大!
安装torch和cuda:
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install mmcv-full==1.5.3 onnxruntime-gpu==1.8.1 mmdet==2.25.1 mmsegmentation==0.25.0
# 先进入项目文件夹 pip install -e -v .
pip install pycuda lyft_dataset_sdk networkx==2.2 numba==0.53.0 numpy==1.23.4 nuscenes-devkit plyfile scikit-image tensorboard trimesh==2.35.39 setuptools==58.2.0
这次安装至少是没报错,,算是配置好了,,
开练!
-
先准备数据集,参考这个博客的第二小节以及readme文件
# 1 新建ckpts文件夹并进入 mkdir ckpts && cd ckpts # 2 下载resnet50-0676ba61.pth wget https://download.pytorch.org/models/resnet50-0676ba61.pth
经常是第一次下载很慢,重新下载就快一些了
-
修改
configs/bevdet/bevdet-r50.py
参数
pretrained='./ckpts/resnet50-0676ba61.pth' samples_per_gpu=1, # 每个 GPU 上的样本数, 决定了每个GPU的批量大小(batch size) workers_per_gpu=0, # 每个 GPU 上的工作线程数,也就是用来加载数据的子进程数。 # 这个参数决定了数据加载的速度和效率 max_epochs = 2
-
OO,启动!
python tools/train.py ./configs/bevdet/bevdet-r50.py
报错:
File "/root/miniconda3/envs/bevdet/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in <module> from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception
重新安装了numpy==1.23.4,问题解决
File "/root/miniconda3/envs/bevdet/lib/python3.8/site-packages/mmcv/utils/config.py", line 508, in pretty_text text, _ = FormatCode(text, style_config=yapf_style, verify=True) TypeError: FormatCode() got an unexpected keyword argument 'verify'
参考这个issue,pip install yapf==0.40.1 问题解决
File "/root/miniconda3/envs/bevdet/lib/python3.8/site-packages/torch/serialization.py", line 242, in __init__ super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
原因是模型文件损坏,,原来是前面下resnet50权重,第一次下载太慢中断后,没下完的模型保存下来了,,把那个删除掉,后来下好的权重改好名字,解决。
AttributeError: module 'distutils' has no attribute 'version'
重新安装了setuptools==58.0.4,解决(要b溃了
终于跑起来了!皇天不负有心人,感动中国了
看代码
项目文件在mmdetection3d项目上扩展而来
mmdetection3d ├── mmdet3d ├── tools ├── configs ├── data │ ├── nuscenes │ │ ├── maps │ │ ├── samples │ │ ├── sweeps │ │ ├── v1.0-test | | ├── v1.0-trainval
有很多东西并没有用到
看:
tools/create_data_bevdet.py configs/bevdet/bevdet-r50.py tools/train.py mmdet3d/apis/train.py mmdet3d/datasets/nuscenes_dataset.py mmdet3d/models/detectors/bevdet.py mmdet3d/models/necks/view_transformer.py
tools/train.py
疑惑
-
config文件中model.type='BEVDet',不是下文中这个:
if cfg.model.type in ['EncoderDecoder3D']: logger_name = 'mmseg' else: logger_name = 'mmdet' logger = get_root_logger( log_file=log_file, log_level=cfg.log_level, name=logger_name)
mmdet3d/apis/train.py中也有一个判断该类型的地方
# TODO: ugly workaround to judge whether we are training det or seg model if cfg.model.type in ['EncoderDecoder3D']: logger_name = 'mmseg' else: logger_name = 'mmdet'
mmdet3d/apis/train.py
HOOK FP16
Fp16OptimizerHook()
# fp16 setting # 从config里读取fp16字段,如果没有为None; fp16_cfg = cfg.get('fp16', None) if fp16_cfg is not None: # 如果我们设置了fp16,则会生成一个Fp16OptimizerHook的实例 optimizer_config = Fp16OptimizerHook(**cfg.optimizer_config, **fp16_cfg, distributed=distributed) elif distributed and 'type' not in cfg.optimizer_config: # 如果我们没有设置,则正常从config里面读取optimizer_config # 比如bevdet-r50 设置grad_clip: optimizer_config = dict(grad_clip=dict(max_norm=5, norm_type=2)) optimizer_config = OptimizerHook(**cfg.optimizer_config) else: optimizer_config = cfg.optimizer_config # 然后注册训练的hooks, optimizer_config会被当参数传进去 # register hooks runner.register_training_hooks( cfg.lr_config, optimizer_config, cfg.checkpoint_config, cfg.log_config, cfg.get('momentum_config', None), custom_hooks_config=cfg.get('custom_hooks', None))
数据预处理Pipeline
先下载数据
下载完成后,由于我们使用的mini
数据集进行测试,而源码中需要full数据集, 我们将v1.0-mini
复制一份并命名为v1.0-trainval
数据集data目录如下:
data └──nuscenes ├── gts ├── maps ├── samples ├── sweeps ├── v1.0-mini └── v1.0-trainval # 复制的文件夹
运行tools/create_data_bevdet.py
生成数据集。
运行完成后会在./data/nuscenes
下生成bevdetv2-nuscenes_infos_train.pkl
与bevdetv2-nuscenes_infos_val.pkl
两个文件夹。
导入数据集
Train.py中,根据config中data.train配置,build_dataset(cfg.data.train)
mmdet3d将原始的nuScenes进行转换,使得各个数据集之间保持统一的格式,这个是离线完成的
BEVDet所使用的mmdet3d版本,处理后的数据在坐标系定义方面和原始的nuScenes保持一致。如果使用新版的mmdet3d处理的数据,就有可能出现mAOE特别差的情况,因为新版的mmdet3d处理后的数据在坐标系定义方面和原始的nuScenes不一致
模型
model = dict( type='BEVDet', img_backbone=dict( pretrained='torchvision://resnet50', type='ResNet', depth=50, num_stages=4, out_indices=(2, 3), frozen_stages=-1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=False, with_cp=True, style='pytorch'), img_neck=dict( type='CustomFPN', in_channels=[1024, 2048], out_channels=256, num_outs=1, start_level=0, out_ids=[0]), img_view_transformer=dict( type='LSSViewTransformer', grid_config=grid_config, input_size=data_config['input_size'], in_channels=256, out_channels=numC_Trans, downsample=16), img_bev_encoder_backbone=dict( type='CustomResNet', numC_input=numC_Trans, num_channels=[numC_Trans * 2, numC_Trans * 4, numC_Trans * 8]), img_bev_encoder_neck=dict( type='FPN_LSS', in_channels=numC_Trans * 8 + numC_Trans * 2, out_channels=256), pts_bbox_head=dict( type='CenterHead', in_channels=256, tasks=[ dict(num_class=10, class_names=['car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone']), ], common_heads=dict( reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2), vel=(2, 2)), share_conv_channel=64, bbox_coder=dict( type='CenterPointBBoxCoder', pc_range=point_cloud_range[:2], post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], max_num=500, score_threshold=0.1, out_size_factor=8, voxel_size=voxel_size[:2], code_size=9), separate_head=dict( type='SeparateHead', init_bias=-2.19, final_kernel=3), loss_cls=dict(type='GaussianFocalLoss', reduction='mean'), loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25), norm_bbox=True),
img_backbone: ResNet
# 位置 /mmdet/models/backbones/resnet.py img_backbone=dict( pretrained='torchvision://resnet50', type='ResNet', depth=50, num_stages=4, out_indices=(2, 3), frozen_stages=-1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=False, with_cp=True, style='pytorch'),
没有用mmdet3d的resnet
img_neck: CustomFPN
# 位置mmdet3d/models/necks/fpn.py img_neck=dict( type='CustomFPN', in_channels=[1024, 2048], out_channels=256, num_outs=1, start_level=0, out_ids=[0]),
view_transformation: LSSViewTransformer
模型定义中的
img_view_transformer=dict( type='LSSViewTransformer', grid_config=grid_config, input_size=data_config['input_size'], in_channels=256, out_channels=numC_Trans, downsample=16), # 位于mmdet3d/models/necks/view_transformer.py
forward()
前向传递
返回 torch.tensor: Bird-eye-view feature in shape (B, C, H_BEV, W_BEV)
走深度估计网络获取深度特征
x = self.depth_net(x) ### self.depth_net = nn.Conv2d(in_channels, self.D + self.out_channels, kernel_size=1, padding=0)
view_transform(self, input, depth, tran_feat)
view_transform_core(input, depth, tran_feat)
self.get_lidar_coor()
view_transform()
view_transform_core()
CustomResNet
# 位置mmdet3d/models/backbones/resnet.py img_bev_encoder_backbone=dict( type='CustomResNet', numC_input=numC_Trans, # 64 num_channels=[numC_Trans * 2, numC_Trans * 4, numC_Trans * 8]),
FPN_LSS
# 位置mmdet3d/models/necks/lss_fpn.py img_bev_encoder_neck=dict( type='FPN_LSS', in_channels=numC_Trans * 8 + numC_Trans * 2, # 640 out_channels=256),
CenterHead
# 位置mmdet3d/models/dense_heads/centerpoint_head.py # pts -> Proposal, Tracking, Segmentation pts_bbox_head=dict( type='CenterHead', in_channels=256, tasks=[ dict(num_class=10, class_names=['car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone']), ], common_heads=dict( reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2), vel=(2, 2)), share_conv_channel=64, bbox_coder=dict( type='CenterPointBBoxCoder', pc_range=point_cloud_range[:2], post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], max_num=500, score_threshold=0.1, out_size_factor=8, voxel_size=voxel_size[:2], code_size=9), separate_head=dict( type='SeparateHead', init_bias=-2.19, final_kernel=3), loss_cls=dict(type='GaussianFocalLoss', reduction='mean'), loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25), norm_bbox=True),
模型主要模块的代码都看了,ing view encoder,view transformation,bev encoder,
mmdet3d的这个框架了解了大概的运行方式,config,pipeline,module,register,不过hook还没有太看明白,还有检测头的部分一些基础知识可能还需要再了解一下,centerpoint,热力图
用云服务器配置了一下环境 跑了一下模型,输出log方便看到总体模型结构
模型结构疑问:
paper中模型结构图img encoder部分的通道数对不上,,backbone是resnet50,neck是fpn,通道数和图里的不一样,是哪里没理解对吗
mmdet3d框架感觉功能很强大,,但是学习门槛不低,,封装程度太高了,深入学习一下