Multi-View 3D Object Detection Network for Autonomous Driving

动机:

We propose Multi-View 3D networks (MV3D), a sensory-fusion framework that takes both LIDAR point cloud and RGB images as input and predicts oriented 3D bounding boxes
【CC】LIDAR和Camera融合的的Dectection方案

The main idea for utilizing multimodal information is to perform region-based feature fusion
【CC】多模态数据做区域特征融合

网络概览:

The network is composed of two subnetworks: one for 3D object proposal generation and another for multi-view feature fusion.
在这里插入图片描述
The 3D proposal network utilizes a bird`s eye view representation of point cloud to generate highly acurate 3D candidate boxes representation of point cloud to generate highly accurate 3D candidate boxes
The multi-view fusion network extracts region-wise features by projecting 3D proposals to the feature maps from mulitple views
【CC】从LIDAR的BEV给出3D Proposals,然后投射到LIDAR和Camera的前视FV;图中橙色-红色-蓝色-绿色的地方;有点two-stage的方式??

In our multi-view network, each view has the same architecture. The base network is built on the 16-layer VGG net with the following modifications:
【CC】三个VIEW的骨干网络都是以16Layer的VGG为基准,稍作改动;
• To handle extra-small objects, we insert a 2x bilinear upsampling layer before feeding the last convolution feature map to the 3D
Proposal Network. Similarly, we insert a 4x/4x/2x upsampling layer before the ROI pooling layer for the BV/FV/RGB branch.
【CC】增加线性插值做上采样
• We remove the 4th pooling operation in the original VGG network, thus the convolution parts of our network proceed 8x downsampling.
【CC】最后的池化层删了
• In the muti-view fusion network, we add an extra fully connected layer fc8 in addition to the origina

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值