SECOND

最新推荐文章于 2023-08-28 16:00:46 发布

xinxiang7

最新推荐文章于 2023-08-28 16:00:46 发布

阅读量342

点赞数

分类专栏： paper阅读文章标签： 3D目标检测激光雷达

本文链接：https://blog.csdn.net/xinxiang7/article/details/114027074

版权

paper阅读专栏收录该内容

14 篇文章 4 订阅

订阅专栏

SECOND: Sparsely Embedded Convolutional Detection

文章：SECOND

该文章是已经离职百度的李博所创的公司（主线科技）发表的。

Abstract

问题：

推理速度较慢
方向评估表现不好

方法：

改善后的稀疏卷积方法，显著提升训练和推理速度
提出一个角度损失回归方法，改善方向评估表现
一种新的数据增强方法，提升收敛速度和表现

结果：

在当时的KITTI上表现最好
推理速度相对较快，对于较大的模型20FPS，较小的模型40FPS。

Introduction

VoxelNet: 实现了单级端到端的网络，而且表现不错，但是非常耗时，推理速度太慢。

文章引入了SECOND方法，引入了稀疏卷积方法，尝试解决这个问题。同时还对稀疏卷积引入了基于GPU的规则生成算法加快速度。
点云的另外一个优势是通过直接对目标上的点进行变换进行缩放、旋转和平移。所以引入了一个数据增强的方法，该方法显著增强收敛速度和模型的最后表现。
一种新的角度损失回归方法来解决真实和预测之间的角度等于 $\pi$ 的差值时的大损失问题。

Related Work

Front-View- and Image-Based Methods

Image-based methods:

Monocular 3D object detection for autonomous driving,

3D bounding box estimation using deep learning and geometry

Front-view-based methods:

Vehicle detection from 3D lidar using fully convolutional network,

Bird’s-Eye-View-Based Methods

MV3D, Complex-YOLO, PIXOR

3D-Based Methods

VoteNet, Vote3deep, Pointnet, Pointnet++, PointCNN, 3D FCN, VoxelNet,

Fusion-Based Methods

Deep sliding shapes for amodal 3D object detection in rgb-d images, MV3D, Joint 3D Proposal Generation and Object Detection from View Aggregation, F-Pointnet

SECOND Detector

Network Architecture

second detector:

逐体素的特征提取器
稀疏卷积中间层
RPN

Point Cloud Grouping

 [[3, 1] × [[40, 40] × [0, 70.4] m along the z × y × x axes
 For pedestrian and cyclist detection, we use crop points at [[3, 1] × [[20, 20] × [0, 48] m
 For our smaller model, we use only points within the range of [[3, 1] × [[32, 32] × [0, 52.8] m
 voxel size of vD = 0.4 × vH = 0.2 × vW = 0.2 m
car detection: T = 35
pedestrian and cyclist detection: T = 45

Voxelwise Feature Extractor

该部分参考VoxelNet的voxel feature encode layer

得到的矩阵特征矩阵为：

Car: D'x H' x W' = 10x400x352
Pedestrian, cyclist: D'x H' x W' = 10x400x240
small model: D'x H' x W' = 10x320x264

Sparse Convolutional Middle Extractor

Sparse Convolution Algorithm

参考3D Semantic Segmentation with Submanifold Sparse Convolutional Networks，了解稀疏卷积的实现方式。

Region Proposal Network

类似于SSD的RPN结构；

Anchors and Targets

Car: w = 1.6 × l = 3.9 × h = 1.56 m, centered at z = -1.0 m
pedestrians: w = 0.6 × l = 0.8 × h = 1.73 m  z = -0.6m
cyclists: w = 0.6 × l = 1.76 × h = 1.73 m
car: IoU threshold 0.45 - 0.6
pedestrians and cyclists: IoU threshold 0.35 - 0.5

second_loss1

Training and Inference

Loss

Sine-Error Loss for Angle Regression

second_loss2

Focal Loss for Classification

second_loss3

Total Training Loss

second_loss4

Data Augmentation

Sample Ground Truths from the Database

从数据库中生成一个数据库，其中包含了所有真实标签和它们的点云数据
从数据库中随机选择一些真实样本，并将它们放入当前的训练点云中

Object Noise

对每个真实样本和其对应点使用从均匀分布采样∆θ∈[π/2，π/2]进行随机旋转和从高斯分布（平均为零，标准差为1.0）采样的随机线性变换。

Global Rotation and Scaling

对所有点云和真实边界框进行全局缩放和旋转。缩放采用的是均匀分布 [0.95, 1.05],旋转采用的是[-π/4, π/4]

Optimization

Adam
GTX 1080 Ti GPU
160 epochs
initial learning rate was 0.0002, with an exponential decay factor of 0.8 and a decay every 15 epochs
A decay weight of 0.0001, a beta1 value of 0.9 and a beta2 value of 0.999 were used

Network Details

RPN细节：
second_rpn

Experiments

 training set of 3712
evaluation set of 3769
level of difficulty: easy, moderate and hard

Evaluation Using the KITTI Test Set

3D detection performance：
second_res1
BEV detection performance:
second_res2

Evaluation Using the KITTI Validation Set

3D detection performance：
second_res3
BEV detection performance:
second_res4

Ablation Studies

Sparse Convolution Performance

second_res5

Sampling Ground Truths for Faster Convergence

second_res6

Conclusions

改善后的稀疏卷积方法，显著提升训练和推理速度
提出一个角度损失回归方法，改善方向评估表现
一种新的数据增强方法，提升收敛速度和表现

xinxiang7

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录