【论文阅读】RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects-CSDN博客

本文链接：https://blog.csdn.net/ninnyyan/article/details/130511785

RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects

Abstract

Radar provides complementary information in the form of Doppler velocity.

propose a new solution that exploits both LiDAR and Radar sensors for perception.

1 Introduction

cameras capture rich appearance features, LiDAR provides direct and accurate 3D measurements.
Challenges:

sparsity of LiDAR measurements (e.g., at long range)
sensor’s sensitivity to weather (e.g., fog, rain and snow)
estimating their velocities is also of vital importance.
cameras and Lidar provides static information only.

Radar’s limitations:

data is very sparse (typically much more so than LiDAR)
the measurements are ambiguous in terms of position and velocity
the readings lack tangential information and often contain false positives

What this paper do:

design a novel neural network architecture, dubbed RadarNet, which can exploit both LiDAR and Radar to provide accurate detections and velocity estimates for the actors in the scene.
propose a multi-level fusion scheme that can fully exploit both geomet- ric and dynamic information of Radar data.

Steps:

first fuse Radar data with LiDAR point clouds via a novel voxel-based early fusion approach to leverage the Radar’s long sensing range.
Furthermore, after we get object detections, we fuse Radar data again via an attention-based late fusion approach to leverage the Radar’s velocity readings.

attention module transforming the 1D radial velocities from Radar to accurate 2D object velocity estimates.

2 Related Work

3 Review of LiDAR and Radar Sensors

LiDAR (light detection and ranging) sensors can be divided into three main types: spinning LiDAR(旋转激光雷达), solid state LiDAR(固态激光雷达), and flash LiDAR(闪光激光雷达).
In this paper we focus on the most common type: spinning LiDAR. This type of LiDAR emits and receives laser light pulses in 360◦ and exploits the time of flight (ToF) to calculate the distance to the obstacles.

radar outputs can be organized in three different levels:

raw data in the form of time-frequency spectrograms(频谱图)
clusters from applying DBSCAN [12] or CFAR [37] on raw data=> this paper use this, for its good balance between information richness and noise
tracks from performing object tracking on the clusters

denote Radar target:
Q = (q,v∥,m,t) => a vector

q = (x,y) : 2D position in BEV
v∥ : scalar value representing the radial velocity
m : binary value indicating whether the target is moving or not
t : timestamp

advantages and disadvantages of Radar:
good: instantaneous(瞬间) velocity measurements and is robust to various weather conditions.
bad: low resolution and thus it is difficult to detect small objects

attention:
It is also worth noting that the objects’ real-world velocities (2D vectors in BEV) are ambiguous given only the radial velocity. Therefore we need to additionally estimate the tangential velocity or the 2D velocity direction in order to properly utilize the radial velocity.

4 Exploiting LiDAR and Radar for Robust Perception

early fusion and late fusion:
early fusion learns joint representations from both sensor observations;
late fusion refines object velocities via an attention-based association and aggregation mechanism between object detections and Radar targets.
early fusion exploits the position and density information of Radar targets, late fusion is designed to explicitly exploit the Radar’s radial velocity evidence.

4.1 Exploiting Geometric Information via Early Fusion

LiDAR Voxel Representation
Radar Voxel Representation
Early Fusion: same BEV voxel size, concatenating them together along the channel dimension.

4.2 Detection Network

Backbone Network 骨干网
We adopt the same backbone network architecture as PnPNet.
composed of three initial convolution layers, three con- secutive multi-scale inception blocks [40], and a feature pyramid network.
Detection Header
a fully-convolutional detection header，consists of a classification branch and a regression branch.

4.3 Exploiting Dynamic Information via Late Fusion

two tasks:
(1) association of each Radar target with the correct object detection for velocity alignment;
(2) aggregation to combine the velocity estimates from detection and associated Radar targets robustly.

late fusion is performed on dynamic Radar targets only.

Pairwise Detection-Radar Association:
formulas in paper
Multi-Layer Perceptron (MLP)
Velocity Aggregation

4.4 Learning and Inference

NMS?

5 Experimental Evaluation

5.1 Datasets and Evaluation Metrics

nuScenes:
1 LiDAR and 5 Radars, with object labels at 2Hz
cars and motorcycles, as their velocities have high variance

parameters:

Average Precision (AP)
final AP is averaged over four different distance thresholds (0.5m, 1m, 2m and 4m)
Average Velocity Error (AVE)

DenseRadar:
a self-collected dataset to show the advantage of Radar over LiDAR is its longer sensing range.

parameters:

AP
Average Dynamic Velocity Error (ADVE) on dynamic objects only

5.2 Implementation Details

trained a two-class model on nuScenes with a shared backbone network and class- specific detection headers.
trianed a single-class model on DenseRadar.

5.3 Comparison with the State-of-the-Art

Precision output: Table 2. Comparison with the state-of-the-art on nuScenes validation set

Q: why CBGS performs so well, as it is Lidar-based function?

5.4 Ablation Study 消融研究

build a strong baseline with carefully designed heuristics
compare peformance of early fusion, late fusion, etc.

Table 3. Ablation study on nuScenes validation set
Table 4. Ablation study on DenseRadar validation set

5.5 Fine-Grained Analysis

5.6 Qualitative Results

Fig. 5

(1) the association is sparse in that only relevant Radar targets are associated;
(2) the association is quite robust to noisy locations of the Radar targets;
(3) the model captures the uncertainty of Radar targets very well.

6 Conclusion

proposed a new method to exploit Radar in combination with LiDAR for robust perception of dynamic objects in self-driving.

use a voxel(体素)-based early fusion approach.
propose an attention-based late fusion approach to exploit dynamic information.

dubbed RadarNet, features a voxel- based early fusion and an attention-based late fusion, which learn from data to exploit both geometric and dynamic information of Radar data.

exploiting Radar improves the perception capabilities of detecting faraway objects and understanding the motion of dynamic objects.

Words in paper

voxel |ˈväksəl| 体素
ambiguities 歧义
dubbed 配音
geometric 几何的
exploit 开发
sparsity 稀疏性
appealing 吸引人的
Doppler effect 多普勒效应
typically 通常
tangential 切向的
novel 新颖的
leverate 杠杆n，利用v
surpass 超越v
perceiving 知觉
notation 符号n
intuitions 直觉
consecutive 连续的
sweep 扫，扫荡，扫视v
dirt 污垢
exhaust plumes 排气羽流
electromagnetic 电磁
spectrograms 频谱图
instantaneous 瞬间
modulo 模数
aliasing 混叠
clutter 杂乱无章
modalities 方式
granularities 粒度
consecutive 连续的
inception 开始
pyramid 金字塔
by stride of 大步前进
regression 回归
back-projection 反射，反投影
exaggerated 夸张的
aggregation 聚合
nontrivial 不平凡
capped 封顶
omitted for brevity 为简洁省略
scalar 标量
post-processing 后期处理
kinematic 运动学
imbalance 不平衡
heuristic 启发式
counterpart 竞争对手
cross-validation 交叉验证
depicted 描绘的
temporal change 时间变化