原始笔记链接: https://mp.weixin.qq.com/s?__biz=Mzg4MjgxMjgyMg==&mid=2247486705&idx=1&sn=2e9c8be25d079fcf9dca9a9b67a90651&chksm=cf51be08f826371edc3226955f1acff0bb560f5256611b43ea91f9db6a579d5313b97ec598e5#rd
↑ \uparrow ↑ 打开上述链接即可阅读全文
CVPR 2023 | Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
毫米波感知论文阅读笔记:CVPR 2023, Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
0 Abstract
-
This paper
- proposes a noval approach to 4D radar-based scene flow estimation via cross-modal learning
-
Motivation
-
Co-located sensing redundancy in modern autonomous vehicles
✅ provides various forms of supervision cues for radar scene flow estimation.
-
-
Methods
- presents a multi-task model architecture for the identified cross-modal learning problem
- proposes specific loss functions to engage scene flow estimation using multiple cross-modal constraints for effective model training
-
Experiments
-
SOTA performance
-
proves effective in inferring more accurate 4D radar scene flow using cross-modal supervised learning
-
shown to be useful for two subtasks
✅ motion segmentation
✅ ego-motion estimation
-
Code:https://github.com/Toytiny/CMFlow
1 Introduction
-
Scene flow estimation
-
Definition of scene flow estimation
✅ Obtaining a 3D motion vector field of static and dynamic environments relative to an ego-agent
-
Importance of scene flow in the context of self-driving
✅ Provides motion cues for various tasks
-
-
Current scene flow estimation approaches
-
fully-supervised, weakly-supervised learning or rely on self-supervised signals.
-
Challenges of these approaches
✅ labor-intensive process of scene flow annotations for supervised learning
✅ the often subpar performance of self-supervised learning methods
-
-
Specific challenges in 4D radar scene flow learning
-
Rise of 4D automotive radars
✅ resistant to adverse conditions and have the ability to measure object velocity.
-
Complications with 4D radar point clouds
❌ sparsity and noise in the point clouds which complicate the scene flow annotation process for supervised learning
-
-
Solution: Hidden Gems
-
exploiting cross-modal supervision signals in autonomous vehicles
✅ Modern autonomous vehicles are equipped with multiple sensors that provide complementary and redundant perception results.
-
The authors aim to use this co-located perception redundancy to provide multiple supervision cues to improve radar scene flow learning.
🚩 The primary research question: how to retrieve and apply cross-modal supervision signals from co-located sensors on a vehicle to improve radar scene flow learning
✅ exploiting useful supervision signals from odometer (GPS/INS), LiDAR, and RGB camera
✅ Train: Multi-modal data; Test: Only radar data
-
-
Contributions
- 1 the first 4D radar scene flow learning using cross-modal supervision
- 2 propose a multi-task model architecture & loss functions
- 3 demonstrate the SOTA performance and its effectiveness in downstream tasks
2 Related Work
- Scene flow
-
Scene flow was first defined as a 3D uplift of optical flow
-
Traditional approaches to scene flow from either RGB or RGB-D images:
✅ based on prior knowledge assumptions or by training deep networks in a supervised or unsupervised way
-
Some methods directly infer point-wise scene flow from 3D sparse point clouds
✅ These methods may rely on online optimization
✅ DL-based methods have been dominant for pointcloud-based scene flow estimation
-
- Deep scene flow on point clouds
-
Current SOTA methods: leveraging large amounts of data for training (Supervised)
✅ fully-supervised manner with GT flow: labor-intensive and costly scene flow annotations
✅ simulated dataset for training: may result in poor generalization
-
Self-supervised learning frameworks to avoid the labor and pitfalls of synthetic data.
✅ Exploit supervision signals from the input data
❌ performance is limited: no real labels are used to supervise their models
-
a Trade-off between annotation efforts and performance
✅ Combine the ego-motion and manually annotated background segmentation labels
✅ ego-motion is easily assessed from odometry sensors
❌ However, the segmentation labels are still manually annotated and expensive
-
- Radar scene flow
-
Previous works cannot be directly extended to the sparse and noisy radar point clouds
❌ they mostly estimate scene flow on dense point clouds captured by LiDAR or rendered from stereo images
-
recent work proposes a self-supervised pipeline for radar scene flow estimation.
❌ However, the lack of real supervision signals limits its scene flow estimation performance
-
- Noting the proposal
-
Solution of the supervision problem:
✅ retrieve supervision signals from co-located sensors in an automatic manner
✅ without resorting to any human intervention during training
-
only require other modalities during the training stage, not during inference.
-
3 Method
3.1 Problem Definition
Defines the task of scene flow estimation.
-
Scene flow estimation :
- aims to solve a motion field that describes the non-rigid transformations induced both by the motion of the ego-vehicle and the dynamic objects in the scene
-
The inputs of point cloud-based scene flow: two consecutive point clouds
- the source one P s = { p i s = { c i s , x i s } } i = 1 N \mathbf{P}^s=\left\{\mathbf{p}_i^s=\right.\left.\left\{\mathbf{c}_i^s, \mathbf{x}_i^s\right\}\right\}_{i=1}^N Ps={ pis={ cis,xis}}i=1N
- the target one P t = { p i t = { c i t , x i t } } i = 1 M \mathbf{P}^t=\left\{\mathbf{p}_i^t=\left\{\mathbf{c}_i^t, \mathbf{x}_i^t\right\}\right\}_{i=1}^M Pt={ pit={ cit,xit}}i=1M
- c i s , c i t ∈ R 3 \mathbf{c}_i^s, \mathbf{c}_i^t \in \mathbb{R}^3 cis,cit∈R3: 3D coordinates of each point
- x i s , x i t ∈ R C \mathbf{x}_i^s, \mathbf{x}_i^t \in \mathbb{R}^C xis,xit∈RC: raw point features of each point
-
The outputs : point-wise 3D motion vectors F \mathbf{F} F
- F = { f i ∈ R 3 } i = 1 N \mathbf{F}=\left\{\mathbf{f}_i \in\right.\left.\mathbb{R}^3\right\}_{i=1}^N F={ fi∈R3}i=1N align each point in P s \mathbf{P}^s Ps to its corresponding position c i ′ = c i s + f i \mathbf{c}_i^{\prime}=\mathbf{c}_i^s+\mathbf{f}_i ci′=cis+fi in the target frame
Note: P s \mathbf{P}^s Ps and P t \mathbf{P}^t P