[CVPR2021] Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-view Transformation
- 概要
transform the front-view monocular image to the top-view road layout
提出cross-view transformation module,利用视图之间的关系和cycle consistency加强视图变换。
提出context-aware discriminator进一步精炼结果,把车辆与路面的空间关系纳入了车辆占用估计任务的考量。
- 方法
Cross-view Transformation = Cycled View Projection (CVP) + Cross-View Transformer (CVT)
Cycled View Projection (CVP):先用MLP把fov投到bev,再引入cycled self-supervision把bev投回fov,引入cycle loss限制两个fov特征的差别。
Cross-View Transformer (CVT):
Context-aware Discriminator:
- 实验
在road layout estimation and vehicle occupancy estimation达到sota。
[CVPR2021] Categorical Depth Distribution Network for Monocular 3D Object Detection
-
概要
过去试图通过直接估计深度来辅助3D检测的方法受制于深度估计的不精确性。这篇论文将深度离散化,提出为每个像素估计类别深度分布的单目3D目标检测方法。
- 方法
CaDDN的四个模块:对输入图片的每个像素估计深度类别,得到Frustum Features;通过相机参数和插值采样得到Voxel Features;连接voxel grid的z和c维度,降维得到BEV;最后用BEV做3D检测。
- 实验
1st on KITTI & first monocular 3D detection results on Waymo
DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
DETR3D uses several tricks to boost the performance. First is the iterative refinement of object queries. Essentially, the predicts the bbox centers in BEV is reprojected back to images with camera transformation matrices (intrinsics and extrinsics), and multi-cam image features are sampled and integrated to refine the queries. This process can be repeated multiple times (6 in this paper) to boost the performance.
The second trick is to use pretrained mono3D network backbone to boost the performance. Initialization seems to matter quite a lot for Transformers-based BEV perception network.
End-to-End Object Detection with Transformers
Object Query是对anchor的编码,并且这个anchor是一个全参数可学习的
[CVPR2020] SampleNet: Differentiable Point Cloud Sampling
一、概要:
点云规模增大导致计算量增大的问题可以通过在执行下游任务前对点云采样来解决。经典采样方法没有考虑下游任务,考虑了下游任务的采样方式也没有处理采样操作的不可微问题。为此论文提出可微松弛采样方法,能输出针对下游任务优化的更小的点云。在point cloud classification, registration and reconstruction中超过所有non-learned和learned采样方法。
二、方法介绍: