GUPNet：基于几何不确定性映射的单目3D检测网络（ICCV2021）

最新推荐文章于 2023-06-30 17:14:07 发布

3Ｄ视觉工坊

最新推荐文章于 2023-06-30 17:14:07 发布

阅读量1.3k

点赞数 2

文章标签：算法人工智能计算机视觉机器学习编程语言

作者丨柒柒@知乎

来源丨https://zhuanlan.zhihu.com/p/397105796

编辑丨3D视觉工坊

论文标题：Geometry Uncertainty Projection Network for Monocular 3D Object Detection
作者单位：The University of Sydney, SenseTime Computer Vision Group 等
论文：https://arxiv.org/pdf/2107.13774.pdf

一句话读论文：

利用几何关系衡量深度估计的不确定度。

作者的观点：

Existing methods with the projection model usually estimate the height of 2D and 3D bounding box first and then infer the depth via the projection formular.

作者还提供了图例，如下图。

从图中可以看出，即使高度估计误差只有0.1m，也可能导致4m的深度值偏差。

We can find that a slight bias (0.1m) of 3D heights could cause a significant shift (even 4m) in the projected depth.

2. 作者探讨的第一个问题是：推断可靠性。为什么要讨论这个问题呢？原因其实第一点已经提过了，“slight height bias → significant depth shift”。也就是说由于高度预测的不确定性，导致了深度值估计的不确定性。

The first problem is inference reliability. A small quality change in the 3D height estimation would cause a large change in the depth estimation quality. This makes the model cannot predict reliable uncertainty or confidence easily, leading to uncontrollable outputs.

3. 作者探讨的第二个问题是：模型训练的稳定性。为什么要讨论这个问题呢？其实还是因为高度预测的不准确。在模型训练初期，物体高度的预测往往存在较大偏差，也因此导致了深度估算偏差较大。较大误差往往导致网络训练困难，从而影响整体网络性能。

Another problem is the instability of model training. In particular, at the beginning of the training phase, the estimation of 2D/3D height tends to be noisy, and the errors will be amplified and cause outrageous depth estimation. Consequently, the training process of the network will be misled, which will lead to the degradation of the final performance

因此，作者整体的网络设计旨在于解决：推断可靠性和模型稳定性两个问题。其中，Geometry Uncertainty Projection (GUP) 用于处理推断可靠性问题，Hierarchical Task Learning (HTL) 用于处理模型训练稳定性问题。具体地，网络框架流程可以理解为：

输入2D图像 → 预测2D+3D box → GUP模块优化深度值 → 得到检测结果，如下图。

网络框架图

骨架网络部分与通用的单目3D检测一致，就不多说了，这里主要记录一下两个主要模块GUP和HTL是怎么运作的。

第一，Geometry Uncertainty Projection (GUP) 模块。这个模块与传统的定位模块有什么区别呢？简单地说，最显著的区别就是：之前的方法只会输出单一的深度值，本文的GUP模块输出深度值+不确定度。这里的不确定度是用来表征当前深度值的可靠性，也就是解决了作者提出的推断可靠性的问题。

The overall module builds the projection process in the probability framework rather than single values so that the model can compute the theoretical uncertainty for the inferred depth, which can indicate the depth inference reliability and also be helpful for the depth learning.

具体的做法是：

预测物体3D高度 → 做映射得到深度值 → 预测偏移量 → 深度值+偏移量得到最终的不确定度。

To achieve this goal, we first assume the prediction of the 3D height for each object is a Laplace distribution. The distribution parameters are predicted by the 3D size streame in an end-to-end way. The average denotes the regression target output and the variation is the uncertainty of the inference

接下就是，怎么样让网络朝着我们希望的方向发展呢，这就是损失函数干的活。因此，作者设计了具有针对性的3D高度预测的损失函数：

上式的函数可以比较明显的看出，损失函数最小的情况无非就是：均值等于真值且方差为0。

b）做映射得到深度值。从几何位置到深度值计算这个话题已经谈了很久了，这里就不赘述了，如下式：

将上文预测出的3D高度带入，即可得到深度值。由于3D高度是符合拉普拉斯分布的，因此，这里计算出的深度值也是符合拉普拉斯分布的，记为。

Based on the learned height distribution, the depth distribution of the projection output can be approximated as above, where X is the standard Laplace distribution.

c）预测偏移量。没啥特别好讲的，无非就是给深度值又加了一层不确定度的保障。

We also assume that the learned bias is a Laplace distribution and independent with the projection one.

其实就是直接相加就好了，均值和方差也都符合分布相加法则。我们希望这个估计出的depth符合什么特性呢？显然与预测出的3D高度一样，我们希望depth的均值无限接近于真值，其方差无限趋近于1。也就得到了下式的损失函数：

The overall loss would push the projection results close to the ground truth and the gradient would affect the depth bias, the 2D height and the 3D height simultaneously. Besides, the uncertainty of 3D height and depth bias is also trained in the optimization process.

至此，第一个GUP模块做完了。

第二，Hierarchical Task Learning (HTL) 模块。上文也提到，这个模块是为了解决模型训练过程中的不稳定性问题。作者的做法其实挺简单，既然所有模块合在一起训练不稳定，那就分开好了，分级训练，为不同模块指定不同的训练权重，用以控制其在模型训练中的重要性。

The GUP module mainly addresses the error amplification effect in the inference stage. Yet, this effect also damages the training procedure. Specifically, at the beginning of the training, the prediction of both h2d and h3d are far from accurate, which will mislead the overall training and damage the performance. To tackle this problem, we design a Hierarchical Task Learning (HTL) to control weights for each task at each epoch.

实验结果：

KITTI test set

没啥好说的，照惯例，有提升。

本文仅做学术分享，如有侵权，请联系删文。

下载1

在「3D视觉工坊」公众号后台回复：3D视觉，即可下载 3D视觉相关资料干货，涉及相机标定、三维重建、立体视觉、SLAM、深度学习、点云后处理、多视图几何等方向。

下载2

在「3D视觉工坊」公众号后台回复：3D视觉github资源汇总，即可下载包括结构光、标定源码、缺陷检测源码、深度估计与深度补全源码、点云处理相关源码、立体匹配源码、单目、双目3D检测、基于点云的3D检测、6D姿态估计源码汇总等。

下载3

在「3D视觉工坊」公众号后台回复：相机标定，即可下载独家相机标定学习课件与视频网址；后台回复：立体匹配，即可下载独家立体匹配学习课件与视频网址。

重磅！3DCVer-学术论文写作投稿交流群已成立
扫码添加小助手微信，可申请加入3D视觉工坊-学术论文写作与投稿微信交流群，旨在交流顶会、顶刊、SCI、EI等写作与投稿事宜。
同时也可申请加入我们的细分方向交流群，目前主要有3D视觉、CV&深度学习、SLAM、三维重建、点云后处理、自动驾驶、多传感器融合、CV入门、三维测量、VR/AR、3D人脸识别、医疗影像、缺陷检测、行人重识别、目标跟踪、视觉产品落地、视觉竞赛、车牌识别、硬件选型、学术交流、求职交流、ORB-SLAM系列源码交流、深度估计等微信群。
一定要备注：研究方向+学校/公司+昵称，例如：”3D视觉 + 上海交大 + 静静“。请按照格式备注，可快速被通过且邀请进群。原创投稿也请联系。
▲长按加微信群或投稿
▲长按关注公众号

3D视觉从入门到精通知识星球：针对3D视觉领域的视频课程（三维重建系列、三维点云系列、结构光系列、手眼标定、相机标定、orb-slam3等视频课程）、知识点汇总、入门进阶学习路线、最新paper分享、疑问解答五个方面进行深耕，更有各类大厂的算法工程人员进行技术指导。与此同时，星球将联合知名企业发布3D视觉相关算法开发岗位以及项目对接信息，打造成集技术与就业为一体的铁杆粉丝聚集区，近2000星球成员为创造更好的AI世界共同进步，知识星球入口：

学习3D视觉核心技术，扫描查看介绍，3天内无条件退款
圈里有高质量教程资料、可答疑解惑、助你高效解决问题
觉得有用，麻烦给个赞和在看~

3Ｄ视觉工坊

关注

2
点赞
踩
9

收藏

觉得还不错? 一键收藏
1
评论
GUPNet：基于几何不确定性映射的单目3D检测网络（ICCV2021）

作者丨柒柒@知乎来源丨https://zhuanlan.zhihu.com/p/397105796编辑丨3D视觉工坊论文标题：Geometry Uncertainty Projection...
复制链接

扫一扫