综述拜读:面向全方位视野(Omnidirectional Vision)的深度学习

人工智睿

已于 2022-06-21 17:16:29 修改

阅读量1k

点赞数

分类专栏：文献拜读自动驾驶与三维重建文章标签：深度学习人工智能

于 2022-06-21 17:08:41 首次发布

本文链接：https://blog.csdn.net/weixin_44671418/article/details/125387788

版权

自动驾驶与三维重建同时被 2 个专栏收录

20 篇文章 33 订阅

订阅专栏

文献拜读

17 篇文章 5 订阅

订阅专栏

文章：Deep Learning for Omnidirectional Vision: A Survey and New Perspectives

Background

OmniDirectional image (ODI) data:

360 image
panoramic image
spherical image data

ODI advantages:

wide FoV of spherical imaging
rich geometric information
multiple projection types

ODI challenges:

severe distortion in the equirectangular projection (ERP) type
content discontinuities in the cubemap projection (CP) format

DL types:

CNN
RNN
GANs
GNN
VIT

DL-methods focus on four major aspects:

(I) convolutional filters used to extract features from the ODI data (omnidirectional video (ODV) can be considered as a temporal set of ODIs)
(II) network design by considering the input numbers and
projection types
(III) novel learning strategies
(IV) practical applications

在这里插入图片描述

Omnidirectional Imaging

Coordinate

spherical coordinate $(\rho,\theta,\phi)$ 表示半径和经纬度
Cartesian coordinate $(x, y, z)$
关系：

Spherical representation types:

ERP: 横坐标 $\frac{\pi}{w}$ ,纵坐标 $\frac{2\pi}{h}$
CP: 6个 90 FoV的立方体面 , 面边长 $w$ , 焦距 $\frac{w}{2}$
- 针对CP空间采样率不均设计的Equi-Angular Cubemap (EAC) projection
Tangent：在球坐标和 pixels on the tangent images之间建立一一映射
Icosahedron：CP、ERP的面细分为更小的面，细分度需要在精度和效率间权衡
- 为此设计的CNN：“Spherephd: Applying cnns on a spherical polyhedron representation of 360˝ images,“CVPR, 2019.和“Spheresr,” CVPR, 2022
将球面转化为非空间域
- 3D rotation group (SO3)
- spherical Fourier transformation (SFT)

CP and tangent images suffer from the challenges

higher computational cost
discrepancy
discontinuity

在这里插入图片描述

Spherical Stereo

两视点基线为 $\mathbf b=(\delta x, \delta y, \delta z)$
球坐标系 $(\rho,\theta,\phi)$ 中的角度差为 $(\delta_\theta,\delta_\phi)$

在这里插入图片描述

Convolution Methods on ODI

当球形图像投影回平面时，处理失真的CNN改进方法可分为两大类：

在平面投影上2D conv
在球面域spherical conv

平面投影上2D conv：

针对ERP类型
- 利用规则的卷积滤波器，根据球坐标自适应核大小。然而，如图4（a）所示，规则卷积权重仅沿每行共享，不能从头开始训练。“Learning spherical convolution for fast features from 360˝ imagery,” in NIPS, 2017
- SphereNet 提出了另一种典型方法，该方法通过直接调整卷积滤波器的采样网格位置来处理ERP，以实现失真不变性，并且可以端到端地进行训练，如图4（b）所示。图c、d也是类似思路
- 此外还有SO3、加权图结构的方法
之接应用2D CNN 到CP和Tangent类型
- ”Bifuse: Monocular 360 depth estimation via bi-projection fusion,” in CVPR,2020.
- “360MonoDepth: High-resolution 360˝ monocular depth estimation,” in CVPR, 2022.

在球面域spherical conv

“Learning so(3) equivariant representations with spherical cnns,” in ECCV,
2018 提出了第一种球形CNN结构，该结构考虑了球谐域中的卷积滤波器，以解决标准CNN中的三维旋转等效问题
Graph Convolutional Network
SpherePHD：设计图b的球形多面体

在这里插入图片描述

Dataset

在这里插入图片描述

以下仅仅是笔者关心的领域，实际上文中涉及了很多方面。

OMNIDIRECTIONAL VISION TASKS

Depth Completion:

“Cross-modal 360˝ depth completion and reconstruction for large-scale indoor environment,” IEEE Trans. Intell. Transp. Syst., 2022.

提出了一个具有代表性的两阶段框架，以实现全景深度补全
“Bips: Bimodal indoor panorama synthesis via residual depth-aided adversarial
learning,” arXiv, 2021
GAN框架

ODI Completion

…

View Synthesis

OmniNeRF
Pathdreamer（indoor）:取视频游走的两个节点，生成节点间新路径的视图

Cross-view Synthesis and Geo-localization

目的是从卫星视图图像合成地面视图ODI，而 geo-localization的目的是匹配地面视图ODI和卫星视图图像，以确定它们之间的关系
…

Lighting Estimation

…

Human Behavior Understanding

…

Stereo Matching

人的双眼视差取决于视网膜上投影之间的差异，即球面投影而非平面投影。因此，ODIs上的立体匹配与人类视觉系统更为相似。

“A study on the influence of omnidirectional distortion on cnn-based stereo vision,” in
VISIGRAPP, 2021.

讨论了全方位失真对基于CNN的方法的影响
并比较了从透视图和全方位立体图像预测的视差图的质量。实验结果表明，基于ODIs的立体匹配更具优势

General stereo matching algorithms follow four steps:

matching cost computation
cost aggregation,
disparity computation with optimization
disparity refinement

SweepNet提出了一种宽基线立体系统，用于从带有超宽FoV镜头的相机捕获的一对图像计算匹配成本图，并在rig坐标系下使用全局球体扫描直接生成全向深度图：
在这里插入图片描述

OmniMVS将四个220 FoV鱼眼视图作为输入来训练端到端DNN模型，并使用3D编码器-解码器块来正则化。
在这里插入图片描述

“End-to-end learning for omnidirectional stereo matching with uncertainty prior,” IEEE TPAMI, 2021.中改进OmniMVS，提供了一种基于uncertainty of prior guidance的正则化
在这里插入图片描述

ODV:omnidirectional video

在这一研究领域中只有少数几种方法，一片荒地。

作为一项与时间相关的任务，将transformer机制应用于ODV摘要可能是有益的。
此外，以前的工作只考虑了ERP格式，这是最严重的失真问题。因此，最好将CP、切线投影或球体格式作为ODV的输入。

人工智睿

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
综述拜读:面向全方位视野(Omnidirectional Vision)的深度学习

基于深度学习的全视图图像视觉综述：Deep Learning for Omnidirectional Vision: A Survey and New Perspectives
复制链接

扫一扫

专栏目录