动机:
propose a novel self-supervised monocular depth estimation method combining geometry with a new deep network
【CC】混合了几何特性的(就是后面的视锥模型)的自监督的单目深度估计网络
创新点
Our architecture leverages novel symmetrical packing and unpacking blocks to jointly learn to compress and decompress detail-preserving representations using 3D convolutions.
【CC】设计了新的Block 在保证细节精度的前提下进行ENCODE/DECODE(压缩/解压缩)。有3D Conv块就太可能快速,是不是可以用“高度信息作为额外通道”放入2D Conv块的方式进行加速?
we address the problem of jointly estimating scene structure and camera motion across RGB image sequences using a self-supervised deep network
【CC】输入RBG图像序列通过自监督的方式估计出:结构化场景(目标深度)和相机运动(R/t阵)
All, however, take advantage of constraints in monocular Structure-from-Motion (SfM) training that only allow the estimation of depth and pose up to an unknown scale factor, and rely on the ground-truth LiDAR measu,rements to scale their depth estimates appropriately for evaluation purposes
【CC】用单目SFM约束条件对深度进行估计,任然没有尺度信息, 本文通过车速来克服这一点
论文参考
Unsupervised monocular depth estimation with left-right consistency
【CC】自监督深度估的经典之作,使用左-右Camera得到的图片一个做预测一个做监督,开山之作
Unsupervised learning of depth and ego-motion from video
【CC】本文的自监督的架构和目标函数的设计都参考了上面文章
Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network
【CC】Space2Depth/Depth2Space操作参考上面文章
Self-Supervised Scale-Aware SfM
we learn to recover the inverse depth fd : I → fD−1(I) instead, along with the ego-motion estimator fx
【CC】有点自编码器(VAE)的意思
The overall loss function is averaged perpixel, pyramid-scale and image batch during training: