UnDeepVO

最新推荐文章于 2024-03-21 22:22:19 发布

Lebhoryi

最新推荐文章于 2024-03-21 22:22:19 发布

阅读量1.4k

点赞数 5

分类专栏：单目深度估计

本文链接：https://blog.csdn.net/weixin_37598106/article/details/90655000

版权

单目深度估计专栏收录该内容

8 篇文章 3 订阅

订阅专栏

0x00 大纲

paper: "UnDeepVO: Monocular Visual Odometry through Unsupervised Deep
Learning " https://arxiv.org/abs/1709.06841
code: https://github.com/drmaj/UnDeepVO
参考：UnDeepVO：基于非监督深度学习的单目视觉里程计 https://cloud.tencent.com/developer/news/210831

作者提出了一个新的单目VO系统,被称为UnDeepVO,训练用的是双目(连续帧作为双目),测试用的是单目,所以标题用的是单目.
有两点比较突出:
- 无监督
- 恢复尺度 不是很理解
loss 基于时间和空间的稠密信息，用左右帧的合成视图（即空间loss）加上前后帧的合成视图监督信号（即时间loss）
空间上的几何一致性是指左右影像对上同名点的重投影几何约束，时间上的几何一致性即指单目序列影像之间同名点的重投影几何约束。
[16-17]用到了光度一致性loss求depth，[18]用单目图片序列的光度一致性求pose和depth，该文章的创新点是融入了the 3D geometric registration loss
Q:为什么分开来训练r和t？

A: 因为旋转（欧拉角表示）具有较强的非线性，比平移更加难训练，所以为了更好地用非监督学习训练，我们在最后一个卷积层后用两个有完全连接层的单独集来分开平移和旋转参数。这样我们就可以引入权重归一化旋转和平移，从而得到更好的预测值。

Q： spatial transformer 和双线性插值的区别

0x01 近期相关工作 & 需要查看的文献资料

基于特征的方法(属于几何方法):

[1] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “MonoSLAM: Real-time single camera SLAM,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1052–1067, 2007.
[2] G. Klein and D. Murray, “Parallel tracking and mapping for small AR workspaces,” in Mixed and Augmented Reality, 2007. ISMAR 2007. 6th IEEE and ACM International Symposium on. IEEE, 2007, pp.225–234.
[3] R. Mur-Artal, J. Montiel, and J. D. Tardos, “ORB-SLAM: a versatile and accurate monocular SLAM system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.

直接方法(属于几何方法):

[4] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “DTAM: Dense tracking and mapping in real-time,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV). IEEE, 2011, pp. 2320–2327.
[5] J. Engel, T. Schops, and D. Cremers, “LSD-SLAM: Large-scale direct ¨ monocular SLAM,” in European Conference on Computer Vision (ECCV). Springer, 2014, pp. 834–849.
[6] J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

基于学习的方法

第一篇是鼻祖,以下全是有监督

[7] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for real-time 6-DOF camera relocalization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2938–2946.
[8] R. Li, Q. Liu, J. Gui, D. Gu, and H. Hu, “Indoor relocalization in challenging environments with dual-stream convolutional neural net-works,” IEEE Transactions on Automation Science and Engineering, 2017.
*[9] R. Clark, S. Wang, A. Markham, N. Trigoni, and H. Wen, “VidLoc:6-DoF video-clip relocalization,” inConference on Computer Visionand Pattern Recognition (CVPR), 2017.
[10] G. Costante, M. Mancini, P. Valigi, and T. A. Ciarfuglia, “Exploring representation learning with CNNs for frame-to-frame ego-motion estimation,” IEEE robotics and automation letters , vol. 1, no. 1, pp.18–25, 2016.
[11] S. Wang, R. Clark, H. Wen, and N. Trigoni, “DeepVO: Towards end-to-end visual odometry with deep recurrent convolutional neural net-works,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 2043–2050.
[12] B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and T. Brox, “DeMoN: Depth and motion network for learning monocular stereo,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[13] R. Clark, S. Wang, H. Wen, A. Markham, and N. Trigoni, “VINet:Visual-Inertial odometry as a sequence-to-sequence learning problem.”in AAAI, 2017, pp. 3995–4001.
[14] S. Pillai and J. J. Leonard, “Towards visual ego-motion learning in robots,” arXiv preprint arXiv:1705.10279, 2017.

无监督主要集中在深度估计,由于spatial transformer图像warp技术的提出,只有[18]涉及到了VO
[15] M. Jaderberg, K. Simonyan, A. Zisserman, et al. , “Spatial transformer networks,” in Advances in Neural Information Processing Systems, 2015, pp. 2017–2025.
[16] R. Garg, G. Carneiro, and I. Reid, “Unsupervised CNN for single view depth estimation: Geometry to the rescue,” in European Conference on Computer Vision (ECCV). Springer, 2016, pp. 740–756.
[17] C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[18] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

0x02 网络相关

在这里插入图片描述

两个网络, 估计深度和pose, 后者用的是VGG-base
前后两张图片的深度图表示空间loss;

PS: Specifically, for the overlapped area between two stereo images, every pixel in one image can find its correspondence in the other with horizontal distance Dp.

左右帧的光度一致性损失
视差一致性损失
pose一致性损失

深度图和pose结合成刚体rigid表示时间loss.

前后帧的光度一致性损失，通过姿态和空间变换网络合成图片Q: 这里的姿态和合成刚体流有什么区别
3D Geometric Registration Loss 和点云相关，不是很理解
k帧上的点可以通过

转换映射到k+1帧上的点，记作，同理，k+1帧上的点也可映射到k帧上。

0x03 作者的实验

很有意思的一个视频：
Video: https://www.youtube.com/watch?v=5RdjO93wJqo&t
一次输入两帧估计pose， 416*128
比较：SfMLearner [18], monocular VISO2-M and ORB-SLAM-M (without loop closure).
on length of 100m-800m 所有的方法都没有用闭环检测
定量测评，Average translational root-mean-square error (RMSE) drift (%) and average rotational RMSE drift (◦=100m) on length of 100m-800m
深度定性和定量测评, 在训练过程中从立体像对得到了真实尺度的深度图

0x04 个人总结

在pose和depth的基础上新增加了一个pose约束，用非监督的方式实现了真实尺度的单目视觉里程计

加个鬼的pose，这篇UnDeepVO就是在2017周庭辉的基础上，将单目改成了双目输入，增加2017年的左右一致性loss，将两个融合在了一起
2017 zhoutinghui，前后帧输入，生成pose和depth：

2017 左右一致性
网络的输入和输出还有新增的pose不是很理解透彻

能用自己的话说出图三，就说明很透彻了
训练的时候恢复了真实尺度也没有理解透彻+1

Lebhoryi

关注

5
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
UnDeepVO

0x00 大纲paper: "UnDeepVO: Monocular Visual Odometry through Unsupervised DeepLearning " https://arxiv.org/abs/1709.06841code: https://github.com/drmaj/UnDeepVO参考：UnDeepVO：基于非监督深度学习的单目视觉里程计 https:...
复制链接

扫一扫

专栏目录