Deep Auxiliary Learning for Visual Localization and Odometry 基于深度辅助学习的视觉定位和里程计

最新推荐文章于 2024-04-28 08:56:53 发布

Xieyuanli_Chen

最新推荐文章于 2024-04-28 08:56:53 发布

阅读量3.6k

点赞数 4

分类专栏： SLAM 深度学习文章标签： SLAM 深度学习视觉定位

本文链接：https://blog.csdn.net/weixin_39779106/article/details/79689208

版权

本文介绍了一种结合深度学习和辅助学习的视觉定位与里程计方法。通过深度位姿回溯，利用全局位姿回归和连体式里程估计网络，实现了全局定位与相对运动的联合优化。提出的几何一致性损失函数提高了模型的鲁棒性。实验结果显示，该方法在平移和旋转估计精度上显著优于现有方法。

摘要由CSDN通过智能技术生成

本博客仅为作者记笔记之用，不对之处，望见谅，欢迎批评指正。
更多相关博客请查阅：http://blog.csdn.net/weixin_39779106；
如需转载，请附上本文链接：http://blog.csdn.net/weixin_39779106/article/details/79689208

原论文链接

一、摘要

zhaiyao

原文摘要	翻译
In this work, we propose VLocNet, a new convolutional neural network architecture for 6-DoF global pose regression and odometry estimation from consecutive monocular images.	本文提出了VLocNet，是一种用于对连续图像进行6自由度全局姿态回归和里程计估计的CNN网络。
Our multitask model incorporates hard parameter sharing, thus being compact and enabling real-time inference, in addition to being end-to-end trainable.	我们的多任务模型实现了参数分享，因此除了可以进行端到端的训练之外，同还十分的紧凑可以实时运行。
We propose a novel loss function that utilizes auxiliary learning to leverage relative pose information during training, thereby constraining the search space to obtain consistent pose estimates.	我们提出了一种新的损失函数，通过辅助学习在训练时利用相对位姿信息，从而约束搜索空间以获得一致的姿态估计。
even our single task model exceeds the performance of state-of-the-art deep architectures for global localization, while achieving competitive performance for visual odometry estimation.	本文所提出的单任务模型就已经超过了目前性能最好的基于深度学习框架的全局定位系统，同时在视觉里程计估计方面也取得了相当有竞争力的表现。
Furthermore, we present extensive experimental evaluations utilizing our proposed Geometric Consistency Loss that show the effectiveness of multitask learning and demonstrate that our model is the first deep learning technique to be on par with, and in some cases outperforms state-of-theart SIFT-based approaches.	此外，我们通过大量实验评估利了使用本文提出的几何一致性损失函数，表明了多任务学习的有效性。实验结果同时表明了我们所提出的模型是目前唯一一个可以与基于SIFT算法性能相提并论的基于深度学习的算法，甚至在某些情况下，我们所提出的算法的性能优于基于SIFT的方法。

二、介绍

这里写图片描述

原文摘要	翻译
From a robot’s learning perspective, it is unlucrative and unscalable to have multiple specialized single-task models as they inhibit both intertask and auxiliary learning. This has lead to a recent surge in research targeted towards frameworks for learning unified models for a range of tasks across different domains.	从机器人学习的角度来看，拥有多个针对单一任务训练的模型是毫无收益，也是不可扩展的，因为这样做抑制了任务之间相互学习以及辅助学习，因此最近的研究热潮更多的关注于针对不同任务训练统一的模型。
An evident advantage is the resulting compact model size in comparison to having multiple task-specific models. Auxiliary learning approaches on the other hand, aim at maximizing the prediction of a primary task by supervising the model to additionally learn a secondary task.	与训练多个特定任务的模型相比，训练统一模型最显而易见的优势在于其结构更加紧凑（模型尺寸更小）。另一方面，辅助学习可以通过监督模型额外学习次要任务来最大程度提高主要任务预测的精度。
For instance, in the context of localization, humans often describe their location to each other with respect to some reference landmark in the scene and giving their position relative to it. Here, the primary task is to localize and the auxiliary task is to be able to identify landmarks.	例如对于定位问题而言，人们通常会根据场景中的一些参考地标来描述彼此的位置，如给出相对于这些地标的位置。在这里，主要任务是定位，辅助任务是识别地标。
Similarly, we can leverage the complementary relative motion information from odometry to constrict the search space while training the global localization model.	同样，我们可以利用里程计中的相对运动信息来限制搜索空间的同时训练全局定位模型。

个人理解1：本段主要讲述了为什么要使用辅助学习，因为现有的模型通常是针对单一任务训练的，而机器人系统本身为多任务系统，将多个单一任务模型并行使用的性能不如利用辅助训练得到的多任务模型（一个模型完成多项任务）。作者举了两个例子，即定位和地标识别之间可以互补，从而提高两者的精度，以及里程计信息也可以利用相对运动信息限制全局定位搜索空间。
个人理解2：本段同样提出了多任务模型的难题，1.首先需要确定如何构建网络框架从而可以实现多任务学习；2.由于现有不同任务针对的网络具有不同的属性和不同的收敛速度，如何实现联合优化。接下来作者将提出针对这两个问题的解决方案。
这里写图片描述

原文摘要	翻译
In this work, we address the problem of global pose regression by simultaneously learning to estimate visual odometry as an auxiliary task. We propose the VLocNet architecture consisting of a global pose regression sub-network and a Siamese-type relative pose estimation sub-network. Our network based on the residual learning framework, takes two consecutive monocular images as input and jointly regresses the 6-DoF global pose as well as the 6-DoF relative pose between the images. We incorporate a hard parameter sharing scheme to learn inter-task correlations within the network and present a multitask alternating optimization strategy for learning shared features across the network. Furthermore, we devise a new loss function for global pose regression that incorporates the relative motion information during training and enforces the predicted poses to be geometrically consistent with respect to the true motion model.	本文主要解决的是全局位姿回归以及同时实现视觉里程计的问题，其中全局重定位是主任务，视觉里程计是辅助任务。本文提出的VLocNet结构包含了一个全局位姿回归子网络和一个连体式（Siamese-type）相对姿态估计子网络。本文所提出的网络基于残差学习框架，将两个两个连续单目图片作为输入，共同回归得到6自由度全局位姿以及图像之间的6自由度相对位姿。我们引入了一个硬参数共享方案来学习网络内部任务间的相关性，同时提出了一个多任务交替优化策略，用于学习整个网络中的共享特征。此外，我们设计了一种新的全局位姿回归损失函数，该函数包含了训练期间的相对运动信息，并强制预测的位姿在几何上与真实运动模型相一致。
We present extensive experimental evaluations on both indoor and outdoor datasets comparing the proposed method to state-ofthe-art approaches for global pose regression and visual odometry estimation. We empirically show that our proposed VLocNet architecture achieves state-of-the-art performance compared to existing CNN-based techniques. To the best of our knowledge, our presented approach is the first deep learning-based localization method to perform on par with local feature-based techniques. Moreover, our work is the first attempt to show that a joint multitask model can precisely and efficiently outperform its task-specific counterparts for global pose regression and visual odometry estimation.	通过大量实验（包括使用大量室内室外数据集）本文对比了所提出的方法以及目前最好的用于全局位姿回归和视觉里程计估计的方法，直观地展示了我们提出的VLocNet框架的性能要优于现有的基于CNN的方法。据我们所知，本文提出的方法是首次实现基于深度学习方法与基于局部视觉特征方法的性能相媲美。此外，通过同时实现全局位姿回归和视觉里程估计，我们的工作首次尝试证明了联合多任务模型可以精确而有效地胜过多单一任务模型的并行。

原文摘要

翻译

In this work, we address the problem of global pose regression by simultaneously learning to estimate visual odometry as an auxiliary task. We propose the VLocNet architecture consisting of a global pose regression sub-network and a Siamese-type relative pose estimation sub-network. Our network based on the residual learning framework, takes two consecutive monocular images as input and jointly regresses the 6-DoF global pose as well as the 6-DoF relative pose between the images. We incorporate a hard parameter sharing scheme to learn inter-task correlations within the network and present a multitask alternating optimization strategy for learning shared features across the network. Furthermore, we devise a new loss function for global pose regression that incorporates the relative motion information during training and enforces the predicted poses to be geometrically consistent with respect to the true motion model.

本文主要解决的是全局位姿回归以及同时实现视觉里程计的问题，其中全局重定位是主任务，视觉里程计是辅助任务。本文提出的VLocNet结构包含了一个全局位姿回归子网络和一个连体式（Siamese-type）相对姿态估计子网络。本文所提出的网络基于残差学习框架，将两个两个连续单目图片作为输入，共同回归得到6自由度全局位姿以及图像之间的6自由度相对位姿。我们引入了一个硬参数共享方案来学习网络内部任务间的相关性，同时提出了一个多任务交替优化策略，用于学习整个网络中的共享特征。此外，我们设计了一种新的全局位姿回归损失函数，该函数包含了训练期间的相对运动信息，并强制预测的位姿在几何上与真实运动模型相一致。

We present extensive experimental evaluations on both indoor and outdoor datasets comparing the proposed method to state-ofthe-art approaches for global pose regression and visual odometry estimation. We empirically show that our proposed VLocNet architecture achieves state-of-the-art performance compared to existing CNN-based techniques. To the best of our knowledge, our presented approach is the first deep learning-based localization method to perform on par with local feature-based techniques. Moreover, our work is the first attempt to show that a joint multitask model can precisely and efficiently outperform its task-specific counterparts for global pose regression and visual odometry estimation.

通过大量实验（包括使用大量室内室外数据集）本文对比了所提出的方法以及目前最好的用于全局位姿回归和视觉里程计估计的方法，直观地展示了我们提出的VLocNet框架的性能要优于现有的基于CNN的方法。据我们所知，本文提出的方法是首次实现基于深度学习方法与基于局部视觉特征方法的性能相媲美。此外，通过同时实现全局位姿回归和视觉里程估计，我们的工作首次尝试证明了联合多任务模型可以精确而有效地胜过多单一任务模型的并行。

三、相关工作（略）

四、深度位姿回溯

0. 概述

这里写图片描述

本文提出框架的主要任务是通过最小化所提出的几何一致性损失函数（Geometric Consistency Loss
function）来精确估计全局位姿，同时又利用两个连续帧之间的相对运动来约束全局定位的搜索空间。本文将这一问题定义成一种辅助学习，该辅助学习以估计全局定位为主目标，以估计相对运动为第二目标。通过相对运动估计学习得到的特征将被全局位姿回归部分用于学习针对不同场景更具区分力的描述子。
本文所提出的框架包含三个流程的神经网络：一个流程为全局位姿回归网络，另外两个流程应用于实现连体式里程估计。整体结构如图1所示。给定一组连续图像 $(I_t,I_{t-1})$ 本文提出的网络先预测出两幅图像的全局位姿 $p_t=[x_t,q_t],p_{t-1}=[x_{t-1},q_{t-1}]$ 以及相对位姿 $p_{t,t-1}=[x_{t,t-1},q_{t,t-1}]$ ，全局位姿归回网络输入为图像 $I_t$ ，连体式里程估计输入为连续图像 $(I_t,I_{t-1})$ 。

A. 全局位姿回归

全局定位子网络的输入是图像 $I_t$ ，先前预测的位姿 $\hat p_{t-1}$ ,输出是新预测的当前位姿 $\hat p_t$
1）子网络结构：子网络是在ResNet-50上构建的，在最后一个平均迟化层之前与ResNet-50是一样的，包含5个具有多个残差单元的残差模块，其中每个单元具有由三个卷积层组成的瓶颈结构，每一个卷积层后接一个批量标准化层（batch normalization），缩放层（scale）以及修正线性单元层（ReLUs）。本文对残差单元进行了改进，将ReLUs换成了指数线性单元（ELUs），从而可以减少神经元中的偏差，同时避免了梯度消失以及收敛速度更快的问题，本文将最后的平均池化层换成了全局平局池化，并在其后增加了三个内积层 $f_{c1}, f_{c2}, f_{c3}$ ， f

最低0.47元/天解锁文章

Xieyuanli_Chen

关注

4
点赞
踩
9

收藏

觉得还不错? 一键收藏
9
评论
Deep Auxiliary Learning for Visual Localization and Odometry 基于深度辅助学习的视觉定位和里程计

本博客仅为作者记笔记之用，不对之处，望见谅，欢迎批评指正。更多相关博客请查阅：http://blog.csdn.net/weixin_39779106；如需转载，请附上本文链接：http://blog.csdn.net/weixin_39779106/article/details/79689208原论文链接一、摘要二、介绍三、相关工作（略）四、深度位...
复制链接

扫一扫

专栏目录