TVCG-2019-HeteroFusion: Dense Scene Reconstruction Integrating Multi-sensors多传感器融合密集场景重建

最新推荐文章于 2023-07-03 14:52:15 发布

Guoliang Li

最新推荐文章于 2023-07-03 14:52:15 发布

阅读量590

点赞数

分类专栏： paper

本文链接：https://blog.csdn.net/qq_35740095/article/details/104252514

版权

paper 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

TVCG-2019-HeteroFusion: Dense Scene Reconstruction Integrating Multi-sensors多传感器融合密集场景重建

Abstract

We present a novel approach to integrate data from multiple sensor types for dense 3D reconstruction of indoor scenes in realtime.
Existing algorithms are mainly based on a single RGBD camera and thus require continuous scanning of areas with sufficient geometric features. Otherwise, tracking may fail due to unreliable frame registration.
Inspired by the fact that the fusion of multiple sensors can combine their strengths towards a more robust and accurate self-localization, we incorporate multiple types of sensors which are prevalent in modern robot systems, including a 2D range sensor, an inertial measurement unit (IMU), and wheel encoders. We fuse their measurements to reinforce the tracking process and to eventually obtain better 3D reconstructions.
Specifically, we develop a 2D truncated signed distance field (TSDF) volume representation for the integration and ray-casting of laser frames, leading to a unified cost function in the pose estimation stage.
For validation of the estimated poses in the loop-closure optimization process, we train a classifier for the features extracted from heterogeneous sensors during the registration progress. To evaluate our method on challenging use case scenarios, we assembled a scanning platform prototype to acquire real-world scans. We further simulated synthetic scans based on high-fidelity synthetic scenes for quantitative evaluation.
Extensive experimental evaluation on these two types of scans demonstrate that our system is capable of robustly acquiring dense 3D reconstructions and outperforms state-of-the-art RGBD and LiDAR systems.
提出了一种基于多传感器类型的数据集成方法，实现了室内场景的实时密集三维重建。
现有的算法主要基于单RGBD摄像机，需要对具有足够几何特征的区域进行连续扫描。否则，跟踪可能会由于不可靠的帧配准而失败。
基于多个传感器的融合可以将各自的优势结合起来，实现更健壮、更精确的自定位，将现代机器人系统中普遍存在的多种类型的传感器结合起来，包括2D距离传感器、惯性测量单元(IMU)和车轮编码器.融合他们的测量来加强跟踪过程，并最终获得更好的三维重建。
具体地，开发了一个 二维截断符号距离场(TSDF) 表示用于激光帧的集成和射线投射，从而在位姿估计阶段得到一个统一的损失函数。
为了验证闭环优化过程中估计的姿态，训练了一个分类器来识别在配准过程中从异构传感器中提取的特征。为了评估在具有挑战性的用例场景中的方法，组装了一个扫描平台原型来获取真实世界扫描。进一步基于高保真合成场景的模拟合成扫描来定量评价。
对这两种扫描的广泛的实验评估表明，系统能够鲁棒地获取密集的三维重建，并优于最先进的RGBD和激光雷达系统。

1.Introduction

THE continuing development of 3D reconstruction systems [1], [2], [3], [4] and the growth of available dense scenes [5], [6] have significantly improved modern scene understanding and manipulation techniques [7], [8].
However, the current data acquisition process still heavily relies on experienced users to hold and smoothly move the RGBD camera, which involves high labor costs.
In order to support cost-effective mass acquisition of 3D scenes, delegating scanning missions to robots is highly demanded.
随着三维重建系统[1]、[2]、[3]、[4]的不断发展，以及可用的密集场景[5]、[6]的增长，极大地提高了现代场景理解和处理技术[7]、[8]。
然而，目前的数据采集过程仍然严重依赖于有经验的用户来手持并平稳移动RGBD相机，这需要较高的人工成本。
为了支持具有成本效益的大规模3D场景采集，需要将扫描任务委托给机器人来完成。
In order to achieve an automatic acquisition and reconstruction scheme, these modern reconstruction algorithms are also required to be improved for cooperating with modern motion planning strategies [9], [10], since their performance may decline significantly when deployed for vehicle scanning instead of hand-held scanning.
This is essentially due to a reduced degree-of-freedom for camera motion that restricts the RGBD sensor from continuously focusing on regions with sufficient geometric details, i.e., the actions of robots are not as flexible as humans for staying at good shooting views containing sufficient registration hints for localization and mapping.
For example, when crossing through different regions of interest in indoor scenes, the robot may choose a relatively clear path, where currently available commodity depth sensors, whose precise scanning range is short and field-of-view is narrow, are not able to capture sufficient registration hints for tracking.
为了实现一种自动采集和重建方案，这些现代重建算法在配合现代运动规划策略[9]、[10]时也需要进行改进，因为在进行车辆扫描而不是手持扫描时，它们的性能可能会显著下降。
这主要是由于相机运动的自由度降低，限制了RGBD传感器连续聚焦于具有足够几何细节的区域，即，机器人的动作不像人类那样灵活，不能停留在有足够的定位和映射注册提示的良好拍摄视图中。
例如，当穿过室内场景中不同的兴趣区域时，机器人可能会选择一条相对清晰的路径，但是现有的深度传感器，其精确的扫描范围较短，视场较窄，无法捕获足够的跟踪线索。
In robotics, with an aim to enhance localization, pioneering work integrates multiple sensors for a wide range of diverse robotic perception tasks, as summarized in [11], which shows a feasible strategy to solve the above mentioned challenge.
For indoor scenarios, 2D laser scanners are the preferable choice for localizing the chassis of wheeled robots on account of both cost and effectiveness.
Gmapping [12] and the state-of-the-art Cartographer [13] are two profound systems coupling laser frames, inertial measurements, and wheel encoders for reconstructing planar occupancy maps.
Recently, several systems considering both visual and laser information were also proposed for a wide variety of scenarios such as Unmanned Aerial Vehicles (UAVs) [14], autonomous driving vehicles [15], [16], and indoor positioning tasks [17], [18].
Comparatively, in the field of reconstruction, both an appropriate data structure for precisely fusing multiple measurements and a higher accuracy of localization are required for generating high fidelity dense representations, whereas current multi-sensor fusion methods [11] have not put these sensors into a unified optimization process, leading to insufficient utilization of advantages from different types of sensors in both pose estimation and loop closure handling.
在机器人领域，为了提高定位能力，开创性的工作是将多个传感器集成在一起，完成各种各样的机器人感知任务，如[11]所总结的，这为解决上述挑战提供了一种可行的策略。
对于室内场景，基于成本和效率的考虑，二维激光扫描仪是轮式机器人底盘定位的最佳选择。
Gmapping[12]和SOTA Cartographer[13]是两个典型的系统，它们耦合激光框架、惯性测量和车轮编码器用于重建平面占位图。
最近，一些同时考虑视觉和激光信息的系统也被提出，用于各种各样的场景，如无人机[14]、自动驾驶车辆[15]、[16]和室内定位任务[17]、[18]。
另外，在高保真密集的表征上,精确融合多种测量的适当数据结构和高定位精度都是需要的，而目前的多传感器融合方法[11]并没有把这些传感器统一优化的过程,导致不同类型的传感器在姿态估计和循环闭环处理的利用率不足。
Inspired by the general idea that multi-sensor fusion is beneficial for promoting the quality and robustness of localization and mapping [11], we present a robust real-time dense reconstruction system coupling information from an RGBD camera, a horizontally placed 2D laser scanner, an inertial measurement unit (IMU), and wheel encoders, which are typically equipped on indoor robots. The main contributions of this paper are
多传感器融合有利于促进定位和建图的质量和鲁棒性[11],我们提出一个健壮的密度实时重建系统，它能耦合RGBD相机、水平放置的二维激光扫描仪,惯性测量单元(IMU),车轮编码器的信息,这些通常是室内机器人所装备的。本文的主要贡献是
1. We present a novel real-time dense scene reconstruction system for robotic scanning using multimodal sensors, which outperforms the generic way of multisensor fusion [11], [19]. Specifically, we replace the occupancy grid by the truncated signed distance field (TSDF) representation for 2D laser frames, so as to reformulate the cost function in the pose estimation stage for maintaining better accuracy.
  提出了一种基于多模态传感器的机器人扫描密集场景实时重建系统，优于普遍的多传感器融合。具体来说，我们将占用网格替换为截断符号距离场(TSDF)二维激光帧表示，以便在位姿估计阶段修正损失函数，以保持更好的精度。
2. We propose a new pose evaluation classifier considering features derived from both sensor readings and the progress of pose estimation. Such a classifier helps to determine correct loop closures used for reducing the cumulated drifts from the sequential frame-to-model registration
  提出了一种新的姿态评估分类器，该分类器既考虑了传感器读数的特征，又考虑了姿态估计的过程。这样的分类器有助于确定正确的闭环，用于减少序列框架到模型注册的累积漂移。
3. A benchmark for evaluating the quality of mesh reconstruction is developed, where the robotic scanning process is simulated using synthetic scenes for quantitative evaluation. The benchmark will be made publicly-available for facilitating future research.
  提出了一种评价网格重建质量的基准方法，利用合成场景对机器人扫描过程进行定量评价。该基准将被公开，以方便未来的研究。
In order to test the performance of our proposed algorithm, we also assembled a simple robot platform (Fig. 7) for scanning real-world scenes.
Extensive evaluations on both real and simulated scans demonstrate that our proposed system, which tightly couples laser and RGBD measurements for a unified tracking and loop optimization process, is capable of maintaining sufficient accuracy for indoor scenarios, outperforming several state-of-the-art reconstruction methods [2], [3], [13], [20], even when they are enhanced with initial pose hints directly provided by a classical probabilistic approach [19] for coupling multiple sensors.
为了测试算法的性能，我们还组装了一个简单的机器人平台(图7)来扫描真实场景。
对真实的和模拟扫描的广泛评估表明,系统紧密结合激光和RGBD测量统一跟踪和循环优化,对于室内场景能够保持足够的准确性,表现优于SOTA重建方法[2],[3],[13],[20],即使这些SOTA方法基于古典概率[19]耦合多个传感器提供初始位姿线索来做增强。

2.RELATED WORK

In this section, we first review dense reconstruction techniques for indoor scenes, and further discuss relevant multisensor systems in the field of simultaneous localization and mapping (SLAM).
在本节中，我们首先回顾了室内场景的密集重建技术，并进一步讨论了同步定位和建图(SLAM)领域的相关多传感器系统。
Dense Scene Reconstruction. As a milestone in realtime dense reconstruction, KinectFusion [1] using consumer level RGBD cameras has aroused great interests in the graphics community. This system uses TSDF volumes to store the reconstructed scenes, achieving real-time tracking and integration with the help of GPU
密集的场景重建。KinectFusion[1]采用消费者级RGBD摄像机，是实时密集重建的一个里程碑，引起了图形学界的极大兴趣。本系统利用TSDF存储重建的场景，借助GPU实现实时跟踪和集成。
In order to enlarge the size of reconstruction, the original KinectFusion has to reduce the resolution of the volumes due to the limited capacity of the graphical memory.
Whelan et al. [21] present a strategy to perform memory swapping according to the current sensor pose, where distant voxels are stored on the host and nearby voxels are loaded in GPU.
Such a strategy is further enhanced by a voxel hashing data structure [22] that significantly improves the utilization of memory, based on the observation that surface voxels are sparsely distributed in common scenes
为了扩大重建的尺寸，原来的KinectFusion由于图形存储器的容量有限，不得不降低体块的分辨率。
Whelan等人提出了一种根据当前传感器位姿执行内存交换的策略，其中远程体素存储在主机上，而附近的体素在GPU中加载。
这种策略通过体素hash数据结构[22]得到了进一步的增强，该hash结构显著地提高了内存的利用率，这是基于在常见场景中表面体素是稀疏分布的这一观察结果得出的。
In another aspect, the accuracy of sequential tracking is also continually improved. In the field of reconstruction, estimating sensor poses is performed through frame-to-model registration (actually with ray-casted frames), and the geometric cost from depth frames for such registration is enhanced with an additional photometric cost [23] based on color frames. Image pyramids [24] are also involved to present coarse-to-fine registration for faster convergence.Recently, sparse correspondences of keypoints were also taken into consideration [3] to avoid falling into erroneous correspondences
在另一方面，连续跟踪的精度也在不断提高。在重建领域，传感器姿态的估计是通过框架到模型的配准(实际上是使用射线框架)来完成的，这种配准的深度框架的几何代价是通过基于彩色框架的额外光度成本[23]来增强的。图像金字塔[24]也涉及提出粗到细的配准，以更快的收敛。为了避免陷入错误的对应，我们还考虑了关键点的稀疏对应[3]。
For these pipelines, the most common way of integrating heterogeneous sensors is to use them for providing a good initial estimation of relative transformations.
This strategy has been used in reconstruction with mobile devices [25], where 3 degree-of-freedom prediction of consecutive rotations inferred via a gyroscope is used to initialize the pose iteration.
However, since they are only used for initialization, there is no guarantee that such hints will eliminate poor alignment and tracking loss during the optimization process (see Sec. 4.2 for a comparison of derived systems utilizing such a strategy).
Therefore, it is necessary to enhance the prediction in a tightly-coupled manner, which we study in this paper.
This is especially appropriate for deploying such systems on robots, where the scanning path can easily contain regions with limited features.
对于这些pipelines，集成异构传感器的最常见方法是使用它们来提供对相关转换的良好初始估计。
该策略已应用于移动设备[25]的重建中，其中通过陀螺仪推断出的连续旋转的3自由度预测用于初始化姿态迭代。
但是，由于这些仅用于初始化，因此不能保证这些线索能消除优化过程中较差的对齐和跟踪损失(使用这种策略的派生系统的比较，请参见第4.2节)。
因此，有必要采用紧耦合的方法来提高预测精度。
这特别适合在机器人上部署这样的系统，因为扫描路径可以轻松包含具有有限特征的区域。

Guoliang Li

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
TVCG-2019-HeteroFusion: Dense Scene Reconstruction Integrating Multi-sensors多传感器融合密集场景重建

TVCG-2019-HeteroFusionHeteroFusion: Dense Scene Reconstruction Integrating Multi-sensorsAbstractHeteroFusion: Dense Scene Reconstruction Integrating Multi-sensorsAbstract—We present a novel approa...
复制链接

扫一扫