第一阶段文献总结

(一)Robust Ego and Object 6-DoF Motion Estimation and Tracking

1、系统功能和亮点

功能:本文提出了一种鲁棒性解决方案,以实现动态多体视觉里程计(相机自运动和视野内物体运动同时tracking)的精确估计和一致跟踪能力。

This paper proposes a robust solution to achieve accurate estimation and consistent track-ability for dynamic multibody visual odometry. 

亮点:利用实例级分割和精确光流估计的最新进展,提出了一种紧凑有效的框架。提出了一种联合优化SE(3)运动和光流的新方法,提高了跟踪点的质量和运动估计精度。

 A compact and effective framework is proposed leveraging recent advances in semantic instance-level segmentation and accurate optical flow estimation. A novel formulation, jointly optimizing SE(3) motion and optical flow is introduced that improves the quality of the tracked points and the motion estimation accuracy. 

2、系统结构流程

(1)Image Preprocessing(图像预处理)

  • dense optical flow calculation(by PWC-Net [6]. The model is trained on FlyingChairs dataset [22], and then fine-tuned on Sintel [23] and KITTI training datasets [20].)
  • instance level semantic segmentation(by Mask R-CNN,The model of this method is trained on COCO dataset , and it is directly used without fine-tuning)
  • 如果是stere camera还需计算depth map
  • 输出:image mask, the depth and the dense flow for the static Is, Ds and Φs and dynamic {Io, Do, Φo} parts of the scene

(2)Ego-motion Estimation

  • 在实例分割后的无label部分提取稀疏FAST特征点(construct a sparse feature set Ps in each frame.),特征点帧间匹配从dense optical flow中获得
  • 用两个模型(一个是由上一帧传播得到,一个是通过P3P得到)计算,留下inliner最多的结果作为初始值,并利用(10)优化

(3)Object Motion Tracking

  • 区分有label的object是否为dynamic object:get scene flow of every points in one object,the object is recognised dynamic if the proportion of “dynamic” points is above a certain level, otherwise static.
  • 物体跟踪Object Tracking: use optical flow to associate point labels in accross frames
  • Object Motion Estimation:提取动态物体上的稠密特征点,对mask上的每三个取一个,然后用Ego-motion Estimation中的运动模型初始化,并用(10)优化:

3、相关实验及结果

(1)评价指标

  • ego & object的pose评价:The pose change error is obtained as: E = Xˆ −1 X. Translation error Et is computed as the L2 norm of translational component in E. Rotation error ER is measured as the angle in axis-angle representation of rotation part of E.
  • 光流估计评价:The optical flow is evaluated using end-point error (EPE) [26].

(2)Virtual KITTI Dataset实验

  • 实验目的:通过在数据groundtruth上加入噪声,分析光流计算精度和图像深度信息精度对运动估计的影响
  • 实验数据集:In Virtual KITTI Dataset selected a representative set that contains multiple moving objects for analysis. so chose S18-F124-134 whichare mainly translating,(S01-F225-235) contains the agent car (camera) turning left into the main street, (S01-F410-418) contains static camera observing one car turning left at the crossroads
  • 实验结果:depth的精度对运动估计精度影响较小,optical flow精度对运动估计结果影响较大(尤其在画面中有很远的物体时)

(3)Real KITTI Dataset

  • 实验目的:与ORB-SLAM2对比证明本文运动估计精度更高
  • 数据集:KITTI tracking dataset, there are 21 sequences with ground truth camera and object poses. For camera motion, we compute the ego-motion error on all the sequences (12 in total) except the ones that the camera is not moving at all.

4、未来工作:整合为一个SLAM系统,即VDO-SLAM.

5、扩展文献

  • 其他动态SLAM方案:多相机SLAM系统 [8] D. Zou and P. Tan, “CoSLAM: Collaborative Visual SLAM in Dynamic Environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 2, pp. 354–366, 2013
  • 动态物体判定:[18] Z. Lv, K. Kim, A. Troccoli, D. Sun, J. M. Rehg, and J. Kautz, “Learning Rigidity in Dynamic Scenes with A Moving Camera for 3D Motion Field Estimation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 468–484.
  • 用到的网络[4] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” ´ in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961–2969                [6] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “PWC-NET: CNNs for Optical Flow using Pyramid, Warping, and Cost Volume,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8934–8943.
  • Dataset相关:[19] A. Gaidon, Q. Wang, Y. Cabon, and E. Vig, “Virtual Worlds as Proxy for Multi-Object Tracking Analysis,” in CVPR, 2016.(  Virtual KITTI Dataset)                                  [20] A. Geiger, P. Lenz, and R. Urtasun, “Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2012.(Real KITTI Dataset)

(二)Dynamic SLAM: The Need For speed

1、系统功能和亮点

功能:本文提出了一种新的无模型、对象感知的基于特征点的动态SLAM方法。此方法利用图像的语义信息同时定位机器人、map静态结构、估计移动对象的完整SE(3)姿势变化并构建世界的动态表示。还充分利用刚性物体运动来提取物体的速度信息。

In this paper, we propose a novel model-free, object-aware point-based dynamic SLAM approach that leverages imagebased semantic information to simultaneously localise the robot, map the static structure, estimate a full SE(3) pose change of moving objects and build a dynamic representation of the world. We also fully exploit the rigid object motion to extract velocity information of objects in the scene (Fig. 1), an emerging task in autonomous driving which has not yet been thoroughly explored [9].

亮点:

  • 提出了一种新的姿态变化表示法,用于对给定刚体的一组点的运动进行建模,并将该模型集成到SLAM优化框架中。
  • 由于系统能够进行语义分割并跟踪对象语义信息,因此算法可以不依靠对象的底层三维模型进行object运动估计。

The key innovation in the paper is a novel pose change representation used to model the motion of a collection of points pertaining to a given rigid body and the integration of this model into a SLAM optimisation framework. The resulting algorithm is agnostic to the underlying 3D-model of the object as long as the semantic detection and segmentation of the object can be tracked.

2、系统结构流程

 (1)前端(实质上就是Robust Ego and Object 6-DoF Motion Estimation and Tracking)

  • RGB-D或stere camera获得rgb图像和depth map
  • 对rgb图像进行实例分割和光流计算(与上文方法一致)
  • 特征点提取及object tracking:此文无论static 还是object上提取都是稀疏特征点。object上特征点的分布,以及object所占scene上的大小都会对估计结果产生影响
  • 获得ego & object 的运动估计(里程计获得)

(2)刚体运动模型(dynamic object上点的运动建模)

  • 不同时刻同一object上的同一点在世界坐标系下的变换关系如(5)式所示
  • object的速度v由(6)得出:

 

 

 

 (3)融合优化模型

  • 第一项:相机观测模型,优化相机位姿和所有3D点
  • 第二项:惯导(预积分)模型,优化相机位姿
  • 第三项:刚体运动模型,优化刚体运动和dynamic points

 

 (4)因子图优化(factor graph)

在此用Ceres、GTSM或G2O优化工具处理上述问题,处理时按照应用场景采用不同的运动模型:

  • In city scenarios:dynamic objects是速度实时变化的,所以优化模型中每时刻的H都是一个节点
  • A highway scenario:此时高速上都是匀速运行车辆,故可以简化为匀速模型,节点数量大大减少:

 

3、相关实验及结果

(1)评价指标

  • ego轨迹评价:Relative Translational Error (RTE) in %, that is the translational component of the error between the estimated and GT robot pose changes. Similarly, the Relative Rotational Error (RRE) in ◦/m is the rotational component of the same error. We also evaluate the Relative Structure Error (RSE) in % for all static and dynamic landmarks
  • object轨迹评价:the Object Motion Translation Error (OMTE) in %, the Object Motion Rotational Error (OMRE) in ◦/m and for driving scenarios, the Object Motion Speed Error (OMSE) in %.

(2)Virtual KITTI Dataset

  • 实验目的:通过在数据groundtruth上加入噪声,分析光流计算精度、图像深度信息精度和语义分割精度对运动估计的影响
  • 实验数据:同上文,操作也同
  • 实验结果:语义分割精度对SLAM系统结果影响最小(Errors in the camera and object motion estimation due to the use of MASK-RCNN compared to GT segmentation appear to be minimal.),depth的精度对运动估计精度影响较小,optical flow精度对运动估计结果影响最大(尤其在画面中有很远的物体时)

(3)KITTI dataset:与 SLAM (where dynamic objects are considered outliers) and SLAM+MOT solutions对比,评价本文SLAM系统在ego和object运动估计上的效果

4、未来工作

  • 此文针对RGB-D/Stere image系统,可以利用单目深度估计做一个单目系统
  • 在长时运行时,优化项过多会影响效率,可以采用边缘化或其他方法控制优化的图规模
  • 可以做一个定长滑窗,窗口内假设ego/object匀速,这样可以简化模型,并用此假设处理object遮挡等问题
  • 可以将后端ego & object运动估计结果反馈给前端tracking部分提高tracking精度,与分割部分结合也可以提高语义分割精度。

5、扩展文献

  • 前端提供位姿结果以便dynamic检测分割:[8] P. Wohlhart and V. Lepetit, “Learning descriptors for object recognition and 3d pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3109–3118.
  • 其他动态SLAM方案:[25] K. M. Judd, J. D. Gammell, and P. Newman, “Multimotion visual odometry (mvo): Simultaneous estimation of camera and third-party motions,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 3949–3956.                     [26] S. Yang and S. Scherer, “Cubeslam: Monocular 3-d object slam,” IEEE Transactions on Robotics, 2019.
  • 高效的增量式SLAM:[33] V. Ila, L. Polok, M. Solony, and P. Svoboda, “SLAM++-A highly ˇ efficient and temporally scalable incremental SLAM framework,” International Journal of Robotics Research, vol. Online First, no. 0, pp. 1–21, 2017.
  • 最新单目深度估计网络:[36] H. Ren, M. El-khamy, and J. Lee, “Deep robust single image depth estimation neural network using scene understanding,” arXiv preprint arXiv:1906.03279, 2019.

(三)VDO-SLAM

1、系统功能和亮点

功能:本文提出了一种新的基于特征的立体/RGB-D动态SLAM系统,该系统利用基于图像的语义信息来同时定位机器人,map静态和动态结构,并跟踪场景中刚性对象的运动。

In this paper, we propose VDO-SLAM, a novel featurebased stereo/RGB-D dynamic SLAM system, that leverages image-based semantic information to simultaneously localise the robot, map the static and dynamic structure, and track motions of rigid objects in the scene.

亮点:

  • 建立了机器人位姿、静态和动态3D点以及对象运动的统一估计框架,框架中提出了新的动态场景模型
  • 精确估计动态对象的SE(3)姿势变化,优于最先进的算法,并能提取场景中运动对象速度
  • 本文利用语义信息跟踪运动对象的鲁棒方法,能够处理由于语义对象分割失败而导致的间接遮挡问题

In summary, the contributions of this work are:

• a novel formulation to model dynamic scenes in a unified estimation framework over robot poses, static and dynamic 3D points, and object motions.

• accurate estimation for SE(3) pose change of dynamic objects that outperforms state-of-the-art algorithms, as well as a way to extract objects’ velocity in the scene,

• a robust method for tracking moving objects exploiting semantic information with the ability to handle indirect occlusions resulting from the failure of semantic object segmentation

2、系统结构流程

基本是上两篇文章的结合

(1)Pre-processing(预处理)

  • 实例分割获得label:Mask R-CNN (He et al. (2017)), to generate object segmentation masks. The model of this method is trained on COCO dataset (Lin et al. (2014)), and is directly used in this work without any fine-tuning.
  • 光流估计:此处依然采用dense optical flow(动态物体有时占画面比例较小,稀疏光流往往效果差;在分割失败时可以用稠密光流恢复mask),一般有传统方法和深度学习两种,此处为后者。这里是PWC-Net (Sun et al. (2018)). The model is trained on FlyingChairs dataset (Mayer et al. (2016)), and then fine-tuned on Sintel (Butler et al. (2012)) and KITTI training datasets (Geiger et al. (2012)).
  • 深度图获取:RGB-D/Stere image可以直接获得,对于单目系统利用深度估计方法获得,这里用MonoDepth2 (Godard et al. (2019)). The model is trained on Depth Eigen split (Eigen et al. (2014)) excluding the tested data in this paper.

(2)Tracking(跟踪)

  • 用于相机估计的特征点提取:提取一组稀疏的FAST特征,并运用光流进行跟踪。
  • 相机位姿估计:用两个模型(一个是由上一帧传播得到,一个是通过P3P得到)计算,留下inliner最多的结果作为初始值,并利用(13)同时优化位姿和光流:

  •  Dynamic Object Tracking(动态对象跟踪):通过scene flow方法判断object是否运动,只对动态object估计运动会提高效率;通过光流法将帧间object的label联系起来
  • Object Motion Estimation:提取动态物体上的稠密特征点,对mask上的每三个取一个,然后用Ego-motion Estimation中的运动模型初始化,并用(15)优化:

(3)Mapping(建图)

  • 后端图优化中的因子图结构:共有四种边,除了上文中的三种还包括object的 速度控制边smooth motion factors(we introduce smooth motion factors to minimise the change in consecutive object motions)

  • 局部优化 :在定步长的滑窗内只优化camera poses和静态结构,因为优化动态物体的运动和结构得不偿失(在object的motion没有强约束时,进行局部优化耗时且优化结果不好)
  • 全局优化:执行(21)式子,对于其中的3Dmap点,只有被track3次以上才加入factor graph进行优化
  • map对tracking的作用:来自上一帧的内点提供运功估计的3D点;上一帧的相机和object的运动可以为当前估计提供先验运动模型;动态物体上的稠密点可以帮助在语义分割失败时mask的传播

3、相关实验及结果

(1)实验数据:the Oxford Multimotion Dataset (Judd and Gammell (2019)) for indoor, and KITTI Tracking dataset (Geiger et al. (2013)) for outdoor scenarios

(2)对比对象:MVO (Judd et al. (2018)) and CubeSLAM (Yang and Scherer (2019)

(3)实验结果:除了在对纯旋转object的运动估计上不如MVO外,其他情况下都要好于MVO和CubeSLAM(原因:采用光流法对旋转后的物体跟踪精度下降),运行效率低:carried out on an Intel Core i7 2.6 GHz laptop computer with 16 GB RAM.

4、未来工作:因子图规模限制;历史动态物体特征点的管理(总结/删除),目前是全部保留的。

5、扩展文献

  • 前端提供位姿结果以便dynamic检测分割:[8] P. Wohlhart and V. Lepetit, “Learning descriptors for object recognition and 3d pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3109–3118.        Byravan A and Fox D (2017) SE3-Nets: Learning rigid body motion using deep neural networks. In: IEEE International Conference on Robotics and Automation (ICRA), 2017. IEEE, pp. 173–180(二选一读)
  • 目标跟踪相关:STAM-MOT:Chu Q, Ouyang W, Li H, Wang X, Liu B and Yu N (2017) Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 4836–4845.
  • 动态目标检测分割:Xu X, Fah Cheong L and Li Z (2018) Motion segmentation by exploiting complementary geometric models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2859–2867        EM算法:Hahnel D, Triebel R, Burgard W and Thrun S (2003) Map building with mobile robots in dynamic environments. In: IEEE International Conference on Robotics and Automation, 2003. Proceedings. ICRA’03, volume 2. IEEE, pp. 1557–1563
  • 其他动态SLAM方案:Judd KM, Gammell JD and Newman P (2018) Multimotion visual odometry (MVO): Simultaneous estimation of camera and third-party motions. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 3949–3956        Kundu A, Krishna KM and Jawahar C (2011) Realtime multibody visual SLAM with a smoothly moving monocular camera. In: IEEE International Conference on Computer Vision (ICCV), 2011. IEEE, pp. 2080–2087.        Yang S and Scherer S (2019) CubeSLAM: Monocular 3-D Object SLAM. IEEE Transactions on Robotics 35(4): 925–938

(四)DynaSLAM

1、系统功能和亮点

功能:本文介绍了一个基于ORB-SLAM2的视觉SLAM系统,它增加了动态目标检测和背景修复的功能。

亮点:

  • DynaSLAM在单目、立体和RGB-D配置的动态场景中非常鲁棒。
  • 可以通过多视图几何、深度学习或两者兼而有之的方法来检测移动的物体:在单目或立体视觉中仅仅使用语义分割获得移动物体先验信息;在RGB-D模式下使用多视图几何+深度学习的方式,这样可以检测移动的静态类物体。
  • 可以通过多视图几何的方法,补全被动态物体遮挡的3Dmap。

2、系统结构流程

 

(1)语义分割(Segmentation of Potentially Dynamic Content Using a CNN):use Mask R-CNN [19],but this work didn't need the instance labels;use the TensorFlow implementation by Matterport.,the network, trained on MS COCO [20], could be fine-tuned with new training data (https://github.com/matterport/Mask_RCNN)

(2)初步tracking(Low-Cost Tracking):这一步是初步得到相机位姿,故相比与ORB-SLAM2的track部分要简化一下,只用将map feature投影到image上进行匹配然后最小化重投影误差即可。

(3)Mask R-CNN 与多视图几何共同分割

  • 寻找与当前帧共视最多且相互距离较远的五个关键帧;
  • 计算这些关键帧上的keypoints投影到当前帧的像素点(x')和投过来后的深度(Zproj),与当前帧上匹配的特征点的像素值和深度(x和Z)相减得到视差:a=x'-x,深度差:d=Zproj-Z,当这两个差值大于阈值,此keypoint属于动态点。
  • 对于得到的动态点,提出边缘点:如果这一keypoint在深度图上其邻域梯度较大,就变为static
  • 此时得到动态特征点,但这些点是稀疏的,通过对稀疏动态点进行深度图上的区域生长(region growth algorithm)得到所有动态物体相关像素。(To classify all the pixels belonging to dynamic objects, we grow the region in the depth image around the dynamic pixels [21].)
  • 多视图动态检测结果与分割结果融合

(4)Tracking and Mapping:对label为static的部分提取ORB特征点,并去除边缘点,然后进行tracking & mapping

(5)背景修补

  • 通过tracking得到的相机位姿,利用多视图几何方法将与当前帧共视的20个关键帧的RGB图像和depth map投影到当前帧的RGB和depth map上,实现背景修补
  • 修补之后有留白(left blank),这些空洞需要通过其他复杂的,如深度学习方法(GANs)进行填补

3、相关实验及结果:定位结果改善不大

4、未来工作:空洞填补

5、扩展文献

  • 动态物体检测方法:[12] S. Li and D. Lee, “RGB-D SLAM in dynamic environments using static point weighting,” IEEE Robot. Autom. Lett., vol. 2, no. 4, pp. 2263–2270, Oct. 2017.    [16] Y. Sun, M. Liu, and M. Q.-H. Meng, “Improving RGB-D SLAM in dynamic environments: A motion removal approach,” Robot. Auton. Syst., vol. 89, pp. 110–122, 2017.   [18] R. Ambrus, J. Folkesson, and P. Jensfelt, “Unsupervised object segmentation through change detection in a long term autonomy scenario,” in Proc. IEEE-RAS 16th Int. Conf. Humanoid Robots (Humanoids), 2016.
  • 区域生长算法:[21] N. L. Gerlach, G. J. Meijer, D.-J. Kroon, E. M. Bronkhorst, S. J. Berge,´ and T. J. J. Maal, “Evaluation of the potential of automatic segmentation of the mandibular canal,” Brit. J. Oral Maxillofacial Surgery, vol. 52, no. 9, pp. 838–844, 2014.
  • 空洞补全算法:[24] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proc. Comput. Vision Pattern Recognit., 2016, pp. 2536–2544.

(五)DynaSLAM 2

1、系统功能和亮点

功能:本文使用实例语义分割和ORB特征来跟踪动态对象,并提出了一个新的BA优化进行静态结构、动态物体、相机轨迹和移动物体轨迹的联合优化,同时object的三维边界框(3D bounding-box)也会在固定的时间窗口内进行估计和解耦优化。

DynaSLAM II makes use of instance semantic segmentation and of ORB features to track dynamic objects. The structure of the static scene and of the dynamic objects is optimized jointly with the trajectories of both the camera and the moving agents within a novel bundle adjustment proposal. The 3D bounding boxes of the objects are also estimated and loosely optimized within a fixed temporal window.

亮点:

  • 提出了一种BA优化方案,该方案在局部时间窗口中紧耦合优化场景结构、相机姿态和对象轨迹。此优化方案充分考虑了计算复杂度和参与优化的参数数量,从而降低了算力消耗,使系统能实时运行。
  • 一部分算法针对一类物体选取特定参考系,另一部分可跟踪所有动态物体,但物体轨迹的参考系是随机的,本文结合两者的优点
  • 对象的边界框也在解耦公式中进行了优化,该公式允许在不针对任何特定用例的情况下估计对象的尺寸和6自由度位姿。

We propose a bundle adjustment solution that tightly optimizes the scene structure, the camera poses and the objects trajectories in a local-temporal window. The bounding boxes of the objects are also optimized in a decoupled formulation that allows to estimate the dimensions and the 6 DoF poses of the objects without being geared to any particular use case.

2、系统结构流程

(1)目标物(objects)的信息提取和关联:

  • 单帧图像处理:像素级实例分割,ORB特征点提取并进行立体视图中的匹配
  • 目标创建:如果分割出来的实例属于动态类物体,且这一实例上的ORB够多,就将此实例创建为一个目标(object),将这些ORB特征点赋给它属于的实例和object
  • 初始相机位姿估计:首先根据静态点的匹配关系(与kf和map)初始估计相机位姿
  • 当前帧上动态点与map中动态目标(dynamic object)关联:其次将动态点与局部地图(local map)中的动态点关联(匹配)起来,有两种方法:若map中objects速度已知,可用速度恒定假设进行重投影匹配;若不已知或速度恒定法结果较差,就暴力匹配(brute force matching)
  • 当前帧实例与objects关联:实例ID就是与其匹配点最多object的ID;同时计算当前帧与共视帧实例的2D bounding box的IoU,来传递实例的ID

(2)object的建模

       与VDO-SLAM中将object上的点表示在世界坐标系下不同,此文将object上的点表示为object的固定坐标系下(其参考点为此object第一次被发现时的中心点),可以减少优化的动态点数量从而减少优化时parameters数量:N = 6Nc + Nc × No × 3Nop减少为N'= 6Nc + Nc × 6No + No × 3Nop,示意图如下:

 

 (3)联合BA优化

  • BA优化因子图及残差项:

优化项(即因子图的节点):

  

总优化目标函数(即因子图的边):

 其中有两残差项与objects速度相关:

  • 关键帧(kf)插入:camera跟踪较弱;任意目标跟踪较弱时都插入关键帧
  • 分情况优化:如果仅因为相机跟踪较弱而插入关键帧,本地BA优化当前关键帧、共视图中与其连接的所有关键帧以及这些关键帧共视map点;如果仅因为object跟踪较弱而插入关键帧,则只优化object位置、速度、其上动态点和前两秒相机位姿;若两情况都有则都优化。

(4)Bounding Boxes的生成及优化

  • 通过搜索两个与大多数对象点大致匹配的垂直平面来初始化对象边界框。
  • 在只找到一个平面的情况下,我们在与对象类相关的不可观测方向的粗略维度上添加先验知识。这个过程是在RANSAC方案中完成的:我们使用CNN 2D边界框选择计算出的3D边界框,该边界框具有其图像投影的最大IoU。
  • 对于每个对象,此边界框计算一次
  • 为了优化边界框尺寸及其相对于对象跟踪参考系的位姿,在时间窗口内执行基于图像的优化。此优化旨在最小化3D边界框图像投影和CNN 2D边界框预测之间的距离
  • 对于不可观的视图(如只能看到车尾),可以添加此类物体的尺寸先验来生成bounding box。

3、相关实验及结果

  • 实验数据:KITTI tracking and raw datasets
  • 对比对象:VDO-SLAM、ClusterSLAM、ClusterVO
  • object tracking评价指标:report the CLEAR MOT metric MOTP (用了两种:MOTPBV和MOTP3D)as well as the common trajectory error metrics.(1、CLEAR MOT metrics to allow for objective comparison of tracker characteristics, focusing on their precision in estimating object locations, their accuracy in recognizing object configurations and their ability to consistently label objects over time.   2、MOTP stands for multiple object tracking precision. It is the predictions precision computed with any given cost function over the number of TPs.)

4、未来工作

  • 探索使用单目摄像机进行多目标跟踪和SLAM。这是一个有趣的方向,因为动态对象跟踪可以提供有关地图比例的丰富线索。
  • 利用动态物体稠密信息来改善3D bounding box估计精度。

5、扩展文献

  • SLAM与objec tracking分离方案(三选二): 1、EM算法:[16] J. G. Rogers, A. J. Trevor, C. Nieto-Granda, and H. I. Christensen, “SLAM with expectation maximization for moveable object tracking,” in IEEE International Conf. on Intelligent Robots and Systems, 2010.     2、动态场景稠密建图:[17] I. A. Barsan, P. Liu, M. Pollefeys, and A. Geiger, “Robust dense ˆ mapping for large-scale dynamic environments,” in IEEE ICRA, 2018.            3、VI方案[18] A. Rosinol, A. Gupta, M. Abate, J. Shi, and L. Carlone, “3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans,” arXiv:2002.06289, 2020.
  • SLAM与深度学习融合方案(三选一):[21] M. Runz and L. Agapito, “Co-fusion: Real-time segmentation, tracking ¨ and fusion of multiple objects,” in IEEE ICRA, pp. 4471–4478, 2017.       [22] M. Runz, M. Buffier, and L. Agapito, “Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects,” in IEEE International Symposium on Mixed and Augmented Reality, 2018.     [23] B. Xu, W. Li, D. Tzoumanikas, M. Bloesch, A. Davison, and S. Leutenegger, “MID-fusion: Octree-based object-level multi-instance dynamic SLAM,” in IEEE ICRA, pp. 5231–5237, 2019.
  • 其他相似动态SLAM方案:1、多视图增强算法使之object检测效果好:[10] P. Li, T. Qin, et al., “Stereo vision-based semantic 3D object and ego-motion tracking for autonomous driving,” in IEEE ECCV, 2018.      2、 [11] S. Yang and S. Scherer, “CubeSLAM: Monocular 3-D object SLAM,” IEEE Transactions on Robotics, vol. 35, no. 4, pp. 925–938, 2019.        3、detecting bounding boxes效果好: [12] J. Huang, S. Yang, T.-J. Mu, and S.-M. Hu, “ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings,” in IEEE CVPR, pp. 2168–2177, 2020.        4、 [24] M. Henein, G. Kennedy, R. Mahony, and V. Ila, “Exploiting rigid body motion for SLAM in dynamic environments,” IEEE ICRA, 2018.         5、[25] J. Huang, S. Yang, Z. Zhao, Y.-K. Lai, and S.-M. Hu, “ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation,” in IEEE ICCV, pp. 5875–5884, 2019.
  • 分割网络:[29] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “Yolact: Real-time instance segmentation,” in ICCV, pp. 9157–9166, IEEE, 2019
  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值