目录
1. ATE RMSE,RPE RMSE等指标能间接说明不同slam算法建图效果好坏吗?
a. 文献[4] Robust reconstruction of indoor scenes
c. Multi-resolution surfel maps for efficient dense 3D modeling and tracking
a. 文献[7] Real-time large-scale dense RGB-D SLAM with volumetric fusion
[1] A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM
[2] ElasticFusion: Dense SLAM without a pose graph
[4] Robust reconstruction of indoor scenes
[5] Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects
[6] Real-time non-rigid reconstruction using an RGB-D camera
[7] Real-time large-scale dense RGB-D SLAM with volumetric fusion
[8] Co-fusion: Real-time segmentation, tracking and fusion of multiple objects
[9] Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals
[10] Kimera: an open-source library for real-time metric-semantic localization and mapping
[11] Densifying sparse vio: a mesh-based approach using structural regularities
[12] Fusion4d: Real-time performance capture of challenging scenes
问题:请问怎么评价slam建图效果,ATE RMSE,RPE RMSE等指标能间接说明不同slam算法建图效果好坏吗,谢谢!
答:感谢提问!我对这个问题也非常感兴趣,所以花了一点时间做了些调查。由于知识星球对富格式文本支持不太好,支持添加到图片个数也十分有限,我把完整的回答内容放在了附图中。
----
这里的“slam建图”我理解为类似RGB-D SLAM所进行的稠密建图哈。
1. ATE RMSE,RPE RMSE等指标能间接说明不同slam算法建图效果好坏吗?
答:一般是的。
1.1 支持轨迹精度可以代表建图精度
a. 文献[4] Robust reconstruction of indoor scenes
文献[4]中在评估轨迹精度时有直接的表达:
- “Trajectories estimated by our approach are considerably more accurate, with average RMSE reduced by a factor of 2.2 relative to the closest alternative approach. Note that trajectory accuracy is only an indirect measure of reconstruction accuracy.”
- 认为轨迹精度是对重建精度的间接度量,但是还是使用了专门的指标(见下文)对重建精度进行直接度量。
b. 文献[8] Co-Fusion
而Co-Fusion[8]中有这样一段比较含蓄的话:
- “To assess the quality of the fusion, one could either inspect the 3D reconstruction errors of each object separately or jointly, by exporting the geometry in a unified coordinate system. We used the latter on the synthetic sequences. This error is strongly conditioned on the tracking, but nicely highlights the quality of the overall system.
- (为了评估融合的质量,可以通过将几何信息输出到统一的坐标系中,单独或联合地检测每个物体的三维重建误差。我们在合成序列上使用后者。这一误差是基于追踪的状态,但却影响了整个系统的质量).
- ”这里的"this error"指的是度量建图的误差,所以我们也可以理解为Co-Fusion认为好的轨迹精度能够带来更好的建图效果,反之亦然。不过Co-Fusion还是选择使用专门的指标(这篇回答的下文有讲)对系统整体建图精度进行更好的度量。
c. Multi-resolution surfel maps for efficient dense 3D modeling and tracking
- 而文献:[Stückler, Jörg, and Sven Behnke. "Multi-resolution surfel maps for efficient dense 3D modeling and tracking." Journal of Visual Communication and Image Representation 25.1 (2014): 137-147.]的6.2节说得是场景重建的评估,但是还是使用了轨迹精度进行定量实验。
1.2 不太支持轨迹精度可以代表建图精度
但是也有些文献持比较坚决的意见,认为轨迹误差并不能够直接用来表达建图精度。
a. 文献[7] Real-time large-scale dense RGB-D SLAM with volumetric fusion
文献[7]认为,轨迹估计精度高并不意味着建图的精度就高:
- "We present a number of quantitative and qualitative results on evaluating the surface reconstructions produced by our system. In our experience a high score on a camera trajectory benchmark does not always imply a high-quality surface reconstruction due to the frame-to-model tracking component of the system. In previous work we found that although other methods for camera pose estimation may score better on benchmarks, the resulting reconstructions are not as accurate if frame-to-model tracking is not being utilized (Whelan et al., 2013a)
- (我们提出了一些定量和定性的结果来评估由我们的系统产生的表面重建。根据我们的经验,由于系统的帧到模型跟踪组件,摄像机轨迹基准上的高分并不总是意味着高质量的表面重建。在之前的工作中,我们发现,尽管其他的摄像机姿态估计方法可能在基准上取得更好的分数,但如果没有利用帧到模型的跟踪,得到的重建结果就不那么准确(Whelan et al., 2013a)。)."
b. ICL-NUIM 数据集[1]
- ICL-NUIM 数据集[1]论文结尾部分也提到:"Further, we have evaluated a number of existing visual odometry methods within the Kintinuous pipeline and shown through experimentation that a good trajectory estimate, which previous to this paper was the only viable benchmark measure, is not indicative of a good surface reconstruction.
- (此外,我们评估了kincontinuous管道内现有的许多视觉测程方法,并通过实验表明,良好的轨迹估计(在本文之前是唯一可行的基准测量)并不意味着良好的表面重建)"。认为轨迹精度好并不表示表面重构就准确。
2. 建图的度量指标,或者说如何评价稠密建图的结果
基本上都需要将稠密建图得到的模型和模型真值进行对齐,对于每个建图模型中的点或小三角平面(模型是使用多个小三角形组成Mesh的情况),计算到它最近的模型真值顶点的距离,或者是到最近模型真值三角平面的距离。得到这些距离之后,会使用
不同的评估方式:
- 评估方式1:计算均值/均方根误差/中位数等统计量,作为对建图准确程度的描述;
- 评估方式2:绘制热图,将定量的评估结果,进行定性地展示;
- 评估方式3:统计不同距离区间内点的分布情况。
也有一些其他的方式,比如:
- 评估方式4:以当前重建模型为参考模型,对于真值模型中的每个点进行上述操作来得到建图完整性的度量(?)
- 评估方式5:对于每一帧,使用该帧的位姿去“观测”重建的模型,得到所谓“Predicted Image”对深度图和彩色图(即将模型反投影到对应相机位姿的图像平面),然后将这深度图像上有效像素点的深度和对应的真值图像中对应像素点的深度真值作差。(反投影到2D平面进行对稠密深度值的度量)
- 评估方式6:不使用欧式距离而使用Hausdorff距离
至于定性评估,几乎每个做稠密建图的论文都有,其目的也是为了突出自身在某些特殊场景下的优越性,所以就不单列了。
[1] A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM
[1] Handa, Ankur, et al. "A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM." 2014 IEEE international conference on Robotics and automation (ICRA). IEEE, 2014.
评估方式:1 //johntodo
ICL-NUIM数据集的论文。文章中VI.B Error metrics的相关内容:
"We quantify the accuracy of surface reconstruction by using the “cloud/mesh” distance metric provided by CloudCompare. The process involves firstly coarsely aligning the reconstruction with the source model by manually selecting point correspondences. From here, the mesh model is densely sampled to create a point cloud model which the reconstruction is finely aligned to using ICP. Finally, for each vertex in the reconstruction, the closest triangle in the model is located and the perpendicular distance between the vertex and closest triangle is recorded. Five standard statistics are computed over the distances for all vertices in the reconstruction: Mean, Median, Std., Min and Max. We provide a tutorial on executing this process at http://www.youtube.com/watch?v=X9gDAElt8HQ."
(我们通过使用CloudCompare提供的“cloud/mesh”距离度量来量化表面重建的精度。该方法首先通过手工选择点对应关系,对重构结果与源模型进行粗对齐。在此基础上,对网格模型进行密集采样,创建点云模型,利用ICP对其进行精细对齐重建。最后,对重建中的每个顶点,定位模型中最近的三角形,并记录该顶点与最近三角形的垂直距离。在重建过程中,计算所有顶点的距离的五个标准统计数据:均值、中值、Std、最小值和最大值。我们提供关于执行此过程的教程:http://www.youtube.com/watch?v=X9gDAElt8HQ)
(1.先选中两个点云,2.点击Tools->Distance->cloud/cloud,3.选择颜色进行计算)
所以和自动计算位姿轨迹精度的工具不同,ICL-NUIM是建议手工提供点云模型对齐初始值,然后自动ICP对准,最后通过计算点云-三角片的距离,通过统计值表征建图准确程度。视频中也提供了完整实现这个过程的的演示,后面很多文章的定量度量都是参考ICL-NUIM数据集的评估方式设计的。
[2] ElasticFusion: Dense SLAM without a pose graph
[2] Whelan, Thomas, et al. "ElasticFusion: Dense SLAM without a pose graph." Robotics: Science and Systems, 2015.
评估方式:1(每个重建点到最近的GT表面的平均距离)
评价结果表格/附图等,见本回答图片,下同。
[3] Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration
[3] Dai, Angela, et al. "Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration." ACM Transactions on Graphics (ToG) 36.4 (2017): 1.
评估方式:1
[4] Robust reconstruction of indoor scenes
[4] Choi, Sungjoon, Qian-Yi Zhou, and Vladlen Koltun. "Robust reconstruction of indoor scenes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
评估方式:1
“To evaluate surface reconstruction accuracy on ICL-NUIM scenes we use the error measures proposed by Handa et al., specifically the mean and median of the distances of the reconstructed surfaces to the ground-truth surfaces.
..
Note that this is a direct evaluation of the metric accuracy of reconstructed models.”
[5] Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects
[5] Runz, Martin, Maud Buffier, and Lourdes Agapito. "Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects." 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2018.
评估方式:2(热图展示)
“The average 3D error for the bleach bottle was 7.0mm with a standard deviation of 5.8mm (where the GT bottle is 250mm tall and 100mm across).”
和ElasticFusion、BundleFusion不同,MaskFusion没有使用单一的一个指标去度量建图准确程度,而是使用热图的方式将定量的评估结果进行定性地展示。
[6] Real-time non-rigid reconstruction using an RGB-D camera
[6] Zollhöfer, Michael, et al. "Real-time non-rigid reconstruction using an RGB-D camera." ACM Transactions on Graphics (ToG) 33.4 (2014): 1-12.
虽然并不是针对SLAM问题设计的重建系统,但是之前看到一些其他的RGB-D SLAM系统将它作为对比,所以还是放在这里了。
“The figure compares renderings of the original mesh and our reconstruction, as well as plots of the deviation for three frames of the animation, where red corresponds to a fitting error of 3mm. ”
这里只是说了是和真值进行对比,但是并没有仔细介绍这里的fitting error 是怎么来的。另外这里的图10放的是人脸的热图,怎么说呢,我是看了之后san值狂掉……为了保护各位眼睛这里就不放了,感兴趣的小伙伴可以去论文里面找一找。
[7] Real-time large-scale dense RGB-D SLAM with volumetric fusion
[7] Whelan, Thomas, et al. "Real-time large-scale dense RGB-D SLAM with volumetric fusion." The International Journal of Robotics Research 34.4-5 (2015): 598-626.
评估方式:2+3+5 //johntodo
- 评估方式2:重建模型与真值模型对比,绘制热图,将定量的评估结果,进行定性地展示;
- 评估方式3:重建模型与真值模型对比,统计不同距离区间内点的分布情况。
- 评估方式5:对于每一帧,使用该帧的位姿去“观测”重建的模型,得到所谓“Predicted Image”对深度图和彩色图(即将模型反投影到对应相机位姿的图像平面),然后将这深度图像上有效像素点的深度和对应的真值图像中对应像素点的深度真值作差。(反投影到2D平面进行对稠密深度值的度量)
“Given that both maps lie in the global coordinate frame we can iteratively minimize nearest-neighbor point-wise correspondences between the two maps using standard point-to-plane ICP. ... We measure the remaining root–mean–square residual error between point correspondences as the residual similarity error between the two maps. ”(考虑到两个map都位于全局坐标系中,我们可以使用标准点到平面ICP,迭代最小化两个map之间的最近邻点对。我们将点对之间剩余的均方根残差作为两个map之间的残差相似度误差。)
不过上面的这个热图是两种不同方式构建地图之间的对比;但也能够差不多说明“评价建图精度”的方式。
"However, each RGB-D frame does have ground truth depth information which we compare against. For each frame in a dataset we compute a histogram of the perdepth-pixel L1-norm error between the ground truth depth map and the predicted surface depth map raycast from the TSDF, normalizing by the number of valid pixels before aligning all histograms into a two-dimensional area plot."
(然而,每个RGB-D帧都有我们比较的真实深度信息。对于数据集中的每一帧,我们计算真实深度图和来自TSDF的预测深度图之间的纵深像素L1范数误差的直方图,在将所有直方图对齐到二维面积图之前,通过有效像素的数量归一化)。
[8] Co-fusion: Real-time segmentation, tracking and fusion of multiple objects
[8] Rünz, Martin, and Lourdes Agapito. "Co-fusion: Real-time segmentation, tracking and fusion of multiple objects." 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017.
评估方式:2+3
- 评估方式2:绘制热图,将定量的评估结果,进行定性地展示;
- 评估方式3:统计不同距离区间内点的分布情况。
"This error is strongly conditioned on the tracking, but nicely highlights the quality of the overall system. For each surfel in the unified map of active models, we compute the distance to the closest point on the ground-truth meshes, after aligning the two representations. Figure 8 visualizes the reconstruction error as a heat-map and highlights differences to Elastic-Fusion. For the real scene Esone1 we computed the 3D reconstruction errors of each object independently. The results are shown in Table I and Figure 10."
(这个错误很大程度上取决于跟踪,但很好地突出了整个系统的质量。 对于活动模型的统一地图中的每个surfel(点元/面元),我们在对齐两个模型后,计算到真值mesh上最近点的距离。 图8将重建误差可视化为热图,并突出了与弹性融合的差异。 对于真实场景Esone1,我们分别计算了每个物体的三维重建误差。 结果如表1和图10所示。)
也是使用热图,将定量的评估结果进行定性展示。//johntodo
这表格中还额外针对误差超过1cm和5cm的点的比例进行了额外的计算和对比,这也是可以从一个侧面反映出建图精度。
[9] Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals
[9] Palazzolo, Emanuele, et al. "Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals." arXiv preprint arXiv:1905.02082 (2019).
评估方式:2+3
- 评估方式2:绘制热图,将定量的评估结果,进行定性地展示;
- 评估方式3:统计不同距离区间内点的分布情况。
"We compare the models built by our algorithm and by StaticFusion [14] for the sequences crowd3 and removing nonobstructing box w.r.t. the ground truth. For each point of the evaluated model, we measure its distance from the ground truth."
"For a quantitative evaluation, Fig. 11 shows the cumulative percentage of points at a certain distance from the ground truth for the models of the two considered sequences. The plots show in both cases that the reconstructed model by our approach is more accurate."
(为了进行定量评价,图11显示了两个序列的模型在距离真值一定距离处的点的累积百分比。在这两种情况下,用我们的方法重建的模型更准确)
[10] Kimera: an open-source library for real-time metric-semantic localization and mapping
[10] Rosinol, Antoni, et al. "Kimera: an open-source library for real-time metric-semantic localization and mapping." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.
评估方式:2+4
- 评估方式2:绘制热图,将定量的评估结果,进行定性地展示;
- 评估方式4:当前重建模型中的每个点到真值模型中的最近点的平均距离来评估准确性,真值到重建模型的最近点的平均距离,来得到建图完整性(?)。
"...and (iii) we evaluate the average distance from ground truth point cloud to its nearest neighbor in the estimated point cloud (accuracy), and vice-versa (completeness)." (我们评估地面真值点云到估计点云中最近邻的平均距离(准确性),反之亦然(完整性)。)
Kimera并没有使用像ElasticFusion、BundleFusion一样使用一个单一的数值,来描述其建图的准确度和完整度,而是使用热图表达形式进行定性地展示。
[11] Densifying sparse vio: a mesh-based approach using structural regularities
[11] Rosinol, Antoni. Densifying sparse vio: a mesh-based approach using structural regularities. MS thesis. ETH Zurich; Massachusetts Institute of Technology (MIT), 2018.
评估方式:1+4
准确度:
"With the newly registered point cloud, we can compute a cloud to cloud distance to assess the accuracy of the mesh relative to the ground-truth point cloud. Used Approach We compute the cloud to cloud absolute distance using the nearest neighbour distance. For each point of the estimated cloud from the mesh, we search the nearest point in the reference cloud and compute their Euclidean distance."
(使用新注册的点云,我们可以计算点云到点云的距离,以评估网格相对于地真点云的准确性。 我们使用最近邻距离计算点云到点云的绝对距离。 对于网格中估计点云的每个点,我们搜索参考云中最近的点,并计算它们的欧氏距离。” )
完整性:
"Similarly to the accuracy, we define the completeness of the mesh as the percentage of points within a threshold of the ground truth." (与准确性类似,我们将网格的完整性定义为距离真值小于一个阈值的内点的百分比 )
Kimera的这两项指标计算也是参考了这篇论文。
[12] Fusion4d: Real-time performance capture of challenging scenes
[12] Dou, Mingsong, et al. "Fusion4d: Real-time performance capture of challenging scenes." ACM Transactions on Graphics (TOG) 35.4 (2016): 1-13.
评估方式:3+6
"In Fig. 15, we compare to the dataset of [Collet et al. 2015] for a sequence with extremely high motions. The figure compares renderings of the original meshes and multiple reconstructions, where red corresponds to a fitting error of 15mm. In particular, we compare our method with [Zollhöfer et al. 2014] and [Newcombe et al. 2015], showing our superior reconstructions in these challenging situations."
(我们比较了[Collet et al. 2015]的数据集,以获得具有极高运动的序列。 图中对比了原始网格和多次重建的效果图,其中红色表示拟合误差为15mm。 特别是,我们将我们的方法与[Zollhöfer等人2014]和[纽科姆等人2015]进行了比较,显示了我们在这些具有挑战性的情况下的优越重建。)
使用平均 Hausdorff distance 作为距离度量.