【定位系列论文阅读】-Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition（二）

最新推荐文章于 2024-08-07 23:01:09 发布

醉酒柴柴

最新推荐文章于 2024-08-07 23:01:09 发布

阅读量882

点赞数 19

文章标签：论文阅读学习笔记

本文链接：https://blog.csdn.net/weixin_46050242/article/details/135403370

版权

这里写目录标题

Discussion and Conclusion 讨论与结论
- 第一段（介绍本文新提出的方法）
- 第二段（展望未来）

4.4. Comparison to State-of-the-art Methods 与最先进方法的比较

第一段（与四种基准方案比较）

We compare against several benchmark localization solutions: AP-GEM [56], DenseVLAD [76], NetVLAD [3], and SuperGlue [59].
我们比较了几种基准定位解决方案:AP-GEM[56]、DenseVLAD[76]、NetVLAD[3]和SuperGlue[59]。

In the recently proposed AP-GEM,the Average Precision is directly optimized via GeneralizedMean pooling and a listwise loss. The key idea behind DenseVLAD is to densely sample SIFT features across the imageat four different scales, and then aggregate the SIFT features using intra-normalized VLAD.
在最近提出的AP-GEM中，通过广义均值池和列表损失直接优化平均精度。DenseVLAD背后的关键思想是在四个不同的尺度上对图像上的SIFT特征进行密集采样，然后使用归一化VLAD对SIFT特征进行聚合。

第二段（SuperGlue作为基线）

Finally, we propose an optimistic baseline that utilizes SuperGlue [59] as a VPR system.
最后，我们提出了一个乐观的基线，利用SuperGlue[59]作为VPR系统。

SuperGlue elegantly matches features extracted with SuperPoint [18] using a graph neural network that solves an assignment optimization problem.
SuperGlue使用解决分配优化问题的图神经网络，优雅地匹配从SuperPoint提取的特征[18]。

While originally proposed for homography and pose estimation, our proposed SuperGlue VPR baseline achieves what would have been state-of-the-art performance second to Patch-NetVLAD.
虽然最初提出的是单应性和姿态估计，但我们提出的SuperGlue VPR基线实现了仅次于Patch-NetVLAD的最先进性能。

As in Patch-NetVLAD, we provide SuperGlue with the same k = 100 candidate images extracted using vanilla NetVLAD, and re-rank the candidates by the number of inlier matches.
与Patch-NetVLAD一样，我们为SuperGlue提供使用vanilla NetVLAD提取的相同的k = 100候选图像，并根据内部匹配的数量对候选图像进行重新排序。

第三段（Patch-NetVLAD和基线方法的定量比较）

Table 1 and Fig. 3 contain quantitative comparisons of Patch-NetVLAD and the baseline methods.
表1和图3包含Patch-NetVLAD和基线方法的定量比较。
在这里插入图片描述

Figure 3. Comparison with state-of-the-art. We show the Recall@N performance of Ours (Multi-RANSAC-PatchNetVLAD) compared to AP-GEM [56], DenseVLAD [76], NetVLAD [3] and SuperGlue [59], on the Mapillary validation set.
图3。与最先进的比较。我们展示了我们的(Multi-RANSAC-PatchNetVLAD)在Mapillary验证集上与AP-GEM[56]、DenseVLAD[76]、NetVLAD[3]和SuperGlue[59]相比的Recall@N性能。

Patch-NetVLAD outperforms the best performing global descriptor methods, NetVLAD, DenseVLAD, and AP-GEM, on average
by 17.5%, 14.8%, and 22.3% (all percentages stated are absolute differences for R@1) respectively.
Patch-NetVLAD比表现最好的全局描述符方法NetVLAD、DenseVLAD和AP-GEM平均分别高出17.5%、14.8%和22.3%(所有百分比都是R@1的绝对差异)。

The differences are particularly pronounced in datasets with large appearance variations, i.e. Nordland and Extended CMU Seasons (both
seasonal changes), Tokyo 24/7 (including images captured at night time) and RobotCar Seasons as well as Mapillary (both seasonal changes and night time imagery). On the Nordland dataset, the difference between Patch-NetVLAD and the original NetVLAD is 34.5%.
在具有较大外观变化的数据集中，即Nordland和Extended CMU季节(均为季节性变化)，东京24/7(包括夜间捕获的图像)和RobotCar季节以及Mapillary(包括季节性变化和夜间图像)，差异尤为明显。在Nordland数据集上，Patch-NetVLAD与原始NetVLAD的差异为34.5%。

第四段（比较 PatchNetVLAD 和 SuperGlue ）

A similar trend can be seen when comparing PatchNetVLAD with SuperGlue. Patch-NetVLAD performs on average 3.1% better (a relative increase of 6.0%), which demonstrates that Patch-NetVLAD outperforms a system that benefits from both learned local feature descriptors
and a learned feature matcher.
在比较 PatchNetVLAD 和 SuperGlue 时，也可以看到类似的趋势。Patch-NetVLAD 的性能平均提高了 3.1%（相对提高了 6.0%），这表明 Patch-NetVLAD 的性能优于同时受益于学习的本地特征描述器和学习的特征匹配器的系统，和学习特征匹配器的系统。

We hypothesize that PatchNetVLAD’s performance could further benefit from SuperGlue’s learned matcher and discuss this opportunity further in Section 5.
我们假设 PatchNetVLAD 的性能可以进一步受益于 SuperGlue 的学习匹配器，并在第 5 节中进一步讨论这一机会。

SuperGlue is a landmark system and our approach does not beat it in every case, with SuperGlue edging slightly ahead on R@1 and R@5 on Tokyo 24/7 and on R@5 and R@10 on Pittsburgh.
SuperGlue 是一个具有里程碑意义的系统，我们的方法并非在每种情况下都能击败它，SuperGlue 在东京 24/7 的 R@1 和 R@5 以及匹兹堡的 R@5 和 R@10 略微领先。

Patch-NetVLAD’s performance edge is particularly significant when large appearance variations are encountered in unseen environments – not typically used for training local feature methods like SuperGlue (or underlying Superpoint).
Patch-NetVLAD 的性能优势在未见环境中遇到较大外观变化时尤为明显，而这种环境通常不用于训练 SuperGlue（或底层 Superpoint）等局部特征方法。

Thus, Patch-NetVLAD achieves superior performance on Nordland with an absolute percentage difference of 15.8%.
因此，Patch-NetVLAD 在诺德兰德的性能更优，绝对百分比差异为 15.8%。

Interestingly, the perfor-mance difference between Patch-NetVLAD and SuperGlue increases with increasing N – from 1.1% for R@1 to 5.1% for R@25 on the Mapillary dataset (Fig. 3).
有趣的是，Patch-NetVLAD 和 SuperGlue 的性能差异随着 N 的增加而增大–在 Mapillary 数据集上，从 R@1 的 1.1% 到 R@25 的 5.1%（图 3）。
在这里插入图片描述

第五段（Patch-NetVLAD优势）

Patch-NetVLAD won the Mapillary Challenge at the ECCV2020 workshops (not yet publicly announced to comply with CVPR’s double-blind policy), with Table 1 showing that Patch-NetVLAD outperformed the baseline method, NetVLAD, by 13.0% (absolute R@1 increase) on the withheld test dataset.
Patch-NetVLAD在ECCV2020研讨会上赢得了Mapillary Challenge(尚未公开宣布遵守CVPR的双盲政策)，表1显示，在保留的测试数据集上，Patch-NetVLAD的性能优于基准方法NetVLAD 13.0%(绝对增加R@1)。
在这里插入图片描述

The test set was more challenging than the validation set (48.1% R@1 and 79.% R@1 respectively; note that no fine-tuning was performed on any of the datasets), indicating that the Mapillary test set is a good benchmarking target for further research compared to “near-solved" datasets like Pittsburgh and Tokyo 24/7, where both PatchNetVLAD and SuperGlue achieve near perfect performance.
测试集比验证集更具挑战性(48.1% R@1和79。% R@1;注意，没有对任何数据集进行微调)，这表明与匹兹堡和东京24/7等“接近解决”的数据集相比，Mapillary测试集是一个很好的基准测试目标，在这些数据集中，PatchNetVLAD和SuperGlue都达到了近乎完美的性能。

第六段（本文位置匹配正确率高）

In Fig. 5 we show a set of examples images, illustrating the matches retrieved with our method compared to NetVLAD and SuperGlue, along with the patch matches that our algorithm detects.
在图5中，我们展示了一组示例图像，说明了与NetVLAD和SuperGlue相比，我们的方法检索到的匹配，以及我们的算法检测到的补丁匹配。
在这里插入图片描述
Figure 5. Qualitative Results. In these examples, the proposed Patch-NetVLAD successfully retrieves the matching reference image, while both NetVLAD and SuperGlue produce incorrect place matches. The retrieved image with our approach on the Tokyo 24/7 dataset is a particularly challenging match, with a combination of day vs night-time, severe viewpoint shift and occlusions.
图5。定性的结果。在这些例子中，Patch-NetVLAD成功地检索了匹配的参考图像，而NetVLAD和SuperGlue都产生了错误的位置匹配。使用我们的方法在东京24/7数据集上检索的图像是一个特别具有挑战性的匹配，结合了白天和夜间，严重的视点移动和遮挡。

4.5. Ablation Studies 烧蚀研究

第一段（单尺度和空间评分:）

Single-scale and Spatial Scoring: To analyze the effectiveness of Patch-NetVLAD, we compare with the following variations: 1) Single-RANSAC-Patch-NetVLAD uses a single patch size (i.e. 5) instead of multi-scale fusion.
单尺度和空间评分:为了分析patch - netvlad的有效性，我们比较了以下变化:1)single - ransac - patch - netvlad使用单个补丁大小(即5)而不是多尺度融合。

Single-Spatial-Patch-NetVLAD employs a simple but rapid spatial verification method applied to a single patch size (see Section 3.4).
single - spatial - patch - netvlad采用一种简单但快速的空间验证方法，适用于单个补丁大小(见第3.4节)。
Multi-Spatial-Patch-NetVLAD uses the same rapid spatial verification method, however applied to three patch sizes rather than a single patch size as in the previous variant.
Multi-Spatial-Patch-NetVLAD使用同样的快速空间验证方法，但与之前的变体不同，它适用于三个补丁大小，而不是单个补丁大小。

第二段（这三种变化的对比）

The comparison results with these three variations are shown in Table 2.
这三种变化的对比结果如表2所示。

The following numeric results are based on R@1 (recall@1) – the conclusions generally apply to R@5 and R@10 as well.
下面的数值结果是基于R@1 (recall@1)的——结论通常也适用于R@5和R@10。

Our proposed multi-fusion approach (Multi-RANSAC-Patch-NetVLAD) performs on average 2.0% better than Single-RANSAC-Patch-NetVLAD, demonstrating that a fusion of multiple patch sizes signifi- cantly improves task performance.
我们提出的多融合方法(Multi-RANSAC-Patch-NetVLAD)的性能比Single-RANSAC-Patch-NetVLAD平均提高2.0%，这表明多补丁大小的融合显著提高了任务性能。

Our approach also provides some compelling options for compute-constrained applications; our rapid spatial verification approach is 2.9 times faster on a single patch size (Single-Spatial-PatchNetVLAD), with only a 0.6% performance reduction.
我们的方法还为计算受限的应用程序提供了一些引人注目的选择;我们的快速空间验证方法在单个补丁大小(single - spatial - patchnetvlad)上快2.9倍，性能仅降低0.6%。

Rapid spatial verification applied to multiple patch sizes (MultiSpatial-Patch-NetVLAD) is 3.1 times faster, with only a 1.1% performance degradation.
应用于多个补丁大小的快速空间验证(MultiSpatial-Patch-NetVLAD)速度提高了3.1倍，性能仅下降了1.1%。

在这里插入图片描述

第三段（斑块描述符维度）

Patch Descriptor Dimension: In addition to disabling multi-scale fusion and using our rapid spatial scoring method, the descriptor dimension can be arbitrarily reduced using PCA (as with the original NetVLAD).
斑块描述符维度：除了禁用多尺度融合和使用我们的快速空间评分方法外，还可以使用 PCA 任意降低描述符维度（与最初的 NetVLAD 一样）。

Here, we choose DPCA = {128, 512, 2048, 4096}.
在此，我们选择 DPCA = {128, 512, 2048, 4096}。

Fig. 4 shows the number of queries that can be processed per second by various con-figurations1 , and the resulting R@1 on the Mapillary validation set.
图 4 显示了不同配置1 每秒可处理的查询次数，以及在 Mapillary 验证集上的 R@1 结果。

在这里插入图片描述 Figure 4. Computational time requirements.
图4。计算时间要求。
The time taken to process one query image is shown on the x-axis, with the resulting R@1 shown on the y-axis, for the Mapillary dataset.
对于Mapillary数据集，处理一个查询图像所需的时间显示在x轴上，生成的R@1显示在y轴上。
Our pipeline enables a range of system configurations that achieve different performance and computational balances that either outperform or are far faster than current state-of-the-art.
我们的管道能够实现一系列系统配置，以实现不同的性能和计算平衡，这些性能和计算平衡要么优于当前最先进的技术，要么快得多。

Our proposed Multi-RANSAC-Patch-NetVLAD in a performance-focused configuration (red star in Fig. 4) achieves 1.1% higher recall than SuperGlue (yellow dot) while being slightly (3%) faster.
我们提出的 Multi-RANSAC-Patch-NetVLAD 采用注重性能的配置（图 4 中的红星），其召回率比 SuperGlue（黄点）高出 1.1%，同时速度略快（3%）。

A balanced configuration (orange triangle) is more than 3 times faster than SuperGlue with comparable performance, while a speed-oriented con- figuration (blue triangle) is 15 times faster at the expense of just 0.6% and 1.7% recall when compared to SuperGlue and our performance-focused configuration respectively.
均衡配置（橙色三角形）在性能相当的情况下比 SuperGlue 快 3 倍多，而速度导向配置（蓝色三角形）则比 SuperGlue 快 15 倍，但召回率却分别比 SuperGlue 和我们的性能导向配置低 0.6% 和 1.7%。

A storage-focused configuration (DPCA = 128) still largely outperforms NetVLAD while having similar memory requirements as a SIFT-like descriptor.
以存储为重点的配置（DPCA = 128）在很大程度上仍优于 NetVLAD，但其内存需求与 SIFT 类描述符相似。

Our speed-oriented and storage-focused configurations provide practical options for applications like time-critical robotics.
我们以速度为导向的配置和以存储为重点的配置为时间紧迫的机器人等应用提供了实用的选择。

Our approach can also run on consumer GPUs, with our performance configuration requiring 7GB GPU memory (batch-size of 5).
我们的方法也可以在消费级 GPU 上运行，我们的性能配置需要 7GB GPU 内存（批量大小为 5）。

4.6. Further analysis 进一步分析

We further study the robustness of our approach to the choice of hyperparameters.
我们进一步研究了超参数选择方法的鲁棒性。

In Fig. 6 (left) we highlight that Single-Patch-NetVLAD is robust to the choice of the patch size dp: The performance gradually decays from a peak at dp = 4. Fig.
在图6(左)中，我们强调Single-Patch-NetVLAD对patch大小dp的选择具有鲁棒性:性能从dp = 4的峰值开始逐渐衰减。

6 (right) similarly shows that Patch-NetVLAD is robust to the convex combination of the multi-patch fusion in terms of the patch sizes that are fused.
图6(右)同样显示了patch - netvlad在融合的补丁大小方面对多补丁融合的凸组合具有鲁棒性

The Supplementary Material provides additional ablation studies, including matching across different patch sizes, complementarity of patch sizes and comparison to other pooling strategies.
补充材料提供了额外的消融研究，包括不同补片大小的匹配，补片大小的互补性以及与其他池化策略的比较。

在这里插入图片描述
Figure 6. Robustness studies for single patch sizes and combined patch sizes.
图6。单个补丁大小和组合补丁大小的稳健性研究。
Left: Recall performance of Single-PatchNetVLAD with varying patch size, using the Mapillary validation dataset.
左图:使用Mapillary验证数据集，使用不同补丁大小的Single-PatchNetVLAD的召回性能。
Performance gradually degrades around a peak at dp = 4.
在dp = 4的峰值附近，性能逐渐下降。
The smallest and largest patch sizes perform most poorly, indicating that both local features and global areas are inferior to intermediate size features.
最小和最大的斑块尺寸表现最差，表明局部特征和全局区域都不如中等大小的特征。
An additional issue with large patch sizes is that there are too few patches for effective spatial verification.
大补丁大小的另一个问题是，补丁太少，无法进行有效的空间验证。
Right: Recall performance of Multi-Patch-NetVLAD against an indicative measure of cumulative patch dimensions.
右:Multi-Patch-NetVLAD对累积补丁尺寸的指示性测量的召回性能。
Our proposed combination of patch dimensions 2, 5 and 8 corresponds to an x-axis value of 15; data points to the left show a reduction in the cumulative patch dimension (e.g. P i dp,i = 14 for patch sizes 1, 5 and 8; sizes 2, 4 and 8; and sizes 2, 5 and 7) and so forth; and similarly for increasing patch size combinations to the right.
我们提出的斑块尺寸2、5和8的组合对应的x轴值为15;左侧的数据点显示了累积斑块维度的减少(例如:P i dp,i = 14，对于大小为1、5和8的patch;尺寸2,4和8;尺寸为2,5和7)等等;同样地，向右增加斑块大小组合。
As for variations on the single patch size, performance gracefully degrades around the peak and remains high over a large range.
至于单个补丁大小的变化，性能在峰值附近优雅地下降，并在较大范围内保持较高。

Discussion and Conclusion 讨论与结论

第一段（介绍本文新提出的方法）

In this work we have proposed a novel locally-global feature descriptor, which uses global descriptor techniques to further improve the appearance robustness of local descriptors.
在这项工作中，我们提出了一种新颖的局部全局特征描述符，它利用全局描述符技术来进一步提高局部描述符的外观鲁棒性。

Unlike prior keypoint-based local feature descriptors [41, 18], our approach considers all the visual content within a larger patch of the image, using techniques that facilitate further performance improvements through an ef- ficient multi-scale fusion of patches
与之前基于关键点的局部特征描述符 [41, 18]不同的是，我们的方法考虑了图像中更大补丁内的所有视觉内容，通过有效的多尺度补丁融合技术进一步提高了性能。

Our proposed PatchNetVLAD’s average performance across key benchmarks is superior by 17.5% over the original NetVLAD, and by 3.1% (absolute recall increase) over the state-of-the-art SuperPoint and SuperGlue-enabled VPR pipeline.
我们提出的 PatchNetVLAD 在主要基准测试中的平均性能比原始 NetVLAD 高出 17.5%，比最先进的 SuperPoint 和 SuperGlue VPR 管道高出 3.1%（绝对召回率提高）。

Our experiments reveal an inherent benefit to fusing multiple patch sizes simultaneously, where the fused recall is greater than any single patch size recall, and provide a means by which to do so with minimal computational penalty compared to single scale techniques.
我们的实验揭示了同时融合多种补丁尺寸的固有优势，即融合后的召回率高于任何单一补丁尺寸的召回率，并提供了一种与单一规模技术相比计算量最小的方法。

第二段（展望未来）

While this demonstration of Patch-NetVLAD occurred in a place recognition context, further applications and extensions are possible.
虽然Patch-NetVLAD的演示发生在位置识别上下文中，但进一步的应用程序和扩展是可能的。

One avenue for future work is the following: while we match Patch-NetVLAD features using mutual nearest neighbors with subsequent spatial verification using RANSAC, recent deep learned matchers [59, 85] could further improve the global re-localization performance
of the algorithm.
未来工作的一个途径是:当我们使用相互最近邻居匹配Patch-NetVLAD特征并随后使用RANSAC进行空间验证时，最近的深度学习匹配器[59,85]可以进一步提高算法的全局重新定位性能。

Although our method is by no means biologically inspired, it is worth noting that the brain processes visual information over multiple receptive fields [33].
虽然我们的方法绝不是受生物学启发，但值得注意的是，大脑在多个接受野上处理视觉信息[33]。

As a result, another potentially promising direction for future research is to explore and draw inspiration from how the task of visual place recognition, rather than the more commonly studied object or face recognition tasks, is achieved in the brain.
因此，未来研究的另一个潜在的有希望的方向是探索并从视觉位置识别任务中获得灵感，而不是更常见的研究对象或面部识别任务，是如何在大脑中实现的。

Finally, another line of work could consider the correlation between the learned VLAD clustering and semantic classes (e.g. car, pedestrian, building), in order to identify and remove patches that contain dynamic objects.
最后，另一项工作可以考虑学习到的VLAD聚类和语义类(例如汽车，行人，建筑物)之间的相关性，以识别和删除包含动态对象的补丁。

后续还有更具体的实验补充

醉酒柴柴

关注

19
点赞
踩
21

收藏

觉得还不错? 一键收藏
0
评论
【定位系列论文阅读】-Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition（二）

Global Feature Match—— for 粗定位
复制链接

扫一扫