论文翻译及笔记 --"Visual Place Recognition: A Survey"

SLAM 专栏收录该内容
10 篇文章 2 订阅


Visual place recognition is a challenging problem due to the vast range of ways in which the appearance of real-world places can vary. In recent years, improvements in visual sensing capabilities, an ever-increasing focus on long-term mobile robot autonomy, and the ability to draw on state-of-the-art research in other disciplines—particularly recognition in computer vision and animal navigation in neuroscience—have all contributed to signifificant advances in visual place recognition systems. This paper presents a survey of the visual place recognition research landscape. We start by introducing the concepts behind place recognition—the role of place recognition in the animal kingdom, how a “place” is defifined in a robotics context, and the major components of a place recognition system. Long-term robot operations have revealed that changing appearance can be a signifificant factor in visual place recognition failure; therefore, we discuss how place recognition solutions can implicitly or explicitly account for appearance change within the environment. Finally, we close with a discussion on the future of visual place recognition, in particular with respect to the rapid advances being made in the related fifields of deep learning, semantic scene understanding, and video description.

Index Terms—Visual place recognition, place recognition.


visual place recognition is a well-defined but extremely challenging problem to solve in the general sense; given an image of a place, can a human, animal, or robot decide whether or not this image is of a place it has already seen? Whether referring to humans, animals, computers, or robots, there are some fundamental things a place recognition system must have and must do. First, a place recognition system must have an internal representation—a map—of the environment to compare to the incoming visual data. Second, the place recognition system must report a belief about whether or not the current visual information is from a place already included in the map, and if so, which one. Performing visual place recognition can be difficult due to a range of challenges; the appearance of a place can change drastically (see Fig. 1), multiple places in an environment may look very similar, a problem known as perceptual aliasing, and places may not always be revisited from the same viewpoint and position as before.




Fig. 1. Visual place recognition systems must be able to (a) successfully match very perceptually different images while (b) also rejecting incorrect matches between aliased image pairs of different places.

图1 视觉场景识别系统要能够成功匹配感知上不同的图像,并拒绝不同场景混叠图像对的匹配。



This paper discusses what qualifies as a place in the context of robotic navigation. It then looks at the three key modules that make up the place recognition system: the image processing module, the mapping framework, and the belief generation module. The paper then turns to the problem of changing environments. It revisits each of the modules—the image processing module, the mapping module, and the belief generation module—and investigates how each has to be adapted to incorporate the notion of appearance change into the place recognition system’s model of the world.





The defifinition of a place depends on the navigation context, and may either be considered as a precise position—“a place describes part of the environment as a zero-dimensional point” (see [26]), or as a larger area—“a place may also be defifined as the abstraction of a region” where a region “represents a two-dimensional subset of the environment” (see [26]). For example, a room in a building might in some cases qualify as a single place, while in other cases, it might contain many different places. A region could also be defifined as a 3-D area, depending on the requirements of the environment or robot. Unlike a robot pose, a place does not have an orientation, and an ongoing challenge in place recognition is pose invariance— ensuring recognition regardless of the orientation of the robot within the place.

1、根据navigation context,place可以定义为一个明确的position,即环境中的一个0维点;也可以是一个抽象的region,即表示环境的一个二维/三维子集。
2、place recognition is pose invariance,即不随机器人姿态而改变,这是正面临的挑战。

The location of each place—whether a 1-D point or a larger region—can be selected based on spatial or temporal density. In this approach, a new place is added according to a particular time step, or when the robot has travelled a certain distance. Alternatively, a place can be defined in terms of its appearance. Kuipers and Byun [25] defined a place as somewhere distinctive relative to other nearby locations, according to some associated sensory information known as a place signature or place description. While the distinctiveness criterion is not always required, a topological place is defined as having a certain appearance con- figuration [45], [46] and the physical bounds of a place occur where the appearance changes significantly, called a “gateway” [47].

1、Alternatively 或者,qualitative 定性地 ,
4、topological place 有一定的外观配置??

This qualitative concept of topological places as regions that are visually homogeneous needs to be quantified—that is, how can a place recognition system actually segment the world into distinct places? Ranganathan [48] noted that there are similarities with the problem of change-point detection in video segmentation [49], [50], and used change-point detection algorithms such as Bayesian surprise [50] and segmented regression [51] to define places within a topological map [48], [52]. A new place is created when the appearance of the environment, determined from the sensor measurements, becomes sufficiently different from the current model of the environment. Similarly, Korrapati et al. [53] used image sequencing partitioning techniques to group visually similar images together as topological graph nodes, while Chapoulie et al. [54] combined Kalman filtering with the Neyman–Pearson Lemma. Murphy and Sibley [55] combined dynamic vocabulary building [56] and incremental topic modeling [57] to continually learn new topological places in an environment, and Volkov et al. [58] used coresets [59] to segment the environment. Topic modeling, coresets, and Bayesian surprise techniques can also be used for other aspects of robotic navigation, such as summarizing a robot’s past experience [60]–[62], or determining exploration strategies [63].

1、homogeneous 同质的,needs to be quantified 需要被量化
2、本段说明,如何量化topological places,即place recognition system如何将世界划分为不同的distinct places,生成拓扑性质的节点place。
3、这个问题与change-point detection in video segmentation有相似性,change-point detection algorithms:Bayesian surprise and segmented regression
4、用image sequencing partitioning techniques生成拓扑图的节点
5、combined Kalman filtering with the Neyman–Pearson Lemma
6、combined dynamic vocabulary building and incremental topic modeling学习新的节点place
7、used coresets [59] to segment the environment

Appearance-based and density-based place selection methods are practical to implement as they depend on measurable quantities such as distance, time, or sensor values [64]. An ongoing challenge is the enhancement of appearance information with semantic labels such as “door” or “intersection” so places can be selected online based on their value as decision points. The addition of semantic data to maps can improve planning and navigation tasks [65] and requires place recognition to be linked with other recognition and classification tasks, especially scene classification and object recognition. These relationships are symbiotic: place recognition can improve object detection by providing contextual priming for object detection as well as contextual priors for object localization [66], and conversely, object recognition can also aid place recognition [67]–[70], particularly in indoor environments where the function of a place such as “kitchen” or “office” can be inferred from the objects within it, and used to infer the location from a labeled semantic map [71].

1、are practical to implement 很实用,symbiotic 共生的
2、Appearance-based 的方法面临的挑战是,用语义信息来增强外观信息(the enhancement of appearance information with semantic labels)
3、place recognition、scene classification and object recognition 三者是共生关系,相互促进的4、这里感觉和loop closure 没啥关系了,在loop closure里,revisited places是小范围内的,甚至只是一段图像序列内的


Visual place description techniques主要分成两类:一类提取图像中感兴趣或显著的部分;另一类描述整个场景,而并不进行选择。如图
Fig. 4. Visual place description techniques fall into two broad categories. (a) Interesting or salient parts of the image are selected for extraction, description and storage. For example, SURF [73] extracts interest points in an image for description. The circles are interest points selected by SURF within this image. The number of possible features may vary depending on the number of interest points detected in the image. (b) Image is described in a predefined way such as the grid shown here without first detecting interest points. Whole-image descriptors such as Gist [74], [75] then process each block regardless of its content.

A. Local Feature Descriptors

采用局部特征描述子的前提是,检测到关键点,即总共两步:Keypoints detection+Feature Description. 因此有许多基于这两步骤的方法组合。ps:感觉无论是Keypoints detection还是Feature Description,都有很多方法,且比较成熟了,不知道还有木有可以创新的地方?

1、The bag-of words model [92], [93] increases efficiency by quantizing local features into a vocabulary that can be compared using text retrieval techniques [94].
2、Images described using the bag-of-words model can be efficiently compared using binary string comparison such as a Hamming distance or histogram comparison techniques.
3、Vocabulary trees [95] can make the process for large-scale place recognition even more efficient.

1、词袋模型忽略所描述place的几何结构,所以the resulting place description is pose invariant,但如果给place添加几何信息,那么改善place match的鲁棒性,尤其是在动态条件下。
The trade-off between pose invariance— recognizing places regardless of the robot orientation—and condition invariance—recognizing places when the visual appearance changes—has not yet been resolved and is a current challenge in place recognition research.
Nicosevici and Garcia [56] propose an online method to continuously update the vocabulary based on observations.

B. Global Descriptors

Global place descriptors used in early localization systems included color histograms [5] and descriptors based on principal component analysis [104].

Global descriptors can be generated from local feature descriptors by predefining the keypoints in the image—for example, using a grid-based pattern—and then using the chosen feature description method on the preselected keypoints.

A popular whole-image descriptor is Gist [74], [75], which has been used for place recognition on a number of occasions[110]–[113].

C. Describing Places Using Local and Global Techniques

Local and global descriptors 都有各自的优缺点,两个方法的折衷是一个改进方向。
1、Local feature descriptors are not restricted to defining a place only in terms of a previous robot pose, but can be recombined to create new places that have not previously been explicitly observed by the robot.
2、Local features can also be combined with metric information to enable metric corrections to localization [2], [7], [76]. Global descriptors do not have the same flexibility, and furthermore, whole-image descriptors are more susceptible to change in the robot’s pose than local descriptor methods, as whole-image descriptor comparison methods tend to assume that the camera viewpoint remains similar. This problem can be somewhat ameliorated by the use of circular shifts as in [116] or by combining a bag-of-words approach with a Gist descriptor on segments of the image [17], [110].
3、While global descriptors are more pose dependent than local feature descriptors, local feature descriptors perform poorly when lighting conditions change [117] and are comprehensively outperformed by global descriptors at performing place recognition in changing conditions [118], [119]. Using global descriptors on image segments rather than whole images may provide a compromise between the two approaches, as sufficiently large image segments exhibit some of the condition invariance of whole images, and sufficiently small image segments exhibit the pose invariance of local features.

局部特征加上度量信息,可以用于定位的metric corrections

D. Including Three-Dimensional Information in Place Descriptions


in metric localization systems, the appearance-based models must be extended with metric information.


For a place recognition or navigation task, the system needs to refer to a map—a stored representation of the robot’s knowledge of the world—to which the current observation is compared. The map framework differs depending on what data are available and what type of place recognition is being performed. Table I displays a taxonomy of mapping approaches, which depends on the level of physical abstraction in the map and whether or not metric information is included in the place description. The most concrete mapping framework listed is the topological-metric or topometric map. Although it is possible to have a globally metric map, such maps are only feasible in small geographical areas, and there are mechanisms for fusing topometric maps into globally metric maps [140]. Thus, for the purposes of place recognition, any globally metric map can be considered as a one-node topometric map.

表1 用于place recognition 的地图构架
中-place description 的类型: 有描述外观的,含稀疏度量信息的,稠密度量信息的

A. Pure Image Retrieval

The most abstract form of mapping framework for place recognition only stores appearance information about each place in the environment, with no associated position information. Pure image retrieval assumes that matching is based solely on appearance similarity and applies image retrieval techniques from computer vision that are not specific to place-based information [3]. Although valuable information is lost by not including relative position information, there are computationally efficient indexing techniques that can be exploited.

B. Topological Maps

Pure topological maps contain information about relative positions of places but do not store metric information regarding how these places are related [5], [6], [118], [119]. Topological information can be used to both increase the number of correct place matches and filter out incorrect matches [14], [84].

拓扑信息能够用于增加正确的place matches和过滤掉错误的 matches

topological maps can use a location prior to speed up matching, that is, the place recognition system only has to search places known to be close to the robot’s current position.

拓扑地图能够加速place matching

The addition of topological information into the recognition process allows place recognition using low-resolution data and thus lower memory requirements.

拓扑信息能够降低内存需求, Using the sparse convex L1 -minimization formulation

C. Topological-Metric Maps

As image retrieval can be enhanced by adding topological information, topological maps can be enhanced by including metric information—distance, direction, or both—on the map edges.

可以给拓扑地图增加度量(Metric)信息,比如FAB-MAP, SeqSLAM都是纯拓扑系统,但给它们增加里程计的信息,从而性能得到改善,比如CAT-SLAM 和 SMART。


the central goal of any place recognition system is reconciling visual input with the stored map data to generate a belief distribution. This distribution provides a measure of likelihood or confidence that the current visual input matches a particular location in the robot’s map representation of the world.

Belief 是概率机器人学另一个关键概念。Belief 反映了机器人对于环境状态的认知。
一个状态认知分布(belief distribution)会给对应真实状态每个可能的假定值一个概率(或者密度值)。
belief 分布是关于状态变量的后验概率,并且以数据为条件。

A. Place Recognition and Simultaneous Localization and Mapping

B. Topological Place Recognition

C. Evaluation of Place Recognition Systems

采用precision-recall机制进行评估结果,当中,一般先控制precision为100%,再比较recall的高低。即 尽可能地消除假阳性,recall at 100% precision was the key metric for place recognition success.

但是,目前已经提出了几种使用拓扑信息来校正假阳性匹配的方法[189] – [191],并且注意力已经从消除所有假阳性转向找到许多潜在的位置匹配,然后在一个拓扑后处理步骤中的改正任何误匹配。

  • 0
  • 0
  • 0
  • 一键三连
  • 扫一扫,分享海报

©️2021 CSDN 皮肤主题: 书香水墨 设计师:CSDN官方博客 返回首页
钱包余额 0