论文笔记--"Visual Place Recognition: A Survey"
- I. INTRODUCTION
- II. CONCEPT OF PLACE IN ROBOTICS AND THE NATURAL KINGDOM
- III. WHAT IS A PLACE?
- IV. DESCRIBING PLACES: THE IMAGE PROCESSING MODULE
- V. REMEMBERING PLACES: THE MAPPING MODULE
- VI. RECOGNIZING PLACES: THE BELIEF GENERATION MODULE ★
Visual place recognition is a challenging problem due to the vast range of ways in which the appearance of real-world places can vary. In recent years, improvements in visual sensing capabilities, an ever-increasing focus on long-term mobile robot autonomy, and the ability to draw on state-of-the-art research in other disciplines—particularly recognition in computer vision and animal navigation in neuroscience—have all contributed to signifificant advances in visual place recognition systems. This paper presents a survey of the visual place recognition research landscape. We start by introducing the concepts behind place recognition—the role of place recognition in the animal kingdom, how a “place” is defifined in a robotics context, and the major components of a place recognition system. Long-term robot operations have revealed that changing appearance can be a signifificant factor in visual place recognition failure; therefore, we discuss how place recognition solutions can implicitly or explicitly account for appearance change within the environment. Finally, we close with a discussion on the future of visual place recognition, in particular with respect to the rapid advances being made in the related fifields of deep learning, semantic scene understanding, and video description.
Index Terms—Visual place recognition, place recognition.
visual place recognition is a well-defined but extremely challenging problem to solve in the general sense; given an image of a place, can a human, animal, or robot decide whether or not this image is of a place it has already seen? Whether referring to humans, animals, computers, or robots, there are some fundamental things a place recognition system must have and must do. First, a place recognition system must have an internal representation—a map—of the environment to compare to the incoming visual data. Second, the place recognition system must report a belief about whether or not the current visual information is from a place already included in the map, and if so, which one. Performing visual place recognition can be difficult due to a range of challenges; the appearance of a place can change drastically (see Fig. 1), multiple places in an environment may look very similar, a problem known as perceptual aliasing, and places may not always be revisited from the same viewpoint and position as before.
Fig. 1. Visual place recognition systems must be able to (a) successfully match very perceptually different images while (b) also rejecting incorrect matches between aliased image pairs of different places.
This paper discusses what qualifies as a place in the context of robotic navigation. It then looks at the three key modules that make up the place recognition system: the image processing module, the mapping framework, and the belief generation module. The paper then turns to the problem of changing environments. It revisits each of the modules—the image processing module, the mapping module, and the belief generation module—and investigates how each has to be adapted to incorporate the notion of appearance change into the place recognition system’s model of the world.
The defifinition of a place depends on the navigation context, and may either be considered as a precise position—“a place describes part of the environment as a zero-dimensional point” (see ), or as a larger area—“a place may also be defifined as the abstraction of a region” where a region “represents a two-dimensional subset of the environment” (see ). For example, a room in a building might in some cases qualify as a single place, while in other cases, it might contain many different places. A region could also be defifined as a 3-D area, depending on the requirements of the environment or robot. Unlike a robot pose, a place does not have an orientation, and an ongoing challenge in place recognition is pose invariance— ensuring recognition regardless of the orientation of the robot within the place.
2、place recognition is pose invariance，即不随机器人姿态而改变，这是正面临的挑战。
The location of each place—whether a 1-D point or a larger region—can be selected based on spatial or temporal density. In this approach, a new place is added according to a particular time step, or when the robot has travelled a certain distance. Alternatively, a place can be defined in terms of its appearance. Kuipers and Byun  defined a place as somewhere distinctive relative to other nearby locations, according to some associated sensory information known as a place signature or place description. While the distinctiveness criterion is not always required, a topological place is defined as having a certain appearance con- figuration ,  and the physical bounds of a place occur where the appearance changes significantly, called a “gateway” .
1、Alternatively 或者，qualitative 定性地 ，
4、topological place 有一定的外观配置？?
This qualitative concept of topological places as regions that are visually homogeneous needs to be quantified—that is, how can a place recognition system actually segment the world into distinct places? Ranganathan  noted that there are similarities with the problem of change-point detection in video segmentation , , and used change-point detection algorithms such as Bayesian surprise  and segmented regression  to define places within a topological map , . A new place is created when the appearance of the environment, determined from the sensor measurements, becomes sufficiently different from the current model of the environment. Similarly, Korrapati et al.  used image sequencing partitioning techniques to group visually similar images together as topological graph nodes, while Chapoulie et al.  combined Kalman filtering with the Neyman–Pearson Lemma. Murphy and Sibley  combined dynamic vocabulary building  and incremental topic modeling  to continually learn new topological places in an environment, and Volkov et al.  used coresets  to segment the environment. Topic modeling, coresets, and Bayesian surprise techniques can also be used for other aspects of robotic navigation, such as summarizing a robot’s past experience –, or determining exploration strategies .
1、homogeneous 同质的，needs to be quantified 需要被量化
2、本段说明，如何量化topological places，即place recognition system如何将世界划分为不同的distinct places，生成拓扑性质的节点place。
3、这个问题与change-point detection in video segmentation有相似性，change-point detection algorithms：Bayesian surprise and segmented regression
4、用image sequencing partitioning techniques生成拓扑图的节点
5、combined Kalman filtering with the Neyman–Pearson Lemma
6、combined dynamic vocabulary building and incremental topic modeling学习新的节点place
7、used coresets  to segment the environment
Appearance-based and density-based place selection methods are practical to implement as they depend on measurable quantities such as distance, time, or sensor values . An ongoing challenge is the enhancement of appearance information with semantic labels such as “door” or “intersection” so places can be selected online based on their value as decision points. The addition of semantic data to maps can improve planning and navigation tasks  and requires place recognition to be linked with other recognition and classification tasks, especially scene classification and object recognition. These relationships are symbiotic: place recognition can improve object detection by providing contextual priming for object detection as well as contextual priors for object localization , and conversely, object recognition can also aid place recognition –, particularly in indoor environments where the function of a place such as “kitchen” or “office” can be inferred from the objects within it, and used to infer the location from a labeled semantic map .
1、are practical to implement 很实用，symbiotic 共生的
2、Appearance-based 的方法面临的挑战是，用语义信息来增强外观信息（the enhancement of appearance information with semantic labels）
3、place recognition、scene classification and object recognition 三者是共生关系，相互促进的4、这里感觉和loop closure 没啥关系了，在loop closure里，revisited places是小范围内的，甚至只是一段图像序列内的
Visual place description techniques主要分成两类：一类提取图像中感兴趣或显著的部分；另一类描述整个场景，而并不进行选择。如图
Fig. 4. Visual place description techniques fall into two broad categories. (a) Interesting or salient parts of the image are selected for extraction, description and storage. For example, SURF  extracts interest points in an image for description. The circles are interest points selected by SURF within this image. The number of possible features may vary depending on the number of interest points detected in the image. (b) Image is described in a predefined way such as the grid shown here without first detecting interest points. Whole-image descriptors such as Gist ,  then process each block regardless of its content.
采用局部特征描述子的前提是，检测到关键点，即总共两步：Keypoints detection+Feature Description. 因此有许多基于这两步骤的方法组合。ps：感觉无论是Keypoints detection还是Feature Description，都有很多方法，且比较成熟了，不知道还有木有可以创新的地方？
1、The bag-of words model ,  increases efficiency by quantizing local features into a vocabulary that can be compared using text retrieval techniques .
2、Images described using the bag-of-words model can be efficiently compared using binary string comparison such as a Hamming distance or histogram comparison techniques.
3、Vocabulary trees  can make the process for large-scale place recognition even more efficient.
1、词袋模型忽略所描述place的几何结构，所以the resulting place description is pose invariant,但如果给place添加几何信息，那么改善place match的鲁棒性，尤其是在动态条件下。
The trade-off between pose invariance— recognizing places regardless of the robot orientation—and condition invariance—recognizing places when the visual appearance changes—has not yet been resolved and is a current challenge in place recognition research.
Nicosevici and Garcia  propose an online method to continuously update the vocabulary based on observations.
Global place descriptors used in early localization systems included color histograms  and descriptors based on principal component analysis .
Global descriptors can be generated from local feature descriptors by predefining the keypoints in the image—for example, using a grid-based pattern—and then using the chosen feature description method on the preselected keypoints.
A popular whole-image descriptor is Gist , , which has been used for place recognition on a number of occasions–.
Local and global descriptors 都有各自的优缺点，两个方法的折衷是一个改进方向。
1、Local feature descriptors are not restricted to defining a place only in terms of a previous robot pose, but can be recombined to create new places that have not previously been explicitly observed by the robot.
2、Local features can also be combined with metric information to enable metric corrections to localization , , . Global descriptors do not have the same flexibility, and furthermore, whole-image descriptors are more susceptible to change in the robot’s pose than local descriptor methods, as whole-image descriptor comparison methods tend to assume that the camera viewpoint remains similar. This problem can be somewhat ameliorated by the use of circular shifts as in  or by combining a bag-of-words approach with a Gist descriptor on segments of the image , .
3、While global descriptors are more pose dependent than local feature descriptors, local feature descriptors perform poorly when lighting conditions change  and are comprehensively outperformed by global descriptors at performing place recognition in changing conditions , . Using global descriptors on image segments rather than whole images may provide a compromise between the two approaches, as sufficiently large image segments exhibit some of the condition invariance of whole images, and sufficiently small image segments exhibit the pose invariance of local features.
in metric localization systems, the appearance-based models must be extended with metric information.
For a place recognition or navigation task, the system needs to refer to a map—a stored representation of the robot’s knowledge of the world—to which the current observation is compared. The map framework differs depending on what data are available and what type of place recognition is being performed. Table I displays a taxonomy of mapping approaches, which depends on the level of physical abstraction in the map and whether or not metric information is included in the place description. The most concrete mapping framework listed is the topological-metric or topometric map. Although it is possible to have a globally metric map, such maps are only feasible in small geographical areas, and there are mechanisms for fusing topometric maps into globally metric maps . Thus, for the purposes of place recognition, any globally metric map can be considered as a one-node topometric map.
表1 用于place recognition 的地图构架
中-place description 的类型: 有描述外观的，含稀疏度量信息的，稠密度量信息的
The most abstract form of mapping framework for place recognition only stores appearance information about each place in the environment, with no associated position information. Pure image retrieval assumes that matching is based solely on appearance similarity and applies image retrieval techniques from computer vision that are not specific to place-based information . Although valuable information is lost by not including relative position information, there are computationally efficient indexing techniques that can be exploited.
Pure topological maps contain information about relative positions of places but do not store metric information regarding how these places are related , , , . Topological information can be used to both increase the number of correct place matches and filter out incorrect matches , .
拓扑信息能够用于增加正确的place matches和过滤掉错误的 matches
topological maps can use a location prior to speed up matching, that is, the place recognition system only has to search places known to be close to the robot’s current position.
The addition of topological information into the recognition process allows place recognition using low-resolution data and thus lower memory requirements.
拓扑信息能够降低内存需求, Using the sparse convex L1 -minimization formulation
As image retrieval can be enhanced by adding topological information, topological maps can be enhanced by including metric information—distance, direction, or both—on the map edges.
可以给拓扑地图增加度量（Metric）信息，比如FAB-MAP, SeqSLAM都是纯拓扑系统，但给它们增加里程计的信息，从而性能得到改善，比如CAT-SLAM 和 SMART。
the central goal of any place recognition system is reconciling visual input with the stored map data to generate a belief distribution. This distribution provides a measure of likelihood or confidence that the current visual input matches a particular location in the robot’s map representation of the world.
Belief 是概率机器人学另一个关键概念。Belief 反映了机器人对于环境状态的认知。
采用precision-recall机制进行评估结果，当中，一般先控制precision为100%，再比较recall的高低。即 尽可能地消除假阳性，recall at 100% precision was the key metric for place recognition success.
但是，目前已经提出了几种使用拓扑信息来校正假阳性匹配的方法 – ，并且注意力已经从消除所有假阳性转向找到许多潜在的位置匹配，然后在一个拓扑后处理步骤中的改正任何误匹配。