Learning Matchable Image Transformations for Long-term Metric Visual Localization

color constancy theory 色彩恒常性、自动白平衡

https://blog.csdn.net/jtop0/article/details/7209702?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-5&spm=1001.2101.3001.4242

光源估计 各种网站

http://colorconstancy.com/

 

 

2019 RAL Learning Matchable Image Transformations for Long-term Metric Visual Localization

learn a drop-in replacement for the standard RGB-to-grayscale colorspace mapping used to pre-process RGB images for use with conventional feature detection/matching algorithms

builds upon prior work on color constancy theory

 

mapping the RGB colorspace onto a grayscale colorspace

 appropriate objective function which should ideally be tied to the performance of
the target localization pipeline

 

mapping the RGB colorspace onto a grayscale colorspace that explicitly maximizes a chosen performance metric of a vision-based localization pipeline.

We investigate two approaches to formulating such a mapping

1. a single function , similarly to [11], [13], [14];

  • Robust monocular visual teach and repeat aided by local ground planarity and color-constant imagery  2017
  • Robust, long-term visual localisation using illumination invariance 2014 
  • Expanding the limits of vision-based localization for long-term routefollowing autonomy 2017

2. a parametrized function tailored to the specific image pair

.Additionally, the functional form of either mapping may be specified analytically (e.g., from physics) or learned from data using a function approximator such as a neural network.

 

In the absence of accurate ground truth data, we might instead choose to maximize the number or quality of feature matches in the front-end of a feature-based localization pipeline. 

 

 In this work we learn an objective function by training a deep convolutional neural network (CNN) to act as a differentiable proxy to the localization front-end.

This proxy network can then be used to define a fully differentiable objective function, allowing us to train a nonlinear colorspace mapping using gradient-based methods.

 

related works

Appearance robustness in metric visual localization has previously been studied from the perspective of illumination invariance 

 

11- 14 hand-engineered image transformations to improve feature matching over time

  • [12]Dealing with shadows: Capturing intrinsic scene appearance for image-based outdoor localisation  2013

 [15], [16]. affine models and other simple analytical transformations 

  • [16]Illumination change robustness in direct visual SLAM

[17]–[20] have focused on learning feature descriptors that are robust to certain types of appearance change

  • Learning place-dependant features for long-term vision-based localisation
  • Made to measure: Bespoke landmarks for 24-hour, all-weather localisation with a camera
  • Image features for visual teach-and-repeat navigation in changing environments
  • Learning place-and-timedependent binary descriptors for long-term visual localization

 

image-to-image translation [7], [8]

  • Image-to-image translatio with conditional adversarial networks
  • Unpaired image-to-image translation using Cycle-Consistent adversarial networks

[21] the authors train a convolutional encoderdecoder network to enhance the temporal consistency : Here the main source of appearance change is the camera
itself

 

 [5] learning a manyto-one mapping onto a privileged appearance condition and
[6] learning multiple pairwise mappings between appearance categories such as day and night. 

  • How to train a CAT: Learning canonical appearance transformations for direct visual localization under illumination change
  • Adversarial training for adverse conditions: Robust metric localisation using appearance transfer

 appearanceinvariant place recognition [22], [23], which typically relies on patch matching or whole-image statistics to identify images corresponding to nearby physical locations 

  • Addressing challenging place recognition tasks using generative adversarial networks
  • Night-to-day image translation for retrieval-based localization

, [5], [6], [21] require well-aligned training images exhibiting appearance variation, which are difficult to obtain at scale in the real world, and it is not clear how categorical
appearance mappings such as [6], [22], [23] should be applied to continuous appearance change in long-term deployments.

 

 Grayscale images generated using this procedure are somewhat resistant to variations in lighting and shadow, and have been shown to improve stereo localization quality in the
presence of shadows and changing daytime lighting conditions [11], [13], [14]

 

A. Differentiable Matcher Proxy

We consider the task of training a CNN Mθ, with parameters θ, to predict the number of inlier feature matches returned by a non-differentiable feature detector/matcher M for a given image pair


B. Physically Motivated Transformation
Prior work in [9] has shown that under the assumptions of a single black-body illuminant and an infinitely narrow sensor response function, an appropriately weighted linear combination of the log-responses of a three-channel (e.g.,RGB) camera represents a projection onto an invariant onedimensional chromaticity space that is independent of both the intensity and color temperature of the illuminant, and depends only on the imaging sensor and the materials in the scene

  • [9]Study of the photodetector characteristics of a camera for color constancy in natural scenes  2010

Grayscale images generated using this procedure are somewhat resistant to variations in lighting and shadow, and have been shown to improve stereo localization quality in the presence of shadows and changing daytime lighting conditions [11], [13], [14], but have not been successful in adapting to nighttime navigation with headlights

 

relax the constraints in equation (3) and generalize equation (2) to be of the form

 

C. Learned Nonlinear Transformations

While the assumption of a single black-body illuminant in [9] is reasonable for daytime navigation where the dominant light source is the sun, it does not hold in many common navigation scenarios such as nighttime driving with headlights. 

Moreover, the assumption of an infinitely narrow sensor response is unrealistic for real cameras.

 

we investigate the possibility of learning a bespoke nonlinear mapping that maximizes matchability for a particular combination of imaging sensor, estimator and environment.

We consider two versions of this MLP-based transformation,both with and without incorporating an additional pairwise context feature obtained from encoder network Eφ.

 

IV. EXPERIMENTS

 

h the greatest improvements generally obtained from the SumLog and SumLog-E transformations.

We saw little improvement in match counts on the RobotCar/Overcast-Night experiment, which we attribute to motion blur in the nighttime images making feature matching exceptionally difficult.

 

Fig. 4 shows the outputs of each image transformation for sample RGB image pairs in the VKITTI/0020 Morning and Sunset sequences (Fig. 4(a)) and the challenging sequence InTheDark/0041 (Fig. 4(b)).

We see that each model produced image pairs that are visually more consistent than standard Gray images, and that local illumination variations such as shadows, uneven lighting, and specular reflections were minimized by optimizing equation (6).

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值