Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
Camera pose estimation in known scenes
deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms.
Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning.
but also improve the accuracy of sparse feature matching.
Inspired by direct image alignment [22, 26, 27, 63, 90, 91] and learned image representations for outlier rejection [42], we argue that end-to-end visual localization algorithms
should focus on representation learning.
the network does not need to learn pose regression itself, but only to extract suitable features, making the algorithm accurate and scene-agnostic.
PixLoc localizes by aligning query and reference images ac