We use a Bayesian convolutional neural network to regress the 6-DOF camera pose from a single RGB image. It is trained in an end-to-end manner with no need of additional engineering or graph optimisation.
We leverage the uncertainty measure to estimate metric relocalization error and to detect the presence or absence of the scene in the input image. We show that the model’s uncertainty is caused by images being dissimilar to the training dataset in either pose or appearance.
PoseNet removes the need for separate mechanisms for appearance based relocalization and metric pose estimation. Furthermore it does not need to store key frames, or establish frame to frame correspondence. It does this by mapping monocular images to a high dimensional space which is linear in pose and robust to annoyance variables. We can then regress the full 6-DOF camera pose from this representation. This allows us to regress camera pose from the image directly, without the need of tracking or landmark matching.
The main contribution of this paper is extending this framework to a Bayesian model which is able to determine the uncertainty of localization. Our Bayesian convolutional neural network requires no additional memory, and can relocalize in under 6ms per frame on a GPU. By leveraging this probabilistic approach, we