动机
为解决SLAM里面回环检测问题,基于LIDAR为主sensor的网络:抛弃了传统的词袋模型,直接通过NN提取描述子,同时以OT理论为基础做数据相似性判断;另外,通过NN估计两帧间的R/t变换阵
总体网络架构
LCDNet is composed of a shared encoder, a place recognition head that extracts global descriptors, and a relative pose head that estimates the transformation between two point clouds. We introduce a novel relative pose head based on the unbalanced optimal transport theory that we implement in a differentiable manner to allow for end-to-end training
特征提取
We build the feature extractor stream of our network based upon the PV-RCNN
The input to the network is a point cloud P ∈ R Jx4 (J points with 4 values each: x, y, z, and intensity). The output of our feature extractor network is a set of N keypoints’ feature FRP = { f rP 1 , … , f rP N }, where f rP i ∈ R D is the D-dimensional feature vector for the i-th keypoint.
输入是原始点云,输出是降采样后N个点的特征向量
downsample the point cloud using the Farthest Point Sampling (FPS) algorithm [54] to select N uniformly distributed keypoints.
使用FPS算法对原始点云进行降采样到N个点; 在VSA模块中将原始点和4个特征层的输出联合起来作为每个特征点的特征向量
特征提取形式化描述:
For every selected keypoint kpi , and every layer l of the pyramid feature map, the keypoint features f l i are computed as
where MP is the max-pooling operation, G denotes a Multi Layer Perceptron (MLP), and M randomly samples the set of neighbor voxel features S l i , which is computed as