策略2. 局部特征编码(LFA)
LocSE + Attentive pooling + residual block
1. 网络结构概览
整个网络包含encode,decode 和头部. LFA模块应用于encode 网络中,用于对随机采样的点进行特征提取,解决随机采样带来的信息丢失问题. 除了LFA,作者还使用了残差模块.下面是整个网络代码.
2. 扩张的残差模块:
分为两条路:一条mainstream ,一条shortcut, 最后相加, 弥补信息丢失.
mainstream: MLP(d_out/2) + LFA(d_out) + MLP(d_out *2)
shortcut: MLP(d_out*2)
residual_output = LeaklyReLU( mainstream + shortcut)
残差模块后做点云下采样(训练开始前已经记录每层要保留的点集,以及他们的最近N个邻居点),进入下一层
代码:
def dilated_res_block(self, feature, xyz, neigh_idx, d_out, name, is_training):
f_pc = helper_tf_util.conv2d(feature, d_out // 2, [1, 1], name + 'mlp1', [1, 1], 'VALID', True, is_training)
f_pc = self.building_block(xyz, f_pc, neigh_idx, d_out, name + 'LFA', is_training)
f_pc = helper_tf_util.conv2d(f_pc, d_out * 2, [1, 1], name + 'mlp2', [1, 1], 'VALID', True, is_training,
activation_fn=None)
shortcut = helper_tf_util.conv2d(feature, d_out * 2, [1, 1], name + 'shortcut', [1, 1], 'VALID',
activation_fn=None, bn=True, is_training=is_training)
return tf.nn.leaky_relu(f_pc + shortcut)
3. LFA模块
也就是LFA模块. 该模块由Local Spatial Encoding(局部空间编码)和Attentive Pooling(注意力池化)组成. 结构为:
LocSE+ Attentive Pooling+LocSE+Attentive Pooling
3.1 LocSE (局部空间编码):
第一步: 相对位置编码. 将中心点, 邻居点,相对坐标和距离进行串联,然后使用卷积操作,使其维度与输入的点云特征维度一致,得到增强的特征.
第二步:空间编码特征. 将编码后的相对位置特征与neighbours 的点特征进行串联就得到空间编码特征.
LocSE代码:
def relative_pos_encoding(self, xyz, neigh_idx):
neighbor_xyz = self.gather_neighbour(xyz, neigh_idx)
xyz_tile = tf.tile(tf.expand_dims(xyz, axis=2), [1, 1, tf.shape(neigh_idx)[-1], 1])
relative_xyz = xyz_tile - neighbor_xyz
relative_dis = tf.sqrt(tf.reduce_sum(tf.square(relative_xyz), axis=-1, keepdims=True))
relative_feature = tf.concat([relative_dis, relative_xyz, xyz_tile, neighbor_xyz], axis=-1)
return relative_feature
def building_block(self, xyz, feature, neigh_idx, d_out, name, is_training):
d_in = feature.get_shape()[-1].value
f_xyz = self.relative_pos_encoding(xyz, neigh_idx)
f_xyz = helper_tf_util.conv2d(f_xyz, d_in, [1, 1], name + 'mlp1', [1, 1], 'VALID', True, is_training)
f_neighbours = self.gather_neighbour(tf.squeeze(feature, axis=2), neigh_idx)
f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)
f_pc_agg = self.att_pooling(f_concat, d_out // 2, name + 'att_pooling_1', is_training)
f_xyz = helper_tf_util.conv2d(f_xyz, d_out // 2, [1, 1], name + 'mlp2', [1, 1], 'VALID', True, is_training)
f_neighbours = self.gather_neighbour(tf.squeeze(f_pc_agg, axis=2), neigh_idx)
f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)
f_pc_agg = self.att_pooling(f_concat, d_out, name + 'att_pooling_2', is_training)
return f_pc_agg
在整个building_block(LFA)中, 进行了两次LocSE 和 Attentive Pooling, 如代码所示.
第二次池化是对输入点坐标再次进行相对位置编码,这一次编码的特征长度为d_out//2 (xyz-->f_xyz(d_in)-->f_xyz(d_out//2)), 然后与第一次融合+池化后的特征(特征长度也为d_out//2 )进行concate,并再次池化.
3.2 Attentive Pooling:
思想: 用点的特征生成同维度的权重,为每个点赋予不同的重要性,然后进行加权聚合.
Attentive Pooling 代码:
结构: MLP + 加权求和 + MLP
步骤: 先经过一个简单的MLP(dense),然后利用softmax对K个近邻点分别打分,将特征与注意力得分相乘,再reduce_sum将所有近邻点的特征全部融合. 最后再经过share_MLP进行维度变形得到一个中心点的局部特征。
def att_pooling(feature_set, d_out, name, is_training):
batch_size = tf.shape(feature_set)[0]
num_points = tf.shape(feature_set)[1]
num_neigh = tf.shape(feature_set)[2]
d = feature_set.get_shape()[3].value
f_reshaped = tf.reshape(feature_set, shape=[-1, num_neigh, d])
att_activation = tf.layers.dense(f_reshaped, d, activation=None, use_bias=False, name=name + 'fc')
att_scores = tf.nn.softmax(att_activation, axis=1)
f_agg = f_reshaped * att_scores
f_agg = tf.reduce_sum(f_agg, axis=1)
f_agg = tf.reshape(f_agg, [batch_size, num_points, 1, d])
f_agg = helper_tf_util.conv2d(f_agg, d_out, [1, 1], name + 'mlp', [1, 1], 'VALID', True, is_training)
return f_agg