代码来自:https://github.com/QingyongHu/RandLA-Net/blob/master/RandLANet.py
其中函数dilated_res_block(self, feature, xyz, neigh_idx, d_out, name, is_training)就是文章对应的Dilated Residual Block模块。具体代码如下:
def dilated_res_block(self, feature, xyz, neigh_idx, d_out, name, is_training):
f_pc = helper_tf_util.conv2d(feature, d_out // 2, [1, 1], name + 'mlp1', [1, 1], 'VALID', True, is_training)
f_pc = self.building_block(xyz, f_pc, neigh_idx, d_out, name + 'LFA', is_training)
f_pc = helper_tf_util.conv2d(f_pc, d_out * 2, [1, 1], name + 'mlp2', [1, 1], 'VALID', True, is_training,
activation_fn=None)
shortcut = helper_tf_util.conv2d(feature, d_out * 2, [1, 1], name + 'shortcut', [1, 1], 'VALID',
activation_fn=None, bn=True, is_training=is_training)
return tf.nn.leaky_relu(f_pc + shortcut)
其中feature维度为[batch_szie, 1, N, din],其中N为点云数量,din是输入维度,dout是输出维度,K是近邻点数
1 通过shared mlp处理后,f_pc为[batch_size, 1, N, 1/2dout]
2 主题部分为building_block模块。其中输入xyz为[b, N, 3],neigh_idx[b, N, K],此处的din = 1/2dout,feature = f_pc
def building_block(self, xyz, feature, neigh_idx, d_out, name, is_training):
d_in = feature.get_shape()[-1].value
f_xyz = self.relative_pos_encoding(xyz, neigh_idx)
f_xyz = helper_tf_util.conv2d(f_xyz, d_in, [1, 1], name + 'mlp1', [1, 1], 'VALID', True, is_training)
f_neighbours = self.gather_neighbour(tf.squeeze(feature, axis=2), neigh_idx)
f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)
f_pc_agg = self.att_pooling(f_concat, d_out // 2, name + 'att_pooling_1', is_training)
f_xyz = helper_tf_util.conv2d(f_xyz, d_out // 2, [1, 1], name + 'mlp2', [1, 1], 'VALID', True, is_training)
f_neighbours = self.gather_neighbour(tf.squeeze(f_pc_agg, axis=2), neigh_idx)
f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)
f_pc_agg = self.att_pooling(f_concat, d_out, name + 'att_pooling_2', is_training)
return f_pc_agg
2.1 relative_pos_encoding(self, xyz, neigh_idx)
def relative_pos_encoding(self, xyz, neigh_idx):
neighbor_xyz = self.gather_neighbour(xyz, neigh_idx)
xyz_tile = tf.tile(tf.expand_dims(xyz, axis=2), [1, 1, tf.shape(neigh_idx)[-1], 1])
relative_xyz = xyz_tile - neighbor_xyz
relative_dis = tf.sqrt(tf.reduce_sum(tf.square(relative_xyz), axis=-1, keepdims=True))
relative_feature = tf.concat([relative_dis, relative_xyz, xyz_tile, neighbor_xyz], axis=-1)
return relative_feature
2.1.1 gather_neighbour:其中pc为[b, N, 3],neighbor_idx为[b, N, K]。通过reshape函数,将neighbor_idx变换为[b, N*K],从pc中取出对应的特征(此处为坐标),得到features为[b, N*K, 3],然后再通过reshape函数将feature转换为[b, N, K, 3]输出。
def gather_neighbour(pc, neighbor_idx):
# gather the coordinates or features of neighboring points
batch_size = tf.shape(pc)[0]
num_points = tf.shape(pc)[1]
d = pc.get_shape()[2].value
index_input = tf.reshape(neighbor_idx, shape=[batch_size, -1])
features = tf.batch_gather(pc, index_input)
features = tf.reshape(features, [batch_size, num_points, tf.shape(neighbor_idx)[-1], d])
return features
通过gather_neighbour()函数获得对应点的xyz,随后就是将整合在一起得到relative_feature。
返回后的f_xyz = relative_feature,再通过shared mlp,将其转换为[b, N, K, 1/2dout]。随后通过gather_neighbour()函数获得邻近点的特征f_neighbours[b, N, K, 1/2dout],并将它于f_xyz连接得到f_concat[b, N, K, dout]
2.2 att_pooling(f_concat, d_out // 2, name + 'att_pooling_1', is_training)
def att_pooling(feature_set, d_out, name, is_training):
batch_size = tf.shape(feature_set)[0]
num_points = tf.shape(feature_set)[1]
num_neigh = tf.shape(feature_set)[2]
d = feature_set.get_shape()[3].value
f_reshaped = tf.reshape(feature_set, shape=[-1, num_neigh, d])
att_activation = tf.layers.dense(f_reshaped, d, activation=None, use_bias=False, name=name + 'fc')
att_scores = tf.nn.softmax(att_activation, axis=1)
f_agg = f_reshaped * att_scores
f_agg = tf.reduce_sum(f_agg, axis=1)
f_agg = tf.reshape(f_agg, [batch_size, num_points, 1, d])
f_agg = helper_tf_util.conv2d(f_agg, d_out, [1, 1], name + 'mlp', [1, 1], 'VALID', True, is_training)
return f_agg
其中feature_set = f_concat,对应维度为 [b, N, K, dout],此处函数输入d_out为外层函数的1/2dout。
通过reshape函数将feature_set转换为f_reshaped [b * N, K, dout]
然后连接一个dense函数,即一个全连接,只改变输入的最后一维,输出维度为神经元数,即参数d。因为只使用了一个全连接网络,也就是论文中所说的共享参数的全连接的实现。输出接一个softmax得到att_scores [b* N, K, dout]。
随后是reduce_sum函数,按照那个维度求和,哪个维度变为1,即f_agg [b * N, 1, dout]
随后通过一个reshape转换为[b, N, 1, dout],然后通过shared mlp 转换为[b, N, 1, 1/2dout]输出