【点云】RandLA-Net中的Dilated Residual Block代码解析

最新推荐文章于 2024-07-12 21:58:03 发布

zzl_1998

最新推荐文章于 2024-07-12 21:58:03 发布

阅读量1.8k

点赞数 2

本文链接：https://blog.csdn.net/qq_40731332/article/details/106312968

版权

代码来自：https://github.com/QingyongHu/RandLA-Net/blob/master/RandLANet.py

其中函数dilated_res_block(self, feature, xyz, neigh_idx, d_out, name, is_training)就是文章对应的Dilated Residual Block模块。具体代码如下：

def dilated_res_block(self, feature, xyz, neigh_idx, d_out, name, is_training):
        f_pc = helper_tf_util.conv2d(feature, d_out // 2, [1, 1], name + 'mlp1', [1, 1], 'VALID', True, is_training)
        f_pc = self.building_block(xyz, f_pc, neigh_idx, d_out, name + 'LFA', is_training)
        f_pc = helper_tf_util.conv2d(f_pc, d_out * 2, [1, 1], name + 'mlp2', [1, 1], 'VALID', True, is_training,
                                     activation_fn=None)
        shortcut = helper_tf_util.conv2d(feature, d_out * 2, [1, 1], name + 'shortcut', [1, 1], 'VALID',
                                         activation_fn=None, bn=True, is_training=is_training)
        return tf.nn.leaky_relu(f_pc + shortcut)

其中feature维度为[batch_szie, 1, N, din]，其中N为点云数量，din是输入维度，dout是输出维度，K是近邻点数

1 通过shared mlp处理后，f_pc为[batch_size, 1, N, 1/2dout]

2 主题部分为building_block模块。其中输入xyz为[b, N, 3]，neigh_idx[b, N, K]，此处的din = 1/2dout，feature = f_pc

def building_block(self, xyz, feature, neigh_idx, d_out, name, is_training):
        d_in = feature.get_shape()[-1].value
        f_xyz = self.relative_pos_encoding(xyz, neigh_idx)
        f_xyz = helper_tf_util.conv2d(f_xyz, d_in, [1, 1], name + 'mlp1', [1, 1], 'VALID', True, is_training)
        f_neighbours = self.gather_neighbour(tf.squeeze(feature, axis=2), neigh_idx)
        f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)
        f_pc_agg = self.att_pooling(f_concat, d_out // 2, name + 'att_pooling_1', is_training)

        f_xyz = helper_tf_util.conv2d(f_xyz, d_out // 2, [1, 1], name + 'mlp2', [1, 1], 'VALID', True, is_training)
        f_neighbours = self.gather_neighbour(tf.squeeze(f_pc_agg, axis=2), neigh_idx)
        f_concat = tf.concat([f_neighbours, f_xyz], axis=-1)
        f_pc_agg = self.att_pooling(f_concat, d_out, name + 'att_pooling_2', is_training)
        return f_pc_agg

2.1 relative_pos_encoding(self, xyz, neigh_idx)

def relative_pos_encoding(self, xyz, neigh_idx):
        neighbor_xyz = self.gather_neighbour(xyz, neigh_idx)
        xyz_tile = tf.tile(tf.expand_dims(xyz, axis=2), [1, 1, tf.shape(neigh_idx)[-1], 1])
        relative_xyz = xyz_tile - neighbor_xyz
        relative_dis = tf.sqrt(tf.reduce_sum(tf.square(relative_xyz), axis=-1, keepdims=True))
        relative_feature = tf.concat([relative_dis, relative_xyz, xyz_tile, neighbor_xyz], axis=-1)
        return relative_feature

2.1.1 gather_neighbour：其中pc为[b, N, 3]，neighbor_idx为[b, N, K]。通过reshape函数，将neighbor_idx变换为[b, N*K]，从pc中取出对应的特征（此处为坐标），得到features为[b, N*K, 3]，然后再通过reshape函数将feature转换为[b, N, K, 3]输出。

def gather_neighbour(pc, neighbor_idx):
        # gather the coordinates or features of neighboring points
        batch_size = tf.shape(pc)[0]
        num_points = tf.shape(pc)[1]
        d = pc.get_shape()[2].value
        index_input = tf.reshape(neighbor_idx, shape=[batch_size, -1])
        features = tf.batch_gather(pc, index_input)
        features = tf.reshape(features, [batch_size, num_points, tf.shape(neighbor_idx)[-1], d])
        return features

通过gather_neighbour()函数获得对应点的xyz，随后就是将整合在一起得到relative_feature。

返回后的f_xyz = relative_feature，再通过shared mlp，将其转换为[b, N, K, 1/2dout]。随后通过gather_neighbour()函数获得邻近点的特征f_neighbours[b, N, K, 1/2dout]，并将它于f_xyz连接得到f_concat[b, N, K, dout]

2.2 att_pooling(f_concat, d_out // 2, name + 'att_pooling_1', is_training)

def att_pooling(feature_set, d_out, name, is_training):
        batch_size = tf.shape(feature_set)[0]
        num_points = tf.shape(feature_set)[1]
        num_neigh = tf.shape(feature_set)[2]
        d = feature_set.get_shape()[3].value
        f_reshaped = tf.reshape(feature_set, shape=[-1, num_neigh, d])
        att_activation = tf.layers.dense(f_reshaped, d, activation=None, use_bias=False, name=name + 'fc')
        att_scores = tf.nn.softmax(att_activation, axis=1)
        f_agg = f_reshaped * att_scores
        f_agg = tf.reduce_sum(f_agg, axis=1)
        f_agg = tf.reshape(f_agg, [batch_size, num_points, 1, d])
        f_agg = helper_tf_util.conv2d(f_agg, d_out, [1, 1], name + 'mlp', [1, 1], 'VALID', True, is_training)
        return f_agg

其中feature_set = f_concat，对应维度为 [b, N, K, dout]，此处函数输入d_out为外层函数的1/2dout。

通过reshape函数将feature_set转换为f_reshaped [b * N, K, dout]

然后连接一个dense函数，即一个全连接，只改变输入的最后一维，输出维度为神经元数，即参数d。因为只使用了一个全连接网络，也就是论文中所说的共享参数的全连接的实现。输出接一个softmax得到att_scores [b* N, K, dout]。

随后是reduce_sum函数，按照那个维度求和，哪个维度变为1，即f_agg [b * N, 1, dout]

随后通过一个reshape转换为[b, N, 1, dout]，然后通过shared mlp 转换为[b, N, 1, 1/2dout]输出