PointNet

layB

已于 2023-05-31 16:59:18 修改

阅读量153

点赞数

分类专栏： PointCloud Fusion 文章标签：深度学习机器学习 python

于 2023-02-21 14:32:19 首次发布

本文链接：https://blog.csdn.net/darlingb/article/details/129141097

版权

PointCloud Fusion 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

Introduction

输入为一帧的全部点云数据的集合，表示为一个nx3的2d tensor，其中n代表点云数量，3对应xyz坐标（3×n）。
输入数据先通过和一个T-Net学习到的转换矩阵相乘来对齐，保证了模型的对特定空间转换的不变性。(input transform)
通过多次mlp对各点云数据进行特征提取后，再用一个T-Net对特征进行对齐。
在特征的各个维度上执行maxpooling操作来得到最终的全局特征。
对分类任务，将全局特征通过mlp来预测最后的分类分数；对分割任务，将全局特征和之前学习到的各点云的局部特征进行串联，再通过mlp得到每个数据点的分类结果。

网络结构：

在这里插入图片描述
该网络根据任务（不管分类还是分割）可以看成两个网络，一是做分类任务的蓝色区域，二是做分割任务的浅黄色区域
mlp:https://blog.csdn.net/fg13821267836/article/details/93405572

网络输入： n $\times$ 3 的张量。其中n是点云数据包含的点的个数，3是空间位置坐标（x,y,z）。

input transform 和 feature transform： 为了保证输入点云的不变性（旋转平移等刚体变换），作者在进行特征提取前对点云数据进行了对齐操作（也就是input transform)，对齐操作是通过训练一个小型的网络（T-Net）来得到转换矩阵，并将之和输入点云数据相乘来实现。
在这里插入图片描述

class STN3d(nn.Module):
    def __init__(self, channel):
        super(STN3d, self).__init__()
        self.conv1 = torch.nn.Conv1d(channel, 64, 1)
        self.conv2 = torch.nn.Conv1d(64, 128, 1)
        self.conv3 = torch.nn.Conv1d(128, 1024, 1)
        self.fc1 = nn.Linear(1024, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 9)
        self.relu = nn.ReLU()

        self.bn1 = nn.BatchNorm1d(64)
        self.bn2 = nn.BatchNorm1d(128)
        self.bn3 = nn.BatchNorm1d(1024)
        self.bn4 = nn.BatchNorm1d(512)
        self.bn5 = nn.BatchNorm1d(256)

    def forward(self, x):
        batchsize = x.size()[0] # shape (batch_size,3,point_nums)
        x = F.relu(self.bn1(self.conv1(x))) # shape (batch_size,64,point_nums)
        x = F.relu(self.bn2(self.conv2(x))) # shape (batch_size,128,point_nums)
        x = F.relu(self.bn3(self.conv3(x))) # shape (batch_size,1024,point_nums)
        x = torch.max(x, 2, keepdim=True)[0] # shape (batch_size,1024,1)
        x = x.view(-1, 1024) # shape (batch_size,1024)

        x = F.relu(self.bn4(self.fc1(x))) # shape (batch_size,512)
        x = F.relu(self.bn5(self.fc2(x))) # shape (batch_size,256)
        x = self.fc3(x) # shape (batch_size,9)

        iden = Variable(torch.from_numpy(np.array([1, 0, 0, 0, 1, 0, 0, 0, 1]).astype(np.float32))).view(1, 9).repeat(
            batchsize, 1) # # shape (batch_size,9)
        if x.is_cuda:
            iden = iden.cuda()
        # that's the same thing as adding a diagonal matrix(full 1)
        x = x + iden # iden means that add the input-self
        x = x.view(-1, 3, 3) # shape (batch_size,3,3)
        return x

3 $\times$ 3 的input transform矩阵的获取还是比较简单，这么一套操作下来，这个input transform矩阵就不是固定的了，它会根据网络的输入动态调整矩阵的权重。
和上面的input transform矩阵的获取方式类似，feature transform的 64 $\times$ 64 矩阵获取代码实现如下：

class STNkd(nn.Module):
    def __init__(self, k=64):
        super(STNkd, self).__init__()
        self.conv1 = torch.nn.Conv1d(k, 64, 1)
        self.conv2 = torch.nn.Conv1d(64, 128, 1)
        self.conv3 = torch.nn.Conv1d(128, 1024, 1)
        self.fc1 = nn.Linear(1024, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, k * k)
        self.relu = nn.ReLU()

        self.bn1 = nn.BatchNorm1d(64)
        self.bn2 = nn.BatchNorm1d(128)
        self.bn3 = nn.BatchNorm1d(1024)
        self.bn4 = nn.BatchNorm1d(512)
        self.bn5 = nn.BatchNorm1d(256)

        self.k = k

    def forward(self, x):
        batchsize = x.size()[0]
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.relu(self.bn3(self.conv3(x)))
        x = torch.max(x, 2, keepdim=True)[0]
        x = x.view(-1, 1024)

        x = F.relu(self.bn4(self.fc1(x)))
        x = F.relu(self.bn5(self.fc2(x)))
        x = self.fc3(x)

        iden = Variable(torch.from_numpy(np.eye(self.k).flatten().astype(np.float32))).view(1, self.k * self.k).repeat(
            batchsize, 1)
        if x.is_cuda:
            iden = iden.cuda()
        x = x + iden
        x = x.view(-1, self.k, self.k)
        return x

64 $\times$ 64 的feature transform矩阵很难优化，但是作者发现如果这个矩阵约等于一个正交矩阵，那么优化就方便很多，也稳定很多。为了实现这个矩阵约等于一个正交矩阵，根据正交矩阵的性质，即正交矩阵与其转置的乘积等于单位矩阵。那么作者额外增加了一个损失函数，定义如下：
在这里插入图片描述
作者在代码中的实现如下：

def feature_transform_reguliarzer(trans):
    """ make the transformation matrix of input akin to orthogonal matrix"""
    d = trans.size()[1]
    I = torch.eye(d)[None, :, :]
    if trans.is_cuda:
        I = I.cuda()
    loss = torch.mean(torch.norm(torch.bmm(trans, trans.transpose(2, 1) - I), dim=(1, 2)))
    return loss

为什么做分割任务的时候，输入到分割网络的特征为1088？
在这里插入图片描述
这个 n $\times$ 1088的张量由两部分组成，一个是特征提取网络的输出（大小为 n $\times$ 64 ）,另一个是通过maxpooling后的global feature（大小为1024），在进行两者融合的时候，对global feature进行了广播，那么64+1024就是1088了。为什么要这么做呢？论文中这么提到

After computing the global point cloud feature vector, we feed it back to per point features by concatenating the global feature with each ofthe point features. Then we extract new per point features based on the combined point features - this time the per point feature is aware of both the local and global information

答案就是作者想要融合点的特征信息（来自特征提取网络的输出）与全局特征（来自global feature）。

问：这一套下来，作者一直在做点之间特征的单独提取，除了最后一层maxpool获取全局信息外，好像并没有将点与其周围点进行融合，提取局部特征呀？

的确，在PointNet这篇文章中确实没有做到像CNN那样逐层提取局部特征。我们知道在CNN中，一个点会与周围若干点进行加权求和（具体取决于卷积核大小），然后获取一个新的点，随着网络层数加深，深层网络的一个点对应原始图像的一个映射区域，这就是感受野的概念。但是本文做的特征提取都是点之间独立进行的，这势必会造成一些问题，至于具体的问题解决，作者在PointNet++展开了说明。