【6PACK代码注解】网络结构

momo_vv

已于 2022-08-06 13:15:12 修改

阅读量769

点赞数

分类专栏： 6pack 文章标签： KeyNet ModifiedResnet PoseNetFeat 关键点预测深度学习

于 2022-08-04 10:42:27 首次发布

本文链接：https://blog.csdn.net/weixin_44695308/article/details/125950048

版权

6pack 专栏收录该内容

9 篇文章 2 订阅

订阅专栏

文章目录

前言
KeyNet

前言

【6PACK全记录】6-PACK论文学习及复现记录
network.py中，KeyNet是主干网络，其中还涉及到ModifiedResnet—用于提取颜色信息的网络，和PoseNetFeat—用于将颜色特征和空间距离特征融合的网络，本文将重点注解KeyNet部分，并将其余两个网络作为Keynet的补充注解

KeyNet

一、init

train.py中model = KeyNet(num_points = opt.num_points, num_key = opt.num_kp, num_cates = opt.num_cates)代码将调用__init__函数初始化，代码结构如下：
输入参数：

- num_points：点云点数
- num_key：关键点点数
- num_cates：类别数

def __init__(self, num_points, num_key, num_cates):
        super(KeyNet, self).__init__()
        self.num_points = num_points
        self.cnn = ModifiedResnet()
        #用于提取颜色特征
        #summary(self.cnn,(a,b,c))---查看网络的每一层和输出大小，其中abc为img的shape

        self.feat = PoseNetFeat(num_points)
        #用于将颜色特征和空间距离特征融合
        self.num_cates = num_cates

        self.sm = torch.nn.Softmax(dim=2)
        #用于生成关键点
        self.kp_1 = torch.nn.Conv1d(160, 90, 1)
        self.kp_2 = torch.nn.Conv1d(90, 3*num_key, 1)

        #选择目标锚点，用2层MLP作为注意力网络
        self.att_1 = torch.nn.Conv1d(160, 90, 1)
        self.att_2 = torch.nn.Conv1d(90, 1, 1)

        self.sm2 = torch.nn.Softmax(dim=1)

        self.num_key = num_key

        #全0的tensor，shape为[1，num_points，3]
        self.threezero = Variable(torch.from_numpy(np.array([0, 0, 0]).astype(np.float32))).cuda().view(1, 1, 3).repeat(1, self.num_points, 1)

二、forward（重点）

train.py中Kp_fr, anc_fr, att_fr = model(img_fr, choose_fr, cloud_fr, anchor, scale, cate, t_fr)部分将调用forward函数

输入输出

输入参数：

- img：color crop 
- choose：选取的点云在2D图中对应的像素idx
- x:点云
- anchor：锚点网格
- scale：归一化系数
- cate：类别索引
- gt_t:当前位姿中的T

输出：

 - all_kp_x：关键点坐标
 - output_anchor：锚点坐标
 - att_x：锚点置信分数

逐句解析

def forward(self, img, choose, x, anchor, scale, cate, gt_t):
        num_anc = len(anchor[0])#锚点网格中的锚点数125
        out_img = self.cnn(img)#提取的颜色特征，对于大小为[1，3，160，160]的img。输出的大小为[1，32，160，160]
        bs, di, _, _ = out_img.size()#bs:批量大小 di：通道数

其中out_img = self.cnn(img)将调用ModifiedResnet网络

2.1 ModifiedResnet

该网络与DenseFusion中一致，是提取颜色特征的网络，网络实现如下：

psp_models = {
    'resnet18': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=512, deep_features_size=256, backend='resnet18'),
    'resnet34': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=512, deep_features_size=256, backend='resnet34'),
    'resnet50': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=2048, deep_features_size=1024, backend='resnet50'),
    'resnet101': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=2048, deep_features_size=1024, backend='resnet101'),
    'resnet152': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=2048, deep_features_size=1024, backend='resnet152')
}

class ModifiedResnet(nn.Module):

    def __init__(self, usegpu=True):
        super(ModifiedResnet, self).__init__()

        #采用resnet18做编码器，可以修改
        self.model = psp_models['resnet18'.lower()]()#“.lower()”字符串小写
        #用多个GPU加速训练
        self.model = nn.DataParallel(self.model)

    def forward(self, x):
        x = self.model(x)
        return x

是一种编码-解码结构，编码器可以采用resnet18，解码器由PspNet中的上采样层组成，调用时输入的x为分割后的实例彩色图color_crop，输出为该实例的颜色特征。对应DenseFusion论文中的以下部分：
在这里插入图片描述

在init中self.cnn后添加以下指令，可以看到该网络在输入大小为(3，160，160)的img时，内部各层输出结构如下：

self.cnn = ModifiedResnet()
summary(self.cnn,(3,160,160)) #需要form torch.summary import summary

在这里插入图片描述

其中(3，160，160)为输入的img的大小，可以看到，输出颜色特征为（32，160，160）的张量，则bs=1，di=32

emb = out_img.view(bs, di, -1)#[1，32，160*160]
choose = choose.repeat(1, di, 1)

第一句color_embedding，将颜色特征拉平成1维，上例的对应大小为[1，32，160*160]
第二句将choose复制di次，使得每一个输出的通道都有一个choose，上例对应大小[bs=1,di=32,choose=500]

emb = torch.gather(emb, 2, choose).contiguous()#[1，32，500]
emb = emb.repeat(1, 1, num_anc).contiguous()#[1，32，500*125]

第一句用gather函数收集choose中点云索引对应的颜色特征，并通过contiguous()实现深复制，由此与上一步的emb隔绝，剔除不关心点的特征。gather相关：Pytorch系列：torch.gather()

第二句将emb复制125遍，每个锚点和点云建立联系,为特征融合做准备

output_anchor = anchor.view(1, num_anc, 3)#resize，保证output_anchor是[1，125，3]的
anchor_for_key = anchor.view(1, num_anc, 1, 3).repeat(1, 1, self.num_key, 1)#[1，num_anc=125，num_key=8，3]，即对每一个keypoint复制1份anchor
anchor = anchor.view(1, num_anc, 1, 3).repeat(1, 1, self.num_points, 1)#[1，num_anc=125，num_points=500，3]，即对每一个点云复制1份anchor，存的是锚点坐标
x = x.view(1, 1, self.num_points, 3).repeat(1, num_anc, 1, 1)#[1，num_anc=125，500，3]，对每一个anchor复制一份点云，存点云坐标
x = (x - anchor).view(1, num_anc * self.num_points, 3).contiguous()#每个点云到每个锚点的距离，【1，125*500，3】
x = x.transpose(2, 1).contiguous()#【1，3，125*500】

通过一系列resize（view）和复制，得到

output_anchor：锚点坐标本身[1,125,3]
anchor_for_key：每个关键点复制一套锚点[1,125,8,3]
anchor:每个点云点复制一套锚点[1,125,500，3]
x：每个点云与各个锚点的距离（x,y,z分离）[1,3,125*500]，为后续距离加权做准备

此后feat_x = self.feat(x, emb)语句将调用PoseNetFeat网络：

2.2 PoseNetFeat

网络结构如下：

class PoseNetFeat(nn.Module):
    def __init__(self, num_points):
        super(PoseNetFeat, self).__init__()
        self.conv1 = torch.nn.Conv1d(3, 64, 1)
        self.conv2 = torch.nn.Conv1d(64, 128, 1)

        self.e_conv1 = torch.nn.Conv1d(32, 64, 1)
        self.e_conv2 = torch.nn.Conv1d(64, 128, 1)

        self.conv5 = torch.nn.Conv1d(256, 256, 1)

        self.all_conv1 = torch.nn.Conv1d(640, 320, 1)
        self.all_conv2 = torch.nn.Conv1d(320, 160, 1)

        self.num_points = num_points

    def forward(self, x, emb):
        #x:【1，3，500*125】,每个点云点到每个anchor的距离----空间特征
        #emb：[1，32，500*125]，500点云点的色彩特征复制125次----色彩特征
        x = F.relu(self.conv1(x))#【1，64，500*125】
        emb = F.relu(self.e_conv1(emb))#【1，64，500*125】
        pointfeat_1 = torch.cat((x, emb), dim=1)#按照第2个维度（通道数）将x，emb拼接【1，128，500*125】

        x = F.relu(self.conv2(x))#【1，128，500*125】
        emb = F.relu(self.e_conv2(emb))#【1，128，500*125】
        pointfeat_2 = torch.cat((x, emb), dim=1)#【1，256，500*125】

        x = F.relu(self.conv5(pointfeat_2))#【1，256，500*125】
        x = torch.cat([pointfeat_1, pointfeat_2, x], dim=1).contiguous() #128 + 256 + 256，【1，640，500*125】

        x = F.leaky_relu(self.all_conv1(x))#【1，320，500*125】
        x = self.all_conv2(x)#【1，160，500*125】

        return x

该网络通过将颜色特征、空间特征卷积并激活后，用cat函数结合，生成融合特征pointfeat_1 、pointfeat_2，再将二者结合作为融合特征x输出([1，160，500*125])，其中160维特征维度。

feat_x = self.feat(x, emb)#【1，160，500*125】
feat_x = feat_x.transpose(2, 1).contiguous()#【1，500*125，160】
feat_x = feat_x.view(1, num_anc, self.num_points, 160).contiguous()#【1，125，500，160】

调整融合特征，至此得到DenseFusion编码器生成的x_j的一维嵌入 $\phi$ _j，完成论文中的步骤3：
在这里插入图片描述

loc = x.transpose(2, 1).contiguous().view(1, num_anc, self.num_points, 3)#【1，125，500，3】每个anchor到500个点云的距离
weight = self.sm(-1.0 * torch.norm(loc, dim=3)).contiguous()#权值w=softmax(d)
weight = weight.view(1, num_anc, self.num_points, 1).repeat(1, 1, 1, 160).contiguous()#【1，125，500，160】

feat_x = torch.sum((feat_x * weight), dim=2).contiguous().view(1, num_anc, 160)#【1，125，160】
feat_x = feat_x.transpose(2, 1).contiguous()#【1，160，125】

该部分代码对应论文的步骤4，如图所示
在这里插入图片描述
即采用距离加权的平均池化得到嵌入锚点 $\psi$ _i=Σ_jw_j $\phi$ _j，其中loc为所有anchor与点云的距离d，权值w=softemax(d)，距离加权后的anchor embedding为feat_x

kp_feat = F.leaky_relu(self.kp_1(feat_x))#卷积并激活后，【1，90，125】
kp_feat = self.kp_2(kp_feat)#卷积并激活后，【1，3*8，125】
kp_feat = kp_feat.transpose(2, 1).contiguous()#【1，125，3*8】
kp_x = kp_feat.view(1, num_anc, self.num_key, 3).contiguous()#【1，125，8，3】
kp_x = (kp_x + anchor_for_key).contiguous()#【1，125，8，3】是每个锚点对应的1组（8个）

此部分是生成关键点的网络，输入为锚点嵌入feat_x，输出8*3维的关键点列表（有序），实际对应论文图中标号6的部分，只不过此处生成的是所有（125个）anchor各自对应的8个keypoints，需要在选择目标锚点后，取出该锚点对应的keypoints。
在这里插入图片描述

#两层MLP作为注意力网络
att_feat = F.leaky_relu(self.att_1(feat_x))#【1，90，125】
att_feat = self.att_2(att_feat)#【1，1，125】
att_feat = att_feat.transpose(2, 1).contiguous()#[1,125,1]
att_feat = att_feat.view(1, num_anc).contiguous()#【1，125】
att_x = self.sm2(att_feat).contiguous()#【1，125】置信分数

该部分对应论文中的步骤5，采用2层MLP作为注意力网络，网络输入为锚点嵌入 $\psi$ _i，输出为置信分数c_i（这里是att_x）
在这里插入图片描述

#scale本身是每个维度的归一化系数[1，3]
scale_anc = scale.view(1, 1, 3).repeat(1, num_anc, 1)#【1，125，3】
output_anchor = (output_anchor * scale_anc).contiguous()#反归一化，锚点原本坐标
min_choose = torch.argmin(torch.norm(output_anchor - gt_t, dim=2).view(-1))

从 $\psi$ _i中找最接近物体真实质心的点作为目标锚点。这里min_choose为目标锚点的idx

all_kp_x = kp_x.view(1, num_anc, 3*self.num_key).contiguous()#【1，125，3*8】
all_kp_x = all_kp_x[:, min_choose, :].contiguous()#取出目标锚点对应的keypoints【1，3*8】
all_kp_x = all_kp_x.view(1, self.num_key, 3).contiguous()#【1，8，3】

scale_kp = scale.view(1, 1, 3).repeat(1, self.num_key, 1)#【1，8，3】
all_kp_x = (all_kp_x * scale_kp).contiguous()#反归一化，keypoints的实际坐标

从kp_x中取出目标锚点对应的8个keypoints，并反归一化，得到关键点实际坐标。
在这里插入图片描述

return all_kp_x, output_anchor, att_x

至此，我们得到了8个Keypoints的实际坐标all_kp_x，125个锚点的实际坐标output_anchor，125个锚点的置信分数att_x。

三、eval_forward

在eval.py中Kp_fr, att_fr = model.eval_forward(img_fr, choose_fr, cloud_fr, anchor, scale, 0.0, True)部分将调用该函数，与forward函数地位相当

输入输出

输入参数：

 - img：color_crop
 - choose：点云idx
 - ori_x：归一化点云
 - anchor：锚点网格
 - scale：归一化系数
 - space：添加平移噪声的范围
 - first：是否为第一帧

输出

 - all_kp_x：加入不同方向平移噪声后(27个方向或不加)Keypoints列表
 - all_att_choose：加入不同方向平移噪声后最可能的目标锚点的idx（即置信分数最大的那个锚点）

逐句解析

该部分与forward()的最大差别在于平移噪声的加入。为和其他方法的估计效果做公平对比，eval时对初始帧添加了平移噪声，为了处理这个问题，除第1帧外，各帧向不同方向偏移一个长度(由space确定)后再计算空间距离特征并与颜色特征进行融合。

def eval_forward(self, img, choose, ori_x, anchor, scale, space, first):
        num_anc = len(anchor[0])
        out_img = self.cnn(img)#提取颜色特征，对[3，120，120]的img，输出[32，120，120]的out_img
        
        bs, di, _, _ = out_img.size()#bs=1，di=32

        emb = out_img.view(bs, di, -1)#【1，32，120*120】
        choose = choose.repeat(1, di, 1)
        emb = torch.gather(emb, 2, choose)#【1，32，500】点云点的颜色特征
        emb = emb.repeat(1, 1, num_anc).detach()#将emb复制125遍，[1，32，500*125]
        #print(emb.size())

        output_anchor = anchor.view(1, num_anc, 3)
        anchor_for_key = anchor.view(1, num_anc, 1, 3).repeat(1, 1, self.num_key, 1)#[1，num_anc=125，num_key=8，3]，即对每一个keypoint复制1份anchor
        anchor = anchor.view(1, num_anc, 1, 3).repeat(1, 1, self.num_points, 1)#[1，num_anc=125，num_points=500，3]，即对每一个点云复制1份anchor，存的是锚点坐标

至此与forward()完全一致。此后考虑到偏移，根据space的至建立27个方向的偏移量，将点云做偏移。

candidate_list = [-10*space, 0.0, 10*space]#偏移范围
        if space != 0.0:
            add_on = []
            for add_x in candidate_list:
                for add_y in candidate_list:
                    for add_z in candidate_list:
                        add_on.append([add_x, add_y, add_z])#由space决定的立方体网格(3*3*3=27个点)

            add_on = Variable(torch.from_numpy(np.array(add_on).astype(np.float32))).cuda().view(27, 1, 3)
        else:
            add_on = Variable(torch.from_numpy(np.array([0.0, 0.0, 0.0]).astype(np.float32))).cuda().view(1, 1, 3)

        all_kp_x = []
        all_att_choose = []
        scale_add_on = scale.view(1, 3)

        for tmp_add_on in add_on:#对27个方向分别处理
            tmp_add_on_scale = (tmp_add_on / scale_add_on).view(1, 1, 3).repeat(1, self.num_points, 1)
            #【1，500，1】将add_on归一化再复制500次
            tmp_add_on_key = (tmp_add_on / scale_add_on).view(1, 1, 3).repeat(1, self.num_key, 1)
            #【1，8，1】将add_on归一化再复制8次
            x = ori_x - tmp_add_on_scale#【1，500，3】每个点云与add_on的坐标差值，即对点云做偏移
            
            #--------开始与forward一致------
            x = x.view(1, 1, self.num_points, 3).repeat(1, num_anc, 1, 1)#【1，125，500，3】
            x = (x - anchor).view(1, num_anc * self.num_points, 3)

此后，与forward一致，计算偏移后的点云x与每个锚点的距离，并求融合特征，代码如下（对27个偏移方向分别处理，故仍在循环中）

#--------开始与forward一致------
            x = x.view(1, 1, self.num_points, 3).repeat(1, num_anc, 1, 1)#【1，125，500，3】
            x = (x - anchor).view(1, num_anc * self.num_points, 3)#每个点云到每个锚点的距离，【1，125*500，3】

            x = x.transpose(2, 1)#【1，3，125*500】
            feat_x = self.feat(x, emb)#【1，160，500*125】融合特征
            feat_x = feat_x.transpose(2, 1)
            feat_x = feat_x.view(1, num_anc, self.num_points, 160).detach()

            loc = x.transpose(2, 1).view(1, num_anc, self.num_points, 3)
            weight = self.sm(-1.0 * torch.norm(loc, dim=3))
            weight = weight.view(1, num_anc, self.num_points, 1).repeat(1, 1, 1, 160)

            feat_x = torch.sum((feat_x * weight), dim=2).view(1, num_anc, 160)
            feat_x = feat_x.transpose(2, 1).detach()#距离加权后的anchor embedding

            kp_feat = F.leaky_relu(self.kp_1(feat_x))
            kp_feat = self.kp_2(kp_feat)
            kp_feat = kp_feat.transpose(2, 1)
            kp_x = kp_feat.view(1, num_anc, self.num_key, 3)
            kp_x = (kp_x + anchor_for_key).detach()#【1，125，8，3】是每个锚点对应的1组（8个）

            #两层MLP作为注意力网络
            att_feat = F.leaky_relu(self.att_1(feat_x))
            att_feat = self.att_2(att_feat)
            att_feat = att_feat.transpose(2, 1)
            att_feat = att_feat.view(1, num_anc)
            att_x = self.sm2(att_feat).detach()#【1，125】置信分数
            #--------至此与forward一致------

            if not first:#不是第一帧
                att_choose = torch.argmax(att_x.view(-1))#最大置信分数的idx
            else:
                att_choose = Variable(torch.from_numpy(np.array([62])).long()).cuda().view(-1)#tensor[62]，中间idx

            
            scale_anc = scale.view(1, 1, 3).repeat(1, num_anc, 1)#【1，125，1】
            output_anchor = (output_anchor * scale_anc)#反归一化，锚点原本坐标

            scale_kp = scale.view(1, 1, 3).repeat(1, self.num_key, 1)#【1，8，1】
            kp_x = kp_x.view(1, num_anc, 3*self.num_key).detach()#【1，125，24】
            kp_x = (kp_x[:, att_choose, :].view(1, self.num_key, 3) + tmp_add_on_key).detach()
            #对非第一帧，取置信分最大的锚点对应的8个关键点
            #对第一帧，不妨取中间锚点为目标锚点

            kp_x = kp_x * scale_kp#【1，8，3】反归一化，keypoints的实际坐标

            all_kp_x.append(copy.deepcopy(kp_x.detach()))
            all_att_choose.append(copy.deepcopy(att_choose.detach()))