代码复现: VoxelNet论文和代码解析 pytorch版本

上一篇翻译并简单理解了VoxelNet的论文
趁热打铁,跑通模型,研读代码。
具体每个文件和训练代码的解读到这里
代码来自GitHub上的pytorch版本,感谢大佬。

环境部署:

RTX3060 |ubuntu20.04 |python 3.7 | torch-gpu 1.11.0 |cudatoolkit 11.3.1| gpu driver 11.7|

1.VoxelNet的网络结构

顾名思义,VoxelNet就是将3D点云数据看做一个个的Voxel(立体块)进行处理。总的来说,VoxelNet的网络结构分为三部分,分别为(1)特征学习网络(2)中部卷积层(3)RPN层,如下图所示
在这里插入图片描述整体网络模型:

class VoxelNet(nn.Module):

    def __init__(self):
        super(VoxelNet, self).__init__()
        self.svfe = SVFE()
        self.cml = CML()
        self.rpn = RPN()

    def voxel_indexing(self, sparse_features, coords):
        dim = sparse_features.shape[-1]
        dense_feature = torch.zeros(cfg.N, cfg.D, cfg.H, cfg.W, dim).to(cfg.device)
        dense_feature[coords[:,0], coords[:,1], coords[:,2], coords[:,3], :]= sparse_features
        return dense_feature.permute(0, 4, 1, 2, 3)

    def forward(self, voxel_features, voxel_coords):
        # feature learning network
        vwfs = self.svfe(voxel_features)
        vwfs = self.voxel_indexing(vwfs,voxel_coords)

        # convolutional middle network
        cml_out = self.cml(vwfs)
        # region proposal network
        # merge the depth and feature dim into one, output probability score map and regression map
        score, reg = self.rpn(cml_out.reshape(cfg.N,-1,cfg.H, cfg.W))
        score = torch.sigmoid(score)
        score = score.permute((0, 2, 3, 1))

        return score, reg

堆叠VFE模块:

# Stacked Voxel Feature Encoding
class SVFE(nn.Module):

    def __init__(self):
        super(SVFE, self).__init__()
        self.vfe_1 = VFE(7,32) #VFE-1
        self.vfe_2 = VFE(32,128) #VFE-2
        self.fcn = FCN(128,128) #全连接层
    def forward(self, x):
        mask = torch.ne(torch.max(x,2)[0], 0) # 滤掉为零的点
        x = self.vfe_1(x, mask)
        x = self.vfe_2(x, mask)
        x = self.fcn(x)
        # element-wise max pooling
        x = torch.max(x,1)[0]
        return x

VFE模块:

class VFE(nn.Module):

    def __init__(self,cin,cout):
        super(VFE, self).__init__()
        assert cout % 2 == 0
        self.units = cout // 2
        self.fcn = FCN(cin,self.units)

    def forward(self, x, mask):
        # point-wise feauture
        pwf = self.fcn(x) #fcn 将 point 转换 到 voxel 特征空间
        #locally aggregated feature
        laf = torch.max(pwf,1)[0].unsqueeze(1).repeat(1,cfg.T,1)
        # point-wise concat feature
        pwcf = torch.cat((pwf,laf),dim=2)
        # apply mask
        mask = mask.unsqueeze(2).repeat(1, 1, self.units * 2)
        pwcf = pwcf * mask.float()

        return pwcf

其中, f o r w a r d ( s e l f , x , m a s k ) forward(self, x, mask) forward(self,x,mask)中的mask是GPU加速部分描述的绿色部分,如图:
在这里插入图片描述

FCN层:

class FCN(nn.Module):

    def __init__(self,cin,cout):
        super(FCN, self).__init__()
        self.cout = cout
        self.linear = nn.Linear(cin, cout)
        self.bn = nn.BatchNorm1d(cout)

    def forward(self,x):
        # KK is the stacked k across batch
        kk, t, _ = x.shape
        x = self.linear(x.view(kk*t,-1))
        x = F.relu(self.bn(x))
        return x.view(kk,t,-1)

然后是中间的CML层:

class CML(nn.Module):
    def __init__(self):
        super(CML, self).__init__()
        self.conv3d_1 = Conv3d(128, 64, 3, s=(2, 1, 1), p=(1, 1, 1))
        self.conv3d_2 = Conv3d(64, 64, 3, s=(1, 1, 1), p=(0, 1, 1))
        self.conv3d_3 = Conv3d(64, 64, 3, s=(2, 1, 1), p=(1, 1, 1))

    def forward(self, x):
        x = self.conv3d_1(x)
        x = self.conv3d_2(x)
        x = self.conv3d_3(x)
        return x

就是简单的堆叠3D卷积块,不断提取深层特征。

最后是作者修改过的RPN网络,实现端到端的检测,有点像yolo,但在loss部分依然是Faster RCNN的思路

class RPN(nn.Module):
    def __init__(self):
        super(RPN, self).__init__()
        self.block_1 = [Conv2d(128, 128, 3, 2, 1)]
        self.block_1 += [Conv2d(128, 128, 3, 1, 1) for _ in range(3)]
        self.block_1 = nn.Sequential(*self.block_1)

        self.block_2 = [Conv2d(128, 128, 3, 2, 1)]
        self.block_2 += [Conv2d(128, 128, 3, 1, 1) for _ in range(5)]
        self.block_2 = nn.Sequential(*self.block_2)

        self.block_3 = [Conv2d(128, 256, 3, 2, 1)]
        self.block_3 += [nn.Conv2d(256, 256, 3, 1, 1) for _ in range(5)]
        self.block_3 = nn.Sequential(*self.block_3)

        self.deconv_1 = nn.Sequential(nn.ConvTranspose2d(256, 256, 4, 4, 0),nn.BatchNorm2d(256))
        self.deconv_2 = nn.Sequential(nn.ConvTranspose2d(128, 256, 2, 2, 0),nn.BatchNorm2d(256))
        self.deconv_3 = nn.Sequential(nn.ConvTranspose2d(128, 256, 1, 1, 0),nn.BatchNorm2d(256))

        self.score_head = Conv2d(768, cfg.anchors_per_position, 1, 1, 0, activation=False, batch_norm=False)
        self.reg_head = Conv2d(768, 7 * cfg.anchors_per_position, 1, 1, 0, activation=False, batch_norm=False)

    def forward(self,x):
        x = self.block_1(x)
        x_skip_1 = x
        x = self.block_2(x)
        x_skip_2 = x
        x = self.block_3(x)
        x_0 = self.deconv_1(x)
        x_1 = self.deconv_2(x_skip_2)
        x_2 = self.deconv_3(x_skip_1)
        x = torch.cat((x_0,x_1,x_2), dim = 1)
        return self.score_head(x), self.reg_head(x)

其中,比较不好理解的是

        self.score_head = Conv2d(768, cfg.anchors_per_position, 1, 1, 0, activation=False, batch_norm=False)
        self.reg_head = Conv2d(768, 7 * cfg.anchors_per_position, 1, 1, 0, activation=False, batch_norm=False)

score_head:置信度,reg_head:预测框位置的偏移量。这里,可以看到,score_head和reg_head的计算差不多,都是卷积产生的结果,体现了端到端的思路。所以,在reg_head有第二维有

7 * cfg.anchors_per_position=14

个参数,代表了2个anchor的偏移量,分别是0和90度。还有,RPN网络,输出的不是物体的坐标,而是在输出固定好的anchor与GT的偏移量。

这里涉及到了anchor的生成,可以理解成,voxelnet 这个网络从一开始就手动的设定好了要生成多少个anchor,以及anchor的大小和位置,不管输入的点云是啥,anchor全都一样。这样,在后续RPN阶段和整个网络学习推理过程,只需要计算anchor与GT的偏移量就行。

评论 28
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值