【3D目标检测】[看code] VoxelRCNN RPN（一）

Hatake卡卡龙

已于 2022-11-21 14:42:59 修改

阅读量1.1k

点赞数

文章标签：目标检测人工智能

于 2022-11-19 17:59:24 首次发布

本文链接：https://blog.csdn.net/weixin_48453503/article/details/127935025

版权

发现最新的一些3d检测的paper都是基于pvrcnn和voxelrcnn的，不得不对这两个网络的源码过一遍了。

CODEBASE 基于 OpenPCDet 0.5.2

先来看VoxelRCNN配置文件。

VFE, BACKBONE_3D, MAP_TO_BEV, BACKBONE_2D, DENSE_HEAD 与 SECOND所用的一致。也就是说用AnchorHEADSingle来当一阶段的Head。接下来直接来看看这个DENSE_HEAD吧。

找到AnchorHeadSingle的py文件路径是：

OpenPCDet/pcdet/models/dense_heads/anchor_head_single.py

先来看初始化。

class AnchorHeadSingle(AnchorHeadTemplate):
    def __init__(self, model_cfg, input_channels, num_class, class_names, grid_size, point_cloud_range,
                 predict_boxes_when_training=True, **kwargs):
        super().__init__(
            model_cfg=model_cfg, num_class=num_class, class_names=class_names, grid_size=grid_size, point_cloud_range=point_cloud_range,
            predict_boxes_when_training=predict_boxes_when_training
        )

        self.num_anchors_per_location = sum(self.num_anchors_per_location)

        self.conv_cls = nn.Conv2d(
            input_channels, self.num_anchors_per_location * self.num_class,
            kernel_size=1
        )
        self.conv_box = nn.Conv2d(
            input_channels, self.num_anchors_per_location * self.box_coder.code_size,
            kernel_size=1
        )

        if self.model_cfg.get('USE_DIRECTION_CLASSIFIER', None) is not None:
            self.conv_dir_cls = nn.Conv2d(
                input_channels,
                self.num_anchors_per_location * self.model_cfg.NUM_DIR_BINS,
                kernel_size=1
            )
        else:
            self.conv_dir_cls = None
        self.init_weights()

    def init_weights(self):
        pi = 0.01
        nn.init.constant_(self.conv_cls.bias, -np.log((1 - pi) / pi))
        nn.init.normal_(self.conv_box.weight, mean=0, std=0.001)

继承自 AnchorHeadTemplate。这个Template主要初始化了一些AnchorHead的通用功能（例如：生成Anchor，初始化targer_assigner，box_coder）。#TODO 这三个模块的解析

然后是AnchorHeadSingle的初始化，这个比较简单，在init里可以看到只是初始化了一些nn.Conv2d 用来做分类和回归的预测。

打印一下看看。

(dense_head): AnchorHeadSingle(
(cls_loss_func): SigmoidFocalClassificationLoss()
(reg_loss_func): WeightedSmoothL1Loss()
(dir_loss_func): WeightedCrossEntropyLoss()
(conv_cls): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1))
(conv_box): Conv2d(256, 14, kernel_size=(1, 1), stride=(1, 1))
(conv_dir_cls): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1))
)

解释下conv2d的in out channels：

# 初始化得到的每个feature map体素的anchor数量为2，也就是feature map的像素上有2个anchor。

self.num_anchors_per_location = sum(self.num_anchors_per_location)

conv_cls 用来对每个anchor的分类置信度进行分类预测，把256通道的特征图变为2通道。

conv_box 用来对每个anchor的偏移量进行回归预测（anchor-based的方法预测的是anchor与真实框的偏移量），这14通道表示2个anchor的偏移量（一个3D bounding box 由7个参数决定 x, y, z, dx, dy, dz, heading）

conv_dir_cls 对方向进行预测（SECOND 中提出对方向先进行分类再加上偏向角）。

再来看forward

    def forward(self, data_dict):
        spatial_features_2d = data_dict['spatial_features_2d']

        cls_preds = self.conv_cls(spatial_features_2d)
        box_preds = self.conv_box(spatial_features_2d)

        cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous()  # [N, H, W, C]
        box_preds = box_preds.permute(0, 2, 3, 1).contiguous()  # [N, H, W, C]

        self.forward_ret_dict['cls_preds'] = cls_preds
        self.forward_ret_dict['box_preds'] = box_preds

        if self.conv_dir_cls is not None:
            dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
            dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
            self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
        else:
            dir_cls_preds = None

        if self.training:
            targets_dict = self.assign_targets(
                gt_boxes=data_dict['gt_boxes']
            )
            self.forward_ret_dict.update(targets_dict)

        if not self.training or self.predict_boxes_when_training:
            batch_cls_preds, batch_box_preds = self.generate_predicted_boxes(
                batch_size=data_dict['batch_size'],
                cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
            )
            data_dict['batch_cls_preds'] = batch_cls_preds
            data_dict['batch_box_preds'] = batch_box_preds
            data_dict['cls_preds_normalized'] = False

        return data_dict

先来看看传进来的data_dict包含什么

(Pdb) data_dict.keys()
dict_keys(['frame_id', 'gt_boxes', 'points', 'points_before_aug', 'flip_x', 'noise_rot', 'noise_scale', 'calib', 'use_lead_xyz', 'voxels', 'voxel_coords', 'voxel_num_points', 'image_shape', 'batch_size', 'voxel_features', 'encoded_spconv_tensor', 'encoded_spconv_tensor_stride', 'multi_scale_3d_features', 'multi_scale_3d_strides', 'spatial_features', 'spatial_features_stride', 'spatial_features_2d'])

spatial_features : 3D backbone输出的特征图

spatial_features_stride : 3D backbone 下采样的步长(倍率)

spatial_features_2d : 2d backbone 输出的特征图 (也就是head要用的bev特征图)

看下这个bev feature map的shape

(Pdb) spatial_features_2d.shape
torch.Size([2, 256, 200, 176])

是一个256通道的2d feature

对应上面说的3个1x1conv 的输入通道

        cls_preds = self.conv_cls(spatial_features_2d)
        box_preds = self.conv_box(spatial_features_2d)

        cls_preds = cls_preds.permute(0, 2, 3, 1).contiguous()  # [N, H, W, C]
        box_preds = box_preds.permute(0, 2, 3, 1).contiguous()  # [N, H, W, C]

        self.forward_ret_dict['cls_preds'] = cls_preds
        self.forward_ret_dict['box_preds'] = box_preds

        if self.conv_dir_cls is not None:
            dir_cls_preds = self.conv_dir_cls(spatial_features_2d)
            dir_cls_preds = dir_cls_preds.permute(0, 2, 3, 1).contiguous()
            self.forward_ret_dict['dir_cls_preds'] = dir_cls_preds
        else:
            dir_cls_preds = None

看下这 3 个 1x1 conv 的输出。

(Pdb) cls_preds.shape, box_preds.shape, dir_cls_preds.shape
(torch.Size([2, 200, 176, 2]), torch.Size([2, 200, 176, 14]), torch.Size([2, 200, 176, 4]))

接下来要为每个Anchor匹配gt box（注意是原始Anchor 不是Anchor+预测值）

        if self.training:
            targets_dict = self.assign_targets(
                gt_boxes=data_dict['gt_boxes']
            )
            self.forward_ret_dict.update(targets_dict)

(Pdb) targets_dict['box_cls_labels'].shape
torch.Size([2, 70400])
(Pdb) targets_dict['box_reg_targets'].shape
torch.Size([2, 70400, 7])
(Pdb) targets_dict['reg_weights'].shape
torch.Size([2, 70400])

2是batch_size 70400是Anchor的总数量

不过此时大部分值都是0，只有匹配到gt的anchor才有值

(Pdb) targets_dict['box_reg_targets']
tensor([[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],

[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]], device='cuda:0')
(Pdb) targets_dict['box_cls_labels'].nonzero().shape
torch.Size([508, 2])

接下来是生成预测框

        if not self.training or self.predict_boxes_when_training:
            batch_cls_preds, batch_box_preds = self.generate_predicted_boxes( # 未看
                batch_size=data_dict['batch_size'],
                cls_preds=cls_preds, box_preds=box_preds, dir_cls_preds=dir_cls_preds
            )
            data_dict['batch_cls_preds'] = batch_cls_preds
            data_dict['batch_box_preds'] = batch_box_preds # [2, 70400, 7]
            data_dict['cls_preds_normalized'] = False

(Pdb) batch_cls_preds.shape
torch.Size([2, 70400, 1])
(Pdb) batch_box_preds.shape
torch.Size([2, 70400, 7])

至此一阶段HEAD的forward 就结束了。

Hatake卡卡龙

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫