详解anchor_target_3d

最新推荐文章于 2024-04-16 13:29:06 发布

大头蘑菇汤

最新推荐文章于 2024-04-16 13:29:06 发布

阅读量455

点赞数 1

分类专栏：三维识别文章标签：深度学习目标检测 pytorch

本文链接：https://blog.csdn.net/ll594282475/article/details/121357547

版权

三维识别专栏收录该内容

8 篇文章 0 订阅

订阅专栏

最近想研究一下3d iou loss，原理很简单，但实现过程有些看不太懂，故梳理一下。

在mmdetection3d中，iou应该由预测3dbbox和ground truth 3d bbox来进行计算
在程序实现手上，由于需要前后传播，应该在pts_bbox_head中实现。
需要提前知道的是anchors = torch.Size([1, 200, 176, 3, 2, 7])
200x176x3是anchor的大小，2是方向，7是bbox参数

        # regression loss
        #torch.Size([1, 42, 200, 176])->[211200, 7]
        bbox_pred = bbox_pred.permute(0, 2, 3, 1).reshape(-1, self.box_code_size)
        bbox_targets = bbox_targets.reshape(-1, self.box_code_size)
        bbox_weights = bbox_weights.reshape(-1, self.box_code_size)

        pos_inds = ((labels >= 0)
                    & (labels < bg_class_ind)).nonzero(
                        as_tuple=False).reshape(-1)
		#这里将211200(200x176X6?)个voxel的索引找到，label = 0，1，2

接下来按索引提取出预测和真值的数据，

        pos_bbox_pred = bbox_pred[pos_inds]
        pos_bbox_targets = bbox_targets[pos_inds]
        pos_bbox_weights = bbox_weights[pos_inds]

这里我主要想弄清楚这两者之间为什么可以算出bbox，以此推出iou的计算方法

在推测最后结果的代码中：

            # torch.Size([42, 200, 176]) ->[211200, 7]
            bbox_pred = bbox_pred.permute(1, 2, 0).reshape(-1, self.box_code_size)

            nms_pre = cfg.get('nms_pre', -1)
            if nms_pre > 0 and scores.shape[0] > nms_pre:
                if self.use_sigmoid_cls:
                    max_scores, _ = scores.max(dim=1)
                else:
                    max_scores, _ = scores[:, :-1].max(dim=1)
                _, topk_inds = max_scores.topk(nms_pre)
                #取100个最大分数的预测值
                anchors = anchors[topk_inds, :]
                bbox_pred = bbox_pred[topk_inds, :]
                scores = scores[topk_inds, :]
                dir_cls_score = dir_cls_score[topk_inds]

得到预测矩阵，解析为预测框的值

            bboxes = self.bbox_coder.decode(anchors, bbox_pred)

接着

        mlvl_bboxes_for_nms = xywhr2xyxyr(input_meta['box_type_3d'](mlvl_bboxes, box_dim=self.box_code_size).bev)

计算nms之前会将xyzwhlr转换为xyxyr

18, 200, 176]
[42, 200, 176]
[12, 200, 176]
torch.Size([211200, 7])
input_meta
rescale = true

这里bbox_target-torch.Size([70400, 7])，是一个初始为0的矩阵，前者表示anchor的分布200x176x2，后者表示7个维度的偏移值，由self.bbox_coder.encode计算得出 pos_gt_bboxes - pos_bboxes
最后得到211200维度的target是因为有3个大小不同的self.bbox_assigner(对于三个类的iou不同)：
min_pos_iou:0.2
neg_iou_thr:0.2
pos_iou_thr:0.35

我们在对比预测框和真实框发现：
真实框：
tensor([[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369],
[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369],
[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369],
[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369],
[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369],
[ 2.3841, 4.0668, -1.6432, 1.6776, 4.0138, 1.7290, 1.3369],
[15.1863, 7.3920, -1.5572, 1.7496, 4.5181, 1.5129, 1.2769],
[15.1863, 7.3920, -1.5572, 1.7496, 4.5181, 1.5129, 1.2769],
[15.1863, 7.3920, -1.5572, 1.7496, 4.5181, 1.5129, 1.2769],
[15.1863, 7.3920, -1.5572, 1.7496, 4.5181, 1.5129, 1.2769],
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469],
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469],
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469],
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469],
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469],
[44.4734, 24.8744, -1.9314, 1.8011, 3.4889, 1.6981, 0.9469]],
device=‘cuda:0’)
预测框：
tensor([[ 2.0114, 3.8191, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[ 2.4137, 3.8191, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[ 2.8160, 3.8191, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[ 2.0114, 4.2211, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[ 2.4137, 4.2211, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[ 2.8160, 4.2211, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[14.4823, 7.4372, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[14.8846, 7.4372, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[15.2869, 7.4372, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[15.6891, 7.4372, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[43.8491, 24.7236, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[44.2514, 24.7236, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[44.6537, 24.7236, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[45.0560, 24.7236, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[44.2514, 25.1256, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700],
[44.6537, 25.1256, -1.7800, 1.6000, 3.9000, 1.5600, 1.5700]],
device=‘cuda:0’)
真实框每个都是真实的框的大小
而预测框代表每个都是单独anchor的大小
在计算iou时：
iou_bbox_pred_xyz
tensor([[ 16.5084, -12.0242, -1.4971, 1.6237, 4.0466, 1.4356, 0.2724],
[ 16.5435, -11.8211, -1.4988, 1.6494, 4.3110, 1.4305, 0.2859],
[ 16.5761, -11.8400, -1.5100, 1.6789, 4.3495, 1.4355, 0.2741],
[ 16.5754, -11.8323, -1.4928, 1.6733, 4.3177, 1.4270, 0.2728],
[ 16.5152, -11.7441, -1.4888, 1.6896, 4.3513, 1.4431, 0.2851],
[ 14.6430, -7.8684, -1.5783, 0.6639, 0.9870, 1.8041, -1.4695],
[ 14.6432, -7.8733, -1.5769, 0.6611, 0.9864, 1.8070, 1.6531],
[ 14.6598, -7.8735, -1.5899, 0.6282, 1.0187, 1.8009, 1.6100],
[ 15.4246, -6.0949, -1.8054, 0.6283, 1.7626, 1.8466, 1.8692],
[ 15.3690, -6.0648, -1.7394, 0.5986, 1.7214, 1.8281, 1.8346],
[ 15.4138, -6.1166, -1.7065, 0.5922, 1.7588, 1.7961, 1.8254],
[ 15.8956, -2.2017, -1.6940, 0.6018, 1.8273, 1.7625, 1.9136],
[ 15.9363, -2.1736, -1.7032, 0.5824, 1.7944, 1.7932, 1.8970],
[ 15.9095, -2.1962, -1.7189, 0.5758, 1.7861, 1.7966, 1.8979],
[ 15.8770, -2.1996, -1.7384, 0.6050, 1.7874, 1.7796, 1.8670]],
device=‘cuda:0’, grad_fn=)
iou_bbox_targets_xyz：
tensor([[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 16.5635, -11.8167, -1.4796, 1.6468, 4.4654, 1.4245, 0.3080],
[ 14.5729, -7.8654, -1.5648, 0.5354, 0.8890, 1.8387, -1.2720],
[ 14.5729, -7.8654, -1.5648, 0.5354, 0.8890, 1.8387, -1.2720],
[ 14.5729, -7.8654, -1.5648, 0.5354, 0.8890, 1.8387, -1.2720],
[ 15.2363, -6.1369, -1.7029, 0.5860, 1.7175, 1.8286, -1.2620],
[ 15.2363, -6.1369, -1.7029, 0.5860, 1.7175, 1.8286, -1.2620],
[ 15.2363, -6.1369, -1.7029, 0.5860, 1.7175, 1.8286, -1.2620],
[ 15.9368, -2.1525, -1.6482, 0.6163, 1.7175, 1.7478, 1.8680],
[ 15.9368, -2.1525, -1.6482, 0.6163, 1.7175, 1.7478, 1.8680],
[ 15.9368, -2.1525, -1.6482, 0.6163, 1.7175, 1.7478, 1.8680],
[ 15.9368, -2.1525, -1.6482, 0.6163, 1.7175, 1.7478, 1.8680]],
device=‘cuda:0’)

大头蘑菇汤

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
详解anchor_target_3d

最近想研究一下3d iou loss，原理很简单，但实现过程有些看不太懂，故梳理一下。在mmdetection3d中，iou应该由预测3dbbox和ground truth 3d bbox来进行计算在程序实现手上，由于需要前后传播，应该在pts_bbox_head中实现。需要提前知道的是anchors = torch.Size([1, 200, 176, 3, 2, 7])200x176x3是anchor的大小，2是方向，7是bbox参数 # regression loss
复制链接

扫一扫