深度学习|2D目标检测|锚框标注|gt_bbox、类别和偏移量分配

最新推荐文章于 2025-04-22 11:03:24 发布

Lily_Mei

最新推荐文章于 2025-04-22 11:03:24 发布

阅读量1.4k

点赞数 23

文章标签：深度学习目标检测人工智能

本文链接：https://blog.csdn.net/m0_57527624/article/details/134630752

版权

本文参考李沐老师的《动手学深度学习》课程整理，对课程提供的代码进行解释和注释，当作备忘。原课程链接：13.4. 锚框 — 动手学深度学习 2.0.0 documentation (d2l.ai)

一、将真实边界框（gt_bbox）分配给anchor

在上一步（深度学习|2D目标检测|锚框实现和IoU计算-CSDN博客）我们已经可以实现一个由anchor索引作为行（i），gt_bbox作为列（j），IoU作为值的矩阵，也就是课程里提到的X。

分配算法

1、在X中找到最大的IoU（和threshold比较），将对应的gt分配给anchor，比如图中的 $x_{23}$ （图源：d2l.ai）；

2、从X中去掉（即不予考虑discard）已经分配好的gt对应列和anchor对应行，图中的i=2行和j=3列；

3、重复以上2个步骤直到gt被分配完；

4、在最后一个gt分配完时，仍存在没有被分配到gt的anchor（一般情况下anchor的数量比gt多），那么此时遍历没有gt的anchor所在行，找到其中的最大值，若大于设定好的IoU_threshold则为其分配对应gt，比如i=1，3，4，6，8行的anchor。

代码实现

def assign_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5):
    """将最接近的真实边界框分配给锚框"""
    # 获取IoU矩阵的i和j数量
    num_anchors, num_gt_boxes = anchors.shape[0], ground_truth.shape[0]

    # 位于第i行和第j列的元素x_ij是锚框i和真实边界框j的IoU
    jaccard = box_iou(anchors, ground_truth)

    # 对于每个锚框，分配的真实边界框的张量
    anchors_bbox_map = torch.full((num_anchors,), -1, dtype=torch.long,
                                  device=device)  # 首先创建一个全是-1构成的anchor和gt的映射关系矩阵，大小是（i，j）

    # 根据阈值，决定是否分配真实边界框
    max_ious, indices = torch.max(jaccard, dim=1)  # 从列维度找每一列最大的IoU，返回(1*j)和对应最大值的坐标
    anc_i = torch.nonzero(max_ious >= iou_threshold).reshape(-1)  # anchor所在的每一行大于threshold的索引，并拉平成行向量
    box_j = indices[max_ious >= iou_threshold]  # gt所在的每一列大于threshold的索引
    anchors_bbox_map[anc_i] = box_j
    col_discard = torch.full((num_anchors,), -1)
    row_discard = torch.full((num_gt_boxes,), -1)
    # 遍历全部的gt列
    for _ in range(num_gt_boxes):
        max_idx = torch.argmax(jaccard)  # 求出对应的map里最大IoU对应的坐标
        box_idx = (max_idx % num_gt_boxes).long()
        anc_idx = (max_idx / num_gt_boxes).long()
        anchors_bbox_map[anc_idx] = box_idx
        jaccard[:, box_idx] = col_discard  # 去除所在行和所在列
        jaccard[anc_idx, :] = row_discard
    return anchors_bbox_map

二、标记类别和偏移量

偏移量转换

偏移量按照理解是anchor相对于gt的偏移程度（也就是不准确的程度），可以表示成向量（中心x轴偏移程度，中心y轴偏移程度，宽度偏移程度，高度偏移程度），这里也可以看出这就是我们得到每一个anchor的后4个指标的变换。

在d2l中采用如下的方式来定义偏移量，并按照一般情况设定参数值：

相对应的代码实现如下：

def offset_boxes(anchors, assigned_bb, eps=1e-6):
    """对锚框偏移量的转换"""
    c_anc = d2l.box_corner_to_center(anchors)  # d2l库中的函数，将对角坐标表示转换为中心点坐标表示
    c_assigned_bb = d2l.box_corner_to_center(assigned_bb)
    offset_xy = 10 * (c_assigned_bb[:, :2] - c_anc[:, :2]) / c_anc[:, 2:]  # 求偏移量前两项
    offset_wh = 5 * torch.log(eps + c_assigned_bb[:, 2:] / c_anc[:, 2:])  # 求偏移量后两项
    offset = torch.cat([offset_xy, offset_wh], axis=1)  # 按照列拼接，组成4维的偏移向量
    return offset

标记类别

anchor在被分配了gt后同时也有了gt的类别（正例，labels）；没有被分配到的anchor被标记为background（负例）。在这里采用background=0，每个类别依次累加1得到类被的索引和分配。

#@save
def multibox_target(anchors, labels):
    """使用真实边界框标记锚框"""
    batch_size, anchors = labels.shape[0], anchors.squeeze(0)  # label=(batch_size, number, 4)
    batch_offset, batch_mask, batch_class_labels = [], [], []
    device, num_anchors = anchors.device, anchors.shape[0]
    
    # 分配gt到anchor
    for i in range(batch_size):
        label = labels[i, :, :]
        anchors_bbox_map = assign_anchor_to_bbox(
            label[:, 1:], anchors, device)
        bbox_mask = ((anchors_bbox_map >= 0).float().unsqueeze(-1)).repeat(
            1, 4)

        # 将类标签和分配的边界框坐标初始化为零
        class_labels = torch.zeros(num_anchors, dtype=torch.long,
                                   device=device)
        assigned_bb = torch.zeros((num_anchors, 4), dtype=torch.float32,
                                  device=device)
        # 使用真实边界框来标记锚框的类别。
        # 如果一个锚框没有被分配，标记其为背景（值为零）
        indices_true = torch.nonzero(anchors_bbox_map >= 0)
        bb_idx = anchors_bbox_map[indices_true]
        class_labels[indices_true] = label[bb_idx, 0].long() + 1
        assigned_bb[indices_true] = label[bb_idx, 1:]
        # 偏移量转换
        offset = offset_boxes(anchors, assigned_bb) * bbox_mask
        batch_offset.append(offset.reshape(-1))
        batch_mask.append(bbox_mask.reshape(-1))
        batch_class_labels.append(class_labels)
    bbox_offset = torch.stack(batch_offset)
    bbox_mask = torch.stack(batch_mask)
    class_labels = torch.stack(batch_class_labels)
    return (bbox_offset, bbox_mask, class_labels)

以上即可完成对于偏移量和label的分配。同时，李沐老师还在课程中加入了一个例子来实现以上代码：13.4. 锚框 — 动手学深度学习 2.0.0 documentation (d2l.ai)

（不足之处，还请指出~）