跟李沐学AI—pytorch锚框代码解析—2

最新推荐文章于 2024-07-14 16:12:48 发布

orient2019

最新推荐文章于 2024-07-14 16:12:48 发布

阅读量739

点赞数 3

分类专栏：目标检测深度学习机器学习笔记文章标签： pytorch 深度学习

本文链接：https://blog.csdn.net/qq_34992900/article/details/120724341

版权

深度学习同时被 3 个专栏收录

17 篇文章 4 订阅

订阅专栏

机器学习笔记

17 篇文章 0 订阅

订阅专栏

目标检测

4 篇文章 1 订阅

订阅专栏

本文详细解析了目标检测模型训练中锚框（anchor box）与真实边界框的匹配过程，包括基于IoU的最大值匹配策略和偏移量的计算。介绍了`assign_anchor_to_bbox`函数实现的匹配逻辑，以及`offset_boxes`函数用于计算锚框相对于真实框的偏移。同时，阐述了如何逐样本进行类别匹配和计算偏移量，为后续模型训练提供关键输入数据。

摘要由CSDN通过智能技术生成

跟李沐学AI–锚框代码解析–2

锚框的实际应用

在训练集中，每一个锚框为一个训练样本，为了训练目标检测模型，需要进行两步对应：
- 基于IoU将边界框分配给最接近锚框
- 锚框与类的对应关系
- 锚框与位置的对应关系，计算偏移量offset

将最接近的真实边界框分配给锚框

若锚框为 $A_1,A_2,A_3,\dots,A_n$ ，真实边框为 $B_1,B_2,B_3,\dots,B_m$ ，其中锚框数量大于真实边框数量，计算每一个锚框与每一个真实边框的IoU值，就变成了一个矩阵IoU_mat，形状为【n,m】，之后在矩阵上进行操作，确定真实的边框值，步骤如下
- 1. 找出 IoU_mat 的最大值，左边为 $i_{max}, j_{max})$ , 删除第 $i_{max}$ 行的所有数据，并删除第 $j_{max}$ 列所有数据
- 1. 在剩余IoU_mat中寻找最大值，重复上述操作，直到选取了与真实框数量相等的 $m$ 个锚框；
- 1. 之后最后一列还剩余最后一列的一些锚框的没办法用上述方式删除掉，要将剩余矩阵格点对应的锚框分别与真实框匹配计算IoU值，这里剩余的锚框为： $A_1,A_3,A_4,A_6,A_8$ 当大于一定阈值时则匹配
- 上述过程通过以下函数完成：
- ```
  
  def assign_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5):
      """将最接近的真实边界框分配给锚框。"""
      num_anchors, num_gt_boxes = anchors.shape[0], ground_truth.shape[0]
      # 位于第i行和第j列的元素 x_ij 是锚框i和真实边界框j的IoU
      jaccard = box_iou(anchors, ground_truth)
      # 对于每个锚框，分配的真实边界框的张量
      anchors_bbox_map = torch.full((num_anchors,), -1, dtype=torch.long,
                                    device=device)
      # 根据阈值，决定是否分配真实边界框
      max_ious, indices = torch.max(jaccard, dim=1)
      '''找出每一行的最大值，并返回列坐标'''
      anc_i = torch.nonzero(max_ious >= 0.5).reshape(-1)
      '''找出锚框对应最大值中IoU值大于阈值0.5，返回列下标'''
      box_j = indices[max_ious >= 0.5]
      '''将阈值大于0.5的与 真实边框的编号对应（也就是jaccard的列号）'''
      anchors_bbox_map[anc_i] = box_j
      col_discard = torch.full((num_anchors,), -1)
      row_discard = torch.full((num_gt_boxes,), -1)
      for _ in range(num_gt_boxes):
          max_idx = torch.argmax(jaccard)
          box_idx = (max_idx % num_gt_boxes).long()
          anc_idx = (max_idx / num_gt_boxes).long()
          ''' 将真实框与anchors匹配，放在映射关系矩阵关系的对应位置'''
          '''删除不改变矩阵形状，将行列全部变成-1'''
          anchors_bbox_map[anc_idx] = box_idx
          jaccard[:, box_idx] = col_discard
          jaccard[anc_idx, :] = row_discard
      return anchors_bbox_map
```
- 上述函数在我自己学习的过程中主要有以下几块卡住了：
  - max_ious, indices = torch.max(jaccard, dim=1)
  - 这里是得到每一行的最大值，返回的为一维tensor
  - indices为每一行的最大值所在列，返回的为一维tensor
  - ```
  jaccard = torch.randint(0, 10, size=(5,3))/10
  max_ious, indices = torch.max(jaccard, dim=1)
  
  outpyt:
  print(jaccard)
  '''
    tensor([[0.30, 0.10, 0.80],
            [0.90, 0.30, 0.00],
            [0.30, 0.00, 0.90],
            [0.40, 0.50, 0.10],
            [0.00, 0.20, 0.30]])
   '''
  print(indices)
    '''
    tensor([2, 0, 2, 1, 2])
    '''
  print(max_ious)
    '''
    tensor([0.80, 0.90, 0.90, 0.50, 0.30])
    '''
```
- max_idx = torch.argmax(jaccard)
- 在torch中argmax选取最大值，会先将矩阵拉平，之后找出最大值在拉平后矩阵中的下标，这个代码与后边的代码相呼应，num_gt_boxes为jaccard矩阵的列数，则取余数（%）可以得到列数，取整除数可以得到行数（除法后取整）
- ```
box_idx = (max_idx % num_gt_boxes).long()
anc_idx = (max_idx / num_gt_boxes).long()
```

  x = torch.randint(1,20, size=(4,5))
  print(x)
  '''
  tensor([[17,  8, 13,  8,  5],
      [18, 17,  5,  5, 14],
      [17,  7, 14, 11, 10],
      [ 3, 10,  9, 14,  7]])
  '''
  print(torch.argmax(x))
  '''
  拉平后下标为5
  tensor(5)
  '''
  print((torch.argmax(x) / 5).long())
  '''
  给出行数
  tensor(1)
  '''
  print((torch.argmax(x) % 5).long()) 
  '''
  给出列数
  tensor(0)
  '''

标记类和偏移

当锚框 $A$ 分配给真实边界框 $B$ 时
- 锚框 $A$ 被标记为与 $B$ 相同的类
- 锚框 $A$ 的偏移量根据 $B$ 和 $A$ 中心左标的相对位置，以及两个框的相对大小进行标记：
  - 由于数据集中不同框的位置和大小不同，我们可以对这些相对位置和大小应用变换，使其更加均匀，更适用于偏移量，锚框A的偏移量有如下公式计算（基于边框中心点计算）：
  - 其中 $x_a, y_a)$ 和 $x_b,y_b)$ ，宽度分别为 $w_a$ 和 $w_b$ ，高度分别为 $h_a$ 和 $h_b$
  - 这里 $\mu_x=\mu_y=\mu_w=\mu_h=0.1$ 和 $\sigma_w=\sigma_h=0.2$
  - 若一个锚框没有被分配真是边界框，只需要将锚框的类标记为"背景类"，背景类通常称为负类锚框，其余为正类锚框。
- 代码如下：
- ```
def offset_boxes(anchors, assigned_bb, eps = 1e-6):
    c_anc = box_concer_to_center(anchors)
    c_assigned_bb = box_concer_to_center(assigned_bb)
    offset_xy = 10 * (c_assigned_bb[:,:2] - c_anc[:, :2]) / c_anc[:,2:]
    offset_wh = 5 * torch.log(eps + c_assigned_bb[:, 2:]) / c_anc[:, 2:])
    offset = torch.cat([offset_xy, offset_wh], axis=1)
    return offset
```
- 完成对偏移量的计算后，需要将真实框分配给锚框，并计算锚框与不同类真实框的偏移，函数如下：
- ```
   def multibox_target(anchors, labels):
       '''
       agrs:
           anchors: tensor
               锚框  [batch_size, anchors, 4]
           labels: tensor
               真实框 [batch_size, class_num, 5]
       '''
       batch_size, anchors = labels.shape[0], anchors.squeeze(0)
       ''' 这里anchors 进行squeeze是剪掉第一个维度,也就是样本量，变成[anchors_num, 4]
           而labels的维度为[1, class_num, 5], [class_label, Frame coordinate:4]'''
       batch_offset, batch_mask, batch_class_labels = [], [], []
       device, num_anchors = anchors.device, anchors.shape[0]
       for i in range(batch_size):
           label = labels[i, :, :]
           '''返回的为锚框与真实框的对应矩阵，-1为无对应'''
           anchors_bbox_map = assign_anchor_to_bbox(label[:, 1:], anchors, device)
           bbox_mask = ((anchors_bbox_map >= 0).float().unsqueeze(-1)).repeat(1,4)
           '''将锚框的映射map按列重复，这样可以在之后对偏移量对应相乘，
              不对应边框偏移量直接归零'''
           class_labels = torch.zeros(
                           num_anchors, dtype = torch.long, device=device)
           assigned_bb = torch.zeros(
                           (num_anchors, 4), dtype=torch.float32, device=device)
           # 使用真是边界框来标记t锚框的类别
           # 若一个锚框没有被分配，则标记为北京（值为零）
           import pdb;pdb.set_trace()
           indices_true = torch.nonzero(anchors_bbox_map >= 0)
           bb_idx = anchors_bbox_map[indices_true]
           '''bb_idx: 返回的为可以对应锚框对应的真实边框的编号'''
           class_labels[indices_true] = label[bb_idx, 0].long() + 1
           '''class_labels: 给予锚框对应的类别赋值，若为背景则赋值为0
              assigned_bb：给予锚框真实边框的对应位置 '''
           assigned_bb[indices_true] = label[bb_idx, 1:]
           '''将真实边框与锚框所处位置相对应，无锚框为0
              使用元素对应相乘'''
           offset = offset_boxes(anchors, assigned_bb) * bbox_mask
           '''计算存在对应关系锚框与真实边框的偏移量，anchors与assigned_bb形状一致，
              为[anchors_num, 4], bbox_mask, 存在对应关系则整行为1，否则为0'''
           batch_offset.append(offset.reshape(-1))
           batch_mask.append(bbox_mask.reshape(-1))
           batch_class_labels.append(class_labels)
       bbox_offset = torch.stack(batch_offset)
       bbox_mask = torch.stack(batch_mask)
       class_labels = torch.stack(batch_class_labels)
       return (bbox_offset, bbox_mask, class_labels)
```
该程序的主体思路如下：逐样本进行锚框的匹配和类别匹配
- a. 提取单样本的真实框
- b. 通过IoU矩阵得到锚框与真实框的对应关系，及锚框分别对应的哪一个真实框，返回 anchors_bbox_map为一维tensor，长度为anchors_num，其中如存在对应关系则显示真实框标号（从零开始），若无则为-1
- c. 将真实框对应类和锚框匹配

程序细节如下：

 '''步骤了可以拆解为三步
   找出 anchors_bbox_map中大于0的值，转为浮点数
   增加一个维度，在第二维度上重复四次
   最终得到bbox mask---> [anchors_num, 4]'''
 bbox_mask = ((anchors_bbox_map >= 0).float().unsqueeze(-1)).repeat(1, 4)

indices_true计算使用了torch.nonzero函数

返回的为tensor非零数据的坐标

 indices_true = torch.nonzero(anchors_bbox_map >= 0)
 '''
 anchors_bbox_map: 
     tensor([-1,  0,  1, -1,  1])
 indices_true: 
     tensor([[1], [2], [4]])
 '''

总结：

这部分主要总结了三个部分的内容：
- 1. 如何利用IoU矩阵–jaccard进行建立锚框与真实边框的对应关系，返回的为一维tensor，长度为 anchors_num, 分别是每个锚框对应的真实锚框编号，若无编号，则为-1
- 1. 计算锚框对应于真实框的偏移量，首先要对锚框进行中心左边转换，根据公式计算锚框相对于真实框的偏移量
- 1. 逐样本将锚框与真实框的类别对应，并计算偏移量，进行储存