【学习记录】锚框

regret～

已于 2024-07-27 20:54:37 修改

阅读量213

点赞数 3

文章标签：学习 python

于 2024-07-27 20:53:49 首次发布

本文链接：https://blog.csdn.net/Word_And_Me_/article/details/140739687

版权

主要解释程序代码，具体解析在代码中进行标注

锚框，具体看见网址https://zh-v2.d2l.ai/chapter_computer-vision/anchor.html#iou

对应程序解析：https://fkjkkll.github.io/2021/11/23/%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8BSSD/#more

返回的锚框变量 Y 的形状

访问以 (250, 250) 为中心的第一个锚框

根据狗和猫的真实边界框，标注这些锚框的分类和偏移量

应用逆偏移变换来返回预测的边界框坐标

以下 nms 函数按降序对置信度进行排序并返回其索引

将非极大值抑制应用于预测边界框

将上述算法应用到一个带有四个锚框的具体示例中

在图像上绘制这些预测边界框和置信度

返回结果的形状是（批量大小，锚框的数量，6）

输出由非极大值抑制保存的最终预测边界框

%matplotlib inline
import torch
from d2l import torch as d2l

torch.set_printoptions(2)  # 输出数值保存小数点后两位
'''
torch.set_printoptions 还有许多其他可用选项，下面是一些常见的参数：

precision: 设置小数点后的位数。
threshold: 总共显示的元素数量阈值。如果元素数量超过这个阈值，将使用省略号表示部分内容。
edgeitems: 当张量元素数量超过 threshold 时，显示张量两端的元素数量。
linewidth: 每行显示的字符数。
sci_mode: 是否使用科学记数法显示数字。
'''

# 函数参数：输入图像、尺寸列表和宽高比列表
def multibox_prior(data, sizes, ratios): 
    '''
    s 指的是长宽的缩放比例而非面积的缩放比例，比如 s=0.5，则面积就是原图像的 0.5^2=0.25 倍。
    r 是宽高比，指的是将原图像归一化为正方形后截取的锚框的宽高比；
    或者说是在原图像的宽高比基础上乘以 r，才是锚框的宽高比。
    '''
    '''生成以每个像素为中心具有不同形状的锚框'''
    in_height, in_width = data.shape[-2:]
    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
    boxes_per_pixel = (num_sizes + num_ratios - 1)
    size_tensor = torch.tensor(sizes, device=device)
    ratio_tensor = torch.tensor(ratios, device=device)
    
    # 为空将锚点移动到像素中心，需要设置偏移量
    # 因为一个像素的高为1且宽为1， 我们选择偏移我们的中心0.5
    offset_h, offset_w = 0.5, 0.5
    steps_h = 1.0 / in_height  # 在y轴缩放步长
    steps_w = 1.0 / in_width  # 在x轴缩放步长
    
    # 生成锚框的所有中心点
    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
    shift_y, shift_x = torch.meshgrid(center_h, center_w, indexing='ij')
    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
    
    # 生成“boxes_per_pixel”个高和宽，
    # 之后用于创建锚框的四角坐标(xmin,xmax,ymin,ymax)
    w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]),
                   sizes[0] * torch.sqrt(ratio_tensor[1:]))) * in_height / in_width
    h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),
                  sizes[0] / torch.sqrt(ratio_tensor[1:])))
    # 除以2来获得半高和半宽， 生成了每个锚框相对于中心点的偏移量，并将这些偏移量应用于图像的每个像素位置
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(in_height * in_width, 1) / 2
    '''
    0维上复制in_height * in_width次, 1维上复制1次, 即保持原样
    torch.stack((-w, -h, w, h)) 的输出形状为 (4, num_anchors)。
    转置操作 .T 将张量的形状变为 (num_anchors, 4)，其中每一行代表一个锚框的四个坐标偏移量 (xmin, ymin, xmax, ymax)。
    .repeat(in_height * in_width, 1) 将每个锚框的偏移量重复 in_height * in_width 次。
    in_height * in_width 是图像中的像素总数，因此这一步骤的作用是为图像中的每个像素生成所有锚框的偏移量。
    重复后的张量形状为 (in_height * in_width * num_anchors, 4)，其中每 num_anchors 行对应图像的一个像素位置。
    '''
    
    # 每个中心点都将有“boxes_per_pixel”个锚框，
    # 所以生成含所有锚框中心的网格，重复了“boxes_per_pixel”次
    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
                dim=1).repeat_interleave(boxes_per_pixel, dim=0)
    '''
    repeat_interleave(repeats, dim=None)
    repeates: 重复次数
    dim: 在某个维度上重复

    repeat与repeat_interleave的区别:
    repeat_interleave(): 在原有的tensor上, 按每一个tensor复制。
    repeat(): 根据原有的tensor复制n个, 然后拼接在一起
    '''
    output = out_grid + anchor_manipulations
    # 对于每个像素增加其对应的偏移量，获得每个像素点对应的所有锚框的位置
    return output.unsqueeze(0)

返回的锚框变量 Y 的形状

img = d2l.plt.imread('../image/catdog.jpg')
h, w = img.shape[:2]

print(h, w)
x = torch.rand(size=(1, 3, h, w))
y = multibox_prior(x, sizes=[0.75, 0.5, 0.25], ratios=[1, 2, 0.5])
y.shape  # 批量大小，锚框的数量，4


# 561 728
# torch.Size([1, 2042040, 4])

访问以 (250, 250) 为中心的第一个锚框

boxes = y.reshape(h, w, 5, 4)
boxes[250, 250, 0, :]


# tensor([0.06, 0.07, 0.63, 0.82])

显示以图像中一个像素为中心的所有锚框

def show_bboxes(axes, bboxes, labels=None, colors=None):
    '''显示所有边界框'''
    def _make_list(obj, default_value=None): # 保证对应的变量都为数组或元组
        if obj is None:
            obj = default_value
        elif not isinstance(obj, (tuple, list)):
            obj = [obj]
        return obj
    
    labels = _make_list(labels)
    colors = _make_list(colors, ['b', 'g', 'r', 'm', 'c'])
    for i, bbox in enumerate(bboxes):
        color = colors[i % len(colors)]  # 循环使用不同颜色
        rect = d2l.bbox_to_rect(bbox.detach().numpy(), color)  # 画出对应的框
        axes.add_patch(rect)
        if labels and len(labels) > i:
            text_color = 'k' if color == 'w' else 'w'
            axes.text(rect.xy[0], rect.xy[1], labels[i],
                      va='center', ha='center', fontsize=9, color=text_color,
                      bbox=dict(facecolor=color, lw=0))
            
d2l.set_figsize()
bbox_scale = torch.tensor((w, h, w, h))
fig = d2l.plt.imshow(img)
show_bboxes(fig.axes, boxes[250, 250, :, :] * bbox_scale,
            ['s=0.75, r=1', 's=0.5, r=1', 's=0.25, r=1', 's=0.75, r=2',
             's=0.75, r=0.5'])

交并比(IoU)

def box_iou(boxes1, boxes2):
    '''计算两个锚框或边界框列表中成对的交并比'''
    box_area = lambda boxes: ((boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]))
    # boxes1,boxes2,areas1,areas2的形状:
    # boxes1：(boxes1的数量,4),
    # boxes2：(boxes2的数量,4),
    # areas1：(boxes1的数量,),
    # areas2：(boxes2的数量,)
    areas1 = box_area(boxes1)
    areas2 = box_area(boxes2)
    # inter_upperlefts,inter_lowerrights,inters的形状:
    # (boxes1的数量,boxes2的数量,2)
    # print('boxes1 shape:', boxes1.shape) # (5, 4)
    # print('boxes2 shape:', boxes2.shape) # (2, 4)
    inter_upperlefts = torch.max(boxes1[:, None, :2], boxes2[:, :2])
    inter_lowerrights = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])
    '''
    这里利用到了广播机制，广播机制从右往左逐维度遍历，维度小则增加成相同的，若没有维度则隐式视为1
    以上面为实例: boxes1、boxes2  (数量， 4)
    boxes1[:, None, :2]:  我们认为shape为: (3, 1, 2)
    boxes2[:, :2] : 我们认为shape为: (2, 2)
    从右往左：
    2 == 2: 不变
    1 < 2 : boxes1中对应的广播为2维
    3 > 1 : boxes2中对应的广播为3维
    最终维度: boxes1/boxes2:(3, 2, 2)

    注意： （torch.max有时返回一个向量，有时返回两个向量， 原因如下）
        按元素比较两个张量并返回最大值：返回一个张量，它是输入张量的对应位置的最大值。
        沿某个维度找到最大值和其索引：返回两个张量，一个是最大值，另一个是最大值的索引。
    '''
    inters = (inter_lowerrights - inter_upperlefts).clamp(min=0)
    # print('inter_upperlefts shape', inter_upperlefts.shape)  # (5,2,2)
    # print('inter_lowerrights shape', inter_lowerrights.shape)  # (5,2,2)
    # print('inters shape:', inters.shape)  # (5,2,2)
    # inter_areasandunion_areas的形状:(boxes1的数量,boxes2的数量)
    inter_areas = inters[:, :, 0] * inters[:, :, 1]
    union_areas = areas1[:, None] + areas2 - inter_areas
    return inter_areas / union_areas

将真实边界框分配给锚框

def assign_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5):
    '''将最接近的真实边界框分配给锚框'''
    num_anchors, num_gt_boxes = anchors.shape[0], ground_truth.shape[0]
    # 位于第i行和第j列的元素x_ij是锚框i和真实边界框j的iou
    jaccard = box_iou(anchors, ground_truth)
    # print('jaccard shape:', jaccard.shape)  # (5. 2)
    # 对于每个锚框，分配的真实边界框的张量
    anchors_bbox_map = torch.full((num_anchors, ), -1, dtype=torch.long, device=device) 
    # full第一个参数如果是一维的，需要在后面增加逗号，否者会报错
    # print('anchor_bbox_map shape:', anchors_bbox_map.shape) # (5, )
    # 根据阈值，决定是否分配真实边界框
    max_ious, indices = torch.max(jaccard, dim=1)
    # print('max_iou shape:', max_ious.shape)  # (5, )
    # print('indices shape:', indices.shape)  # (5, )
    '''
    在1维上进行比较, 返回最大值以及对应的索引
    这里计算每一行中的最大值，即每个锚框最有可能是哪一个边界框
    '''
    anc_i = torch.nonzero(max_ious >= iou_threshold).reshape(-1)
    '''
    torch.nonzero()函数：
    1、输入是一维张量，返回一个包含输入 input 中非零元素索引的张量，输出张量中的每行包含 input 中非零元素的索引，
       输出是二维张量torch.size(z,1)， z 是输入张量 input 中所有非零元素的个数。
    2、输入是n维张量，如果输入 input 有 n 维,则输出的索引张量的size为torch.size(z,n) , 这里 z 是输入张量 input 中所有非零元素的个数。
    '''
    box_j = indices[max_ious >= iou_threshold]
    anchors_bbox_map[anc_i] = box_j  # 将满足条件的位置赋为下标值，其余仍为-1
    col_discard = torch.full((num_anchors, ), -1)
    row_discard = torch.full((num_gt_boxes, ), -1)
    print(anchors_bbox_map)
    for _ in range(num_gt_boxes):
        max_idx = torch.argmax(jaccard)
        box_idx = (max_idx % num_gt_boxes).long()
        anc_idx = (max_idx / num_gt_boxes).long()
        anchors_bbox_map[anc_idx] = box_idx
        jaccard[:, box_idx] = col_discard
        jaccard[anc_idx, :] = row_discard
        print(anchors_bbox_map)
    return anchors_bbox_map  # 只是将满足条件的分配给了对应的真实框，下标对应是哪一个锚框，值对应哪一个真实框

def offset_boxes(anchors, assigned_bb, eps=1e-6):
    '''对锚框偏移量的转换'''
    c_anc = d2l.box_corner_to_center(anchors)
    c_assigned_bb = d2l.box_corner_to_center(assigned_bb)
    offset_xy = 10 * (c_assigned_bb[:, :2] - c_anc[:, :2])/ c_anc[:, 2:]
    offset_wh = 5 * torch.log(eps + c_assigned_bb[:, 2:] / c_anc[:, 2:])
    offset = torch.cat([offset_xy, offset_wh], dim=1)
    return offset

标记锚框的类和偏移量

def multibox_target(anchors, labels):
    '''使用真实边界框标记锚框'''
    batch_size, anchors = labels.shape[0], anchors.squeeze(0)  # 删除一个维度, 即删除批次维度
    batch_offset, batch_mask, batch_class_labels = [], [], []
    device, num_anchors = anchors.device, anchors.shape[0]
    # print('anchors shape:', anchors.shape)
    for i in range(batch_size):
        label = labels[i, :, :]
        anchors_bbox_map = assign_anchor_to_bbox(label[:, 1:], anchors, device)
        # print('anchor_bbpx_map shape:', anchors_bbox_map.shape)
        # print('anchors_bbox_map:', anchors_bbox_map)
        bbox_mask = ((anchors_bbox_map >= 0).float().unsqueeze(-1)).repeat(1, 4) 
        # print('bbox_mask shape:', bbox_mask.shape)  # (5, 4)
        '''
        形成一个矩阵，矩阵大小为：【锚框数量， 4】, 值为true 或者 false
        '''
        # 将类标签和分配的边界框坐标初始化为零
        class_labels = torch.zeros(num_anchors, dtype=torch.long, device=device)
        assigned_bb = torch.zeros((num_anchors, 4), dtype=torch.float32, device=device)
        # 使用真实边界框来标记锚框的类别。
        # 如果一个锚框没有被分配，标记其为背景（值为零）
        indices_true = torch.nonzero(anchors_bbox_map >= 0)
        # print('indices_true shaoe', indices_true.shape)  # (3, 1)
        bb_idx = anchors_bbox_map[indices_true]
        # print(bb_idx)
        # print('bb_idx shape:', bb_idx.shape)  # (3, 1)
        # print('label shape:', label.shape)  # (2, 5)
        # print(label)
        class_labels[indices_true] = label[bb_idx, 0].long() + 1
        # print(class_labels)
        assigned_bb[indices_true] = label[bb_idx, 1:]
        # 偏移量转换
        offset = offset_boxes(anchors, assigned_bb) * bbox_mask
        batch_offset.append(offset.reshape(-1))
        batch_mask.append(bbox_mask.reshape(-1))
        batch_class_labels.append(class_labels)
    bbox_offset = torch.stack(batch_offset)
    bbox_mask = torch.stack(batch_mask)
    class_labels = torch.stack(batch_class_labels)
    '''
    torch.cat() 和 torch.stack()的区别：
    torch.cat(concatenate)用于沿着指定维度连接给定的张量序列。被连接的张量必须在除连接维度以外的所有维度上具有相同的大小。
    torch.stack 用于沿着一个新的维度连接给定的张量序列。被堆叠的张量必须具有相同的形状。
    '''
    return (bbox_offset, bbox_mask, class_labels)

在图像中绘制这些地面真相边界框和锚框

ground_truth = torch.tensor([[0, 0.1, 0.08, 0.52, 0.92],
                             [1, 0.55, 0.2, 0.9, 0.88]])
anchors = torch.tensor([[0, 0.1, 0.2, 0.3], [0.15, 0.2, 0.4, 0.4],
                        [0.63, 0.05, 0.88, 0.98], [0.66, 0.45, 0.8, 0.8],
                        [0.57, 0.3, 0.92, 0.9]])

fig = d2l.plt.imshow(img)
show_bboxes(fig.axes, ground_truth[:, 1:] * bbox_scale, ['dog', 'cat'], 'k')
show_bboxes(fig.axes, anchors * bbox_scale, ['0', '1', '2', '3', '4'])

根据狗和猫的真实边界框，标注这些锚框的分类和偏移量

labels = multibox_target(anchors.unsqueeze(dim=0),
                         ground_truth.unsqueeze(dim=0))

labels[2]

# tensor([[0, 1, 2, 0, 2]])

labels[1]

# tensor([[0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 1.,
#          1., 1.]])

labels[0]

# tensor([[-0.00e+00, -0.00e+00, -0.00e+00, -0.00e+00,  1.40e+00,  1.00e+01,
#           2.59e+00,  7.18e+00, -1.20e+00,  2.69e-01,  1.68e+00, -1.57e+00,
#          -0.00e+00, -0.00e+00, -0.00e+00, -0.00e+00, -5.71e-01, -1.00e+00,
#           4.17e-06,  6.26e-01]])

应用逆偏移变换来返回预测的边界框坐标

def offset_inverse(anchors, offset_preds):
    '''根据带有预测偏移量的锚框来预测边界框'''
    anc = d2l.box_corner_to_center(anchors)
    pred_bbox_xy = (offset_preds[:, :2] * anc[:, 2:] / 10) + anc[:, :2]
    pred_bbox_wh = torch.exp(offset_preds[:, 2:] / 5) * anc[:, 2:]
    pred_bbox = torch.cat((pred_bbox_xy, pred_bbox_wh), dim=1)
    predicted_bbox = d2l.box_center_to_corner(pred_bbox)
    return predicted_bbox

以下 nms 函数按降序对置信度进行排序并返回其索引

def nms(boxes, scores, iou_threshold):
    '''对预测边界框的置信度进行排序'''
    B = torch.argsort(scores, dim=-1, descending=True)
    '''
    torch.argsort():
    input: 需要排序的张量。
    dim: 要排序的维度。默认是最后一个维度 (-1)。
    descending: 是否按降序排序。默认为 False, 即升序排序。
    '''
    keep = []
    while B.numel() > 0:
        i = B[0]
        keep.append(i)
        if B.numel() == 1: break
        iou = box_iou(boxes[i, :].reshape(-1, 4),
                      boxes[B[1:], :].reshape(-1, 4)).reshape(-1)
        inds = torch.nonzero(iou <= iou_threshold).reshape(-1)
        B = B[inds + 1]   # inds是在B[1:]的基础上计算出来的下标，若要应用于原始的下标，需要加1
    return torch.tensor(keep, device=boxes.device)

将非极大值抑制应用于预测边界框

def multibox_detection(cls_probs, offset_preds, anchors, nms_threshold=0.5,
                       pos_threshold=0.009999999):
    '''
    使用非极大值抑制来预测边界框
    clas_probs shape:(批次大小, 类别数目, 锚框数目)  数据信息为各个锚框对各个类别预测的置信度
    offset_preds shape:(批次大小, 锚框数目, 4) 代表每个锚框预测的偏移量
    anchors shape: (批次大小, 锚框数目, 4)  代表每个锚框的初始坐标
    '''
    device, batch_size = cls_probs.device, cls_probs.shape[0]
    anchors = anchors.squeeze(0)  # 减少一个维度  （锚框的数量, 4）
    num_classes, num_anchors = cls_probs.shape[1], cls_probs.shape[2]
    out = []
    for i in range(batch_size):
        cls_prob, offset_pred = cls_probs[i], offset_preds[i].reshape(-1, 4)
        # print('cls_prob:', cls_prob)
        conf, class_id = torch.max(cls_prob[1:], 0)    # 计算置信度得分
        # print('conf:', conf)
        # print('class_id:', class_id)
        '''
        cls_prob: tensor([[0.00, 0.00, 0.00, 0.00],
                          [0.90, 0.80, 0.70, 0.10],
                          [0.10, 0.20, 0.30, 0.90]])
        conf: tensor([0.90, 0.80, 0.70, 0.90])
        class_id: tensor([0, 0, 0, 1])
        '''
        predicted_bb = offset_inverse(anchors, offset_pred)  # 通过预测的偏移量获得预测的框
        keep = nms(predicted_bb, conf, nms_threshold)  # 通过非极大抑制去除一些框

        # 找到所有的non_keep索引，并将类设置为背景
        all_idx = torch.arange(num_anchors, dtype=torch.long, device=device)
        combined = torch.cat((keep, all_idx))
        uniques, counts = combined.unique(return_counts=True)
        non_keep = uniques[counts == 1]
        all_id_sorted = torch.cat((keep, non_keep))
        class_id[non_keep] = -1
        # print(class_id)
        class_id = class_id[all_id_sorted]
        # print(class_id)
        '''
        tensor([ 0, -1, -1,  1])
        tensor([ 0,  1, -1, -1])
        '''
        conf, predicted_bb = conf[all_id_sorted], predicted_bb[all_id_sorted]
        # pos_threshold是一个用于非背景预测的阈值
        below_min_idx = (conf < pos_threshold)
        class_id[below_min_idx] = -1
        conf[below_min_idx] = 1 - conf[below_min_idx]
        pred_info = torch.cat(
            (class_id.unsqueeze(1), conf.unsqueeze(1), predicted_bb), dim=1)
        out.append(pred_info)
    return torch.stack(out)  # （批量大小，锚框的数量，6）
'''
第一个元素是预测的类索引，从0开始（0代表狗，1代表猫），值-1表示背景或在非极大值抑制中被移除了。
第二个元素是预测的边界框的置信度。 
其余四个元素分别是预测边界框左上角和右下角的x/y轴坐标（范围介于0和1之间）
'''

将上述算法应用到一个带有四个锚框的具体示例中

anchors = torch.tensor([[0.1, 0.08, 0.52, 0.92], [0.08, 0.2, 0.56, 0.95],
                        [0.15, 0.3, 0.62, 0.91], [0.55, 0.2, 0.9, 0.88]])
offset_preds = torch.tensor([0] * anchors.numel())
cls_probs = torch.tensor([[0] * 4,  # 背景的预测概率
                          [0.9, 0.8, 0.7, 0.1],     # 狗的预测概率
                          [0.1, 0.2, 0.3, 0.9]])    # 猫的预测概率

在图像上绘制这些预测边界框和置信度

fig = d2l.plt.imshow(img)
show_bboxes(fig.axes, anchors * bbox_scale,
            ['dog=0.9', 'dog=0.8', 'dog=0.7', 'cat=0.9'])

返回结果的形状是（批量大小，锚框的数量，6）

output = multibox_detection(cls_probs.unsqueeze(dim=0),
                            offset_preds.unsqueeze(dim=0),
                            anchors.unsqueeze(dim=0), nms_threshold=0.5)
'''
第一个元素是预测的类索引，从0开始（0代表狗，1代表猫），值-1表示背景或在非极大值抑制中被移除了。 
第二个元素是预测的边界框的置信度。 
其余四个元素分别是预测边界框左上角和右下角的x/y轴坐标（范围介于0和1之间）
'''
output


# tensor([[[ 0.00,  0.90,  0.10,  0.08,  0.52,  0.92],
#          [ 1.00,  0.90,  0.55,  0.20,  0.90,  0.88],
#          [-1.00,  0.80,  0.08,  0.20,  0.56,  0.95],
#          [-1.00,  0.70,  0.15,  0.30,  0.62,  0.91]]])

输出由非极大值抑制保存的最终预测边界框

fig = d2l.plt.imshow(img)
for i in output[0].detach().numpy():
    if i[0] == -1:
        continue
    label = ('dog=', 'cat=')[int(i[0])] + str(i[1])
    show_bboxes(fig.axes, [torch.tensor(i[2:]) * bbox_scale], label)