PyTorch：目标检测（object detection）介绍

最新推荐文章于 2024-06-09 19:59:30 发布

峡谷的小鱼

最新推荐文章于 2024-06-09 19:59:30 发布

阅读量4.3k

点赞数 3

分类专栏： PyTorch使用文章标签：目标检测 pytorch 计算机视觉人工智能深度学习

本文链接：https://blog.csdn.net/weixin_43276033/article/details/124623669

版权

PyTorch使用专栏收录该内容

21 篇文章 1 订阅

订阅专栏

目标检测（object detection）

一、介绍

在图像分类任务中，我们假设图像中只有一个主要物体对象，我们只关注如何识别其类别。然而，很多时候图像里有多个我们感兴趣的目标，我们不仅想知道它们的类别，还想得到它们在图像中的具体位置。在计算机视觉里，我们将这类任务称为目标检测（object detection）或目标识别（object recognition）。
目标检测所关注的问题：

分类：图片某个区域属于哪个类别；
定位：目标出现在图像中的哪个位置；
大小：目标的大小；

边界框
目标检测中，使用矩形的边界框（bounding box）描述对象的空间位置。图像以左上角为坐标 (0, 0) 点，水平向右为x坐标，垂直向下为y坐标。

%matplotlib inline
import torch
from d2l import torch as d2l

d2l.set_figsize()
img = d2l.plt.imread('../img/catdog.jpg')
d2l.plt.imshow(img);

在这里插入图片描述

此时，边界框有两种表示方式：

使用边界框的左上角和右下角在图像中的坐标值来表示
或者使用矩阵的中心坐标和矩形的高和宽来表示

两种表示法之间可以相互转换：

# 边界框(bounding box): 描述对象的空间位置。
# 边界框是矩形的，有两种表示法：
# 1. 由左上角和右下角的x和y坐标决定，
# 2. 或中心位置x,y坐标和框的宽和高决定。

# 定义两种表示法之间转换的函数：
def box_corner_to_center(boxes):
    """
    从（左上坐标，右下坐标）转换到（中心坐标，宽度， 高度）
    Args:
        boxes: tensor, 尺寸(batch_size, 4)
    return:
        boxes: tensor, 尺寸(batch_size, 4)
    """
    x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    
    # 中心坐标
    cx = (x1 + x2) / 2
    cy = (y1 + y2) / 2
    
    # 宽度，高度
    w = x2 - x1
    h = y2 - y1

    # torch.stack: 合并tensor序列，添加一个新维度
    boxes = torch.stack((cx, cy, w, h), axis=-1)
    return boxes
    
def box_center_to_corner(boxes):
    """
    从（中心坐标，宽度， 高度）转换到（左上坐标，右下坐标）
    Args:
        boxes: tensor, 尺寸(batch_size, 4)
    return:
        boxes: tensor, 尺寸(batch_size, 4)
    """
    cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    
    # 计算左上坐标
    x1 = cx - 0.5 * w
    y1 = cy - 0.5 * h
    
    # 计算右下坐标
    x2 = cx + 0.5 * w
    y2 = cy + 0.5 * h
    
    # 在维度1上合并
    boxes = torch.stack((x1, y1, x2, y2), axis=-1)
    return boxes

# 举例验证函数正确性
dog_bbox, cat_bbox = [60.0, 45.0, 378.0, 516.0], [400.0, 112.0, 655.0, 493.0]
boxes = torch.tensor((dog_bbox, cat_bbox))
box_center_to_corner(box_corner_to_center(boxes)) == boxes

输出

tensor([[True, True, True, True],
        [True, True, True, True]])

在图像上添加边界框之后，检测对象的主要轮廓基本上在两个框内。

# 绘制边界框
def bbox_to_rect(bbox, color):
    # bbox: sequence, 边界框(左上x, 左上y， 右下x， 右下y)
    # color: 边框颜色，e.g. "blue", "red"
    # plt.Rectangle 绘制长方形

    return plt.Rectangle(
        xy=(bbox[0], bbox[1]),      # 锚点，即左上角坐标
        width=bbox[2]-bbox[0],      # 矩形的宽
        height=bbox[3]-bbox[1],     # 矩形的高
        fill=False,             
        edgecolor=color,
        linewidth=2
    )

fig = plt.imshow(img)
fig.axes.add_patch(bbox_to_rect(dog_bbox, 'blue'))
fig.axes.add_patch(bbox_to_rect(cat_bbox, 'red'));

在这里插入图片描述

二、检测区域——锚框

目标检测算法，一般会先抽取候选区域。通常，在输入图像中先采样大量的区域，然后判断这些区域中是否包含要检测的目标，并调整候选区域的边界，从而更加准确的预测目标的真实边界。
这里介绍一种方法：锚框，以每个像素为中心，生成多个不同缩放比和宽高比的边界框。

生成多个锚框
假设输入图像的高度为 $h$ ，宽度为 $w$ 。
对图片的每个像素生成不同形状的锚框：缩放比 $s\in(0,1]$ ，高宽比为 $r > 0$ 。
则 $s=\frac{w_ah_a}{wh}$ 、 $r=\frac{w_a}{h_a}$ 条件下锚框的宽度 $w_a$ 和高度 $h_a$ 分别为： $w_a=\sqrt{rswh}, h_a=\sqrt{\frac{swh}{r}}$ 。

在下面代码中，因为将坐标缩放到了(0, 1)之间，并且最终返回的锚框的坐标也是(0,1)之间的值。因此，在代码中宽和高的计算分别为：
$w_{as}=\sqrt{\frac{rsh}{w}}$ ， $h_{as}=\sqrt{\frac{sw}{rh}}$ 。
在绘制时，通过下式得到以像素为单位的宽和高：
$w_{a}=w\cdot\sqrt{\frac{rsh}{w}}$ ， $h_{a}=w\cdot\sqrt{\frac{sw}{rh}}$

# 生成多个锚框
# 假设输入图像：高 h, 宽 w
# 以图像的每个像素为中心生成不同形状的锚框：不同缩放比 (0,1], 高宽比 r > 0
#                                           锚框的宽度和高度分别为 sqrt(rswg), sqrt(swh/r)
# 设置缩放比取值[s1, s2,..., sn], 高宽比取值[r1, r2,..., rm], 则输入图像将有 whnm 个锚框
# 为降低复杂性，值旋转包含 s1 和 r1 的锚框， 则一个像素生成 n+m-1 个锚框，共 wh(n+m-1) 个

def multibox_prior(data, sizes, ratios):
    """
    生成以像素为中心具有不同形状的锚框
    Args:
        data: 形状(批量大小，通道数，高，宽)
        sizes: 1-D sequence, 缩放比
        ratios: 1-D sequence，宽高比
    """
    
    # 取图像的高和宽，存放设备，锚框大小列表长度，锚框宽高比列表长度
    in_height, in_width = data.shape[-2:]
    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
    
    # 大小列表和宽高比列表决定了每个像素生成的锚框数量
    # 并将其转换为tensor
    boxes_per_pixel = (num_sizes+num_ratios-1)
    size_tensor = torch.tensor(sizes, device=device)
    ratio_tensor = torch.tensor(ratios, device=device)
    
    
    # 为了将锚点移动到像素的中心，需要设置偏移量
    # 因为一个像素的的高为1且宽为1，我们选择偏移我们的中心0.5
    offset_h, offset_w = 0.5, 0.5
    # 高和宽按照自身长度的缩放比例
    steps_h = 1.0 / in_height  # 在y轴上缩放步长
    steps_w = 1.0 / in_width  # 在x轴上缩放步长
    
    
    # 单个像素大小为 1*1
    # 所以，将所有像素点的横/纵坐标+0.5（即加上上面设置的偏移量），得到每个像素的中心坐标
    # 然后，乘以缩放比例，归一化到(0,1)之间
    # 这就生成了锚框的所有中心点的横/纵坐标
    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
    
    # 生成中心点的横坐标tensor 和纵坐标 tensor
    # shift_y，shift_x 的形状都是：(in_height, in_width)
    shift_y, shift_x = torch.meshgrid(center_h, center_w)
    # 将shift_y，shift_x展平为一维，形状变为：(in_height * in_width)
    # 最终得到每个锚框中心点的横坐标 shift_x 和列坐标 shift_y
    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
    
    
    # 生成“boxes_per_pixel”的每个锚框的高和宽，
    # 之后用于创建锚框的四角坐标(xmin,xmax,ymin,ymax)
    w = torch.cat((torch.sqrt(size_tensor * ratio_tensor[0]*in_height/in_width),
                   torch.sqrt(sizes[0] * ratio_tensor[1:])*in_height/in_width))
    h = torch.cat((torch.sqrt(size_tensor * in_width / (ratio_tensor[0] * in_height)),
                   torch.sqrt(sizes[0]  * in_width / (ratio_tensor[1:] * in_height))))

    # 为每个中心点生成一组所有锚框的(-w, -h, w, h)
    # 除以2来获得半高和半宽
    # anchor_manipulations 的形状： (boxes_per_pixel * in_height * in_width, 4)
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                        in_height * in_width, 1) / 2    
    
    # 每个中心点都将有“boxes_per_pixel”个锚框，
    # 将[shift_x, shift_y, shift_x, shift_y]的每个元素作为列，合并为二维tensor
    # 并在 0 维上，按行，每行复制 boxes_per_pixel遍 
    # 最终，生成含所有锚框中心的网格，重复了“boxes_per_pixel”次
    # out_grid 的形状： (boxes_per_pixel * in_height * in_width, 4)
    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
                dim=1).repeat_interleave(boxes_per_pixel, dim=0)

    # 在中心坐标的基础上，加减半个宽或高，得到锚框的左上角坐标和右下角坐标
    output = out_grid + anchor_manipulations
    
    # 返回output，形状 (批量大小, 锚框数量, 4)
    return output.unsqueeze(0)

测试定义的函数

d2l.set_figsize()
img = d2l.plt.imread('./img/catdog.jpg')
h, w = img.shape[:2]

print(h, w)
X = torch.rand(size=(1, 3, h, w))
Y = multibox_prior(X, sizes=[0.75, 0.5, 0.25], ratios=[1, 2, 0.5])
boxes = Y.reshape(h, w, 5, 4)
print(boxes[250, 250, :, :])

输出：

tensor([[-0.04, -0.05,  0.72,  0.94],
        [ 0.03,  0.04,  0.65,  0.85],
        [ 0.12,  0.16,  0.56,  0.73],
        [-0.13,  0.10,  0.82,  0.80],
        [ 0.11, -0.25,  0.58,  1.14]])

定义函数，绘制某一个像素为中心的全部锚框

def show_bboxes(axes, bboxes, labels=None, colors=None):
    """
    显示以某个像素为中心的所有边界框
    Args:
        axes:
        bboxes: 单个像素的所有锚框坐标的tensor
        labels: 边界框标签列表
        colors: 边界框颜色列表
    """
    def _make_list(obj, default_values=None):
        if obj is None:
            obj = default_values
        elif not isinstance(obj, (list, tuple)):
            obj = [obj]
        return obj

    labels = _make_list(labels)
    colors = _make_list(colors, ['b', 'g', 'r', 'm', 'c'])
    for i, bbox in enumerate(bboxes):
        # 画边界框
        color = colors[i % len(colors)]
        rect = bbox_to_rect(bbox.detach().numpy(), color)
        axes.add_patch(rect)
        
        # 添加边框的标签
        if labels and len(labels) > i:
            text_color = 'k' if color == 'w' else 'w'
            axes.text(rect.xy[0], rect.xy[1], labels[i],
                      va='center', ha='center', fontsize=9, color=text_color,
                      bbox=dict(facecolor=color, lw=0))


# 测试函数：
set_figsize()
bbox_scale = torch.tensor((w, h, w, h))
fig = plt.imshow(img)
show_bboxes(fig.axes, boxes[250, 250, :, :] * bbox_scale,
            ['s=0.75, r=1', 's=0.5, r=1', 's=0.25, r=1', 's=0.75, r=2',
             's=0.75, r=0.5'])

输出：
在这里插入图片描述

三、计算锚框和真实边框之间的相似性——交并比 IoU

如何量化锚框和真实边界之间的相似性？
杰卡德系数：jaccard，可以衡量两集合A,B之间的相似性，J(A,B) = (A∩B)/(A∪B)。
交并比 (IoU)：将边界框看作为像素的集合，使用杰卡德系数来测量两个边界框的相似性，并将其称之为
交并比 (IoU): 两个边界框相交面积之比，取值范围在0和1之间。

# 交并比
"""
1. 如何量化锚框和真实边界之间的相似性？
    杰卡德系数：jaccard，可以衡量两集合A,B之间的相似性，J(A,B) = (A∩B)/(A∪B)。
    边界框看作为像素的集合，使用杰卡德系数来测量两个边界框的相似性，并将其称之为
    交并比 (IoU): 两个边界框相交面积之比，取值范围在0和1之间。
"""
def box_iou(boxes1, boxes2):
    """计算两个锚框或边界框列表中成对的交并比"""
    
    # 计算锚框或边界框的面积
    box_area = lambda boxes: ((boxes[:, 2] - boxes[:, 0])*
                              (boxes[:, 3] - boxes[:, 1]))
    # boxes1, boxes2, areas1, areas2的形状
    # boxes1: (boxes1的数量， 4)
    # boxes2: (boxes2的数量， 4)
    # areas1：(boxes1的数量,)
    # areas2：(boxes2的数量,)
    areas1 = box_area(boxes1)
    areas2 = box_area(boxes2)
    # inter_upperlefts,inter_lowerrights,inters的形状:
    # (boxes1的数量,boxes2的数量,2)
    # 如果两个框有交集，inter_upperlefts中对应的元素是重叠区域左上角坐标
    # 如果两个框有交集，inter_lowerrights中对应的元素是重叠区域右下角坐标
    # 此时，计算inter_lowerrights-inter_upperlefts时，坐标差为正
    # 如果两个框无交集，则计算inter_lowerrights-inter_upperlefts时，坐标差为负
    # torch.clamp(input, min, max)：将tensor中的元素限制到一定范围内
    # 这里使用torch.clamp将inter_lowerrights-inter_upperlefts的负值归零
    # inters 的形状(boxes1的数量,boxes2的数量,2), 第三维是坐标差
    inter_upperlefts = torch.max(boxes1[:, None, :2], boxes2[:, :2])
    inter_lowerrights = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])
    inters = (inter_lowerrights - inter_upperlefts).clamp(min=0)
    
    # 使用 inters 的坐标差计算相交的面积inter_areas，和相并的面积 union_areas
    # inter_areas and union_areas的形状:(boxes1的数量,boxes2的数量)
    inter_areas = inters[:, :, 0] * inters[:, :, 1]
    union_areas = areas1[:, None] + areas2 - inter_areas
    return inter_areas / union_areas

四、在训练数据中标注锚框

在训练集中：
将每个锚框视为一个训练样本，为训练目标检测模型，需要每个锚框的类别（class: 锚框相关的对象类别）和偏移量（offset: 真实边界框相对于锚框的偏移量）标签。

给定图像，假设锚框是 $A_1,A_2,...,A_{na}$ ，真实边界框是 $B_1,B_2,...,B_{nb}$ , 其中 $n_a >= n_b$ 。
定义一个矩阵 $X∈R^ {n_a×n_b}$ ，其中第 $i$ 行、第 $j$ 列的元素 $x_{ij}$ 是锚框 $A_i$ 和真实边界框 $B_j$ 的IoU。
算法步骤：

在矩阵 $X$ 中找到最大的元素，并将其的行索引和列索引分别表示为 $i_1$ 和 $j_1$ 。然后将真实边界框 $B_{j_1}$ 分配给锚框 $A_{i_1}$ 。因为 $A_{i_1}$ 和 $B_{j_1}$ 是所有锚框和真实边界框配对中最相近的。在第一个分配完成后,丢弃矩阵中行和列中的所有元素。
在矩阵 $X$ 中找到剩余元素中最大的元素，并将它的行索引和列索引分别表示为 $i_2$ 和 $j_2$ 。我们将真实边界框 $A_{i_2}$ 分配给锚框 $B_{i_2}$ ，并丢弃矩阵中行 $i_{2th}$ 和列 $j_{2th}$ 中的所有元素。
此时，矩阵中两行和两列中的元素已被丢弃。继续前序操作，直到丢弃掉矩阵 $X$ 中 $n_b$ 列中的所有元素。此时，已经为 $n_b$ 个锚框各自分配了一个真实边界框。
只遍历剩下的 $n_a-n_b$ 个锚框。例如，给定任何锚框 $A_i$ ，在矩阵 $X$ 的第 $i_{th}$ 行中找到与 $A_i$ 的IoU最大的真实边界框 $B_j$ ，只有当此IoU大于预定义的阈值时，才将 $B_j$ 分配给 $A_j$ 。

# 算法实现：为锚框分别真实边界框标签
def assign_anchor_to_bbox(ground_truth, anchor, device, iou_threshold=0.5):
    """将最接近的真实边界框分配给锚框"""
    
    num_anchors, num_gt_boxes = anchors.shape[0], ground_truth.shape[0]
    # 位于第i行和第j列的元素x_ij是锚框i和真实边界框j的IoU
    jaccard = box_iou(anchors, ground_truth)
    # 对于每个锚框，分配的真实边界框的张量，首先初始化为-1，即背景值
    anchors_bbox_map = torch.full((num_anchors,), -1, dtype=torch.long,
                                  device=device)
    
    # 根据阈值，决定是否分配真实边界框
    # jaccard 每行的最大值，max_ious: 1-d tensor, 最大值，indices: 1-d tensor, 最大值下标(dim=1)
    max_ious, indices = torch.max(jaccard, dim=1)
    # anc_i: 1-D tensor, max_ious 中 IoU值大于0.5的值的下标，
    # 也就是 jaccard 中IoU值大于0.5的值的 dim=0 的索引
    anc_i = torch.nonzero(max_ious >= 0.5).reshape(-1)
    # box_j: 1-D tensor, jaccard 中IoU值大于0.5的值的 dim=1 的索引
    box_j = indices[max_ious >= 0.5]
    
    # 给每个锚框赋予标签（即jaccard对应行中对打IoU对应的dim=1的索引值）
    anchors_bbox_map[anc_i] = box_j
    
    col_discard = torch.full((num_anchors,), -1)
    row_discard = torch.full((num_gt_boxes,), -1)
    for _ in range(num_gt_boxes):
        # 计算jaccard中最大值的索引：anc_idx，box_idx
        max_idx = torch.argmax(jaccard)
        box_idx = (max_idx % num_gt_boxes).long()
        anc_idx = (max_idx / num_gt_boxes).long()
        
        # 给对应锚框赋予标签值
        anchors_bbox_map[anc_idx] = box_idx
        # jaccard中所在行和列的所有值置为-1
        jaccard[:, box_idx] = col_discard
        jaccard[anc_idx, :] = row_discard
        
    # 返回 anchors_bbox_map: 1-D tensor, (num_anchors), 锚框的标签值。
    return anchors_bbox_map

在为每个锚框分配了真实边界框，也就为锚框标记了分类类别，。之后，还需要对其标记与对应真实边界框之间的偏移。
标记偏移量
为每个锚框标记类别和偏移量，假设一个锚框A被分配给了一个真实边界框B。则锚框A的类别将被标记为与B相同，并且锚框A的偏移量将根据B和A中心坐标的相对位置以及这两个框的大小进行标记。直觉来说，可以直接使用锚框和真实边框之间的坐标差来标记偏移。但是这样的标记不利于学习。因此，将同时基于数据集内不同的框的位置和大小不同，应用变换来获得均匀拟合的偏移量。给定框A和B，中心坐标分别为 $x_a, y_a)$ 和 $x_b,y_b)$ ，宽度分别为 $\omega_a$ 和 $\omega_b$ ,高度分别为 $h_a$ 和 $h_b$ ，则偏移量可以标记为：
$(\frac{\frac{x_b-x_a}{w_a} - \mu_x}{\sigma_x}, \frac{\frac{y_b-y_a}{h_a} - \mu_y}{\sigma_y}, \frac{log \frac{w_b}{w_a} - \mu_w}{\sigma_w},\frac{log \frac{h_b}{h_a} - \mu_h}{\sigma_h})$
其中常量的默认值为 $\mu_x=\mu_y=\mu_w=\mu_h=0$ , $\sigma_x=\sigma_y=0.1, \sigma_w=\sigma_h=0.2$ 。

# 偏移量计算
def offset_boxes(anchors, assigned_bb, eps=1e-6):
    """对锚框偏移量的转换"""
    # 将锚框的表示方式转换为中心坐标加高宽的形式
    c_anc = d2l.box_corner_to_center(anchors)
    c_assigned_bb = d2l.box_corner_to_center(assigned_bb)
    
    # 计算偏移量
    offset_xy = 10 * (c_assigned_bb[:, :2] - c_anc[:, :2]) / c_anc[:, 2:]
    offset_wh = 5 * torch.log(eps + c_assigned_bb[:, 2:] / c_anc[:, 2:])
    
    offset = torch.cat([offset_xy, offset_wh], axis=1)
    return offset

如果一个锚框没有被分配真实边界框，我们只需将锚框的类别标记为“背景”（background）。背景类别的锚框通常被称为“负类”锚框，其余的被称为“正类”锚框。我们使用真实边界框（labels参数）实现以下multibox_target函数，来标记锚框的类别和偏移量（anchors参数）。此函数将背景类别的索引设置为零，然后将新类别的整数索引递增一。

#@save
def multibox_target(anchors, labels):
    """使用真实边界框标记锚框"""
    
    # labels 形状 (batch_size, num_gt_boxes, 5)
    # anchors 形状 (batch_size, num_anchors, 4)
    batch_size, anchors = labels.shape[0], anchors.squeeze(0)
    batch_offset, batch_mask, batch_class_labels = [], [], []
    device, num_anchors = anchors.device, anchors.shape[0]
    
    # 
    for i in range(batch_size):
        
        # 为图片的锚框分配真实边界框
        label = labels[i, :, :]
        anchors_bbox_map = assign_anchor_to_bbox(
            label[:, 1:], anchors, device)
        
        # 将负类锚框的掩码置为0，bbox_mask 形状 (num_anchors, 4)
        bbox_mask = ((anchors_bbox_map >= 0).float().unsqueeze(-1)).repeat(
            1, 4)
        
        # 将类标签和分配的边界框坐标初始化为零
        class_labels = torch.zeros(num_anchors, dtype=torch.long,
                                   device=device)
        assigned_bb = torch.zeros((num_anchors, 4), dtype=torch.float32,
                                  device=device)
        
        # 使用真实边界框来标记锚框的类别。
        # 如果一个锚框没有被分配，我们标记其为背景（值为零）
        indices_true = torch.nonzero(anchors_bbox_map >= 0)
        bb_idx = anchors_bbox_map[indices_true]
        class_labels[indices_true] = label[bb_idx, 0].long() + 1
        assigned_bb[indices_true] = label[bb_idx, 1:]
        
        # 偏移量转换
        offset = offset_boxes(anchors, assigned_bb) * bbox_mask
        batch_offset.append(offset.reshape(-1))
        batch_mask.append(bbox_mask.reshape(-1))
        batch_class_labels.append(class_labels)
    bbox_offset = torch.stack(batch_offset)
    bbox_mask = torch.stack(batch_mask)
    class_labels = torch.stack(batch_class_labels)
    return (bbox_offset, bbox_mask, class_labels)

测试：
定义一个图片即真实标签和锚框：

ground_truth = torch.tensor([[0, 0.1, 0.08, 0.52, 0.92],
                         [1, 0.55, 0.2, 0.9, 0.88]])
anchors = torch.tensor([[0, 0.1, 0.2, 0.3], [0.15, 0.2, 0.4, 0.4],
                    [0.63, 0.05, 0.88, 0.98], [0.66, 0.45, 0.8, 0.8],
                    [0.57, 0.3, 0.92, 0.9]])

fig = d2l.plt.imshow(img)
show_bboxes(fig.axes, ground_truth[:, 1:] * bbox_scale, ['dog', 'cat'], 'k')
show_bboxes(fig.axes, anchors * bbox_scale, ['0', '1', '2', '3', '4']);

在这里插入图片描述
测试定义的multibox_target函数：

labels = multibox_target(anchors.unsqueeze(dim=0),
                         ground_truth.unsqueeze(dim=0))

for i in labels:
    print(i)

输入：

tensor([[-0.00e+00, -0.00e+00, -0.00e+00, -0.00e+00,  1.40e+00,  1.00e+01,
          2.59e+00,  7.18e+00, -1.20e+00,  2.69e-01,  1.68e+00, -1.57e+00,
         -0.00e+00, -0.00e+00, -0.00e+00, -0.00e+00, -5.71e-01, -1.00e+00,
          4.17e-06,  6.26e-01]])
tensor([[0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 1.,
         1., 1.]])
tensor([[0, 1, 2, 0, 2]])

labels 中第一个元素是锚框标记的四个偏移量，形状 (batch_size, 4*num_anchors)；第二个元素，掩码变量，size(批量大小， 4*num_anchors)，掩码中的元素和锚框的四个偏移量一一对应，负类的偏移量不应影响目标函数，故对应掩码为0；第三个元素，预测的类别，形状(batch_size, num_anchors)。

五、非极大值抑制预测边界框

在预测时：为每个图像生成多个锚框，预测所有锚框的类别和偏移量，根据预测的偏移量调整它们的位置获得预测的边界框，最后，只输出符合特定条件的预测边界框。
预测时，先为图片生成多个锚框，然后经模型预测出带有预测偏移量的锚框，然后就可以进一步转化出预测的边界框。转化过程是上面介绍的偏移变换的逆过程。

def offset_inverse(anchors, offset_preds):
    """根据带有预测偏移量的锚框来预测边界框"""
    anc = d2l.box_corner_to_center(anchors)
    pred_bbox_xy = (offset_preds[:, :2] * anc[:, 2:] / 10) + anc[:, :2]
    pred_bbox_wh = torch.exp(offset_preds[:, 2:] / 5) * anc[:, 2:]
    pred_bbox = torch.cat((pred_bbox_xy, pred_bbox_wh), axis=1)
    predicted_bbox = d2l.box_center_to_corner(pred_bbox)
    return predicted_bbox

当有许多锚框时，可能会输出许多相似的具有明显重叠的预测边界框，都围绕着同一目标。为了简化输出，使用极大值抑制 (non-maximum suppression, NMS)合并属于同一目标的类似的预测边界框。
非极大值抑制：对于一个预测边界框B，目标检测模型会计算每个类别的预测概率。假设最大的预测概率为p，则该概率所对应的类别B即为预测的类别。具体来说，我们将p作为预测边界框B的置信度(confidence)。在所有预测的非背景边界框都按照置信度降序排列，生成列表L。然后通过以下步骤操作列表L：

从L中选取置信度最高的预测边界框 $B_1$ 作为基准，然后将所有与 $B_1$ 的IoU超过预定阈值 $\epsilon$ 的非基准预测边界框从L中移除。这时，L保留了置信度最高的预测边界框，去除了与其太过相似的其它预测边界框。简而言之，这些具有非极大值置信度的边界框被抑制了。
从L中选取置信度第二高的预测边界框 $B_2$ 作为又一个基准，然后将所有与 $B_2$ 的IoU大于 $\epsilon$ 的非基准预测边界框从L中移除。
重复上述过程，直到L中所有预测边界框都曾被作为基准。此时，L中任意一对预测边界框的IoU都小于阈值 $\epsilon$ ，不会过于相似。
输出列表L中的所有预测边界框。

def nms(boxes, scores, iou_threshold):
    """对预测边界框的置信度进行排序"""
    B = torch.argsort(scores, dim=-1, descending=True)
    keep = []  # 保留预测边界框的指标
    while B.numel() > 0:
        i = B[0]
        keep.append(i)
        if B.numel() == 1: break
        iou = box_iou(boxes[i, :].reshape(-1, 4),
                      boxes[B[1:], :].reshape(-1, 4)).reshape(-1)
        inds = torch.nonzero(iou <= iou_threshold).reshape(-1)
        B = B[inds + 1]
    return torch.tensor(keep, device=boxes.device)


# 定义以下multibox_detection函数来将非极大值抑制应用于预测边界框。
#@save
#@save
def multibox_detection(cls_probs, offset_preds, anchors, nms_threshold=0.5,
                       pos_threshold=0.009999999):
    """使用非极大值抑制来预测边界框"""
    device, batch_size = cls_probs.device, cls_probs.shape[0]
    anchors = anchors.squeeze(0)
    num_classes, num_anchors = cls_probs.shape[1], cls_probs.shape[2]
    out = []
    for i in range(batch_size):
        cls_prob, offset_pred = cls_probs[i], offset_preds[i].reshape(-1, 4)
        conf, class_id = torch.max(cls_prob[1:], 0)
        predicted_bb = offset_inverse(anchors, offset_pred)
        keep = nms(predicted_bb, conf, nms_threshold)

        # 找到所有的non_keep索引，并将类设置为背景
        all_idx = torch.arange(num_anchors, dtype=torch.long, device=device)
        combined = torch.cat((keep, all_idx))
        uniques, counts = combined.unique(return_counts=True)
        non_keep = uniques[counts == 1]
        all_id_sorted = torch.cat((keep, non_keep))
        class_id[non_keep] = -1
        class_id = class_id[all_id_sorted]
        conf, predicted_bb = conf[all_id_sorted], predicted_bb[all_id_sorted]
        # pos_threshold是一个用于非背景预测的阈值
        below_min_idx = (conf < pos_threshold)
        class_id[below_min_idx] = -1
        conf[below_min_idx] = 1 - conf[below_min_idx]
        pred_info = torch.cat((class_id.unsqueeze(1),
                               conf.unsqueeze(1),
                               predicted_bb), dim=1)
        out.append(pred_info)
    return torch.stack(out)

举例：

anchors = torch.tensor([[0.1, 0.08, 0.52, 0.92], [0.08, 0.2, 0.56, 0.95],
                      [0.15, 0.3, 0.62, 0.91], [0.55, 0.2, 0.9, 0.88]])
offset_preds = torch.tensor([0] * anchors.numel())
cls_probs = torch.tensor([[0] * 4,  # 背景的预测概率
                      [0.9, 0.8, 0.7, 0.1],  # 狗的预测概率
                      [0.1, 0.2, 0.3, 0.9]])  # 猫的预测概率

# 未使用非极大值抑制
fig = plt.imshow(img)
show_bboxes(fig.axes, anchors * bbox_scale,
            ['dog=0.9', 'dog=0.8', 'dog=0.7', 'cat=0.9'])

在这里插入图片描述
可以调用multibox_detection函数来执行非极大值抑制，其中阈值设置为0.5。

output = multibox_detection(cls_probs.unsqueeze(dim=0),
                            offset_preds.unsqueeze(dim=0),
                            anchors.unsqueeze(dim=0),
                            nms_threshold=0.5)

fig = plt.imshow(img)
for i in output[0].detach().numpy():
    if i[0] == -1:
        continue
    label = ('dog=', 'cat=')[int(i[0])] + str(i[1])
    show_bboxes(fig.axes, [torch.tensor(i[2:]) * bbox_scale], label)

在这里插入图片描述

参考：动手学深度学习

峡谷的小鱼

关注

3
点赞
踩
39

收藏

觉得还不错? 一键收藏
0
评论
PyTorch：目标检测（object detection）介绍

在图像分类任务中，我们假设图像中只有一个主要物体对象，我们只关注如何识别其类别。然而，很多时候图像里有多个我们感兴趣的目标，我们不仅想知道它们的类别，还想得到它们在图像中的具体位置。在计算机视觉里，我们将这类任务称为目标检测（object detection）或目标识别（object recognition）。
复制链接

扫一扫