模型部署之——yolov5、yolov7等基于anchor-based解码后处理过程

Ceri

已于 2024-01-04 16:43:23 修改

阅读量1.6k

点赞数 24

分类专栏：模型部署文章标签： YOLO 深度学习人工智能

于 2023-12-20 15:19:22 首次发布

本文链接：https://blog.csdn.net/qq_38806886/article/details/135108289

版权

模型部署专栏收录该内容

12 篇文章 2 订阅

订阅专栏

提示：本文为半成品暂时禁止转载。

文章目录

前言
一、先求出anchor在不同特征层上的缩放
二、修改输入维度
三、对输出x,y,w,h,conf,pred做sigmod
四、生成网格，先验框中心，网格左上角的坐标矩阵
五、处理anchor
六、解码步骤三中的输出，得到新的x,y,w,h
六、组合output
七、得分筛选和非极大值抑制
总结

前言

假设输入为640*640的图像，经过fpn-pan结构得到20 * 20, 40 * 40, 80 * 80三种尺度的特征图:
20 * 20的特征层对应的anchor是[116,90], [156,198], [373,326];
40 * 40的特征层对应的anchor是[30,61], [62,45], [59,119];
80 * 80的特征层对应的anchor是[10,13], [16,30], [33,23];
三个特征层上的缩放32、16、8。

提示：以下是本篇文章正文内容，下面案例可供参考
首先计算20*20特征层的操作

一、先求出anchor在不同特征层上的缩放

计算方式: [116,90],[156,198],[373,326] / 32
结果: scaled_anchors = [(3.625, 2.8125), (4.875, 6.1875), (11.65625, 10.1875)]

二、修改输入维度

原始维度为: batch_size, 3 * (4 + 1 + num_classes), 20, 20
修改后的output维度：batch_size, 3 * (4 + 1 + num_classes), 20, 20 => batch_size, 3, 20, 20, 4 + 1 + num_classes 即:[4, 3, 20, 20, 85]
需要注意的是：一般输入255那个维度是3*85得出来的，而不是85*3得出来的，因此在维度转换的时候需要现将255先resize为3*85，然后再做permute维度变换，这点在写板端代码的时候需要特别注意。

三、对输出x,y,w,h,conf,pred做sigmod

x = sigmod(output[...,0])   维度为:[4, 3, 20, 20]
y = sigmod(output[...,1])   维度为:[4, 3, 20, 20]
w = sigmod(output[...,2])  维度为:[4, 3, 20, 20]
h = sigmod(output[...,3])   维度为:[4, 3, 20, 20]
conf = sigmod(output[...,4])  维度为:[4, 3, 20, 20]
pred = sigmod(output[...,5:]) 维度为:[4, 3, 20, 20, 80]

四、生成网格，先验框中心，网格左上角的坐标矩阵

grid_x = [[[[ 0.,  1.,  2.,  ..., 17., 18., 19.],
      [ 0.,  1.,  2.,  ..., 17., 18., 19.],
      [ 0.,  1.,  2.,  ..., 17., 18., 19.],
      ...,
      [ 0.,  1.,  2.,  ..., 17., 18., 19.],
      [ 0.,  1.,  2.,  ..., 17., 18., 19.],
      [ 0.,  1.,  2.,  ..., 17., 18., 19.]]]

第三和第四个维度填充的是 [ 0., 1., 2., …, 17., 18., 19.] 重复20次的矩阵，后面batch和 3 都只是重复把这个矩阵增加维度而已

grid_y = [[[[ 0.,  0.,  0.,  ...,  0.,  0.,  0.],
      [ 1.,  1.,  1.,  ...,  1.,  1.,  1.],
      [ 2.,  2.,  2.,  ...,  2.,  2.,  2.],
      ...,
      [17., 17., 17.,  ..., 17., 17., 17.],
      [18., 18., 18.,  ..., 18., 18., 18.],
      [19., 19., 19.,  ..., 19., 19., 19.]]]

注：grid_y类似grid_x的转置。

五、处理anchor

将步骤一中的scaled_anchors的w取出来形成二维矩阵 3*1

anchor_w = [[ 3.6250],
    [ 4.8750],
    [11.6562]]

将步骤一中的scaled_anchors的h取出来形成二维矩阵 3*1

	anchor_h =[[ 2.8125],
    [ 6.1875],
    [10.1875]]

将anchor_w扩充维度为 [4, 3, 20, 20]

anchor_w = [[[[3.6250,  3.6250,  3.6250,  ...,  3.6250,  3.6250,  3.6250],
      [ 3.6250,  3.6250,  3.6250,  ...,  3.6250,  3.6250,  3.6250],
      [ 3.6250,  3.6250,  3.6250,  ...,  3.6250,  3.6250,  3.6250],
      ...,
      [ 3.6250,  3.6250,  3.6250,  ...,  3.6250,  3.6250,  3.6250],
      [ 3.6250,  3.6250,  3.6250,  ...,  3.6250,  3.6250,  3.6250],
      [ 3.6250,  3.6250,  3.6250,  ...,  3.6250,  3.6250,  3.6250]],

     [[ 4.8750,  4.8750,  4.8750,  ...,  4.8750,  4.8750,  4.8750],
      [ 4.8750,  4.8750,  4.8750,  ...,  4.8750,  4.8750,  4.8750],
      [ 4.8750,  4.8750,  4.8750,  ...,  4.8750,  4.8750,  4.8750],
      ...,
      [ 4.8750,  4.8750,  4.8750,  ...,  4.8750,  4.8750,  4.8750],
      [ 4.8750,  4.8750,  4.8750,  ...,  4.8750,  4.8750,  4.8750],
      [ 4.8750,  4.8750,  4.8750,  ...,  4.8750,  4.8750,  4.8750]],

     [[11.6562, 11.6562, 11.6562,  ..., 11.6562, 11.6562, 11.6562],
      [11.6562, 11.6562, 11.6562,  ..., 11.6562, 11.6562, 11.6562],
      [11.6562, 11.6562, 11.6562,  ..., 11.6562, 11.6562, 11.6562],
      ...,
      [11.6562, 11.6562, 11.6562,  ..., 11.6562, 11.6562, 11.6562],
      [11.6562, 11.6562, 11.6562,  ..., 11.6562, 11.6562, 11.6562],
      [11.6562, 11.6562, 11.6562,  ..., 11.6562, 11.6562, 11.6562]]]

第三个和第四个维度填充的是 3.6250 形成形同元素的 20*20 矩阵。

第二个维度填充的是anchor_w 下一个元素形成相同元素的 20*20 矩阵。

anchor_h的操作也是一样的

anchor_h = [[[[ 2.8125,  2.8125,  2.8125,  ...,  2.8125,  2.8125,  2.8125],
      [ 2.8125,  2.8125,  2.8125,  ...,  2.8125,  2.8125,  2.8125],
      [ 2.8125,  2.8125,  2.8125,  ...,  2.8125,  2.8125,  2.8125],
      ...,
      [ 2.8125,  2.8125,  2.8125,  ...,  2.8125,  2.8125,  2.8125],
      [ 2.8125,  2.8125,  2.8125,  ...,  2.8125,  2.8125,  2.8125],
      [ 2.8125,  2.8125,  2.8125,  ...,  2.8125,  2.8125,  2.8125]],

     [[ 6.1875,  6.1875,  6.1875,  ...,  6.1875,  6.1875,  6.1875],
      [ 6.1875,  6.1875,  6.1875,  ...,  6.1875,  6.1875,  6.1875],
      [ 6.1875,  6.1875,  6.1875,  ...,  6.1875,  6.1875,  6.1875],
      ...,
      [ 6.1875,  6.1875,  6.1875,  ...,  6.1875,  6.1875,  6.1875],
      [ 6.1875,  6.1875,  6.1875,  ...,  6.1875,  6.1875,  6.1875],
      [ 6.1875,  6.1875,  6.1875,  ...,  6.1875,  6.1875,  6.1875]],

     [[10.1875, 10.1875, 10.1875,  ..., 10.1875, 10.1875, 10.1875],
      [10.1875, 10.1875, 10.1875,  ..., 10.1875, 10.1875, 10.1875],
      [10.1875, 10.1875, 10.1875,  ..., 10.1875, 10.1875, 10.1875],
      ...,
      [10.1875, 10.1875, 10.1875,  ..., 10.1875, 10.1875, 10.1875],
      [10.1875, 10.1875, 10.1875,  ..., 10.1875, 10.1875, 10.1875],
      [10.1875, 10.1875, 10.1875,  ..., 10.1875, 10.1875, 10.1875]]]]

六、解码步骤三中的输出，得到新的x,y,w,h

将输出的 x y w h 进一步按照下面公式进行解码

    #   x  0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_x
    #   y  0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_y
    #   w  0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_w
    #   h  0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_h

x, y, w, h, grid_x, grid_y, anchor_w, anchor_h的维度为 batch_size,3,20,20，由于最终生成的x, y, w, h需要反映到原图上，所以好需要乘以本个特征图对应下采样的倍数32。

x = (x.data * 2 - 0.5 + grid_x) * 32
y = (y.data * 2 - 0.5 + grid_y) * 32
w = ((w.data * 2) ** 2 * anchor_w) * 32
h = ((h.data * 2) ** 2 * anchor_h) * 32

六、组合output

假设三个特征图按照上述步骤进行解码

xywh       resize为batch * -1 * 4 , 
conf       resize为batch * -1 * 1 , 
pred_cls   resize为batch * -1 * 80；

concate: (xywh, conf, pred_cls)

4 * (3 * 20 * 20) * 4 、 4 * (3*20*20) * 1 、 4 * (3 * 20 * 20) *  80  => 4 * 1200 * 85 ,其中4为batch_size

三个特征层最后的输出就是4 *(1200 + 4800 + 19200) * 85 => 4 *(25200) * 85。

七、得分筛选和非极大值抑制

得到组合输出的维度为:4 *(25200) * 85:
首先会将conf大于conf_thres阈值(一般设置为0.25)的框取出来，做一个粗筛选。

xc = prediction[..., 4] > conf_thres

其次会根据类别的数量做置信度的计算，
(1)单个类别:
如果分类的类别只有一个，将会把pred_cls赋值为conf相同的值

x[:, 5:] = x[:, 4:5]

(2)多个类别:
如果分类的类别有多个，将会把conf * pred_cls的结果赋值给pred_cls，关于为什么将conf * pred_cls的结果赋值给pred_cls当做最终的输出，有个解释是参考。

现有的检测模型会预测一个额外的IoU score或centerness score来作为定位精度的评价指标，并把它们和分类得分相乘的结果作为NMS中排序的指标。这些方法可以缓解分类得分和定位准确度之间的不对齐misalignment问题。

x[:, 5:] *= x[:, 4:5]

随后把pred_cls超过conf_thres的目标框取出来后按照普通的nms算法即可得到最终结果。

总结

以上基本就是基于anchor-based的目标检测后处理解码方式，基于anchor会更简单一些。

相关代码：参考bilibili博主bulllling

import numpy as np
import torch
from torchvision.ops import nms


class DecodeBox():
    def __init__(self, anchors, num_classes, input_shape, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
        super(DecodeBox, self).__init__()
        self.anchors        = anchors
        self.num_classes    = num_classes
        self.bbox_attrs     = 5 + num_classes
        self.input_shape    = input_shape
        #-----------------------------------------------------------#
        #   20x20的特征层对应的anchor是[116,90],[156,198],[373,326]
        #   40x40的特征层对应的anchor是[30,61],[62,45],[59,119]
        #   80x80的特征层对应的anchor是[10,13],[16,30],[33,23]
        #-----------------------------------------------------------#
        self.anchors_mask   = anchors_mask

    def decode_box(self, inputs):
        outputs = []
        for i, input in enumerate(inputs):
            #-----------------------------------------------#
            #   输入的input一共有三个，他们的shape分别是
            #   batch_size = 1
            #   batch_size, 3 * (4 + 1 + 80), 20, 20
            #   batch_size, 255, 40, 40
            #   batch_size, 255, 80, 80
            #-----------------------------------------------#
            batch_size      = input.size(0)
            input_height    = input.size(2)
            input_width     = input.size(3)

            #-----------------------------------------------#
            #   输入为640x640时
            #   stride_h = stride_w = 32、16、8
            #-----------------------------------------------#
            stride_h = self.input_shape[0] / input_height
            stride_w = self.input_shape[1] / input_width
            #-------------------------------------------------#
            #   此时获得的scaled_anchors大小是相对于特征层的
            #-------------------------------------------------#
            scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]
            # print("scaled_anchors:",scaled_anchors)
            #-----------------------------------------------#
            #   输入的input一共有三个，他们的shape分别是
            #   batch_size, 3, 20, 20, 85
            #   batch_size, 3, 40, 40, 85
            #   batch_size, 3, 80, 80, 85
            #-----------------------------------------------#
            prediction = input.view(batch_size, len(self.anchors_mask[i]),
                                    self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
            print("prediction:",prediction)
            #-----------------------------------------------#
            #   先验框的中心位置的调整参数
            #-----------------------------------------------#
            x = torch.sigmoid(prediction[..., 0])  
            y = torch.sigmoid(prediction[..., 1])
            #-----------------------------------------------#
            #   先验框的宽高调整参数
            #-----------------------------------------------#
            w = torch.sigmoid(prediction[..., 2]) 
            h = torch.sigmoid(prediction[..., 3]) 
            #-----------------------------------------------#
            #   获得置信度，是否有物体
            #-----------------------------------------------#
            conf        = torch.sigmoid(prediction[..., 4])
            #-----------------------------------------------#
            #   种类置信度
            #-----------------------------------------------#
            pred_cls    = torch.sigmoid(prediction[..., 5:])

            FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
            LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor

            #----------------------------------------------------------#
            #   生成网格，先验框中心，网格左上角 
            #   batch_size,3,20,20
            #----------------------------------------------------------#
            grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
            grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
                batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)

            #----------------------------------------------------------#
            #   按照网格格式生成先验框的宽高
            #   batch_size,3,20,20
            #----------------------------------------------------------#
            anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
            anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
            anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
            anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)

            #----------------------------------------------------------#
            #   利用预测结果对先验框进行调整
            #   首先调整先验框的中心，从先验框中心向右下角偏移
            #   再调整先验框的宽高。
            #   x 0 ~ 1 => 0 ~ 2 => -0.5, 1.5 => 负责一定范围的目标的预测
            #   y 0 ~ 1 => 0 ~ 2 => -0.5, 1.5 => 负责一定范围的目标的预测
            #   w 0 ~ 1 => 0 ~ 2 => 0 ~ 4 => 先验框的宽高调节范围为0~4倍
            #   h 0 ~ 1 => 0 ~ 2 => 0 ~ 4 => 先验框的宽高调节范围为0~4倍
            #----------------------------------------------------------#
            pred_boxes          = FloatTensor(prediction[..., :4].shape)
            pred_boxes[..., 0]  = x.data * 2. - 0.5 + grid_x
            pred_boxes[..., 1]  = y.data * 2. - 0.5 + grid_y
            pred_boxes[..., 2]  = (w.data * 2) ** 2 * anchor_w
            pred_boxes[..., 3]  = (h.data * 2) ** 2 * anchor_h

            #----------------------------------------------------------#
            #   将输出结果归一化成小数的形式
            #----------------------------------------------------------#
            _scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor)
            output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale,
                                conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1)
            outputs.append(output.data)
        return output

    def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
        #----------------------------------------------------------#
        #   将预测结果的格式转换成左上角右下角的格式。
        #   prediction  [batch_size, num_anchors, 85]
        #----------------------------------------------------------#
        box_corner          = np.zeros_like(prediction)
        box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
        box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
        box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
        box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
        prediction[:, :, :4] = box_corner[:, :, :4]

        output = [None for _ in range(len(prediction))]
        for i, image_pred in enumerate(prediction):
            #----------------------------------------------------------#
            #   对种类预测部分取max。
            #   class_conf  [num_anchors, 1]    种类置信度
            #   class_pred  [num_anchors, 1]    种类
            #----------------------------------------------------------#
            class_conf = np.max(image_pred[:, 5:5 + num_classes], 1, keepdims=True)
            class_pred = np.expand_dims(np.argmax(image_pred[:, 5:5 + num_classes], 1), -1)

            #----------------------------------------------------------#
            #   利用置信度进行第一轮筛选
            #----------------------------------------------------------#
            conf_mask = np.squeeze((image_pred[:, 4] * class_conf[:, 0] >= conf_thres))

            #----------------------------------------------------------#
            #   根据置信度进行预测结果的筛选
            #----------------------------------------------------------#
            image_pred = image_pred[conf_mask]
            class_conf = class_conf[conf_mask]
            class_pred = class_pred[conf_mask]
            if not np.shape(image_pred)[0]:
                continue
            #-------------------------------------------------------------------------#
            #   detections  [num_anchors, 7]
            #   7的内容为：x1, y1, x2, y2, obj_conf, class_conf, class_pred
            #-------------------------------------------------------------------------#
            detections = np.concatenate((image_pred[:, :5], class_conf, class_pred), 1)

            #------------------------------------------#
            #   获得预测结果中包含的所有种类
            #------------------------------------------#
            unique_labels = np.unique(detections[:, -1])

            for c in unique_labels:
                #------------------------------------------#
                #   获得某一类得分筛选后全部的预测结果
                #------------------------------------------#
                detections_class = detections[detections[:, -1] == c]

                # 按照存在物体的置信度排序
                conf_sort_index     = np.argsort(detections_class[:, 4] * detections_class[:, 5])[::-1]
                detections_class    = detections_class[conf_sort_index]
                # 进行非极大抑制
                max_detections = []
                while np.shape(detections_class)[0]:
                    # 取出这一类置信度最高的，一步一步往下判断，判断重合程度是否大于nms_thres，如果是则去除掉
                    max_detections.append(detections_class[0:1])
                    if len(detections_class) == 1:
                        break
                    ious                = self.bbox_iou(max_detections[-1], detections_class[1:])
                    detections_class    = detections_class[1:][ious < nms_thres]
                # 堆叠
                max_detections = np.concatenate(max_detections, 0)
                
                # Add max detections to outputs
                output[i] = max_detections if output[i] is None else np.concatenate((output[i], max_detections))
            
            if output[i] is not None:
                output[i]           = output[i]
                box_xy, box_wh      = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
                output[i][:, :4]    = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
        return output
    

if __name__ == "__main__":
    import matplotlib.pyplot as plt
    import numpy as np

    #---------------------------------------------------#
    #   将预测值的每个特征层调成真实值
    #---------------------------------------------------#
    def get_anchors_and_decode(input, input_shape, anchors, anchors_mask, num_classes):
        #-----------------------------------------------#
        #   input   batch_size, 3 * (4 + 1 + num_classes), 20, 20
        #-----------------------------------------------#
        batch_size      = input.size(0)
        input_height    = input.size(2)
        input_width     = input.size(3)

        #-----------------------------------------------#
        #   输入为640x640时 input_shape = [640, 640]  input_height = 20, input_width = 20
        #   640 / 20 = 32
        #   stride_h = stride_w = 32
        #-----------------------------------------------#
        stride_h = input_shape[0] / input_height
        stride_w = input_shape[1] / input_width
        #-------------------------------------------------#
        #   此时获得的scaled_anchors大小是相对于特征层的
        #   anchor_width, anchor_height / stride_h, stride_w
        #-------------------------------------------------#

        # [116, 90], [156, 198], [373, 326] / 32
        scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in anchors[anchors_mask[2]]]
        # [(3.625, 2.8125), (4.875, 6.1875), (11.65625, 10.1875)]

        print("scaled_anchors:",scaled_anchors)
        #-----------------------------------------------#
        #   batch_size, 3 * (4 + 1 + num_classes), 20, 20 => 
        #   batch_size, 3, 5 + num_classes, 20, 20  => 
        #   batch_size, 3, 20, 20, 4 + 1 + num_classes
        #-----------------------------------------------#
        print("batch_size:",batch_size)
        prediction = input.view(batch_size, len(anchors_mask[2]),
                                num_classes + 5, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()

        # print("prediction:",prediction.shape)
        #-----------------------------------------------#
        #   先验框的中心位置的调整参数
        #-----------------------------------------------#
        x = torch.sigmoid(prediction[..., 0])
        y = torch.sigmoid(prediction[..., 1])
        #-----------------------------------------------#
        #   先验框的宽高调整参数
        #-----------------------------------------------#
        w = torch.sigmoid(prediction[..., 2]) 
        h = torch.sigmoid(prediction[..., 3]) 
        #-----------------------------------------------#
        #   获得置信度，是否有物体 0 - 1
        #-----------------------------------------------#
        conf        = torch.sigmoid(prediction[..., 4])
        #-----------------------------------------------#
        #   种类置信度 0 - 1
        #-----------------------------------------------#
        pred_cls    = torch.sigmoid(prediction[..., 5:])

        FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
        LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor

        #----------------------------------------------------------#
        #   生成网格，先验框中心，网格左上角 
        #   batch_size,3,20,20
        #   range(20)
        #   [
        #       [0, 1, 2, 3 ……, 19], 
        #       [0, 1, 2, 3 ……, 19], 
        #       …… （20次）
        #       [0, 1, 2, 3 ……, 19]
        #   ] * (batch_size * 3)

        #   [batch_size, 3, 20, 20]
        #   
        #   [
        #       [0, 1, 2, 3 ……, 19], 
        #       [0, 1, 2, 3 ……, 19], 
        #       …… （20次）
        #       [0, 1, 2, 3 ……, 19]
        #   ].T * (batch_size * 3)
        #   [batch_size, 3, 20, 20]
        #----------------------------------------------------------#

        # [4, 3, 20, 20]
        grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
            batch_size * len(anchors_mask[2]), 1, 1).view(x.shape).type(FloatTensor)
        # print("grid_x:",grid_x.shape)
        # print("grid_x:",grid_x)
        # [4, 3, 20, 20]
        grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
            batch_size * len(anchors_mask[2]), 1, 1).view(y.shape).type(FloatTensor)

        # print("grid_x:",grid_x,len(grid_x),len(grid_x[0]),len(grid_x[0][0]),len(grid_x[0][0][0]))
        # print("grid_y:",grid_y)
        #----------------------------------------------------------#
        #   按照网格格式生成先验框的宽高
        #   batch_size, 3, 20 * 20 => batch_size, 3, 20, 20
        #   batch_size, 3, 20 * 20 => batch_size, 3, 20, 20
        #----------------------------------------------------------#
        anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
        anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))

        # print("anchor_w:",anchor_w)
        # print("anchor_w:",anchor_w.shape)
        # print("anchor_h:",anchor_h)
        # print("anchor_h:",anchor_h.shape)

        anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
        anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)

        # print("anchor_w:",anchor_w)
        # print("anchor_w:",anchor_w.shape)
        # print("anchor_h:",anchor_h)
        # print("anchor_h:",anchor_h.shape)

        #----------------------------------------------------------#
        #   利用预测结果对先验框进行调整
        #   首先调整先验框的中心，从先验框中心向右下角偏移
        #   再调整先验框的宽高。
        #   x  0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_x
        #   y  0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_y
        #   w  0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_w
        #   h  0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_h 
        #----------------------------------------------------------#
        pred_boxes          = FloatTensor(prediction[..., :4].shape)
        pred_boxes[..., 0]  = x.data * 2. - 0.5 + grid_x
        pred_boxes[..., 1]  = y.data * 2. - 0.5 + grid_y
        pred_boxes[..., 2]  = (w.data * 2) ** 2 * anchor_w
        pred_boxes[..., 3]  = (h.data * 2) ** 2 * anchor_h

        point_h = 5
        point_w = 5
        
        box_xy          = pred_boxes[..., 0:2].cpu().numpy() * 32
        box_wh          = pred_boxes[..., 2:4].cpu().numpy() * 32
        grid_x          = grid_x.cpu().numpy() * 32
        grid_y          = grid_y.cpu().numpy() * 32
        anchor_w        = anchor_w.cpu().numpy() * 32
        anchor_h        = anchor_h.cpu().numpy() * 32
        
        fig = plt.figure()
        ax  = fig.add_subplot(121)
        from PIL import Image
        img = Image.open("street.jpg").resize([640, 640])
        plt.imshow(img, alpha=0.5)
        plt.ylim(-30, 650)
        plt.xlim(-30, 650)
        plt.scatter(grid_x, grid_y)
        plt.scatter(point_h * 32, point_w * 32, c='black')
        plt.gca().invert_yaxis()

        anchor_left = grid_x - anchor_w / 2
        anchor_top  = grid_y - anchor_h / 2
        
        rect1 = plt.Rectangle([anchor_left[0, 0, point_h, point_w],anchor_top[0, 0, point_h, point_w]], \
            anchor_w[0, 0, point_h, point_w],anchor_h[0, 0, point_h, point_w],color="r",fill=False)
        rect2 = plt.Rectangle([anchor_left[0, 1, point_h, point_w],anchor_top[0, 1, point_h, point_w]], \
            anchor_w[0, 1, point_h, point_w],anchor_h[0, 1, point_h, point_w],color="r",fill=False)
        rect3 = plt.Rectangle([anchor_left[0, 2, point_h, point_w],anchor_top[0, 2, point_h, point_w]], \
            anchor_w[0, 2, point_h, point_w],anchor_h[0, 2, point_h, point_w],color="r",fill=False)

        ax.add_patch(rect1)
        ax.add_patch(rect2)
        ax.add_patch(rect3)

        ax  = fig.add_subplot(122)
        plt.imshow(img, alpha=0.5)
        plt.ylim(-30, 650)
        plt.xlim(-30, 650)
        plt.scatter(grid_x, grid_y)
        plt.scatter(point_h * 32, point_w * 32, c='black')
        plt.scatter(box_xy[0, :, point_h, point_w, 0], box_xy[0, :, point_h, point_w, 1], c='r')
        plt.gca().invert_yaxis()

        pre_left    = box_xy[...,0] - box_wh[...,0] / 2
        pre_top     = box_xy[...,1] - box_wh[...,1] / 2

        rect1 = plt.Rectangle([pre_left[0, 0, point_h, point_w], pre_top[0, 0, point_h, point_w]],\
            box_wh[0, 0, point_h, point_w,0], box_wh[0, 0, point_h, point_w,1],color="r",fill=False)
        rect2 = plt.Rectangle([pre_left[0, 1, point_h, point_w], pre_top[0, 1, point_h, point_w]],\
            box_wh[0, 1, point_h, point_w,0], box_wh[0, 1, point_h, point_w,1],color="r",fill=False)
        rect3 = plt.Rectangle([pre_left[0, 2, point_h, point_w], pre_top[0, 2, point_h, point_w]],\
            box_wh[0, 2, point_h, point_w,0], box_wh[0, 2, point_h, point_w,1],color="r",fill=False)

        ax.add_patch(rect1)
        ax.add_patch(rect2)
        ax.add_patch(rect3)

        #plt.show()
        #
    feat            = torch.from_numpy(np.random.normal(0.2, 0.5, [4, 255, 20, 20])).float()
    anchors         = np.array([[116, 90], [156, 198], [373, 326], [30,61], [62,45], [59,119], [10,13], [16,30], [33,23]])
    anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
    get_anchors_and_decode(feat, [640, 640], anchors, anchors_mask, 80)

Ceri

关注

24
点赞
踩
26

收藏

觉得还不错? 一键收藏
2
评论
模型部署之——yolov5、yolov7等基于anchor-based解码后处理过程

假设输入为640640的图像，经过fpn-pan结构得到2020,4040,8080三种尺度的特征图:20x20的特征层对应的anchor是[116,90],[156,198],[373,326];40x40的特征层对应的anchor是[30,61],[62,45],[59,119];80x80的特征层对应的anchor是[10,13],[16,30],[33,23];三个特征层上的缩放32、16、8。提示：以下是本篇文章正文内容，下面案例可供参考首先计算20*20特征层的操作。
复制链接

扫一扫

专栏目录