提示:本文为半成品暂时禁止转载。
文章目录
前言
假设输入为640*640的图像,经过fpn-pan结构得到20 * 20, 40 * 40, 80 * 80三种尺度的特征图:
20 * 20的特征层对应的anchor是[116,90], [156,198], [373,326];
40 * 40的特征层对应的anchor是[30,61], [62,45], [59,119];
80 * 80的特征层对应的anchor是[10,13], [16,30], [33,23];
三个特征层上的缩放32、16、8。
提示:以下是本篇文章正文内容,下面案例可供参考
首先计算20*20特征层的操作
一、先求出anchor在不同特征层上的缩放
计算方式: [116,90],[156,198],[373,326] / 32
结果: scaled_anchors = [(3.625, 2.8125), (4.875, 6.1875), (11.65625, 10.1875)]
二、修改输入维度
原始维度为: batch_size, 3 * (4 + 1 + num_classes), 20, 20
修改后的output维度:batch_size, 3 * (4 + 1 + num_classes), 20, 20 => batch_size, 3, 20, 20, 4 + 1 + num_classes 即:[4, 3, 20, 20, 85]
需要注意的是:一般输入255那个维度是3*85得出来的,而不是85*3得出来的,因此在维度转换的时候需要现将255先resize为3*85,然后再做permute维度变换,这点在写板端代码的时候需要特别注意。
三、对输出x,y,w,h,conf,pred做sigmod
x = sigmod(output[...,0]) 维度为:[4, 3, 20, 20]
y = sigmod(output[...,1]) 维度为:[4, 3, 20, 20]
w = sigmod(output[...,2]) 维度为:[4, 3, 20, 20]
h = sigmod(output[...,3]) 维度为:[4, 3, 20, 20]
conf = sigmod(output[...,4]) 维度为:[4, 3, 20, 20]
pred = sigmod(output[...,5:]) 维度为:[4, 3, 20, 20, 80]
四、生成网格,先验框中心,网格左上角的坐标矩阵
grid_x = [[[[ 0., 1., 2., ..., 17., 18., 19.],
[ 0., 1., 2., ..., 17., 18., 19.],
[ 0., 1., 2., ..., 17., 18., 19.],
...,
[ 0., 1., 2., ..., 17., 18., 19.],
[ 0., 1., 2., ..., 17., 18., 19.],
[ 0., 1., 2., ..., 17., 18., 19.]]]
第三和第四个维度填充的是 [ 0., 1., 2., …, 17., 18., 19.] 重复20次的矩阵,后面batch和 3 都只是重复把这个矩阵增加维度而已
grid_y = [[[[ 0., 0., 0., ..., 0., 0., 0.],
[ 1., 1., 1., ..., 1., 1., 1.],
[ 2., 2., 2., ..., 2., 2., 2.],
...,
[17., 17., 17., ..., 17., 17., 17.],
[18., 18., 18., ..., 18., 18., 18.],
[19., 19., 19., ..., 19., 19., 19.]]]
注:grid_y类似grid_x的转置。
五、处理anchor
将步骤一中的scaled_anchors的w取出来形成二维矩阵 3*1
anchor_w = [[ 3.6250],
[ 4.8750],
[11.6562]]
将步骤一中的scaled_anchors的h取出来形成二维矩阵 3*1
anchor_h =[[ 2.8125],
[ 6.1875],
[10.1875]]
将anchor_w扩充维度为 [4, 3, 20, 20]
anchor_w = [[[[3.6250, 3.6250, 3.6250, ..., 3.6250, 3.6250, 3.6250],
[ 3.6250, 3.6250, 3.6250, ..., 3.6250, 3.6250, 3.6250],
[ 3.6250, 3.6250, 3.6250, ..., 3.6250, 3.6250, 3.6250],
...,
[ 3.6250, 3.6250, 3.6250, ..., 3.6250, 3.6250, 3.6250],
[ 3.6250, 3.6250, 3.6250, ..., 3.6250, 3.6250, 3.6250],
[ 3.6250, 3.6250, 3.6250, ..., 3.6250, 3.6250, 3.6250]],
[[ 4.8750, 4.8750, 4.8750, ..., 4.8750, 4.8750, 4.8750],
[ 4.8750, 4.8750, 4.8750, ..., 4.8750, 4.8750, 4.8750],
[ 4.8750, 4.8750, 4.8750, ..., 4.8750, 4.8750, 4.8750],
...,
[ 4.8750, 4.8750, 4.8750, ..., 4.8750, 4.8750, 4.8750],
[ 4.8750, 4.8750, 4.8750, ..., 4.8750, 4.8750, 4.8750],
[ 4.8750, 4.8750, 4.8750, ..., 4.8750, 4.8750, 4.8750]],
[[11.6562, 11.6562, 11.6562, ..., 11.6562, 11.6562, 11.6562],
[11.6562, 11.6562, 11.6562, ..., 11.6562, 11.6562, 11.6562],
[11.6562, 11.6562, 11.6562, ..., 11.6562, 11.6562, 11.6562],
...,
[11.6562, 11.6562, 11.6562, ..., 11.6562, 11.6562, 11.6562],
[11.6562, 11.6562, 11.6562, ..., 11.6562, 11.6562, 11.6562],
[11.6562, 11.6562, 11.6562, ..., 11.6562, 11.6562, 11.6562]]]
第三个和第四个维度填充的是 3.6250 形成形同元素的 20*20 矩阵。
第二个维度填充的是anchor_w 下一个元素形成相同元素的 20*20 矩阵。
anchor_h的操作也是一样的
anchor_h = [[[[ 2.8125, 2.8125, 2.8125, ..., 2.8125, 2.8125, 2.8125],
[ 2.8125, 2.8125, 2.8125, ..., 2.8125, 2.8125, 2.8125],
[ 2.8125, 2.8125, 2.8125, ..., 2.8125, 2.8125, 2.8125],
...,
[ 2.8125, 2.8125, 2.8125, ..., 2.8125, 2.8125, 2.8125],
[ 2.8125, 2.8125, 2.8125, ..., 2.8125, 2.8125, 2.8125],
[ 2.8125, 2.8125, 2.8125, ..., 2.8125, 2.8125, 2.8125]],
[[ 6.1875, 6.1875, 6.1875, ..., 6.1875, 6.1875, 6.1875],
[ 6.1875, 6.1875, 6.1875, ..., 6.1875, 6.1875, 6.1875],
[ 6.1875, 6.1875, 6.1875, ..., 6.1875, 6.1875, 6.1875],
...,
[ 6.1875, 6.1875, 6.1875, ..., 6.1875, 6.1875, 6.1875],
[ 6.1875, 6.1875, 6.1875, ..., 6.1875, 6.1875, 6.1875],
[ 6.1875, 6.1875, 6.1875, ..., 6.1875, 6.1875, 6.1875]],
[[10.1875, 10.1875, 10.1875, ..., 10.1875, 10.1875, 10.1875],
[10.1875, 10.1875, 10.1875, ..., 10.1875, 10.1875, 10.1875],
[10.1875, 10.1875, 10.1875, ..., 10.1875, 10.1875, 10.1875],
...,
[10.1875, 10.1875, 10.1875, ..., 10.1875, 10.1875, 10.1875],
[10.1875, 10.1875, 10.1875, ..., 10.1875, 10.1875, 10.1875],
[10.1875, 10.1875, 10.1875, ..., 10.1875, 10.1875, 10.1875]]]]
六、解码步骤三中的输出,得到新的x,y,w,h
将输出的 x y w h 进一步按照下面公式进行解码
# x 0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_x
# y 0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_y
# w 0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_w
# h 0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_h
x, y, w, h, grid_x, grid_y, anchor_w, anchor_h的维度为 batch_size,3,20,20,由于最终生成的x, y, w, h需要反映到原图上,所以好需要乘以本个特征图对应下采样的倍数32。
x = (x.data * 2 - 0.5 + grid_x) * 32
y = (y.data * 2 - 0.5 + grid_y) * 32
w = ((w.data * 2) ** 2 * anchor_w) * 32
h = ((h.data * 2) ** 2 * anchor_h) * 32
六、组合output
假设三个特征图按照上述步骤进行解码
xywh resize为batch * -1 * 4 ,
conf resize为batch * -1 * 1 ,
pred_cls resize为batch * -1 * 80;
concate: (xywh, conf, pred_cls)
4 * (3 * 20 * 20) * 4 、 4 * (3*20*20) * 1 、 4 * (3 * 20 * 20) * 80 => 4 * 1200 * 85 ,其中4为batch_size
三个特征层最后的输出就是4 *(1200 + 4800 + 19200) * 85 => 4 *(25200) * 85。
七、得分筛选和非极大值抑制
得到组合输出的维度为:4 *(25200) * 85:
首先会将conf大于conf_thres阈值(一般设置为0.25)的框取出来,做一个粗筛选。
xc = prediction[..., 4] > conf_thres
其次会根据类别的数量做置信度的计算,
(1)单个类别:
如果分类的类别只有一个,将会把pred_cls赋值为conf相同的值
x[:, 5:] = x[:, 4:5]
(2)多个类别:
如果分类的类别有多个,将会把conf * pred_cls的结果赋值给pred_cls,关于为什么将conf * pred_cls的结果赋值给pred_cls当做最终的输出,有个解释是参考。
现有的检测模型会预测一个额外的IoU score或centerness score来作为定位精度的评价指标,并把它们和分类得分相乘的结果作为NMS中排序的指标。这些方法可以缓解分类得分和定位准确度之间的不对齐misalignment问题。
x[:, 5:] *= x[:, 4:5]
随后把pred_cls超过conf_thres的目标框取出来后按照普通的nms算法即可得到最终结果。
总结
以上基本就是基于anchor-based的目标检测后处理解码方式,基于anchor会更简单一些。
相关代码:参考bilibili博主bulllling
import numpy as np
import torch
from torchvision.ops import nms
class DecodeBox():
def __init__(self, anchors, num_classes, input_shape, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
super(DecodeBox, self).__init__()
self.anchors = anchors
self.num_classes = num_classes
self.bbox_attrs = 5 + num_classes
self.input_shape = input_shape
#-----------------------------------------------------------#
# 20x20的特征层对应的anchor是[116,90],[156,198],[373,326]
# 40x40的特征层对应的anchor是[30,61],[62,45],[59,119]
# 80x80的特征层对应的anchor是[10,13],[16,30],[33,23]
#-----------------------------------------------------------#
self.anchors_mask = anchors_mask
def decode_box(self, inputs):
outputs = []
for i, input in enumerate(inputs):
#-----------------------------------------------#
# 输入的input一共有三个,他们的shape分别是
# batch_size = 1
# batch_size, 3 * (4 + 1 + 80), 20, 20
# batch_size, 255, 40, 40
# batch_size, 255, 80, 80
#-----------------------------------------------#
batch_size = input.size(0)
input_height = input.size(2)
input_width = input.size(3)
#-----------------------------------------------#
# 输入为640x640时
# stride_h = stride_w = 32、16、8
#-----------------------------------------------#
stride_h = self.input_shape[0] / input_height
stride_w = self.input_shape[1] / input_width
#-------------------------------------------------#
# 此时获得的scaled_anchors大小是相对于特征层的
#-------------------------------------------------#
scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]
# print("scaled_anchors:",scaled_anchors)
#-----------------------------------------------#
# 输入的input一共有三个,他们的shape分别是
# batch_size, 3, 20, 20, 85
# batch_size, 3, 40, 40, 85
# batch_size, 3, 80, 80, 85
#-----------------------------------------------#
prediction = input.view(batch_size, len(self.anchors_mask[i]),
self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
print("prediction:",prediction)
#-----------------------------------------------#
# 先验框的中心位置的调整参数
#-----------------------------------------------#
x = torch.sigmoid(prediction[..., 0])
y = torch.sigmoid(prediction[..., 1])
#-----------------------------------------------#
# 先验框的宽高调整参数
#-----------------------------------------------#
w = torch.sigmoid(prediction[..., 2])
h = torch.sigmoid(prediction[..., 3])
#-----------------------------------------------#
# 获得置信度,是否有物体
#-----------------------------------------------#
conf = torch.sigmoid(prediction[..., 4])
#-----------------------------------------------#
# 种类置信度
#-----------------------------------------------#
pred_cls = torch.sigmoid(prediction[..., 5:])
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
#----------------------------------------------------------#
# 生成网格,先验框中心,网格左上角
# batch_size,3,20,20
#----------------------------------------------------------#
grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)
#----------------------------------------------------------#
# 按照网格格式生成先验框的宽高
# batch_size,3,20,20
#----------------------------------------------------------#
anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)
#----------------------------------------------------------#
# 利用预测结果对先验框进行调整
# 首先调整先验框的中心,从先验框中心向右下角偏移
# 再调整先验框的宽高。
# x 0 ~ 1 => 0 ~ 2 => -0.5, 1.5 => 负责一定范围的目标的预测
# y 0 ~ 1 => 0 ~ 2 => -0.5, 1.5 => 负责一定范围的目标的预测
# w 0 ~ 1 => 0 ~ 2 => 0 ~ 4 => 先验框的宽高调节范围为0~4倍
# h 0 ~ 1 => 0 ~ 2 => 0 ~ 4 => 先验框的宽高调节范围为0~4倍
#----------------------------------------------------------#
pred_boxes = FloatTensor(prediction[..., :4].shape)
pred_boxes[..., 0] = x.data * 2. - 0.5 + grid_x
pred_boxes[..., 1] = y.data * 2. - 0.5 + grid_y
pred_boxes[..., 2] = (w.data * 2) ** 2 * anchor_w
pred_boxes[..., 3] = (h.data * 2) ** 2 * anchor_h
#----------------------------------------------------------#
# 将输出结果归一化成小数的形式
#----------------------------------------------------------#
_scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor)
output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale,
conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1)
outputs.append(output.data)
return output
def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
#----------------------------------------------------------#
# 将预测结果的格式转换成左上角右下角的格式。
# prediction [batch_size, num_anchors, 85]
#----------------------------------------------------------#
box_corner = np.zeros_like(prediction)
box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
prediction[:, :, :4] = box_corner[:, :, :4]
output = [None for _ in range(len(prediction))]
for i, image_pred in enumerate(prediction):
#----------------------------------------------------------#
# 对种类预测部分取max。
# class_conf [num_anchors, 1] 种类置信度
# class_pred [num_anchors, 1] 种类
#----------------------------------------------------------#
class_conf = np.max(image_pred[:, 5:5 + num_classes], 1, keepdims=True)
class_pred = np.expand_dims(np.argmax(image_pred[:, 5:5 + num_classes], 1), -1)
#----------------------------------------------------------#
# 利用置信度进行第一轮筛选
#----------------------------------------------------------#
conf_mask = np.squeeze((image_pred[:, 4] * class_conf[:, 0] >= conf_thres))
#----------------------------------------------------------#
# 根据置信度进行预测结果的筛选
#----------------------------------------------------------#
image_pred = image_pred[conf_mask]
class_conf = class_conf[conf_mask]
class_pred = class_pred[conf_mask]
if not np.shape(image_pred)[0]:
continue
#-------------------------------------------------------------------------#
# detections [num_anchors, 7]
# 7的内容为:x1, y1, x2, y2, obj_conf, class_conf, class_pred
#-------------------------------------------------------------------------#
detections = np.concatenate((image_pred[:, :5], class_conf, class_pred), 1)
#------------------------------------------#
# 获得预测结果中包含的所有种类
#------------------------------------------#
unique_labels = np.unique(detections[:, -1])
for c in unique_labels:
#------------------------------------------#
# 获得某一类得分筛选后全部的预测结果
#------------------------------------------#
detections_class = detections[detections[:, -1] == c]
# 按照存在物体的置信度排序
conf_sort_index = np.argsort(detections_class[:, 4] * detections_class[:, 5])[::-1]
detections_class = detections_class[conf_sort_index]
# 进行非极大抑制
max_detections = []
while np.shape(detections_class)[0]:
# 取出这一类置信度最高的,一步一步往下判断,判断重合程度是否大于nms_thres,如果是则去除掉
max_detections.append(detections_class[0:1])
if len(detections_class) == 1:
break
ious = self.bbox_iou(max_detections[-1], detections_class[1:])
detections_class = detections_class[1:][ious < nms_thres]
# 堆叠
max_detections = np.concatenate(max_detections, 0)
# Add max detections to outputs
output[i] = max_detections if output[i] is None else np.concatenate((output[i], max_detections))
if output[i] is not None:
output[i] = output[i]
box_xy, box_wh = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
output[i][:, :4] = self.yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
return output
if __name__ == "__main__":
import matplotlib.pyplot as plt
import numpy as np
#---------------------------------------------------#
# 将预测值的每个特征层调成真实值
#---------------------------------------------------#
def get_anchors_and_decode(input, input_shape, anchors, anchors_mask, num_classes):
#-----------------------------------------------#
# input batch_size, 3 * (4 + 1 + num_classes), 20, 20
#-----------------------------------------------#
batch_size = input.size(0)
input_height = input.size(2)
input_width = input.size(3)
#-----------------------------------------------#
# 输入为640x640时 input_shape = [640, 640] input_height = 20, input_width = 20
# 640 / 20 = 32
# stride_h = stride_w = 32
#-----------------------------------------------#
stride_h = input_shape[0] / input_height
stride_w = input_shape[1] / input_width
#-------------------------------------------------#
# 此时获得的scaled_anchors大小是相对于特征层的
# anchor_width, anchor_height / stride_h, stride_w
#-------------------------------------------------#
# [116, 90], [156, 198], [373, 326] / 32
scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in anchors[anchors_mask[2]]]
# [(3.625, 2.8125), (4.875, 6.1875), (11.65625, 10.1875)]
print("scaled_anchors:",scaled_anchors)
#-----------------------------------------------#
# batch_size, 3 * (4 + 1 + num_classes), 20, 20 =>
# batch_size, 3, 5 + num_classes, 20, 20 =>
# batch_size, 3, 20, 20, 4 + 1 + num_classes
#-----------------------------------------------#
print("batch_size:",batch_size)
prediction = input.view(batch_size, len(anchors_mask[2]),
num_classes + 5, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
# print("prediction:",prediction.shape)
#-----------------------------------------------#
# 先验框的中心位置的调整参数
#-----------------------------------------------#
x = torch.sigmoid(prediction[..., 0])
y = torch.sigmoid(prediction[..., 1])
#-----------------------------------------------#
# 先验框的宽高调整参数
#-----------------------------------------------#
w = torch.sigmoid(prediction[..., 2])
h = torch.sigmoid(prediction[..., 3])
#-----------------------------------------------#
# 获得置信度,是否有物体 0 - 1
#-----------------------------------------------#
conf = torch.sigmoid(prediction[..., 4])
#-----------------------------------------------#
# 种类置信度 0 - 1
#-----------------------------------------------#
pred_cls = torch.sigmoid(prediction[..., 5:])
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
#----------------------------------------------------------#
# 生成网格,先验框中心,网格左上角
# batch_size,3,20,20
# range(20)
# [
# [0, 1, 2, 3 ……, 19],
# [0, 1, 2, 3 ……, 19],
# …… (20次)
# [0, 1, 2, 3 ……, 19]
# ] * (batch_size * 3)
# [batch_size, 3, 20, 20]
#
# [
# [0, 1, 2, 3 ……, 19],
# [0, 1, 2, 3 ……, 19],
# …… (20次)
# [0, 1, 2, 3 ……, 19]
# ].T * (batch_size * 3)
# [batch_size, 3, 20, 20]
#----------------------------------------------------------#
# [4, 3, 20, 20]
grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
batch_size * len(anchors_mask[2]), 1, 1).view(x.shape).type(FloatTensor)
# print("grid_x:",grid_x.shape)
# print("grid_x:",grid_x)
# [4, 3, 20, 20]
grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
batch_size * len(anchors_mask[2]), 1, 1).view(y.shape).type(FloatTensor)
# print("grid_x:",grid_x,len(grid_x),len(grid_x[0]),len(grid_x[0][0]),len(grid_x[0][0][0]))
# print("grid_y:",grid_y)
#----------------------------------------------------------#
# 按照网格格式生成先验框的宽高
# batch_size, 3, 20 * 20 => batch_size, 3, 20, 20
# batch_size, 3, 20 * 20 => batch_size, 3, 20, 20
#----------------------------------------------------------#
anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
# print("anchor_w:",anchor_w)
# print("anchor_w:",anchor_w.shape)
# print("anchor_h:",anchor_h)
# print("anchor_h:",anchor_h.shape)
anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)
# print("anchor_w:",anchor_w)
# print("anchor_w:",anchor_w.shape)
# print("anchor_h:",anchor_h)
# print("anchor_h:",anchor_h.shape)
#----------------------------------------------------------#
# 利用预测结果对先验框进行调整
# 首先调整先验框的中心,从先验框中心向右下角偏移
# 再调整先验框的宽高。
# x 0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_x
# y 0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_y
# w 0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_w
# h 0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_h
#----------------------------------------------------------#
pred_boxes = FloatTensor(prediction[..., :4].shape)
pred_boxes[..., 0] = x.data * 2. - 0.5 + grid_x
pred_boxes[..., 1] = y.data * 2. - 0.5 + grid_y
pred_boxes[..., 2] = (w.data * 2) ** 2 * anchor_w
pred_boxes[..., 3] = (h.data * 2) ** 2 * anchor_h
point_h = 5
point_w = 5
box_xy = pred_boxes[..., 0:2].cpu().numpy() * 32
box_wh = pred_boxes[..., 2:4].cpu().numpy() * 32
grid_x = grid_x.cpu().numpy() * 32
grid_y = grid_y.cpu().numpy() * 32
anchor_w = anchor_w.cpu().numpy() * 32
anchor_h = anchor_h.cpu().numpy() * 32
fig = plt.figure()
ax = fig.add_subplot(121)
from PIL import Image
img = Image.open("street.jpg").resize([640, 640])
plt.imshow(img, alpha=0.5)
plt.ylim(-30, 650)
plt.xlim(-30, 650)
plt.scatter(grid_x, grid_y)
plt.scatter(point_h * 32, point_w * 32, c='black')
plt.gca().invert_yaxis()
anchor_left = grid_x - anchor_w / 2
anchor_top = grid_y - anchor_h / 2
rect1 = plt.Rectangle([anchor_left[0, 0, point_h, point_w],anchor_top[0, 0, point_h, point_w]], \
anchor_w[0, 0, point_h, point_w],anchor_h[0, 0, point_h, point_w],color="r",fill=False)
rect2 = plt.Rectangle([anchor_left[0, 1, point_h, point_w],anchor_top[0, 1, point_h, point_w]], \
anchor_w[0, 1, point_h, point_w],anchor_h[0, 1, point_h, point_w],color="r",fill=False)
rect3 = plt.Rectangle([anchor_left[0, 2, point_h, point_w],anchor_top[0, 2, point_h, point_w]], \
anchor_w[0, 2, point_h, point_w],anchor_h[0, 2, point_h, point_w],color="r",fill=False)
ax.add_patch(rect1)
ax.add_patch(rect2)
ax.add_patch(rect3)
ax = fig.add_subplot(122)
plt.imshow(img, alpha=0.5)
plt.ylim(-30, 650)
plt.xlim(-30, 650)
plt.scatter(grid_x, grid_y)
plt.scatter(point_h * 32, point_w * 32, c='black')
plt.scatter(box_xy[0, :, point_h, point_w, 0], box_xy[0, :, point_h, point_w, 1], c='r')
plt.gca().invert_yaxis()
pre_left = box_xy[...,0] - box_wh[...,0] / 2
pre_top = box_xy[...,1] - box_wh[...,1] / 2
rect1 = plt.Rectangle([pre_left[0, 0, point_h, point_w], pre_top[0, 0, point_h, point_w]],\
box_wh[0, 0, point_h, point_w,0], box_wh[0, 0, point_h, point_w,1],color="r",fill=False)
rect2 = plt.Rectangle([pre_left[0, 1, point_h, point_w], pre_top[0, 1, point_h, point_w]],\
box_wh[0, 1, point_h, point_w,0], box_wh[0, 1, point_h, point_w,1],color="r",fill=False)
rect3 = plt.Rectangle([pre_left[0, 2, point_h, point_w], pre_top[0, 2, point_h, point_w]],\
box_wh[0, 2, point_h, point_w,0], box_wh[0, 2, point_h, point_w,1],color="r",fill=False)
ax.add_patch(rect1)
ax.add_patch(rect2)
ax.add_patch(rect3)
#plt.show()
#
feat = torch.from_numpy(np.random.normal(0.2, 0.5, [4, 255, 20, 20])).float()
anchors = np.array([[116, 90], [156, 198], [373, 326], [30,61], [62,45], [59,119], [10,13], [16,30], [33,23]])
anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
get_anchors_and_decode(feat, [640, 640], anchors, anchors_mask, 80)