锚框的-初步实现

最新推荐文章于 2023-12-22 16:40:25 发布

不再懒惰

最新推荐文章于 2023-12-22 16:40:25 发布

阅读量640

点赞数 1

文章标签：深度学习目标检测计算机视觉

本文链接：https://blog.csdn.net/qq_59841463/article/details/127802711

版权

大家好，我是阿林。学习目标检测，就首先学习锚框的生成。希望大家一起学习基础。我在学习中，如有错误还望指出。

%matplotlib inline 
import torch
from d2l import torch as d2l

torch.set_printoptions(2)

##这里是学习锚框使用的pytorch的一些基础的知识。

## 每日一学基础知识

#pytorch.torch.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, profile=None, sci_mode=None)

设置precision显示的精度，threshold显示的数据数量，超过丢弃。一般都是用于设置精度

#torch.meshgrid(centor_h,centor_w)

一般用于生成网格，一般用连续的数组去制造网格。

C = torch.cat( (A,B),0 ) #按维数0拼接（竖着拼）

C = torch.cat( (A,B),1 ) #按维数1拼接（横着拼）

#torch.stack(sequence, dim=0)横横竖

沿一个新维度对输入张量序列进行连接，序列中所有张量应为相同形状；

stack 函数返回的结果会新增一个维度，而stack（）函数指定的dim参数，就是新增维度的（下标）位置。

import torch

a = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

b = torch.tensor([[11, 22, 33], [44, 55, 66], [77, 88, 99]])

c = torch.stack([a, b], 1)

print(a)

print(b)

print(c)

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

tensor([[11, 22, 33],
        [44, 55, 66],
        [77, 88, 99]])

tensor([[[ 1,  2,  3],
         [11, 22, 33]],
 
        [[ 4,  5,  6],
         [44, 55, 66]],
 
        [[ 7,  8,  9],
         [77, 88, 99]]])

PyTorch中的repeat()函数可以对张量进行重复扩充。

当参数只有两个时：（列的重复倍数，行的重复倍数）。1表示不重复

当参数有三个时：（通道数的重复倍数，列的重复倍数，行的重复倍数）。

#torch.repeat_interleave(input, repeats, dim=None)

x = torch.tensor([1, 2, 3])

x.repeat_interleave(2)

tensor([1, 1, 2, 2, 3, 3])

y = torch.tensor([[1, 2], [3, 4]])

torch.repeat_interleave(y, 2)

tensor([1, 1, 2, 2, 3, 3, 4, 4])

torch.repeat_interleave(y,3,0)

tensor([[1, 2],
        [1, 2],
        [1, 2],
        [3, 4],
        [3, 4],
        [3, 4]])
torch.repeat_interleave(y, 3, dim=1)

tensor([[1, 1, 1, 2, 2, 2],
        [3, 3, 3, 4, 4, 4]])


torch.repeat_interleave(y, torch.tensor([1, 2]), dim=0)
tensor([[1, 2],
        [3, 4],
        [3, 4]])

torch.squeeze()

这个是对通道进行压缩。可以看出，维度为(1,2,1,3)直接变为了(2,3)。即去掉了维度为1的所有维度。

##基础知识已经学习完，我们再次进入到学习锚框中。


def multibox_prior(data,sizes,ratios):
    """生成以每个像素为中心的具有不同形状的锚框"""
    in_height,in_width = data.shape[-2:]
    #data是什么形状的，data.shape检测一下，sizes是相对于图片所占得面积的比例，ratios是锚框的高宽比。
    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
    #size和ratios是一个字典，上面的式子是将其转化为数组
    boxes_per_pixel = (num_sizes + num_ratios - 1)
    #因为如果使用sizes*ratios作为锚框的数量会太多的不利于计算，所以使用规定每个像素的锚框数量。
    size_tensor = torch.tensor(sizes,device=device)
    ratio_tensor = torch.tensor(ratios,device=device)
    # 为了将锚点移动到像素的中心，需要设置偏移量。
    # 因为一个像素的的高为1且宽为1，我们选择偏移我们的中心0.5
    offset_h,offset_w = 0.5,0.5
    steps_h = 1.0 / in_height #在y轴上缩放步长
    steps_w = 1.0 / in_width #在x轴上缩放步长
    #为什么要缩放步长呢。获得了每个像素的中心点也是锚框的中心点
    centor_h = (torch.arange(in_height,device=device)+offset_h)*steps_h
    centor_w = (torch.arange(in_width,device=device)+offset_w)*steps_w
    
    #生成网格，用于生成坐标
    shift_y,shift_x = torch.meshgrid(centor_h,centor_w)

    shift_y,shift_x = shift_y.reshape(-1),shift_x.reshape(-1)#固定行数或者列数
 
    # 通过在matplotlib中进行可视化，来查看函数运行后得到的网格化数据的结果
    # plt.plot(shift_x, shift_y, marker='.', color='red', linestyle="none",markersize="0.1")
    # plt.show()

    # 生成“n+m-1”个高和宽
    w = torch.cat((size_tensor * torch.sqrt(ratio_tensor),sizes[0] * torch.sqrt(ratio_tensor[1:])))* in_height / in_width
    h = torch.cat((size_tensor / torch.sqrt(ratio_tensor),sizes[0] /torch.sqrt(ratio_tensor[1:])))


    #相当与生成中心点上下加减高宽的矩形锚框的矩阵 
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(in_height * in_width, 1) / 2

    #这个相当于得到网格元素中心点的坐标为什么有两对坐标呢，那是因为他要对应加上高宽的坐标，和减去高宽的坐标
    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y], dim=1).repeat_interleave(boxes_per_pixel, dim = 0)
    
    
    output = out_grid + anchor_manipulations
    
    return output.unsqueeze(0)

img = d2l.plt.imread('./pytorch/img/catdog.jpg')
#因为它的通道数是在最后面的
h,w = img.shape[:2]
print(h,w)
X = torch.rand(size=(1,3,h,w))

Y = multibox_prior(X,sizes=(0.75,0.5,0.25),ratios=[1,2,0.5])

Y

#相当于将锚框分为某个像素点的的哪一个锚框的四角坐标。
boxes = Y.reshape(h,w,5,4)
boxes[250,250,0,:]

def show_bboxes(axes,bboxes,labels=None,colors=None):
    """显示所有的边界框"""
    def _make_list(obj,default_values=None):
        if obj is None:
            obj = default_values
        elif not isinstance(obj,(list,tuple)):
            obj = [obj]
        return obj
    
    labels = _make_list(labels)
    print(labels)
    colors = _make_list(colors,['b', 'g', 'r', 'm', 'c'])
    print(colors)
    for i,bbox in enumerate(bboxes):
        color = colors[i % len(colors)]
        rect = d2l.bbox_to_rect(bbox.detach().numpy(), color)
        axes.add_patch(rect)
        if labels and len(labels) > i:
            text_color = 'k' if color == 'w' else 'w'
            axes.text(rect.xy[0], rect.xy[1], labels[i],
                      va='center', ha='center', fontsize=9, color=text_color,
                      bbox=dict(facecolor=color, lw=0))

d2l.set_figsize()
bbox_scale = torch.tensor((w, h, w, h))
fig = d2l.plt.imshow(img)

show_bboxes(fig.axes, boxes[250, 250, :, :] * bbox_scale,
            ['s=0.75, r=1', 's=0.5, r=1', 's=0.25, r=1', 's=0.75, r=2',
             's=0.75, r=0.5'])

以上如果还是有些不懂代码的含义的话，可以一步一步的将其变量打印出来，可以有利于理解，阿林我也是这样一步一步的理解的。

最后出现的效果是