Faster-rcnn源码解析之RPN

最新推荐文章于 2022-01-16 17:11:54 发布

Jane_JU

最新推荐文章于 2022-01-16 17:11:54 发布

阅读量221

点赞数

分类专栏： FasterRCNN 文章标签： python 目标检测人工智能

原文链接：https://blog.csdn.net/xinjieyuan/article/details/105205326

版权

FasterRCNN 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

def grid_anchors(self, grid_sizes, strides):
    # type: (List[List[int]], List[List[Tensor]]) -> List[Tensor]
    """
    anchors position in grid coordinate axis map into origin image
    计算预测特征图对应原始图像上的所有anchors的坐标
    Args:
        grid_sizes: 预测特征矩阵的height和width
        strides: 预测特征矩阵上一步对应原始图像上的步距
    """
    anchors = []
    cell_anchors = self.cell_anchors
    assert cell_anchors is not None

    # 遍历每个预测特征层的grid_size，strides和cell_anchors
    for size, stride, base_anchors in zip(grid_sizes, strides, cell_anchors):
        grid_height, grid_width = size
        stride_height, stride_width = stride
        device = base_anchors.device

        # For output anchor, compute [x_center, y_center, x_center, y_center]
        # shape: [grid_width] 对应原图上的x坐标(列)
        shifts_x = torch.arange(0, grid_width, dtype=torch.float32, device=device) * stride_width
        # shape: [grid_height] 对应原图上的y坐标(行)
        shifts_y = torch.arange(0, grid_height, dtype=torch.float32, device=device) * stride_height

        # 计算预测特征矩阵上每个点对应原图上的坐标(anchors模板的坐标偏移量)
        # torch.meshgrid函数分别传入行坐标和列坐标，生成网格行坐标矩阵和网格列坐标矩阵
        # shape: [grid_height, grid_width]
        shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)
        shift_x = shift_x.reshape(-1)
        shift_y = shift_y.reshape(-1)

        # 计算anchors坐标(xmin, ymin, xmax, ymax)在原图上的坐标偏移量
        # shape: [grid_width*grid_height, 4]
        shifts = torch.stack([shift_x, shift_y, shift_x, shift_y], dim=1)

        # For every (base anchor, output anchor) pair,
        # offset each zero-centered base anchor by the center of the output anchor.
        # 将anchors模板与原图上的坐标偏移量相加得到原图上所有anchors的坐标信息(shape不同时会使用广播机制)
        shifts_anchor = shifts.view(-1, 1, 4) + base_anchors.view(1, -1, 4)
        anchors.append(shifts_anchor.reshape(-1, 4))

    return anchors  # List[Tensor(all_num_anchors, 4)]

.reshape(-1)或.reshape(1,-1)将数组横向平铺
在pytorch中，常见的拼接函数主要是两个，分别是：
stack()
cat()
函数的意义：使用stack可以保留两个信息：[1. 序列] 和 [2. 张量矩阵] 信息，属于【扩张再拼接】的函数。

形象的理解：假如数据都是二维矩阵(平面)，它可以把这些一个个平面按第三维(例如：时间序列)压成一个三维的立方体，而立方体的长度就是时间序列长度。

该函数常出现在自然语言处理（NLP）和图像卷积神经网络(CV)中。

1. stack()
官方解释：沿着一个新维度对输入张量序列进行连接。序列中所有的张量都应该为相同形状。

浅显说法：把多个2维的张量凑成一个3维的张量；多个3维的凑成一个4维的张量…以此类推，也就是在增加新的维度进行堆叠。

outputs = torch.stack(inputs, dim=?) → Tensor
参数

inputs : 待连接的张量序列。
注：python的序列数据只有list和tuple。

dim : 新的维度，必须在0到len(outputs)之间。
注：len(outputs)是生成数据的维度大小，也就是outputs的维度值。

2. 重点
函数中的输入inputs只允许是序列；且序列内部的张量元素，必须shape相等
----举例：[tensor_1, tensor_2,…]或者(tensor_1, tensor_2,…)，且必须tensor_1.shape == tensor_2.shape

dim是选择生成的维度，必须满足0<=dim<len(outputs)；len(outputs)是输出后的tensor的维度大小
不懂的看例子，再回过头看就懂了。

3. 例子
1.准备2个tensor数据，每个的shape都是[3,3]

假设是时间步T1的输出

T1 = torch.tensor([[1, 2, 3],
        		[4, 5, 6],
        		[7, 8, 9]])

假设是时间步T2的输出

T2 = torch.tensor([[10, 20, 30],
        		[40, 50, 60],
        		[70, 80, 90]])

2.测试stack函数

print(torch.stack((T1,T2),dim=0).shape)
print(torch.stack((T1,T2),dim=1).shape)
print(torch.stack((T1,T2),dim=2).shape)
print(torch.stack((T1,T2),dim=3).shape)

outputs:

torch.Size([2, 3, 3])
torch.Size([3, 2, 3])
torch.Size([3, 3, 2])
'选择的dim>len(outputs)，所以报错'
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)

可以复制代码运行试试：拼接后的tensor形状，会根据不同的dim发生变化。

dim	shape
0	[2, 3, 3]
1	[3, 2, 3]
2	[3, 3, 2]
3	溢出报错

4. 总结
函数作用：
函数stack()对序列数据内部的张量进行扩维拼接，指定维度由程序员选择、大小是生成后数据的维度区间。

存在意义：
在自然语言处理和卷及神经网络中，通常为了保留–[序列(先后)信息] 和 [张量的矩阵信息] 才会使用stack。

函数存在意义？》》》

手写过RNN的同学，知道在循环神经网络中输出数据是：一个list，该列表插入了seq_len个形状是[batch_size, output_size]的tensor，不利于计算，需要使用stack进行拼接，保留–[1.seq_len这个时间步]和–[2.张量属性[batch_size, output_size]]。

一般torch.cat()是为了把多个tensor进行拼接而存在的。实际使用中，和torch.stack()使用场景不同：参考链接torch.stack()，但是本文主要说cat()。

torch.cat() 和python中的内置函数cat()，在使用和目的上，是没有区别的，区别在于前者操作对象是tensor。

1. cat()
函数目的：在给定维度上对输入的张量序列seq 进行连接操作。

outputs = torch.cat(inputs, dim=?) → Tensor
参数
inputs : 待连接的张量序列，可以是任意相同Tensor类型的python 序列
dim : 选择的扩维, 必须在0到len(inputs[0])之间，沿着此维连接张量序列。
2. 重点
输入数据必须是序列，序列中数据是任意相同的shape的同类型tensor
维度不可以超过输入数据的任一个张量的维度
3.举例子
1.准备数据，每个的shape都是[2,3]

# x1
x1 = torch.tensor([[11,21,31],[21,31,41]],dtype=torch.int)
x1.shape # torch.Size([2, 3])
# x2
x2 = torch.tensor([[12,22,32],[22,32,42]],dtype=torch.int)
x2.shape  # torch.Size([2, 3])

2.合成inputs

'inputs为２个形状为[2 , 3]的矩阵 '
inputs = [x1, x2]
print(inputs)
'打印查看'
[tensor([[11, 21, 31],
         [21, 31, 41]], dtype=torch.int32),
 tensor([[12, 22, 32],
         [22, 32, 42]], dtype=torch.int32)]

3.查看结果, 测试不同的dim拼接结果

In    [1]: torch.cat(inputs, dim=0).shape
Out[1]: torch.Size([4,  3])

In    [2]: torch.cat(inputs, dim=1).shape
Out[2]: torch.Size([2, 6])

In    [3]: torch.cat(inputs, dim=2).shape
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

大家可以复制代码运行一下就会发现其中规律了。

总结
通常用来，把torch.stack得到tensor进行拼接而存在的。

Jane_JU

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Faster-rcnn源码解析之RPN

def grid_anchors(self, grid_sizes, strides): # type: (List[List[int]], List[List[Tensor]]) -> List[Tensor] """ anchors position in grid coordinate axis map into origin image 计算预测特征图对应原始图像上的所有anchors的坐标 Args: grid_sizes: 预测特征矩
复制链接

扫一扫

专栏目录