faster rcnn-CSDN博客

本文链接：https://blog.csdn.net/SEREBA/article/details/123332196

1、卷积层

所有卷积层是kernel_size=3，pad=1，stride=1，保证了图片大小不被改变。
所有的池化层是kernel_size=2，pad=0，stride=2。
经过卷积得到256-d的feature maps(以VGG为例)

2、RPN

（1）anchor的生成

有三个参数，base_anchor_size/stride_lenght=16，anchor_ratios=(0.5,1.0,2.0)
anchor_scale=(8,16,32)
第一个anchor的坐标为(0,0,base_anchor_size-1=15,base_anchor_size-1=15)[(左上角x，左上角y，宽，高)]，中心点坐标为(7.5,7.5)。
根据ratios计算三种不同的宽高比：0.5,1.0,2.0，则得到不同比例的anchor：

ratio	center x	center y	width	height
2.0	7.5	7.5	23	12
1.0	7.5	7.5	16	16
0.5	7.5	7.5	11	22

根据三种不同长宽比例的anchor，再生成3*3种不同缩放比例的anchor。

scale	ratio	center x	center y	width	height
8	2.0	7.5	7.5	184	96
16	2.0	7.5	7.5	368	192
32	2.0	7.5	7.5	736	384
8	1.0	7.5	7.5	128	128
16	1.0	7.5	7.5	256	256
32	1.0	7.5	7.5	512	512
8	0.5	7.5	7.5	88	176
16	0.5	7.5	7.5	176	352
32	0.5	7.5	7.5	352	704

假设feature map大小为 $800 * 600$ ，则会产生 $800 * 600 * 9 = 4, 320, 000$ 个anchors框。
在RPN层产生的anchors用于判断该网格中是否有目标，若存在目标为Postive，否则是negative。那么每个网格的每一个anchor框都会产生这样的2分类，也就是一张w*h大小的feature maps，输出结果为 $w * h * 9 * 2$ 。
（由于产生的anchors框太多，因此从中选取合适的128个positive和128个negative用于训练）

（2）anchor的调整(第一次坐标回归)

上面产生的positve anchors的形状与实际的GT box对比是不一样的，需要对positve anchors经过某种变换，使其接近GT box。
计算positve anchors与GT box之间的平移量 $t_x,t_y)$ 和缩放因子 $t_w,t_h)$ 。也就是一张w*h大小的feature maps，输出结果为 $w * h * 9 * 4$ 。

经过RPN网络完成anchors的二分类以及坐标的微调。

（3）选择合适的anchors

当anchor与GT的IOU>0.7则记为positive anchor，当anchor与GT的IOU<0.3记为negative anchor，而期间0.3< IOU<0.7的anchors是不参与训练。

3、Proposal layer

在这层中会传进im_info信息，包括图像W，图像H以及feat_scale。图像输入卷积网络前将缩放大小为 $M * N$ ，经过卷积网络后会缩放大小，如VGG网络，原始图像将会变为原来的1/16， $(W / 16, H / 16)$ 则feat_scale=16。

(1)调整anchor

将上一步RPN网络生成的anchor，根据坐标回归 $t_x,t_y,t_w,t_h)$ ，调整大小。

(2)排序anchor

根据positve score，从大到小排序anchors，并取前N个anchors微调其大小，形成positive anchors。

(3)剔除部分positve anchor

限制positve anchors不能超出图像边界(映射回原图查看是否越界)，并去除一些尺寸非常小的positive anchors。对剩余的positive anchors进行NMS。

4、ROI Pooling

输入：RPN网络输出不同大小的proposal boxes和卷积网络后得到的feature maps，
roi pooling层主要解决网络输出不固定的问题。

将proposal boxes映射到feature map对应的位置。
再将映射后的区域划分成 $pool_w* pool_h$ 大小的sections
对每个section进行max pooling
经过上面操作后，固定了输出的大小，得到proposal feature maps。
ROI Pooling层详解

5、Classification(第二次坐标回归)

从ROI Pooling中得到 $7 * 7 = 49$ 个proposal feature maps，对得到的proposal feature maps经过全连接层和softmax后得到真实属于的类别，并再次计算坐标位置回归。
在这里需要进行全连接层，因此输入和输出的形状都是固定的，而上一步的ROI Pooling刚好就可以固定网络的输入。

参考：一文读懂Faster RCNN