【FPN】《Feature Pyramid Networks for Object Detection》-CSDN博客

本文链接：https://blog.csdn.net/bryant_meng/article/details/81351096

这里写图片描述

CVPR 2017

Feature pyramids are a basic component in recognition systems for detecting objects at different scales.

But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive.

作者提出来FPN，with marginal extra cost，without bells and whistles（花里胡哨）, surpassing all existing single-model entries including those from the COCO 2016 challenge winners.

1 Motivation

Recognizing objects at vastly different scales is a fundamental challenge in computer vision.

这里写图片描述

rcnn系列在单个scale的feature map做检测 (b)，尽管conv已经对scale有些鲁棒了，但是还是不够。物体各种各样的scale还是是个难题，尤其是小物体，所以有很多论文在这上面做工作，最简单的做法就是类似于数据增强了，train时把图片放缩成不同尺度送入网络进行训练，但是图片变大很吃内存，一般只在测试时放缩图片，这样一来测试时需要测试好几遍时间就慢了(a)。另一种就是SSD的做法©，在不同尺度的feature map上做检测，按理说它该在计算好的不同 scale 的 feature map 上做检测，但是它放弃了前面的low-level 的 feature map，而是从 conv4_3 开始用而且在后面加了一些 conv，生成更多高层语义的 feature map 在上面检测.

所以本文就想即利用 conv net 本身的这种已经计算过的不同 scale 的 feature，又想让 low-level 的高分辩的 feature具有很强的语义，所以自然的想法就是把 high-level 的低分辨的 feature map 融合过来。类似的工作还有RON: Reverse Connection with Objectness Prior Networks for Object Detection

通常卷积神经网络中都会使用这两种类型的features: 卷积神经网络的前几层学习low level feature，后几层学习的是high level feature。作者 combines low-resolution, semantically strong features with high-resolution, semantically weak features.

2 Notion

Low-level feature：通常是指图像中的一些小的细节信息，例如edge、corner、color、pixeles、gradients等，这些信息可以通过滤波器、SIFT或HOG获取；
High level feature：是建立在low level feature之上的，可以用于图像中目标或物体形状的识别和检测，具有更丰富的语义信息。
Image pyramid

3 Advantages

In ablation experiments, we find that for bounding box proposals, FPN significantly increases the Average Recall (AR) by 8.0 points; for object detection, it improves the COCO-style Average Precision (AP) by 2.3 points and PASCAL-style AP by 3.8 points.
In addition, our pyramid structure can be trained end-to-end with all scales and is used consistently at train/test time, which would be memory-infeasible using image pyramids.

4 Feature Pyramid Networks

这里写图片描述

4.1 Bottom-up pathway

作者用的是ResNet

We denote the output of these last residual blocks as ｛C2;C3;C4;C5｝ for conv2, conv3, conv4, and conv5 outputs, and note that they have strides of **｛4, 8, 16, 32｝**pixels with respect to the input image.

在这里插入图片描述

4.2 Top-down pathway and lateral connections

we upsample the spatial resolution by a factor of 2 (using nearest neighbor upsampling for simplicity). The upsampled map is then merged with the corresponding bottom-up map (which undergoes a 1×1 convolutional layer to reduce channel dimensions) by element-wise addition.

Designing better connection modules is not the focus of this paper, so we opt for the simple design described above.

4.3 利用FPN构建Faster R-CNN检测器步骤

在这里插入图片描述

首先，选择一张需要处理的图片，然后对该图片进行预处理操作；
然后，将处理过的图片送入预训练的特征网络中（如ResNet等），即构建所谓的bottom-up网络；
接着，如图5所示，构建对应的top-down网络（即对层4进行上采样操作，先用1x1的卷积对层2进行降维处理，然后将两者相加（对应元素相加），最后进行3x3的卷积操作，最后）；
接着，在图中的4、5、6层上面分别进行RPN操作，即一个3x3的卷积后面分两路，分别连接一个1x1的卷积用来进行分类和回归操作；
接着，将上一步获得的候选ROI分别输入到4、5、6层上面分别进行ROI Pool操作（固定为7x7的特征）；
最后，在上一步的基础上面连接两个1024层的全连接网络层，然后分两个支路，连接对应的分类层和回归层；

5 Applications

5.1 Feature Pyramid Networks for RPN

RPN is a sliding-window class-agnostic object detector.

Because the head slides densely over all locations in all pyramid levels, it is not necessary to have multi-scale anchors on a specific level. Instead, we assign anchors of a single scale to each level.

之前一层，anchor 多个 scale
现在多层，anchor 一个 scale

在这里插入图片描述

RPN生成roi后对应feature时在哪个level上取呢？
$k_0$ 是faster rcnn时在哪取的feature map如resnet那篇文章是在C4取的， $k_0$ =4 (C5相当于fc，也有在C5取的，在后面再多添加fc)，比如roi是 $w /2$ , $h /2$ （ $w * h = 224$ ），那么 $k=k_0-1= 4-1=3$

5.2 Feature Pyramid Networks for Fast RCNN

Fast R-CNN is a region-based object detector in which Region-of-Interest (RoI) pooling is used to extract features. Fast R-CNN is most commonly performed on a single-scale feature map. To use it with our FPN, we need to assign RoIs of different scales to the pyramid levels.

6 Experiments on Object Detection

6.1 Region Proposal with RPN

看看加入FPN的RPN网络的有效性，如下表Table1。网络这些结果都是基于ResNet-50。评价标准采用AR，AR表示Average Recall，AR右上角的100表示每张图像有100个anchor，AR的右下角s，m，l表示COCO数据集中object的大小分别是小，中，大。feature列的大括号{}表示每层独立预测。

这里写图片描述

从（a）（b）（c）的对比可以看出FRN的作用确实很明显。另外（a）和（b）的对比可以看出高层特征并非比低一层的特征有效。

6.1.1 How important is top-down enrichment?

Table 1(d)

表示只有横向连接，而没有自顶向下的过程，也就是仅仅对自底向上（bottom-up）的每一层结果做一个11的横向连接和33的卷积得到最终的结果，有点像Fig1的（b）。从feature列可以看出预测还是分层独立的。作者推测（d）的结果并不好的原因在于在自底向上的不同层之间的semantic gaps比较大。

6.1.2 How important are lateral connections?

Table 1(e)
这样效果也不好的原因在于目标的location特征在经过多次降采样和上采样过程后变得更加不准确。

6.1.3 How important are pyramid representations?

Table 1(f)
这里写图片描述

6.2 Object Detection with Fast/Faster RCNN

fast rcnn
这里写图片描述

faster rcnn
这里写图片描述

6.3 Comparing with COCO CompetitionWinners

这里写图片描述

7 Extensions: Segmentation Proposals

其它的应用
Our method is a generic pyramid representation and can be used in applications other than object detection（to generate segmentation proposals）.