论文阅读：FPN

最新推荐文章于 2022-02-20 12:37:30 发布

贾小树

最新推荐文章于 2022-02-20 12:37:30 发布

阅读量224

点赞数

分类专栏：目标分类目标检测论文阅读

本文链接：https://blog.csdn.net/j879159541/article/details/99486663

版权

论文阅读同时被 3 个专栏收录

74 篇文章 1 订阅

订阅专栏

目标检测

45 篇文章 1 订阅

订阅专栏

目标分类

8 篇文章 0 订阅

订阅专栏

一、对网络的简单理解

1、网络结构图（简记为：向上、向右、向下）
在这里插入图片描述
2、得到的新的feature map P2 P3 P4 P5 ，既有高分辨率的信息，又有高级特征的语义信息，所以FPN在检测小目标方面效果很好且时间和计算量上没有特别多的额外花销，而且输入只是一个scale的图片，不像图像金字塔那样输入多个scale，耗时太长。

3、其中P2 P3 P4 P5 P6的输出通道数相同，论文中设为d=256，输出通道数一样在后面应用时才能共享head参数，论文中做了实验室对比head是否共用参数，发现性能差不多，得出结论时：特征金字塔使得不同层学到了相同层次的语义特征。

4、实验Fast RCNN时，需要固定FPN+RPN提取的proposal结果。在Fast RCNN里，FPN主要应用于为不同尺寸的ROI选择提取哪一层的feature map来做ROI pooling。假设特征金字塔结果对应到图像金字塔结果。定义不同feature map集合为{P2, P3, P4, P5}，此处去掉了p6，对于输入网络的原图上w*h的ROI，选择的feature map为Pk，其中（224为ImageNet输入图像大小）:k=[k0+log2(根号下wh/224)]，即根据roi大小来进行选择让它去找哪个feature map去提取特征。

5、FPN的整个pyramid有15种anchors，5个feature maps，每个feature map一种尺度，但有3种长宽比，是15种，不是15个！！

6、在ablations study中，FPN对小目标的AR大幅增加；bottom up之间的特征层之间有语义鸿沟，得加top_down；laterral connect可以使定位更加准确；只用一个finest层对特征变化不鲁邦；anchors越多不一定越准确，要用不同feature map的anchors。

7、不同深度的feature map为什么可以经过upsample后直接相加？？

作者解释说这个原因在于我们做了end-to-end的training，因为不同层的参数不是固定的，不同层同时给监督做end-to-end training，所以相加训练出来的东西能够更有效地融合浅层和深层的信息。

8、为什么FPN相比去掉深层特征upsample(bottom-up pyramid)对于小物体检测提升明显？？

在这里插入图片描述
对于小物体，一方面我们需要高分辨率的feature map更多关注小区域信息，另一方面，如图中的挎包一样，需要更全局的上下文的信息更准确判断挎包的存在及位置。

9、如果不考虑时间情况下，image pyramid是否可能会比feature pyramid的性能更高？

作者觉得经过精细调整训练是可能的，但是image pyramid主要的问题在于时间和空间占用太大，而feature pyramid可以在几乎不增加额外计算量情况下解决多尺度检测问题。

二、论文中给出的数据

1、 4种金字塔，a为图像金字塔多尺度训练中常用或者人工设计特征时代，b类似于fasterrcnn，c类似于ssd网络，d即是本文的fpn了。
在这里插入图片描述
2、一个有趣的对比，至于哪种结构更好，RPN里下面的要好，RPN是sliding window形式, Fastrcnn里两者区别不大，因为Fastrcnn里有roipooliing,对尺寸信息不敏感。

在这里插入图片描述
3、在RPN fastrcnn fasterrcnn上做的实验，其中有C4和C5的对比，用C4的话参数量太大，还是尽量把C5也加进feature层里。

4、

5、

在这里插入图片描述

三、原论文启示句摘抄

1、单个特征层已经有应对目标尺寸变化的功能
Aside from being
capable of representing higher-level semantics, ConvNets
are also more robust to variance in scale and thus facilitate
recognition from features computed on a single input scale

2、
To achieve this goal, we rely on an architecture that
combines low-resolution, semantically strong features with
high-resolution, semantically weak features via a top-down
pathway and lateral connections (Fig. 1(d)).

3、
Finally, we append a 3×3 convolution on each merged map to
generate the final feature map, which is to reduce the aliasing effect of upsampling.

4、
We note that the parameters of the heads are shared
across all feature pyramid levels; we have also evaluated the
alternative without sharing parameters and observed similar
accuracy. The good performance of sharing parameters indicates that all levels of our pyramid share similar semantic
levels.

5、改变这些之后，FasterRcnn效果更好
Note that Table 3(a) and (b) are baselines that are much
stronger than the baseline provided by He et al. [16] in Table 3(). We find the following implementations contribute to the gap:
(i) We use an image scale of 800 pixels instead of
600 in [11, 16];
(ii) We train with 512 RoIs per image which
accelerate convergence, in contrast to 64 RoIs in [11, 16];
(iii) We use 5 scale anchors instead of 4 in [16] (adding
322);
(iv) At test time we use 1000 proposals per image instead of 300 in [16]. So comparing with He et al.’s ResNet-50 Faster R-CNN baseline in Table 3(),
our method improves AP by 7.6 points and AP@0.5 by 9.6 points.

6、
Our method introduces small extra cost by the extra layers in the FPN, but has a lighter weight head. Overall
our system is faster than the ResNet-based Faster R-CNN
counterpart. We believe the efficiency and simplicity of our
method will benefit future research and applications

7、这里不懂
As the MLP must predict objects at a range of scales for
each pyramid level (specifically a half octave range), some
padding must be given around the canonical object size. We
use 25% padding. This means that the mask output over
{P2, P3, P4, P5, P6} maps to {40, 80, 160, 320, 640} sized
image regions for the 5×5 MLP (and to √
2 larger corresponding sizes for the 7×7 MLP)

参考文献：

1、https://zhuanlan.zhihu.com/p/39185919

2、https://blog.csdn.net/u014380165/article/details/72890275#commentsedit

3、https://blog.csdn.net/qq_17550379/article/details/80375874

4、https://blog.csdn.net/WZZ18191171661/article/details/79494534#commentsedit

贾小树

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
论文阅读：FPN

一、对网络的简单理解1、网络结构图（简记为：向上、向右、向下）2、得到的新的feature map P2 P3 P4 P5 ，既有高分辨率的信息，又有高级特征的语义信息，所以FPN在检测小目标方面效果很好且时间和计算量上没有特别多的额外花销，而且输入只是一个scale的图片，不像图像金字塔那样输入多个scale，耗时太长。3、其中P2 P3 P4 P5 P6的输出通道数相同，论文中...
复制链接

扫一扫

专栏目录