8.SPP-net 论文总结

最新推荐文章于 2024-05-07 09:24:12 发布

红薯塔就是爱太阳啊

最新推荐文章于 2024-05-07 09:24:12 发布

阅读量270

点赞数

分类专栏：论文总结文章标签： SPP-net Spatial Pyramid Pooling 空间金字塔池化目标检测

本文链接：https://blog.csdn.net/weixin_42270275/article/details/86518754

版权

14 篇文章 0 订阅

订阅专栏

在目标检测中，Rcnn 用 region proposals 解决 cnn 定位问题。
但是在使用神经网络提取候选框的特征时，每张图片需要重复应用神经网络大约2000个windows。浪费时间。
SPP-net 解决了这个问题：从整个图片只提取1次特征(possibly at multiple scales)。
然后在 feature maps 的每个候选窗口应用空间金字塔池化，以池化此窗口的固定长度表示。
Our method extracts window-wise features from regions of the feature maps,
while R-CNN extracts directly from image regions.
our method enables feature extraction in arbitrary windows from the deep convolutional feature maps.

CNNs 要求固定大小的输入图片，后来发现卷积对输入没有限制同时可以生成和输入一样的任意大小的特征图，只是全连接需要限制输入。
卷积层的参数和输入大小无关，它仅仅是一个卷积核在图像上滑动，不管输入图像多大都没关系，只是对不同大小的图片卷积出不同大小的特征图，但是全连接层的参数就和输入图像大小有关，因为它要把输入的所有像素点连接起来,需要指定输入层神经元个数和输出层神经元个数，所以需要规定输入的feature的大小。
所以对于卷积可以输入任意大小图片，只要在进入全连接之前能形成固定大小的就可以。
提出的另一个池化策略：空间金字塔池化。 perform some information “aggregation”
pool features in arbitrary regions (sub-images) to generate fixed-length representations regardless of image
size/scale. 消除了要求固定大小输入的条件。while the sliding window pooling used in the previous deep
networks cannot

在这里插入图片描述

通过对feature map进行相应尺度的pooling，使得能pooling出4×4, 2×2, 1×1的feature map，再将这些feature map concat成列向量与下一层全链接层相连。这样就消除了输入尺度不一致的影响。
空间金字塔池化使用了 multi-level spatial bins, 然而滑动窗口的池化只使用了 a single window size.
it can maintain spatial information by pooling in local spatial bins.
These spatial bins have sizes proportional to the image size, so the number of bins is fixed regardless of the image size

(1) SPP is able to generate a fixedlength output regardless of the input size,
while the sliding window pooling used in the previous deep networks cannot;
(2) SPP uses multi-level spatial bins, while the sliding window pooling uses only a single window size.
Multi-level pooling has been shown to be robust to object deformations [15];
(3) SPP can pool features extracted at variable scales thanks to the flexibility of input scales.
Through experiments we show that all these factors elevate the recognition accuracy of deep networks.
卷积或pooling增加stride的话就相当与原图先进行卷积或池化，再进行sampling，这还是能一一对应的，就这样原图的某个区域就可以通过除以网络的所有stride来映射到conv5后去区域

training/running a detector on the feature maps (rather than image regions) is actually a more popular idea
the deep convolutional features can be pooled
the scales are also important for the accuracy of deep networks.
maintaining the complete content is important
The feature map regions can have strong activations near the window boundaries,
while the image regions may not.

关注

专栏目录