SPP-Net (spatial pyramid pooling,空间金字塔池化）

最新推荐文章于 2024-03-13 12:06:41 发布

有丝吼

最新推荐文章于 2024-03-13 12:06:41 发布

阅读量672

点赞数

分类专栏：目标检测文章标签：目标检测

本文链接：https://blog.csdn.net/sinat_33027857/article/details/80560593

版权

目标检测专栏收录该内容

1 篇文章 0 订阅

订阅专栏

论文传送门： Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Abstract

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This requirement is “artificial” and may hurt the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. By removing the fixed-size limitation, we can improve all CNN-based image classification methods in general. Our SPP-net achieves state-of-the-art accuracy on the datasets of ImageNet 2012, Pascal VOC 2007, and Caltech101.

The power of SPP-net is more significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method computes convolutional features 30-170× faster than the recent leading method R-CNN (and 24-64× faster overall), while achieving better or comparable accuracy on Pascal VOC 2007.

解决的问题：在深度卷积神经网络中，需要固定输入图像的大小，这一特性主要是因为卷积神经网络中全连接层的存在。

解决的思路：在最后一层卷积层和全连接层之间加一层pyramid pooling layer，这一层的作用是对feature maps进行多尺度的池化，如图：

对feature maps of conv5, 依次采用以上三种不同核大小，和步长的最大池化操作，然后将三种池化后的特征图平铺并连接，得到16 + 4 +1个feature maps。这样，就把不同尺寸的输入图像卷积后的不同尺寸的feature maps 采样到同样的维度，以便接下来作为全连接层的输入。

特点：

1.显然的，它可以解决输入图像大小不一造成的缺陷

2.把feature map从不同的尺度进行特征提取，再聚合，这样的处理能增强模型的鲁棒性。

3.多尺度的特征提取，也能提高任务的精度。

应用实例（改进R-CNN网络用于目标检测)：

由于R-CNN网络对于每一个region proposal 都要执行提取卷积层特征，这会导致大量的冗余计算，所以一个思路是对图像提取一次卷积层特征，只需将region proposals 在原图的位置映射到卷积层的feature maps上，然后将提取的每个region proposals的卷积特征输入到全连接层做后续的分类和回归计算。

这时问题就出现了，由于每个region proposal的尺度不一样，无法直接输入全连接层，因为全连接层的输入必须是固定的长度。这里，SPP layer就发挥作用了，通过引入SPP层，将每个region proposal 在特征图对应一个window，这里的window是多尺度的，将每个window划分为4*4， 2*2， 1*1的块，然后在每个块上应用max-pooling下采样，然后每个window就被被处理成长度为（4*4 + 2*2 + 1）*channels的特征向量，用来输入全连接层进行后续的分类和回归计算。

有丝吼

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SPP-Net (spatial pyramid pooling,空间金字塔池化）

论文传送门：Spatial Pyramid Pooling in Deep Convolutional Networks for Visual RecognitionAbstract Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This ...
复制链接

扫一扫

专栏目录