(2015年Pami)Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition（SPPNet)

m0_55384957

已于 2023-09-21 17:46:22 修改

阅读量26

点赞数

文章标签：深度学习人工智能

于 2023-09-11 22:31:45 首次发布

本文链接：https://blog.csdn.net/m0_55384957/article/details/132818556

版权

Abstract:

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224×224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations.

The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features.

Notation:The reason of the fixed input size:So why do CNNs require a fixed input size? A CNN mainly consists of two parts: convolutional layers, and fully-connected layers that follow. The convolutional layers operate in a sliding-window manner and output feature maps which represent the spatial arrangement of the activations (Figure 2). In fact, convolutional layers do not require a fixed image size and can generate feature maps of any sizes. On the other hand, the fully-connected layers need to have fixedsize/length input by their definition. Hence, the fixedsize constraint comes only from the fully-connected layers, which exist at a deeper stage of the network.

The method of the SPPNet:

Figure 3 illustrates our method. In each spatial bin, we pool the responses of each filter (throughout this paper we use max pooling). The outputs of the spatial pyramid pooling are kM - dimensional vectors with the number of bins denoted as M (k is the number of filters in the last convolutional layer). The fixed-dimensional vectors are the input to the fully-connected layer.

The effect of the SPPNet:

It is worth noticing that the gain of multi-level pooling is not simply due to more parameters; rather, it is because the multi-level pooling is robust to the variance in object deformations and spatial layou.

m0_55384957

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
(2015年Pami)Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition（SPPNet)

SPPNet
复制链接

扫一扫