CV之DL之SPPNet：SPP-Net算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

本文链接：https://blog.csdn.net/qq_41185868/article/details/82873511

CV之DL之SPPNet：SPP-Net算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略

相关论文

《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》翻译与解读

地址	论文地址：https://arxiv.org/abs/1406.4729
时间	2014年6月18日
作者	Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
总结	这篇论文提出了一个新的深度卷积网络结构—SPP-Net，以解决深度卷积网络要求固定输入图片尺寸的问题。 >> 背景和痛点：现有的深度卷积网络在训练和测试时都需要固定输入图片的尺寸，这会限制输入图片的长宽比和缩放，可能导致识别准确率下降。 >> 解决方案：论文提出在卷积神经网络的最后一个卷积层和全连接层之间增加一个空间金字塔池化层，可以生成固定长度的特征向量，此时不再需要固定输入图片尺寸。 >> SPP-Net网络结构：SPP-Net通过增加空间金字塔池化层，可以处理任意大小和缩放的输入图片，生成固定长度的特征表示。 >> 实验与结果：论文在ImageNet2012数据集上对4种不同结构的深度卷积网络进行实验，验证SPP-Net可以提升所有网络的识别准确率。同时也在PASCAL VOC和Caltech-101数据集上取得状态界别识结果。 >> 应用在目标检测中：论文将SPP-Net应用到目标检测任务中，只需要对整个图片计算卷积特征一次，就可以高效率地 extracted不同区域的特征，大大提高检测速度。总之，SPP-Net通过增加空间金字塔池化层，解决了深度卷积网络固定输入尺寸的问题，同时提升了识别与检测任务的效果与效率。

Abstract

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224×224) input image. This require-ment is “artificial” and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning.

The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102× faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007.

In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

Index Terms—Convolutional Neural Networks, Spatial Pyramid Pooling, Image Classification, Object Detection

现有的深度卷积神经网络（CNNs）要求输入图像具有固定大小（例如，224×224）。这一要求是“人为的”，可能会降低对任意大小/比例的图像或子图像的识别准确性。在这项工作中，我们为网络配备了另一种池化策略，“空间金字塔池化”，以消除上述要求。新的网络结构称为 SPP-net，可以生成固定长度的表示，而不受图像大小/比例的限制。金字塔池化还对物体变形具有鲁棒性。凭借这些优势，SPP-net通常应该改善所有基于CNN的图像分类方法。在ImageNet 2012数据集上，我们证明了尽管设计不同，SPP-net提高了各种CNN架构的准确性。在Pascal VOC 2007和Caltech101数据集上，SPP-net在使用单个全图像表示且无需微调的情况下实现了最先进的分类结果。

SPP-net在目标检测中的能力也很显著。使用SPP-net，我们仅对整个图像计算一次特征图，然后在任意区域（子图像）中池化特征，以生成用于训练检测器的固定长度表示。这种方法避免了重复计算卷积特征。在处理测试图像时，我们的方法比R-CNN方法快24-102倍，并在Pascal VOC 2007上实现了更好或相当的准确性。

在ImageNet大规模视觉识别挑战（ILSVRC）2014中，我们的方法在38个团队中在目标检测中排名第2，在图像分类中排名第3。本文还介绍了为该比赛做出的改进。

关键词：卷积神经网络，空间金字塔池化，图像分类，目标检

CONCLUSION

SPP is a flexible solution for handling different scales, sizes, and aspect ratios. These issues are important in visual recognition, but received little consideration in the context of deep networks. We have suggested a so-lution to train a deep network with a spatial pyramid pooling layer. The resulting SPP-net shows outstand-ing accuracy in classification/detection tasks and greatly accelerates DNN-based detection. Our studies also show that many time-proven techniques/insights in computer vision can still play important roles in deep-networks-based recognition.

SPP是处理不同尺度、大小和长宽比的灵活解决方案。这些问题在视觉识别中很重要，但在深度网络的背景下得到的关注较少。我们提出了使用空间金字塔池化层训练深度网络的解决方案。结果表明，所得到的SPP-net在分类/检测任务中表现出色，并显著加速了基于DNN的检测。我们的研究还表明，在基于深度网络的识别中，计算机视觉中许多经过时间验证的技术/见解仍然可以发挥重要作用。

SPP-Net算法的简介

SPP-Net的第一作者也是何凯明，原论文《Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition》。用于分类和检测任务，在ImageNet数据集ILSVRC2014竞赛上，检测任务获得第二名、分类任务第三名。

1、实验结果

VOC2007

ILSVRC 2014 Classification

ILSVRC 2014 Detection

2、SPP-Net中的亮点

在此之前，所有的神经网络都是需要输入固定尺寸的图片，比如224*224（ImageNet）、32*32(LenNet)、96*96等。这样对于我们希望检测各种大小的图片的时候，需要经过crop，或者warp等一系列操作，这都在一定程度上导致图片信息的丢失和变形，限制了识别精确度。而且，从生理学角度出发，人眼看到一个图片时，大脑会首先认为这是一个整体，而不会进行crop和warp，所以更有可能的是，我们的大脑通过搜集一些浅层的信息，在更深层才识别出这些任意形状的目标。