Spatial Pyramid Pooling金字塔池化

spp为何凯明2015年提出的池化结构,以解决在此之前卷积神经网络只能输入固定大小图片的结果
在这里插入图片描述

相比于传统的cnn网络,spp主要是在全连接层之前嵌入,保证输入的维度大小一定,避免了不同大小的输入尺寸对结果的影响,并且spp池化层的加入显著提高了在此之前的部分CNN模型以及目标检测的RCNN模型中取得了显著的进步

在这里插入图片描述

不同于单纯使用平均池化或者最大池化,对卷积层最后的输出(batch,N通道数,w宽度,h高度)分别用三个最大池化层获得(batch,N,4,4),(batch,N,2,2),(batch,N,1,1)三层,并共同输入fc全连接层,获得(batch,16+4+1,w*h)大小的数据输入fc层。可以理解为使用了三种卷积核(池化窗口)提取特征,以获得固定大小特征输入全连接层
实际应用中可以对比参考选择三个尺寸的池化层。
在这里插入图片描述
以下为一个demo:

import torch
import torch.nn as nn
import math
def _spp_pool(in_h,in_w,out_hc,out_wc):
    #H_out=(H_in-F+2P)/S+1
    F_h=int(math.ceil(in_h/out_hc))
    F_w=int(math.ceil(in_w/out_wc))
    S_h=F_h
    S_w=F_w
    P_h=(F_h-1)*S_h+F_h-in_h
    P_w=(F_w-1)*S_w+F_w-in_w
    return nn.MaxPool2d((F_h,F_w),(S_h,S_w),(P_h,P_w))
#x.view(,-1)
#torch.cat()
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224224) input image. This requirement is “artificial” and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-theart classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值