SPP-net : Spatial Pyramid Pooling in Deep Convolutional Networks

Summary

0. History

Existing deep convolutional neural networks (CNNs) require a fixed-size (224*224) input image. This is often generated by cropping or wrapping image, which provides unwanted problems, missing entire objects or causing unwanted geometric distortion for example. This requirement is 'artificial' and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale

1. Objective

Deep networks consist of conv layers and fc layers, the requirement of fixed-size input image is due to fixed-size input of the fc layers, as the conv layers use the sliding windows method which has no exigence on the size of input, so the key point is to adopt the input size of image to the size of fc layers

2. Model adopted

 R-CNN model based, ZF-5 as baseline, SS to select region of interest, then in each candidate window, a 4-level spatial pyramid to pool the feature

3. Specialities of the system

a. specialities of SPP:

    - able to generate a fixed-size output vector

    - use multi-level spatial bins

    - pool features extracted at variable scales (variable input scale which increases scale-invariance and reduce over-fitting)

b. improve 4 different CNN architectures

e. run the convolutional layers only once on the entire image and then extract features by SPP-net on the feature maps

f. function of global pooling :

   - reduce model size and reduce overfitting

   - used on the testing stage after fc layers to improve accuracy

   - used for  weakly supervised object recognition

g. the pooling layer output a fixed-size vector for different size of input images

h. training with variant size for each epoch, which increases the accuracy of system

i. multi-level pooling helps increase accuracy, not simply due to more parameters, rather, it is because the multi-level pooling is robust to the variance in object deformations and spatial layout

j. full image representation is preferred and increases the accuracy,

k. multi-view testing, resize the image and select a fixed view to generate a set of different views of image(10-views: center and corner with flipped image, 18-views: +mid-side)

l. accuracy increased by combine two models with different conv layers

m. mapping a window to the feature map

4. Disadvantages

Still uses multi-stage system

5. Personnal reviews

Keep going...

转载于:https://www.cnblogs.com/lucasdu/p/7878333.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值