【PVANet】《PVANET：Deep but Lightweight Neural Networks for Real-time Object Detection》

最新推荐文章于 2021-11-16 13:00:00 发布

bryant_meng

最新推荐文章于 2021-11-16 13:00:00 发布

阅读量351

点赞数 1

分类专栏： CNN / Transformer 文章标签： PVANet CReLU

本文链接：https://blog.csdn.net/bryant_meng/article/details/88983212

版权

CNN / Transformer 专栏收录该内容

210 篇文章 7 订阅

订阅专栏

在这里插入图片描述
arXiv preprint arXiv:1608.08021, 2016.

caffe code ：https://github.com/sanghoon/pva-faster-rcnn/blob/master/models/pvanet/example_train/train.prototxt
caffe code 可视化工具：http://ethereon.github.io/netscope/#/editor

1 Background and Motivation

目前目标检测精度还不错，automotive and surveillance 领域有广泛的商业市场，但是速度堪忧，作者从提升速度这个点出发，重新设计了 backbone，遵循 less channels with more layers 的设计准则，在 VOC 07 和 12 上取得了相当不错的结果，且大幅度的降低了 computational cost，做到 Real-time.

2 Advantages / Contributions

83.8% mAP on VOC 2007
82.5% mAP on VOC 2012（2nd place，计算量只有第一名 resnet 的 12.3%）
46 ms/image on Titan X（(21.7FPS)）

lightweight feature extraction network

3 Innovations

自己设计了整个目标检测网络，light weight 且精度在线
大大提升速度，做到 real time

4 Method

4.1 C.ReLU: Earlier building blocks in feature generation

在这里插入图片描述
C 为 concatenation 的意思，不同于 original C.ReLU（来源于《Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units》），作者增加了 Scale / Shift 操作，同 Batch normalization 的复原操作，对每个通道进行！这种设计的 motivation 是 In the early stage, output nodes tend to be “paired” such that one node’s activation is the opposite side of another’s. 所以可以把 channels 减半，正负 concatenate 即可，精度相仿！2x speed-up

4.2 Inception: Remaining building blocks in feature generation

在这里插入图片描述
作者的 inception 堆叠形式相对于原版的 GoogleNet，少了pooling 的分支，5x5 替换成了 double 3x3，这中形式能很好的捕捉不同尺寸的目标，作者用如下的图进行了解释

哈哈哈，这个第一眼看会有点懵，但是没关系，经历过大风大浪，看这16年的前人工作，首先心理上不能惧怕！仔细分析，原来如此！

上图描述的就是三个 inception block 堆叠的情况，第一层 1,3,5 的感受野 channels 分别为原来的 $(\frac{1}{2},\frac{1}{4},\frac{1}{4})$ ，两层堆叠后，也即 $(\frac{1}{2},\frac{1}{4},\frac{1}{4})*(\frac{1}{2},\frac{1}{4},\frac{1}{4})$ ，注意感受野的乘法准则即可， $1 * x = x ， 3 * 3 = 5, 3 * 5 = 7$ 以此类推，相邻的奇数相乘等于他们下一个奇数！

我们来算下 $(\frac{1}{2},\frac{1}{4},\frac{1}{4})*(\frac{1}{2},\frac{1}{4},\frac{1}{4})$ 的结果，也即第二层的结果，也即感受野 $(1, 3, 5) * (1, 3, 5)$ 的结果

感受野 1：仅 $1 * 1$ ，也即 $\frac{1}{2}*\frac{1}{2} = \frac{1}{4}$
感受野 3：有 $1 * 3$ 和 $3 * 1$ ，也即 $\frac{1}{2}*\frac{1}{4} + \frac{1}{4}*\frac{1}{2} = \frac{1}{4}$
感受野 5：有 $1 * 5$ 、 $5 * 1$ 和 $3 * 3$ ，也即 $\frac{1}{2}*\frac{1}{4} + \frac{1}{4}*\frac{1}{2} + \frac{1}{4}*\frac{1}{4} = \frac{5}{16}$
感受野 7：有 $3 * 5$ 和 $5 * 3$ ，也即 $\frac{1}{4}*\frac{1}{4} + \frac{1}{4}*\frac{1}{4} = \frac{1}{8}$
感受野 9：仅 $5 * 5$ ，也即 $\frac{1}{4}*\frac{1}{4} = \frac{1}{16}$

第三层的计算就是 $(\frac{1}{4},\frac{1}{4},\frac{5}{16}, \frac{1}{8}, \frac{1}{16})*(\frac{1}{2},\frac{1}{4},\frac{1}{4})$ ，也即 $(1, 3, 5, 7, 9) * (1, 3, 5)$ ，用感受野的 “乘法公式”，对应通道的比重相乘即可！

it slows down the growth of receptive fields for some output features so that small-sized objects can be captured precisely.

4.3 HyperNet: Concatenation of multi-scale intermediate outputs

在这里插入图片描述 $x \times x$ C.ReLU 表示 $1 \times 1 \to x \times x \to 1 \times 1$ 模式，其中 $x \times x$ 的形式如下图所示

inception 中的 # out 表示 concatenation 之后的 $1 * 1$
resnet 结构中， $1 * 1$ 的 short cut 用在 stride = 2 和 channels 改变的时候！
Multi-scale features 的做法如下：conv3_4 downscale（128）、conv4_4（256）、conv5_4（384） upscale concatenation

在这里插入图片描述
图片来自于 [目标检测]PVAnet原理，简单明了

RPN 取 convf 的前 128 channels，配合 $3 \times 3$ conv (384 channels) 和 $1 \times 1$ conv (25x(2+4) = 150 channel），5 scale 和 5 ratio (3, 6, 9, 16, 25)，(0.5, 0.667, 1.0, 1.5,2.0). 2 是 2 分类，4 是 bbox delta
head，after roi pooling $6 * 6 * 512$ ， $4096$ （fc）， $4096$ （fc）， $21$ （20+1类）， $84$ （21*4 bbox delta）

4.4 Deep network training

Batch Normalization
moving average of loss（keras 有实现，哈哈，这里不再赘述）
inception + residual connection（注意，作者在 inception block concatenation 之后，接了 $1 * 1$ ，residual connection 或者 x，或者 $c o n v 1 * 1$ ，把 inception 1*1 后的结果和 residual connection 相加）

5 Experiments

5.1 Datasets and Training

ILSVRC2012、MS COCO、PASCAL VOC 2007、2012

预训练：ILSVRC2012
然后：MS COCO、PASCAL VOC 2007、2012 trainval 训练
fine-tuning：PASCAL VOC 2007、2012 trainval

5.2 VOC 2007

在这里插入图片描述

5.3 VOC 2012

在这里插入图片描述
MAC（number of adds and multiplications）很夸张，mAP 和 state-of-art 相仿，2nd place，还顺带说了下 1st 用了一些 trick，比如多尺度测试!!!

6 Conclusion（Own）

C.ReLU 还是给人很大的启发，up sampling 竟然用的 $4 * 4$ conv，不过话说好像和 kernel size 无关，这个以后有空得琢磨下！设计网络的思路给人启发！！！

bryant_meng

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【PVANet】《PVANET：Deep but Lightweight Neural Networks for Real-time Object Detection》

arXiv preprint arXiv:1608.08021, 2016.文章目录1 Background and Motivation2 Advantages / Contributions3 Notions / Innovations4 Method5 Datasets6 Experiments7 Conclusion / Future work1 Background and ...
复制链接

扫一扫

专栏目录