READING NOTE: PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

原创 2016年08月30日 20:58:12

TITLE: PVANET Deep but Lightweight Neural Networks for Real-time Object Detection

AUTHER: Kye-Hyeon Kim, Yeongjae Cheon, Sanghoon Hong, Byungseok Roh, Minje Park

ASSOCIATION: Intel Imaging and Camera Technology

FROM: arXiv:1608.08021

CONTRIBUTIONS

  1. An efficient object detector based on CNN is proposed, which has the following advantages:
    • Computational cost: 7.9GMAC for feature extraction with 1065x640 input (cf. ResNet-101: 80.5GMAC1)
    • Runtime performance: 750ms/image (1.3FPS) on Intel i7-6700K CPU with a single core; 46ms/image (21.7FPS) on NVIDIA Titan X GPU
    • Accuracy: 81.8% mAP on VOC-2007; 82.5% mAP on VOC-2012 (2nd place)

Method

The author utilizes the pipline of Faster-RCNN, which is “CNN feature extraction + region proposal + RoI classification”. The author claims that feature extraction part needs to be redesigned, since region proposal part is not computationally expensive and classification part can be efficiently compressed with common techniques like truncated SVD. And the principle is “less channels with more layers” and adoption of some building blocks including concatenated ReLU, Inception, and HyperNet. The structure of the network is as follows:

Some Details

  1. Concatenated rectified linear unit (C.ReLU) is applied to the early stage of the CNNs (i.e., first several layers from the network input) to reduce the number of computations by half without losing accuracy. In my understanding, the C.ReLU encourages the network to learn Gabor-like filters and helps to accelerate the forward-propagation. If the output of the C.ReLu is 64, its convolution layer only needs 32-channel outputs. And it may harm the performance if it is used to the later stage of the CNNs, because it keeps the negative responses as activated signal, which means that a mad brain is trained.
  2. Inception is applied to the remaining of the feature generation sub-network. An Inception module produces output activations of different sizes of receptive fields, so that increases the variety of receptive field sizes in the previous layer. All the design policies can be found in this related work.
  3. The author adopted the idea of multi-scale representation like HyperNet that combines several intermediate outputs so that multiple levels of details and non-linearities can be considered simultaneously. Direct concatenation of all abstraction layers may produce redundant information with much higher compute requirement and layers which are too early for object proposal and classification would be little help. The author combines 1) the last layer and 2) two intermediate layers whose scales are 2x and 4x of the last layer, respectively.
  4. Residual structure is also used in this network, which helps to train very deep CNNs.
版权声明:本文为博主原创文章,未经博主允许不得转载。欢迎访问博主个人主页 http://joshua881228.webfactional.com/

相关文章推荐

caffe SSD环境搭建过程遇到的问题

1、 .sh文件的执行权限不够 修改权限 chmod u+x *.sh 2、 ./create_data.sh的时候、运行ssd_pascal.py报错AttributeError: 'module'...

深度学习(六十八)darknet使用

这几天因为要对yolo进行重新训练,需要用到imagenet pretrain,由于网络是自己设计的网络,所以需要先在darknet上训练imagenet,由于网上都没有相关的说明教程,特别是图片路径...
  • hjimce
  • hjimce
  • 2017-07-05 16:42
  • 1588

神经网络梯度消失的解释

转载自哈工大SCIR(公众号) 为了弄清楚为何会出现消失的梯度,来看看一个极简单的深度神经网络:每一层都只有一个单一的神经元。下图就是有三层隐藏层的神经网络:

深度学习(五十三)对抗网络

再举个实际一点的例子:假设有A、B两个人,A是银行的职员,其工作职责是判别真假人民币,防止假钞泛滥;B是专门造假钞的,他的职责是要尽量的造出更加逼真的人民币,来谋取利益。A、B为了混口饭吃,将在不断的...
  • hjimce
  • hjimce
  • 2017-02-25 15:28
  • 1737

NoScope:极速视频目标检测

一.提出背景        在基于CNN的方法提升到一个很高的准确度之后,效率又成为人们所关注的话题,目前兼备准确度和效率的方法包括 SSD、YOLO v2,其检测效率通常能到达 30-100F...

几种使CNN稳-准-快的操作

我想让深度学习在实际产品中应用起来,可是模型太大,速度有些慢怎个搞~~~可喜的是有无数AI先锋奋勇战斗,为我们提出一些精彩绝伦的解决方案,下面总结下: 卷积核方面: 1.大卷积核用多个小卷积核代替...

目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

https://www.arxiv.org/abs/1608.08021本文针对多种类目标检测这个问题,结合当前各种最新技术成果,达到很好的结果。针对整体检测框架:CNN feature extrac...

【TensorFlow】tf.nn.softmax_cross_entropy_with_logits的用法

在计算loss的时候,最常见的一句话就是tf.nn.softmax_cross_entropy_with_logits,那么它到底是怎么做的呢? 首先明确一点,loss是代价值,也就是我们要最小化的...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:深度学习:神经网络中的前向传播和反向传播算法推导
举报原因:
原因补充:

(最多只允许输入30个字)