READING NOTE: PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

原创 2016年08月30日 20:58:12

TITLE: PVANET Deep but Lightweight Neural Networks for Real-time Object Detection

AUTHER: Kye-Hyeon Kim, Yeongjae Cheon, Sanghoon Hong, Byungseok Roh, Minje Park

ASSOCIATION: Intel Imaging and Camera Technology

FROM: arXiv:1608.08021


  1. An efficient object detector based on CNN is proposed, which has the following advantages:
    • Computational cost: 7.9GMAC for feature extraction with 1065x640 input (cf. ResNet-101: 80.5GMAC1)
    • Runtime performance: 750ms/image (1.3FPS) on Intel i7-6700K CPU with a single core; 46ms/image (21.7FPS) on NVIDIA Titan X GPU
    • Accuracy: 81.8% mAP on VOC-2007; 82.5% mAP on VOC-2012 (2nd place)


The author utilizes the pipline of Faster-RCNN, which is “CNN feature extraction + region proposal + RoI classification”. The author claims that feature extraction part needs to be redesigned, since region proposal part is not computationally expensive and classification part can be efficiently compressed with common techniques like truncated SVD. And the principle is “less channels with more layers” and adoption of some building blocks including concatenated ReLU, Inception, and HyperNet. The structure of the network is as follows:

Some Details

  1. Concatenated rectified linear unit (C.ReLU) is applied to the early stage of the CNNs (i.e., first several layers from the network input) to reduce the number of computations by half without losing accuracy. In my understanding, the C.ReLU encourages the network to learn Gabor-like filters and helps to accelerate the forward-propagation. If the output of the C.ReLu is 64, its convolution layer only needs 32-channel outputs. And it may harm the performance if it is used to the later stage of the CNNs, because it keeps the negative responses as activated signal, which means that a mad brain is trained.
  2. Inception is applied to the remaining of the feature generation sub-network. An Inception module produces output activations of different sizes of receptive fields, so that increases the variety of receptive field sizes in the previous layer. All the design policies can be found in this related work.
  3. The author adopted the idea of multi-scale representation like HyperNet that combines several intermediate outputs so that multiple levels of details and non-linearities can be considered simultaneously. Direct concatenation of all abstraction layers may produce redundant information with much higher compute requirement and layers which are too early for object proposal and classification would be little help. The author combines 1) the last layer and 2) two intermediate layers whose scales are 2x and 4x of the last layer, respectively.
  4. Residual structure is also used in this network, which helps to train very deep CNNs.


caffe SSD环境搭建过程遇到的问题

1、 .sh文件的执行权限不够 修改权限 chmod u+x *.sh 2、 ./create_data.sh的时候、运行ssd_pascal.py报错AttributeError: 'module'...


这几天因为要对yolo进行重新训练,需要用到imagenet pretrain,由于网络是自己设计的网络,所以需要先在darknet上训练imagenet,由于网上都没有相关的说明教程,特别是图片路径...
  • hjimce
  • hjimce
  • 2017-07-05 16:42
  • 1588


转载自哈工大SCIR(公众号) 为了弄清楚为何会出现消失的梯度,来看看一个极简单的深度神经网络:每一层都只有一个单一的神经元。下图就是有三层隐藏层的神经网络:


  • hjimce
  • hjimce
  • 2017-02-25 15:28
  • 1737


一.提出背景        在基于CNN的方法提升到一个很高的准确度之后,效率又成为人们所关注的话题,目前兼备准确度和效率的方法包括 SSD、YOLO v2,其检测效率通常能到达 30-100F...


我想让深度学习在实际产品中应用起来,可是模型太大,速度有些慢怎个搞~~~可喜的是有无数AI先锋奋勇战斗,为我们提出一些精彩绝伦的解决方案,下面总结下: 卷积核方面: 1.大卷积核用多个小卷积核代替...

目标检测--PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection本文针对多种类目标检测这个问题,结合当前各种最新技术成果,达到很好的结果。针对整体检测框架:CNN feature extrac...


在计算loss的时候,最常见的一句话就是tf.nn.softmax_cross_entropy_with_logits,那么它到底是怎么做的呢? 首先明确一点,loss是代价值,也就是我们要最小化的...