ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
Abstract. Currently, the neural network architecture design is mostly
guided by the indirect metric of computation complexity, i.e., FLOPs.
However, the direct metric, e.g., speed, also depends on the other factors
such as memory access cost and platform characterics. Thus, this work
proposes to evaluate the direct metric on the target platform, beyond
only considering FLOPs. Based on a series of controlled experiments,
this work derives several practical guidelines for efficient network de-
sign. Accordingly, a new architecture is presented, called ShuffleNet V2.
Comprehensive ablation experiments verify that our model is the state-
of-the-art in terms of speed and accuracy tradeoff.
An Extremely Efficient Convolutional Neural Network for Mobile Devices
Abstract
We introduce an extremely computation-efficient CNN
architecture named ShuffleNet, which is designed specially
for mobile devices with very limited computing power (e.g.,
10-150 MFLOPs). The new architecture utilizes two new
operations, pointwise group convolution and channel shuf-
fle, to greatly reduce computation cost while maintaining
accuracy. Experiments on ImageNet classification and MS
COCO object detection demonstrate the superior perfor-
mance of ShuffleNet over other structures, e.g. lower top-1
error (absolute 7.8%) than recent MobileNet [12] on Ima-
geNet classification task, under the computation budget of
40 MFLOPs. On an ARM-based mobile device, ShuffleNet
achieves ∼13× actual speedup over AlexNet while main-
taining comparable accuracy.
Fine-Grained Head Pose Estimation Without Keypoints
Abstract
Estimating the head pose of a person is a crucial prob-
lem that has a large amount of applications such as aiding
in gaze estimation, modeling attention, fitting 3D models
to video and performing face alignment. Traditionally head
pose is computed by estimating some keypoints from the tar-
get face and solving the 2D to 3D correspondence problem
with a mean human head model. We argue that this is a
fragile method because it relies entirely on landmark detec-
tion performance, the extraneous head model and an ad-hoc
fitting step. We present an elegant and robust way to deter-
mine pose by training a multi-loss convolutional neural net-
work on 300W-LP, a large synthetically expanded dataset,
to predict intrinsic Euler angles (yaw, pitch and roll) di-
rectly from image intensities through joint binned pose clas-
sification and regression. We present empirical tests on
common in-the-wild pose benchmark datasets which show
state-of-the-art results. Additionally we test our method on
a dataset usually used for pose estimation using depth and
start to close the gap with state-of-the-art depth pose meth-
ods. We open-source our training and testing code as well
as release our pre-trained models 1 .
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Abstract
In this paper we describe a new mobile architecture,
MobileNetV2, that improves the state of the art perfor-
mance of mobile models on multiple tasks and bench-
marks as well as across a spectrum of different model
sizes. We also describe efficient ways of applying these
mobile models to object detection in a novel framework
we call SSDLite. Additionally, we demonstrate how
to build mobile semantic segmentation models through
a reduced form of DeepLabv3 which we call Mobile
DeepLabv3.
is based on an inverted residual structure where
the shortcut connections are between the thin bottle-
neck layers. The intermediate expansion layer uses
lightweight depthwise convolutions to filter features as
a source of non-linearity. Additionally, we find that it is
important to remove non-linearities in the narrow layers
in order to maintain representational power. We demon-
strate that this improves performance and provide an in-
tuition that led to this design.
Finally, our approach allows decoupling of the in-
put/output domains from the expressiveness of the trans-
formation, which provides a convenient framework for
further analysis. We measure our performance on
ImageNet [1] classification, COCO object detection [2],
VOC image segmentation [3]. We evaluate the trade-offs
between accuracy, and number of operations measured
by multiply-adds (MAdd), as well as actual latency, and
the number of parameters.
DSFD: Dual Shot Face Detector
Abstract
Recently, Convolutional Neural Network (CNN) has
achieved great success in face detection. However, it re-
mains a challenging problem for the current face detection
methods owing to high degree of variability in scale, pose,
occlusion, expression, appearance and illumination. In this
paper, we propose a novel face detection network named
Dual Shot face Detector(DSFD), which inherits the archi-
tecture of SSD and introduces a Feature Enhance Module
(FEM) for transferring the original feature maps to extend
the single shot detector to dual shot detector. Specially, Pro-
gressive Anchor Loss (PAL) computed by using two set of
anchors is adopted to effectively facilitate the features. Ad-
ditionally, we propose an Improved Anchor Matching (IAM)
method by integrating novel data augmentation techniques
and anchor design strategy in our DSFD to provide better
initialization for the regressor. Extensive experiments on
popular benchmarks: WIDER FACE (easy: 0.966, medium:
0.957, hard: 0.904) and FDDB ( discontinuous: 0.991,
continuous: 0.862) demonstrate the superiority of DSFD
over the state-of-the-art face detectors (e.g., PyramidBox
and SRN). Code will be made available upon publication.
cascade r-cnn paper
In object detection, an intersection over union (IoU)
threshold is required to define positives and negatives. An
object detector, trained with low IoU threshold, e.g. 0.5,
usually produces noisy detections. However, detection per-
formance tends to degrade with increasing the IoU thresh-
olds. Two main factors are responsible for this: 1) over-
fitting during training, due to exponentially vanishing pos-
itive samples, and 2) inference-time mismatch between the
IoUs for which the detector is optimal and those of the in-
put hypotheses. A multi-stage object detection architecture,
the Cascade R-CNN, is proposed to address these prob-
lems. It consists of a sequence of detectors trained with
increasing IoU thresholds, to be sequentially more selec-
tive against close false positives. The detectors are trained
stage by stage, leveraging the observation that the out-
put of a detector is a good distribution for training the
next higher quality detector. The resampling of progres-
sively improved hypotheses guarantees that all detectors
have a positive set of examples of equivalent size, reduc-
ing the overfitting problem. The same cascade procedure
is applied at inference, enabling a closer match between
the hypotheses and the detector quality of each stage. A
simple implementation of the Cascade R-CNN is shown to
surpass all single-model object detectors on the challeng-
ing COCO dataset. Experiments also show that the Cas-
cade R-CNN is widely applicable across detector architec-
tures, achieving consistent gains independently of the base-
line detector strength. The code will be made available at
https://github.com/zhaoweicai/cascade-rcnn.
深度学习 最新中文版 pdf
深度学习书籍 2017年9月的最新高清pdf版, beta版
第一章
引言
1
1.1 本书面向的读者 . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 深度学习的历史趋势 . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.1 神经网络的众多名称和命运变迁 . . . . . . . . . . . 12
1.2.2 与日俱增的数据量 . . . . . . . . . . . . . . . . . . . 17
1.2.3 与日俱增的模型规模 . . . . . . . . . . . . . . . . . . 19
1.2.4 与日俱增的精度、复杂度和对现实世界的冲击 . . . . 22
How to Write makefile
makefile文件的编写,How to Write makefile.pdf 。全英文版本。