Note_Benchmark Analysis of Representative Deep Neural Network Architectures

ABSTRACT

本文关注的DNNs的性能指标包含:
recognition accuracy, model complexity, computational complexity, memory usage, and inference time。
本文DNNs实验的平台包含:
a NVIDIA Titan X Pascal:12 TeraFlops
a NVIDIA Jetson TX1 board:1 TeraFlops
本文实验所用数据集:ImageNet-1k
本文目标:
1、a complete analysis of existing DNNs for image recognition.
2、an analysis on two hardware platform with a very difference computational capacity.

1. INTRODUCTION

本文重要的结论首先展示如下:
结论1、recognition accuracy vs computational complexity , uncorrelated
结论2、model complexity vs accuracy , uncorrelated
结论3、 the desired throughput places an upper bound to the achievable accuracy.
结论4 not all the DNN models use their parameters with the same level of efficiency.
结论5、 almost all models are capable of super real-time performance on a high-end GPU, while just some of them can guarantee it on an embedded system.
结论6、 even DNNs with a very low level model complexity have a minimum GPU memory footprint of about 0.6GB.

PERFORMANCE INDICES

A. ACCURACY RATE

Slightly better performances can be achieved by considering the average prediction coming from multiple crops (four corners pluscentral crop and their horizontal flips).

B. MODEL COMPLEXITY

we collect the size of the parameter file in terms of MB for the considered models.

C. MEMORY USAGE

the memory allocated for the network model and the memory required while processing the batch.

D. COMPUTATIONAL COMPLEXITY

the multiply-adds are counted as two FLOPs

E. INFERENCE TIME

We measure inference time in terms of milliseconds

Results

A. ACCURACY-RATE VS COMPUTATIONAL &COMPLEXITY VS MODEL COMPLEXITY

FIGURE 1: Ball chart reporting the Top-1 and Top-5 accuracy vs. computational complexity. Top-1 and Top-5 accuracy using
only the center crop versus floating-point operations (FLOPs) required for a single forward pass are reported. The size of each ball corresponds to the model complexity. (a) Top-1; (b) Top-5.

对于已知model complexity的dnn模型,对相同输入样本进行识别时的computational complexity记录在x轴,accuracy记录在y轴。
1、NASNet-A-Large:highest Top-1 and Top-5 accuracy , highest computational complexity
2、SE-ResNeXt50 (32x4d): the highest Top-1 and Top-5 accuracy ,a low level of model complexity
3、SENet-154 vs SE-ResNeXt-101(32x4d):证实了结论2.
4、VGG-13 vs ResNet-18:证实了结论1.

B. ACCURACY-RATE VS LEARNING POWER

It is known that DNNs are inefficient in the use of their full learning power。 Although many papers exist that exploit this feature to produce compressed DNN models with the same accuracy of the original models [24] we want here to measure how efficiently each model uses its parameters。
accuracy density [4]:Top-1 accuracy divided by the number of parameters。

FIGURE 2: Top-1 accuracy density (a) and Top-1 accuracy vs. Top-1 accuracy density (b). The accuracy density measures how efficiently each model uses its parameters。
在这里插入图片描述
1 、 in Figure 2(a), where it can be seen that the models that use their parameters most efficiently are the SqueezeNets, ShuffleNet, the MobileNetsand NASNet-A-Mobile.
2、NASNet-A-Mobileand MobileNet-v2 are the two providing a much higher Top-1 accuracy
3、Inception-v4 and SE-ResNeXt-101 (32x4d):top-1 accuracy higher than 80%, using their parameters more efficiently.

C. INFERENCE TIME

Average per image inference time over 10 runs。
TABLE 1: Inference time vs. batch size. Inference time per image is estimated across different batch sizes for the Titan Xp
(left), and Jetson TX1 (right). Missing data are due to the lack of enough system memory required to process the larger batches
在这里插入图片描述

D. ACCURACY-RATE VS INFERENCE TIME

在这里插入图片描述
对于batch size为1 的样本输入,DNNS 每秒推理的帧数记录在x坐标轴,识别准确率记录在y坐标轴。(a)是Titan Xp平台上的实验数据,(b)是Jeston TX1平台上的实验数据。
intercept:很相近
slope:show that the Titan Xp guarantees a lower decay of the maximum accuracy achievable when a larger throughput is needed。

在Titan Xp平台上的部分模型实验数据分析如下,Jeston TX1平台上的实验结果分析在此就不一一赘述了。
ResNet-34:a throughput of more than 250 FPS with 73.27% Top-1 accuracy
Xception: a target of more than 125 FPS with 78,79% Top-1 accuracy
SE-ResNeXt-50 (32x4d): a target of more than 62.5 FPS with 79,11% Top-1 accuracy
NASNet-A-Large:a target of more than 30 FPS with 82,50% Top-1 accuracy

F. MEMORY USAGE VS MODEL COMPLEXITY

TABLE 2: Memory consumption of the different DNN models considered on the Titan Xp for different batch sizes.

FIGURE 4: Plot of the initial static allocation of the model parameters (i.e. the model complexity) and the total memory
utilization with batch size 1 on the Titan Xp.
在这里插入图片描述
MEMORY USAGE:the initial static allocation of the model parameters and the total memory utilization for a batch size
of 1 on the Titan Xp
结论:this means that the model complexity can be used to reliably estimate the total memory utilization。

G. BEST DNN AT GIVEN CONSTRAINTS

当我们对内存使用和推理速度有了限制要求时,来看DNN模型分别在Titan XP 和Jetson TX1上的表现,从而可以在限制条件下选出最适合的DNN模型。
TABLE 3: Top 5 models (sorted in decreasing Top-1 accuracy) satisfying memory consumption (≤0.7GB, ≤1.0GB, ≤1.4GB)
and inference speed (≥15FPS, ≥30FPS, ≥60FPS) constraints on the Titan Xp (a) and Jetson TX1 (b).
在这里插入图片描述
Titan XP:
DPN-68:, with a low memory usage as constraint, achieves a recognition accuracy of at most 75.95% by using the DPN-68 network independently of the computational time.
SE-ResNeXt-50 (32x4d):medium and high memory usage,79.11%,super real-time throughput

Jetson TX1:
MobileNet-v1: 69.52%,a super real-time throughput
ResNet-50, able to guarantee an half real-time throughput, with a recognition accuracy of 76.01%。

conclusion

The design of Deep neural networks (DNNs) with increasing complexity able to improve the performance of the
ImageNet-1k competition plays a central rule in advancing the state-of-the-art also on other vision tasks。

The key findings of this paper are the following:

  • the recognition accuracy does not increase as the number of operations increases: in fact, there are some architectures that with a relatively low number of operations,such as the SE-ResNeXt-50 (32x4d), achieve very high accuracy (see Figures 1a and b). This finding is independent on the computer architecture experimented;
  • there is not a linear relationship between model complexity and accuracy (see Figures 1a and b);
  • not all the DNN models use their parameters with the same level of efficiency (see Figures 2a and b);
  • the desired throughput (expressed for example as the number of inferences per second) places an upper bound
    to the achievable accuracy (see Figures 3a and b);
  • model complexity can be used to reliably estimate the total memory utilization (see Figure 4);
  • almost all models are capable of real-time or super realtime performance on a high-end GPU, while just a few of them can guarantee them on an embedded system(see Tables 1a and b);
  • even DNNs with a very low level model complexity have a minimum GPU memory footprint of about 0.6GB(see Table 2).
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值